SailPoint

Large language models: What’s under the hood? 

SailPoint is always looking for new technologies that can help us improve our products and better serve our customers. One area that has gained significant traction in recent years—and has been attracting mainstream attention in recent months—is Large Language Models (LLMs). In this blog, we’ll delve into LLMs: defining them, pinpointing their technological blind spots, and revealing SailPoint’s plans for enhancing our products. We’ll also touch upon the criticality of our infrastructure choices in delivering top-notch LLMs and pose key questions about the risks of deploying LLMs. 

Large Language Models (LLMs) are advanced artificial intelligence systems that can understand and generate human-like text. They are trained on extensive amounts of data, allowing them to recognize patterns, learn from context, and make predictions. LLMs are used in various applications, such as language translation, chatbots, content creation, and sentiment analysis.  

ChatGPT, an OpenAI product built on the GPT LLM, is estimated to have reached 100 million monthly active users just two months after its launch this past January, making it the fastest-growing consumer application in history. 

**Everyone** is excited about LLMs, but are they ready for the world’s biggest companies? 

I spent the last month working with a couple of SailPoint’s most talented engineers and our product leadership team to thoroughly investigate the potential impact of LLMs on several core use cases. There are plenty of exciting scenarios where LLMs excel, but we also found a handful of scenarios where LLMs struggle. Below I’ve outlined a few uses cases of each.  

Where LLMs struggle 

  1. Unreliable in some scenarios 
  1. Scale poorly to large tasks 
  1. Slower than normal computational processes and expensive  
  1. Not backward/forward compatible 
  1. Security concerns 

While we found clear challenges with LLMs, our team also found a handful of use cases that intersected with identity security which are well-suited to the strengths of LLMs. These use cases range from the relatively known quantity of using coding assistants to aspirational applications like detecting risk in an organization.  

Where LLMs can be successful  

Search 

Our product usage data shows that search is one of the most used features in SailPoint Identity Security Cloud (our flagship SaaS product). However, becoming proficient in building queries can also have a steep learning curve for customers. Our team investigated using an LLM to convert a human question into an Elasticsearch query, which would then be run on the underlying search datastore. Think of it as a flexible Google search for your identity security data.   

By utilizing in-context learning and providing the index mappings (i.e., the schema) used by search, the LLM produced the correct query when asked: “Find me all identities with accounts on the Active Directory source that are privileged.” Similar techniques can be used to convert the natural language to SQL queries, which could help end users to author new reports in our Access Intelligence Center—SailPoint’s hub for reporting. 

Access descriptions 

Providing our customers with a picture of their access spanning identities, entitlements, and roles is crucial to successful identity governance but is not readily available in our field today.  

SailPoint customers have tens of millions of entitlements (atomic pieces of access), and most lack detailed descriptions of what access they grant to a user. The lack of descriptions makes the job of an identity security admin more complex and opens our customers to security threats. LLMs can generate these descriptions at scale, allowing admins to make more informed access modeling decisions. In early feedback on this feature, human annotators have consistently rated access descriptions generated by LLMs as more informative that their human authored counterparts.  

What infrastructure challenges lay ahead 

Many companies—both new and established—are working to deploy LLMs. However, there are challenges with upgrading consumer LLMs to enterprise LLMs. Vendors need to ensure any LLM-powered feature is built on the right foundation.  

SailPoint prioritizes data privacy, secure data handling, and security in every aspect of our operations, which is the same lens we used in determining what we value in our infrastructure. We worked with AWS to get preview access to Amazon Bedrock, a fully managed service that makes foundation models from Amazon, available through an API without having to manage any infrastructure. Our engineers tested Bedrock’s capabilities, and we liked the following aspects: 

Our focus is to positively impact our customers while mitigating risks such as inadvertent disclosure of confidential information, leaking code, or having “hallucinations” that can materially impact a business. 

As the number and types of identities, applications, and unstructured data that must be governed continue to grow, AI is quickly moving from a “nice to have” to a “must have” component of an organization’s identity security strategy. As we delegate more tasks to machines, our customers advance more towards autonomy. 

Several examples of low-risk, high-value use cases exist where LLMs can help deliver an autonomous identity security solution. Along with the use cases described above, we also see value in helping with the challenges of preparing for an audit. In lieu of exhaustive searches for data and authoring reports in a BI tool, an LLM generates a first version of audit reports which can then be refined into the final version, which dramatically reduces time and customers resources.  

The LLM-powered use cases we are interested in aren’t just novelties. They will allow our customers to move towards our vision of autonomous identity security. 

What questions should companies ask before deploying LLM products? 

Before deploying any LLM product into your enterprise environment, start by asking the following questions: 

  1. Data provenance: Where is the data originating from, and how is its authenticity verified? Is it somebody’s intellectual property? Does the output contain an IP that puts us at legal risk if we use it? 
  1. Data regulation: How does the AI system handle data storage across different regions with varying data privacy laws? Does the deployment infrastructure adhere to international, national, and industry-specific regulatory standards such as GDPR, CCPA, HIPAA, PCI DSS, ISO 27001, and the NIST Cybersecurity Framework? 
  1. Infrastructure: How will the AI use/secure what is input into queries? 
  1. PII and confidential data protection: How does the AI system protect against the potential leakage of personally identifiable information and confidential company data? 

As trusted identity security experts, SailPoint’s focus will always be on ensuring the safe and responsible deployment of cutting-edge technologies like LLMs. For more information on SailPoint’s AI efforts, visit the SailPoint website