Large language models: What’s under the hood?

Erik Huckle

9 months ago

SailPoint is always looking for new technologies that can help us improve our products and better serve our customers. One area that has gained significant traction in recent years—and has been attracting mainstream attention in recent months—is Large Language Models (LLMs). In this blog, we’ll delve into LLMs: defining them, pinpointing their technological blind spots, and revealing SailPoint’s plans for enhancing our products. We’ll also touch upon the criticality of our infrastructure choices in delivering top-notch LLMs and pose key questions about the risks of deploying LLMs.

Large Language Models (LLMs) are advanced artificial intelligence systems that can understand and generate human-like text. They are trained on extensive amounts of data, allowing them to recognize patterns, learn from context, and make predictions. LLMs are used in various applications, such as language translation, chatbots, content creation, and sentiment analysis.

ChatGPT, an OpenAI product built on the GPT LLM, is estimated to have reached 100 million monthly active users just two months after its launch this past January, making it the fastest-growing consumer application in history.

**Everyone** is excited about LLMs, but are they ready for the world’s biggest companies?

I spent the last month working with a couple of SailPoint’s most talented engineers and our product leadership team to thoroughly investigate the potential impact of LLMs on several core use cases. There are plenty of exciting scenarios where LLMs excel, but we also found a handful of scenarios where LLMs struggle. Below I’ve outlined a few uses cases of each.

Where LLMs struggle

Unreliable in some scenarios

Inconsistent response format and quality make production applications difficult

LLMs can “hallucinate” meaning the model at times fabricates people, places, things, and numbers which can make certain processes, like specific calculations useless

Scale poorly to large tasks

Prompt size is limited
The probabilities of hallucinations increase with prompt size

Slower than normal computational processes and expensive

Compared to structured queries, LLMs are much slower. For instance:

It can take LLMs over a second to even start to respond to a prompt
Current LLM technologies like ChatGPT offer no service level agreements (SLAs)

In one experiment we ran, it took the LLM over 30 minutes to respond to one complex prompt

Commercial offerings like ChatGPT charge by “token” (read: word). While their output can be valuable, SaaS companies need to keep an eye on the potential impact to costs.

Not backward/forward compatible

Results are very sensitive to the prompt

Prompts that work well today have no guarantee of working well tomorrow

Security concerns

LLMs may use tools/plugins in unexpected ways
LLMs may leak any information they have seen—or can access—to a user

LLMs may transmit sensitive information from the user to the owner
Prompt injection—exploiting LLM prompts with malicious inputs that trick the model into ignoring previous directions
LLMs can generate output at a far higher volume, and if their error rate is comparable to humans, there may be an explosion of vulnerabilities with widespread usage

While we found clear challenges with LLMs, our team also found a handful of use cases that intersected with identity security which are well-suited to the strengths of LLMs. These use cases range from the relatively known quantity of using coding assistants to aspirational applications like detecting risk in an organization.

Where LLMs can be successful

Search

Our product usage data shows that search is one of the most used features in SailPoint Identity Security Cloud (our flagship SaaS product). However, becoming proficient in building queries can also have a steep learning curve for customers. Our team investigated using an LLM to convert a human question into an Elasticsearch query, which would then be run on the underlying search datastore. Think of it as a flexible Google search for your identity security data.

By utilizing in-context learning and providing the index mappings (i.e., the schema) used by search, the LLM produced the correct query when asked: “Find me all identities with accounts on the Active Directory source that are privileged.” Similar techniques can be used to convert the natural language to SQL queries, which could help end users to author new reports in our Access Intelligence Center—SailPoint’s hub for reporting.

Access descriptions

Providing our customers with a picture of their access spanning identities, entitlements, and roles is crucial to successful identity governance but is not readily available in our field today.

SailPoint customers have tens of millions of entitlements (atomic pieces of access), and most lack detailed descriptions of what access they grant to a user. The lack of descriptions makes the job of an identity security admin more complex and opens our customers to security threats. LLMs can generate these descriptions at scale, allowing admins to make more informed access modeling decisions. In early feedback on this feature, human annotators have consistently rated access descriptions generated by LLMs as more informative that their human authored counterparts.

What infrastructure challenges lay ahead

Many companies—both new and established—are working to deploy LLMs. However, there are challenges with upgrading consumer LLMs to enterprise LLMs. Vendors need to ensure any LLM-powered feature is built on the right foundation.

SailPoint prioritizes data privacy, secure data handling, and security in every aspect of our operations, which is the same lens we used in determining what we value in our infrastructure. We worked with AWS to get preview access to Amazon Bedrock, a fully managed service that makes foundation models from Amazon, available through an API without having to manage any infrastructure. Our engineers tested Bedrock’s capabilities, and we liked the following aspects:

High-performing Foundation Models (FMs) through a scalable and secure managed service

Models we can customize with our data while ensuring data privacy and protection
Ability to integrate and deploy the models into relevant applications
The ability to fine-tune the models with only a small fraction of labeled examples, ensuring efficient use of data and compute resources to train FMs from scratch
Customer data remains encrypted and within their own Virtual Private Cloud (VPC), guaranteeing privacy and confidentiality

Our focus is to positively impact our customers while mitigating risks such as inadvertent disclosure of confidential information, leaking code, or having “hallucinations” that can materially impact a business.

As the number and types of identities, applications, and unstructured data that must be governed continue to grow, AI is quickly moving from a “nice to have” to a “must have” component of an organization’s identity security strategy. As we delegate more tasks to machines, our customers advance more towards autonomy.

Several examples of low-risk, high-value use cases exist where LLMs can help deliver an autonomous identity security solution. Along with the use cases described above, we also see value in helping with the challenges of preparing for an audit. In lieu of exhaustive searches for data and authoring reports in a BI tool, an LLM generates a first version of audit reports which can then be refined into the final version, which dramatically reduces time and customers resources.

The LLM-powered use cases we are interested in aren’t just novelties. They will allow our customers to move towards our vision of autonomous identity security.

What questions should companies ask before deploying LLM products?

Before deploying any LLM product into your enterprise environment, start by asking the following questions:

Data provenance: Where is the data originating from, and how is its authenticity verified? Is it somebody’s intellectual property? Does the output contain an IP that puts us at legal risk if we use it?

Data regulation: How does the AI system handle data storage across different regions with varying data privacy laws? Does the deployment infrastructure adhere to international, national, and industry-specific regulatory standards such as GDPR, CCPA, HIPAA, PCI DSS, ISO 27001, and the NIST Cybersecurity Framework?

Infrastructure: How will the AI use/secure what is input into queries?

PII and confidential data protection: How does the AI system protect against the potential leakage of personally identifiable information and confidential company data?

As trusted identity security experts, SailPoint’s focus will always be on ensuring the safe and responsible deployment of cutting-edge technologies like LLMs. For more information on SailPoint’s AI efforts, visit the SailPoint website.