If Mom Taught Me Anything … It’s That You Need a Feature Store

“A problem well stated is a problem half-solved.” – Charles Kettering 

It is not a secret that data fuels our digital progress. The key is to use that data effectively. SailPoint is not an exception to this digital progress. At SailPoint, we use data to inform our AI/ML platforms and have built a Feature Store as the foundation. Feature Store? 

Features are everything when it comes to quality Machine Learning Engineering. Get it wrong and the quality of your ML model will suffer. Most likely, you will also need a lot of features to see which ones are the strongest signal carriers. Anything you can measure is a potential feature. As a result you can get overwhelmed with feature management over time. 

But what are humans good at?! Right, we are good at creating useful tools so we can focus on solving our main business objectives. This is where Feature Store concept comes in handy. 

We would like to share how we conceptualized and created our solution with the community of data geeks. 

What are we trying to solve? 

Provide recommendations for numerous access requests and certification campaigns to our customers and have system level guaranties that the recommendation is correct, consistent, trackable, and most importantly, trusted.

How we did it. 

First step: we looked around and tried to see if this problem had an existing solution. It turned out that data geeks had already figured out an elegant design.  

Meet Feast and Michelangelo! These two projects enlightened and inspired us with their elegance. After all, you want to learn from the best, don’t you? But… there was a minor road bump. Feast is awesome. It is open sourced and well designed, but it uses Google Cloud Platform and we are an AWS shop. Michelangelo is a proprietary platform by Uber. Inspired by these two projects and driven by a vision of Machine Learning as a Service (MLaaS), we were committed to having a solid foundation for our future platform. We started working on our own version of the Feature Store that would fit into our existing infrastructure without boiling an ocean. 

Meet the SailPoint Feature Store. 

A screenshot of a cell phone

Description automatically generated
Figure 1

Key design definitions:

  • Entity – the domain object in question. Think about it as a primary key that never changes.  
  • Feature – a numerical or categorical property of an entity of certain type defined by feature producer 
  • Feature Group – A logical grouping of Features that have the same Entity 

Feature Store functionality: 

Ingestion: Feature values need to be ingested at scale and remain immutable.  Feature producers control the logic around what makes up a feature. Serving store schema is automatically updated when producers register new feature group or add features to existing groups. 

New Versioning: Every feature value is stamped with an identifier timestamp and needs to be activated before serving.  If the semantics of a feature change, a new version of that feature is created.  Activating features and feature groups is atomic to avoid dirty and inconsistent reads. 

Discovery: Feature meta-data is maintained to describe the features and the schemas of the entities and feature groups.  This is queryable so that consumers can learn about which features are available. 

Serving: Consumers can retrieve feature values at scale and in bulk. 

SaaS-first: The feature store also was developed SaaS-first and handles multi-tenancy and some configuration sharing across tenants.  

How we use it. 

Our first production functionality that used Feature Store was IdentityNow recommendation engine (Figure 1). Feature Store enabled us to implement a recommendation scoring algorithm, powered by simple and performant data producers – both of which are driven by customer’s configuration. It also gives our engineering team a single source of truth for reliable score computation and troubleshooting. 

The system described above is the first baby step in our journey to better ML data practices at SailPoint. There is no doubt that this system might be eventually superseded by a more advanced solution as time goes on, especially in such a fast evolving field as AI/ML. But, it is quite obvious that key definitions and abstractions of a system like Feature Store won’t change much. It will allow us to switch more easily to a different solution later, hopefully avoiding major headaches. 


Discussion