In the age of digital transformation, data-driven products that provide fast, accurate, and actionable insights, dominate the market. In this article, we demonstrate how the Data Science approach yields optimal solutions and opens the door for future growth and development.
As an example, we explore the social aspects of identity governance and illustrate how this realization allows us to tackle some of the challenging problems in IGA today. For instance, it enables us to identify risky identities as outliers of a particular network graph and provides us with insights to recommend proper actions to mitigate risk.
The Data Revolution
The last two decades saw the convergence of several technologies that culminated in the data revolution we experience today. From Big Data’s fault-tolerant, distributed systems that run on commodity hardware, e.g., the Apache Hadoop eco system, to affordable, high-performance, cloud computing options featuring fast, in-memory, analytics tools. These technologies paved the way for industry-wide disruptions.
These days, harvesting terabytes (1,000 gigabytes), and petabytes (1,000 terabytes) of data on a regular basis is all but common. Data sources include user’s online behavior and transactions, as well as, sensor data from billions of mobile devices, social media content, and drone and satellite images. This deluge of data allows data scientists to utilize advanced algorithms and train sophisticated models that in several cases surpass human performance.
Enter Data Science
Data Science is a somewhat nebulous term, encompassing disciplines such as Machine Learning, and AI. Hardly a day goes by without hearing one of these buzzwords on the news. Boundaries between these disciplines are quite fuzzy, and it boils down to what the business problem dictates at the end of the day.
In broad terms, Data Science techniques allow the data scientist to tackle a broad spectrum of real-world problems utilizing approaches that are entirely data-driven. In more technical terms, the underlying challenge is to be able to rephrase the real-world problem as an optimization min/max problem, e.g., minimizing approximation errors. Solving such a problem yields an optimized model that can then be utilized to deliver insights, make future predictions, or perform an action.
The Bridges of Königsberg
Graph theory is an elegant branch of mathematics. Origins of graphs date back to the 18th century when the Swiss mathematician Leonhard Euler utilized a graph structure to solve a tough puzzle. The city of Königsberg, Prussia (currently in Russia) had a network of seven bridges, and the challenge was to determine a path through the city that crosses each of the seven bridges exactly once. With an innovative and straightforward graph approach, Euler was able to show that such a path did not exist! Since then, graphs have been ubiquitous, with applications in mathematics, computer science, telecommunication, genetics, and numerous applied fields.
In 1996, Larry Page and Sergey Brin jointly developed PageRank, a graph-based algorithm to rank search engine’s hits. They went on to found a search engine start-up company that they named “Google”, and the rest is history. More recently, social media companies like Facebook, utilize advanced network graph algorithms, on a very large scale, measured by billions of vertices and edges, to solve some of their business problems, e.g. to detect communities of friends and recommend new connections.
The Identity Graph
SailPoint’s newest product, IdentityAI, is intended to modernize identity governance and bring it to the data-driven world of the 21st century. The premise is that regardless of how tight or well established a company’s governance policies are, only a data-driven approach can expose the ground truth of the implementation of these rules, thus allowing users to pinpoint inaccuracies, assess compliance, and rectify any existing issues.
One of the first use cases tackled by IdentityAI’s Data Science team was the ‘Peer Groups’ use case, which can be briefly described as follows: Given a dataset of identities and their entitlements, Data Science team needs to find an algorithm to cluster these identities into strongly similar groups, based only on their entitlements.
Delving deeper into this problem, we had an epiphany! Entitlement assignment has a strong social aspect. ‘Peer’ identities should be assigned strongly similar, if not identical, entitlement combinations. With this realization, we constructed the ‘Identity Graph’, which is a network graph representation of the identity (entitlement-based) similarity structure. In this graph, vertices represent identities, while weighted edges signify how similar connecting identities are, based on the entitlements they hold. We then utilize an optimal, graph-based, community-detection algorithm to identify the peer groups. This approach has numerous advantages, we summarize a few:
• It is a faithful representation of the similarity structure in an organization. No modeling, no corner cutting, no omissions or extrapolations.
• It yields a physical data model, the graph itself, which can be utilized for several use cases, e.g. outlier detection. Outliers are candidates for risky entitlement assignments, and the graph exposes these identities due to their abnormal configurations. In some cases, it will even allow us to propose certain actions to correct the abnormalities, thus mitigating the risk associated with these identities.
• It opens the door for several other graph-based use cases, e.g. visualization of peer groups, data mining, querying.
• It can provide augmented signals, boosting the accuracy of advanced machine learning models, currently under development as part of upcoming use cases.
In summary, we do live in very exciting times! If history is any guide, data science will allow us to revolutionize IGA and drive growth in a sector that could benefit significantly from a data-driven approach. With recent advances made by the Data Science & Engineering teams, we look forward to making IdentityAI a leading product on the market. As we make progress in data acquisition and harvesting techniques, to improve data’s quality and quantity, we can safely state that for IdentityAI’s future, the sky is the limit.