Skip to Main Content

Create, Communicate, Inspire, Repeat: From Data-Driven PoCs to AI innovations

The abundance of data has made it ever so challenging for IAM professionals to have a bird’s eye view of the underlying risks in [fast-growing, dynamic] enterprise systems. These days companies are taking advantage of machine learning in hopes of gaining meaningful insights from large-scale, high-dimensional data. It should go without saying that the impact of data science work depends on how well others can relate to its findings in order to take further proper actions. At SailPoint, data science (DS) work goes beyond mere analyses and report-style presentations. Our team goes the extra mile by making effective use of high-quality, interactive web applications featuring state-of-the-art data visualizations. This allows the data science team to rapidly act on data-driven ideas and swiftly gauge its value through fast prototyping of real applications. These early prototypes not only have led to new AI innovations, but they’ve also become a crucial part of our communication channel across teams. It has since increased our ability to make machine learning more accessible and bring non-practitioners into the conversation.

Agile approach to innovation

The crux of our innovation process is our DS rapid development pipeline in which we solidified back in 2018 after multiple successful proof-of-concept use cases. It starts with an idea for an existing challenge and a graph database that accurately represents the social aspect of identity management. We test the feasibility by converting the problem into graphDB queries (sometimes these queries lead to generating new features that can be developed and utilized in a day’s time). Often times the query results become part of the machine learning solution that can be developed into high-functional features within a few weeks. The magic happens at the end of the pipeline when we showcase our prototypes. They allow us to collect valuable feedback that inspire our data science team to develop new, impactful solutions.

At the center of it all is our DS Proof-of-Concept Platform developed using the open source Shiny ecosystem in R. ShinyDashboard is an R package that provides an elegant and powerful way to quickly spin up high-quality web applications. This ready-to-use dashboard provides data scientists more time to work on use-cases at hand without having to deep dive into html, CSS, and JavaScript. Perhaps the most powerful aspect about Shiny is its D3 data visualization capabilities. Not only do multi-functional, interactive visualizations add an impressive oomph to an otherwise passive analysis; but they also enable users to actively engage with their data. Part of what makes interactive visualizations a powerful tool for any data science team is that it enables the team to share their work with a wider audience. We deploy prototypes at an enterprise scale using ShinyProxy on our PoC platform. The open-source ShinyProxy has built-in functionality for LDAP authentication and authorization, makes securing user traffic (over TLS) a breeze and has no limits on concurrent usage of the app.

Fast and furious by design

In the world of big data and ML-driven software, communication of the findings is as important as the findings themselves. High-quality, interactive visualizations can act as a communication tool to emphasize certain aspects of the use case and remove any ambiguity or confusion around data science projects. Take identity access management for example, it’s hard to ignore the underlying network structure and social properties present in identity-based governance. A highly effective way to convey these deep aspects of our graph-based approach is to visualize the identity social networks for particular use cases, e.g. peer grouping, and role mining.

(Examples in Shiny v2.0 — Role mining use-case (step 1)

(Examples in Shiny v2.0 — Role mining use-case (step 2)

(Examples in Shiny v2.0 — Role mining use-case (step 3)

In recent discussions, we’ve witnessed how our prototypes empowered our customers and SailPoint crew members from varying levels of ML background. It removes the need for them to understand algorithms before they can start interacting with their data and engaging in high-level technical discussions. When represented in the form of an interactive dashboard, we were able to introduce some of our most innovative use-cases such as outlier identity detection and role-mining.

(Example in Shiny v1.0 — Structure outliers. Risky outlier identities highlighted in yellow, orange, and red.)

These prototypes greatly increased the relevance and visibility of data science inventions. More often than not, they lead to thoughtful and passionate discussions with customers who end up wanting more functionalities. The non-technical feedback is an invaluable resource for driving innovation and fine-tuning our approach. It is the key to measuring the impact of data science in our product. Experimentation and failure are inevitable as data science is well positioned at the upstream of the product pipeline. Our work involves thinking ahead without being restricted by specific engineering practices or protocols. Experimenting cheaply allows us to pivot quickly before minor issues grow into larger problems that could be costly for engineering down the road. Prototyping early give us the opportunity to identify and rectify any existing issues in the overall vision and help guide business decisions in the right direction. In other words, the ability to move fast helps data science innovate better.

Gearing up for high fives

Data science development is a highly iterative process, where it is best to conclude an iteration by closing the feedback loop with customers, working alongside domain experts. The DS rapid development pipeline not only allows the data science team to explore ideas and quickly ship valuable use-cases to production, but also helps to avoid any pitfalls of agile software development. Our working prototypes demonstrate what machine learning is capable of and how we can leverage it to solve customers’ problems. The data science team helps paint the bigger picture and inspire data-driven visions. At the same time, our work in the early-stages of productization paves the way for the team’s success in tackling high-stake problems early on. This type of work has inspired innovation across teams and created a data-driven culture within the organization. More so than anything else, we’ve successfully initiated conversations that sparked some of the most insightful ideas with professional and non-technical teams alike. This is the secret sauce to how our data science team has led the way for data-driven inventions and drive cutting edge AI innovation.


Discussion