Performance Engineering for SaaS Products at SailPoint
Performance Engineering means many things to different people. Some people think “simple JMeter tests” and others “advanced capacity planning” or “predictive analytics”. The Performance Engineering practice at SailPoint spans all of these concepts. Today we will talk about what Performance Engineering means to IdentityNow and to SailPoint. Let’s take some time to explain how we work, what we do and why it benefits our customers and our end users. Performance engineering is a necessary discipline of any successful, scalable software product team.
Research We Do
At SailPoint, Performance Engineering focus expands beyond the core responsibilities of technical performance evaluations such as capturing and understanding performance metrics from our software. We gather that information and apply it to the business cases that call into the systems being tested. Our team does the research to find answers to questions like the following:
- How many self-service password resets can an individual customer organization process per minute on virtual appliance?
- How many self-service password resets can a micro-service address in a given minute?
- How quickly can we generate a Certification Campaign for a given number of managers?
- How responsive is our UI when we have dozens of administrators concurrently running searches?
- Does our UI remain responsive while background batch account data aggregations are processing?
Our deliverables use terms that directly apply to product management driven features like, “Our testing has shown that in our reference configuration we can have approximately 1,700 end users concurrently performing X activity before performance will degrade.” Understanding and interpreting our performance metrics in this way allows us to have meaningful conversations with Product Management, DevOps, and core Software Engineering on whether our findings require an architecture review, expanded or contracted infrastructure, or will meet our customer’s expectations for product behavior. Feature focused questions and reporting help us understand our software and our customer’s needs.
Knowing the performance characteristics of our products ahead of time helps us manage software lifecycle end-to-end. Helping everyone be informed makes lives better – be they customers, support, services implementers, or on-call engineers handling escalations. The simple knowledge that a service that normally runs at X records per minute is currently running at 1/10th that helps us quickly isolate the problem space when there is a support issue.
From answering RFP questions about performance and scalability to helping with support escalations when a customer feels something isn’t working the way it should, having a record of proactive testing to set everyone’s expectations early provides value. Understanding the upper bounds of load and capacity ahead of time, before customers use a feature, allows SailPoint to proactively take action when we see our operational dashboards approaching those limits. It allows us to proactively communicate with customers when we are reaching known limits. It allows customers to reach out to us when spikes of traffic are coming due to changing business needs.
How We Work
Maintaining engineering velocity in cloud engineering teams contributes to success. Technologies such as CI/CD pipelines and automated unit, integration, and end-to-end regression tests support keeping a high rate of coded feature to production code deliveries. The keys here are removing toil and minimizing the amount of human action that needs to be taken after changed code gets pushed, reviewed, and merged. Performance Engineering is seldom explicitly called out in the pantheon of tests that go into automated CD pipelines that include unit test, integration test, end-to-end test, gold-master test, and platform testing.
Performance Engineering compliments the typical CD process. Due to the sizeable nature of data sets in use, performance testing is usually not integrated into the CI/CD pipelines. Smaller scale performance evaluations, such as collecting the average times to lookup 10,000 different records, can remain part of CI/CD, but larger scale testing efforts like those involving 7-figure data sets and “soak” type tests that check for resource leakage over the long term need to be handled in a different process. Performance evaluations that run in a single digit number of minutes or less time are appropriate for CI integrations; experience has taught us that these are the minority of performance testing related items. The vast majority of performance tests must live in other processing streams, either running concurrently to CI testing or running independently on their own schedule.
At SailPoint our individual scrum teams own their software stacks from cradle to production. They choose the programming languages, micro-service technologies, persistence layer interfaces and technologies, and overall object model for how their subsystems work. Performance Engineering acts as a reviewing adviser to these teams by quantifying the performance of newly implemented features and providing architectural critiques and improvement suggestions when appropriate. Performance engineering does not have veto power and cannot override design decisions made by the core engineering groups; this is important because the Software engineers maintain ownership of their design choices.
Engineering managers and lead engineers initiate performance evaluations for new features via a request. When a project is nearing feature-completion and entering its initial QA hands-on time is most frequently when performance engineering gets involved. We establish an understanding of the feature’s software mechanics, including REST calls and involved infrastructure and define automated test drivers that load the new feature in a reproducible, concurrent, and tunable fashion. Once we have test jigs in place, we iterate on running load cycles. We then take that data and produce formal reports to give feedback to product management and core feature engineering on our findings and recommend possible changes based on the collected data.
Diplomacy is important during this process. Nobody wants to hear that their baby is ugly or slow. Performance engineering needs to be careful not to practice “Seagull” evaluation – fly in, make a lot of noise, leave negative notes on everything, and then fly away. When we do find an issue, our duty is to communicate it back in a quantified and uncritical manner while being as helpful as possible and remembering that we are all on the same team. This allows the engineers who understand the feature best – the ones who wrote it – to use their judgment on what is a time and technology efficient solution to any performance issues identified. We, as a team, own the performance of our software products as much as the core engineering groups that produce them and the DevOps teams that helped take them into production. It takes calm and polite consensus between the teams to produce scalable in performance products for our SaaS service customers.
Quality is everyone’s responsibility, and this includes Performance Engineering. When we see something, we say something. Sometimes we find feature bugs or API calls that need their response payloads tweaked or minimized for public use. We sometimes overlap with helping to find issues with other disciplines like feature functionality and security compliance because our work streams happen concurrently with QA’s involvement in feature development.
Like our UX design, at SailPoint the Performance Engineering discipline is engineering-led from the bottom up. That means engineers, not product managers, decide how we test and what infrastructure we build. We share the responsibility with Product Management for ensuring performance evaluation before new features are released to customers and the market. We co-ordinate feature centric stand-ups and synchronization between our test engineers, product engineers, DevOps engineers, and product managers where we review the performance data and deliverables. In a world where “perfect” can be the enemy of “good enough” we work to find a balance between the expense of back-end infrastructure versus end-user experience. We achieve an understanding of what our infrastructure can support today, what our anticipated needs and load will be, and what steps to take as we turn on the feature in production.
Do challenges like benchmarking to find the limits of software and infrastructure interest you? Do you enjoy writing concurrent, multi-threaded API clients that emulate the behavior of dozens or hundreds of separate user’s browsers interacting with a platform? If these kinds of problems interest, you then reach out to us and consider joining our team.