Intuitiveness in Exception and Error Handling in Connectors

Connector projects are complex and exceptions are bound to happen. It is important that we have the ability to catch, categorize and handle exceptions, so that we do not leave the SailPoint identity platform in an inconsistent state. At SailPoint, we perform rigorous proofreading for preproduction and postproduction errors to improve error-handling techniques to resolve runtime errors or have their impact minimized by adopting reasonable countermeasures.

For background, a connector is the software that connects the SailPoint identity platform system to an external system to load data from and provisioning changes to the external system. An application is an instance of a connector in the SailPoint identity platform. Applications have a default exception strategy to send exception notification to the activity monitor and to log the exception. The default exception strategy itself never actually performs any rollback, commit, or consume activities, but allows for upper layers of the product to manage these tasks consistently across all application types.

Application Reconnection Strategy

The application reconnection strategy specifies how a connector behaves when its connection fails. A reconnection strategy also referred as “Retry Policy” because the underlying identity platform components provided a generic framework for retrying the same provisioning operation. The default exception strategy can be overridden by retry policy allowing control of how reconnects are attempted by specifying a number of criteria: the type of exception, error message, frequency of reconnection attempts and allowed maximum retries. With a reconnection strategy, you can better control the behaviour of a failed connection by configuring it, for example, to reattempt the connection only once every 15 minutes, and to stop trying to reconnect after 10 attempts.

 Categorizing Connector Exceptions

Exceptions can happen at any point within a SailPoint flow, and it is important that we handle them appropriately. How we handle exception depends on the transport, message exchange pattern and type of exception raised. We identify all of the possible exceptions and categorize them, so we have a consistent way of handling across all connectors.

Why is it important to be able to categorize exceptions? — It is important to be able to categorize exceptions because not all exceptions are the same.

At a very high level, we classify exceptions into “Business” or “Non-Business” exceptions. Business exceptions are typically due to incorrect data or incorrect process flow. Due to the nature of a business exception, it is typically non-retryable and therefore would not make sense to configure retry. Instead, processing should stop immediately, and the exception sent back as a response or message routed to a logger queue and a notification sent to operations.

If exception category names do not tell the actual cause of an exception just looking at the exception name, then it becomes difficult to work on it proactively. To bring additional readability and more specific exception handling we have introduced exception categories in the underlying components whose names are clear and meaningful stating the cause of the exception. To implement successful exception handling we understand what kind of exceptions can occur in a connector. For example, to resolve errors caused by faulty application configuration, there is a need to correct an application configuration, thus if a connector detects certain configuration to connect to an external system which is not provided or is faulty, the connector throws an InvalidConfigurationException. The exception category name InvalidConfigurationException informs the user to resolve an error by correcting an application configuration. Along the same lines, if there is an exception caused the external system outage, then the connector throws a ConnectionFailedException.

Error Suggestion to Correct an Error

One of the primary success factors for standardizing an exception handling framework is providing enough information at runtime so that the system can be effectively monitored. Information is key to solving any problem as quickly and as painlessly as possible.  Additionally, the consistency across the different connector types allows for similar and consistent actions regardless of the type of integrated system.

We provide sophisticated error-handling in dealing with exceptions in SailPoint applications. Error messages are meaningful and contain a human-readable description of the error. To solve the production problem, exception messages are more accurate, containing ample and accessible error data. Error stack trace is attached to an error message to show a history (call stack) of what componets were involved in the error.

Along with an error in a message, a message itself contains suggestions for correcting that error. Suggestions are easy to understand and distinguishable from error text. Displaying an error and possible suggestions to correct it helps the user gain intuitiveness in solving the error. If the system administrator follows up on hints, he can resume execution on its own without engaging SailPoint. We don’t provide suggestions if it is negatively impacting data security, meaning suggestions are provided with an error, unless it would jeopardize the security or purpose of the content.

Customer Satisfaction

The error handling techniques like reconnection strategy, categorizing exceptions and error suggestions enables customers to resolve most of non-business exceptions on their own, allowing them to gracefully recover from the unexpected errors without impacting the business.

Discussion