DDE search technology and how to get the most out of your queries

The Dynamic Discovery Engine (DDE) in IdentityNow provides users with the ability to search Identities and related objects using a powerful and flexible search syntax.  The DDE utilizes Elasticsearch to power this search functionality.

Elasticsearch is a distributed search and analytics engine based on the Lucene information retrieval library. Elasticsearch is a schema free JSON document store which allows dynamic and flexible storage of data. For the IdentityNow implementation we store our key objects as JSON documents which currently include Identities, Entitlements, Roles, Access Profiles, Account Activity, and Audit Events.

Figure 1 – Example Identity JSON

In order to allow for very fast searching of these document objects, we create a mapping definition which tells Elasticsearch how to index the fields stored in the documents.  Elasticsearch provides several different mapping types depending on what is stored in the field. Standard core index types are available for numerical, boolean, string keywords and date values.

The real power comes with Elasticsearch’s ability to index text data into a full-text index using an inverted index structure. The inverted index splits out the text into individual word tokens with references back to the documents they are contained in. The search engine can then use this index table to calculate a score for each document, taking into account parameters like proximity, fuzzy matching using word stemming, and also eliminating insignificant words to return the most relevant documents.  Elasticsearch also provides the ability to use analyzers that can adjust the way the inverted index is built to fine tune how your searches behave with your data.

Figure 2 – Identity Inverted Index

In addition to the core types and the full-text index, Elasticsearch also supports complex types such as arrays, objects, and nested objects and also specialized types including IP addresses and geo coordinates.

One of the useful complex types we use for IdentityNow is the nested object. A nested object allows us to store and search related objects in the Elasticsearch flattened document store. For example, Identities have three nested objects that are treated as child documents which include Accounts, Access, and Apps.  Under the covers, Elasticsearch is creating separately indexed documents that allow us to search multiple fields on these objects.  Non-nested objects in Elasticsearch have their fields flattened into arrays which limit the ability to do multi-field searching to retrieve a specific item.

Figure 3 – Nested Objects vs. Regular Object

As you can see in the figure above, without the nested document, the search query access.type:role AND access.name:admin would still return Identity “John Doe” since the values are flattened to an array per field. The nested query would not match because the fields do not exist together on a single access document.

Now that we’ve given a very brief description of Elasticsearch and how it indexes data, we can now examine the search query syntax used by DDE and identify some tips and tricks to help construct them better.

Query Syntax Overview

The query syntax utilized by DDE consists of a combination of terms and operators.  The terms are the items (words, dates, numbers, other values) you are looking to match in the documents stored in the index. Terms can also be grouped together into phrases by enclosing the terms in double quotes. The operators consist of boolean operators like AND, OR, & NOT plus operators used for grouping. There are also operators which help narrow now your search and take advantage of the different field types to in include wildcards, fuzzy matching, regular expressions and ranges.

Figure 4 – Query Syntax Reserved Operators

Field Specifiers

The simplest form of a query used in DDE is just a query term.  This query will be searched against all the objects in the index and all the fields on those objects.  Sometimes you may want to target your search term to a specific field.  In this case you would use a field specifier to bind the scope of that term.

Example:

firstName:John AND lastName:Doe AND attributes.department:"Human Resources"

The field specifier can also be combined with grouping and boolean operators to add additional flexibility.

Example:

attributes.department:("Human Resources" OR Accounting OR IT)

If you have a long list of items you want to search for, you can also use a comma instead of the OR operator.

Example:

attributes.department:("Human Resources",Accounting,IT)

Nested Queries

Nested documents are complex field types that allow related child documents to be stored inside a parent document. An Identity has three nested documents to include Accounts, Apps, and Access.  In order to perform searches using the nested documents, a special syntax is used that involves @ followed by the nested type and then the query to run inside parentheses.  The result of the search will return the parent documents that match the nested query being passed.

 

Example to return identities that have the Role “Admin” on Source “Active Directory”:

@access(type:ROLE AND name:"Admin" AND source.name:"Active Directory")

Full Text vs. Keyword

String text fields can be indexed as either a full text field, a keyword string, or potentially both.

Keyword strings are fields that have a finite list of values like a list of cities or come in a specific format like an email address or phone number.  These types of fields match a query if there is an exact match or if a wildcard used and are case sensitive.

The full text field uses the inverted index mentioned above and can match a query if one or more words exist in the object and a score is provided to rank how the objects are returned.

In some cases you may want to search as full text in some case and as a keyword in other case.  For fields indexed as full text, you can add .exact to the field specifier and then the query string following will be treated as a keyword query.

Example to return documents that have John somewhere in the name field and is not case sensitive:

name:John

Example to return documents the have only the term JOHN exactly in all caps:

name.exact:JOHN

Fuzziness, Proximity, and Wildcards

Fields that are indexed for full text searching can also be searched using fuzziness and proximity operators.

Fuzziness operators allow for slight variances and misspellings in the search term and will still match.  This can be done by using the ~ character at the end of the term.

Example match even with misspelling:

name:Jonh~

In similar fashion, we can also allow for variance in the order of terms in a phrase by adding a ~ at the end of the phrase optionally followed by a number which determines how many terms away they can be.

Example match a term phrase where the terms are maximum 2 away from each other:

name:"Doe Q John"~2

Finally, you can utilize wildcards to match a single character or multiple characters. To wildcard match a single character you use ? and to match multiple characters you use *.

Example:

name:Joh?

name:Jo*

Ranges

Fields that are indexed as keyword string, numbers, and dates values can be searched using range operators.

The range operators work in conjunction with the field specifiers to support both a single value range and a multiple value range.

The single value range indicates if a value is greater, less, or equal using >, <, >=, <= operators.

Example:

accessCount:>10

date:<=2019-2-10

The multiple value range indicates if a field is between two values using [ TO ] for an inclusive range and { TO } for an exclusive range. The inclusive and exclusive operators can also be combined together such as [ TO } or { TO ].

Example to return all objects created between 2/10/2019 and 3/10/2019 including those dates:

created:[2019-2-10 TO 2019-3-10]

Example to return all objects created between 2/10/2019 and 3/10/2019 excluding those dates:

created:{2019-2-10 TO 2019-3-10}

Example to return all objects created between 2/10/2019 and 3/10/2019 including the start date but excluding the end date:

created:[2019-2-10 TO 2019-3-10}

Exists

Sometimes it’s useful to know if a value exists or does not exist in a field.  The _exists_ operator is used for that purpose. It can be useful for a specific field value and also for an object or nested object.

Example to return objects that have a phone number:

_exists_:phone

Example to return objects that do not have a phone number:

NOT _exists_:phone

Example to return objects that do not have a manager object:

NOT _exists_:manager

Summary

The Dynamic Discovery Engine leveraging Elasticsearch provides a very powerful and flexible capability that allows users to craft search queries that can pull from most the fields and objects in IdentityNow.  Hopefully understanding the technology behind the scenes and gaining insight into the query syntax will help fine tune your search skills and allow you to get more use out of the tool.

Discussion