Skip to main content
Version: 2.0

GraphGrid Search

Platform Version 2.0

API Version 1.0

Introduction

GraphGrid Search provides textual search capabilities across a graph database by integrating ONgDB with Elasticsearch. Users are allowed to define policies for populating indexes based on highly customizable Elasticsearch documents built using policies that support Geequel. Search provides two main capabilities. First, it takes nodes, their properties, and customizable queries and stores them in Elasticsearch; this is called indexing. Second it takes text input and searches across documents for matches; this is called searching.

Environment

Search requires the following integrations:

  • ONgDB 1.0+
  • Elasticsearch 5.5+

Search supports the following integrations:

  • RABBITMQ 3.5+
  • AWS SQS

Indexing

Indexing refers to the process of pushing up documents in to indexes that are held by the Elasticsearch server. Information must be indexed in to Elasticsearch before it can be searched for and retrieved.

If a broker (such as RABBITMQ or SQS) is set up to be used by GraphGrid Search, then indexing should always be done using the broker. Indexing using apoc repeat jobs and triggers cannot scale as a broker is able to and cannot be used across a cluster.

Index Policies

An index policy is used to define what indexes and documents should be created in Elasticsearch using graph data from ONgDB. The policy is composed of three parts, indexes, globals and metadata. Indexes define what should be stored (indexed) in Elasticsearch. Globals define variables that can be used in custom queries for storing custom information. Metadata is used to store information about the index policy itself.

Index policies can be generated from the nodes on the ONgDB graph itself. These generated policies can be used as-is or can be edited for optimizing and storing custom information. Index policies are written in JSON and are stored in AWS S3.

Metadata

Information about the index policy itself is stored as metadata. It includes a description, a displayName, a createdAt, an updatedAt, a state, and previous versions of the index policy.

The displayName is used to store the name of the index policy.

The createdAt property contains when the original index policy was created, and updatedAt contains when the index policy was last saved. They are both stored in the ISO-8601 format.

The state of an index policy can be ONLINE, OFFLINE, POPULATING, or FAILED. An index policy with an ONLINE property is the index policy that was most recently used to index documents to Elasticsearch. An index policy with an OFFLINE property means it is not actively being used for indexing. An index policy with a POPULATING property means the index policy is currently being used to index and push documents to Elasticsearch (Note: This state is only applied when the index policy is manually used to index through specific endpoints. An index policy used to create apoc repeat jobs/triggers that push documents to Elasticsearch will have an ONLINE state). An index policy with a FAILED state indicates that an index within the index policy failed to push documents to Elasticsearch.

The versions property in metadata is a list of the previous index policies of the same displayName in the same cluster. The configurable value maxVersions sets how many previous index policies are stored at once, with a default value of six. Note that the previous policies are displayed in full, so when editing an index policy make you are editing the actual index policy and not a previous version!

Indexes

An index in the index policy is used to specify what and how information will be stored in Elasticsearch. An index can be broken into three parts, indexData, index strategies, and a schema. Below is an example of a person index policy, without any inner parts expanded.

{
"person": {
"indexData":{ ... },
"indexStrategies": { ... },
"schema": { ... }
}
}

The name of an index, "person" in this case, is validated and must be an acceptable index name for Elasticsearch. The index names are validated by the following regex: [a-z0-9][a-z0-9_\\-]*[a-z0-9]

That is, names must be lowercase, starting and ending with a letter or number, and allowing underscores and dashes. Names must be at least two characters long.

Index Data

Information about the index is stored here. It includes a description, and a displayName. The description is a statement about the index. The displayName is used to store an alias of the index. Since the index name is subject to validation the displayName allows for a more flexible name if desired.

{
"person": {
"indexData": {
"description": "The 'person' index representing the 'Person' node label in Elasticsearch.",
"displayName": "person"
}
}
}

Index Strategies

The index strategies define how we query the ONgDB graph to acquire information that we want to store in Elasticsearch. In an index the index strategies are used to define different ways to get the same base information that will be stored in Elasticsearch. They can be used to achieve different indexing features, all from the same index policy. For example, one index strategy can be used to index all Person nodes while another index strategy can be used to index any Person nodes that have not yet been indexed. Below is an index with two index strategies, named defaultFull and defaultPartial, that do just that.

{
"person": {
"indexStrategies": {
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS person RETURN person",
"anchorLabel": "Person",
"anchor": "person",
"anchorId": "person.grn"
},
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS person RETURN person",
"anchorLabel": "Person",
"anchor": "person",
"anchorId": "person.grn"
}
}
}
}

Notice that each index strategy contains a producer, anchorLabel, anchor, and anchorId. These are four required properties out of eight total possible properties (more on the others later).

The producer is a Geequel statement that should return any information that will be used in the schema. The information returned by the producer is also used to define some other parameters of an index strategy.

In our example the producer retrieves nodes of type Person and then returns those nodes under the variable person. The only difference between defaultFull and defaultPartial is the producer. Specifically, defaultPartial has a WHERE clause that filters out all Person nodes that have already been indexed.

The anchor is used to connect each Elasticsearch document with a node on the graph. The anchor is evaluated in context of what is returned by the producer.

In our example, the producer returns person and these returned nodes are also what we want our Elasticsearch documents to be connected to. Thus, the anchor is also person. In an example below, we look at a case where the producer returns is not exactly the same as the anchor (See "More complex Index Strategies").

The anchorLabel is the label that is used when searching for the anchor node. Choosing a label that makes it quick to find the anchor will help speed up performance in certain situations. The example has an anchorLabel of Person, since our anchor person is of type Person. Since full label scans are slow, an anchorLabel is required.

The anchorId is used as the unique identifier for the Elasticsearch document. Similar to the anchor, the anchorId is evaluated in context of what is returned by the producer. The anchorId should evaluate to a unique value. In the example, every person has a unique grn property, and so person.grn is always unique. Every person has a unique grn, therefore we can be assured every person node will have its own Elasticsearch document.

There are also validation requirements for the anchorId values, which must match the following regex: [A-Za-z0-9_\-\.\:]+. If an evaluated anchorId fails the validation, the indexing process for the current index will be stopped and an error message will be pushed to ONgDB's console and debug log.

As mentioned above, an index strategy has four more parameters all of which are optional: batchSize, parallel, retries, and iterateList. These control the way documents are indexed and can be specified or left out entirely. Specifically they are used in an internal apoc.periodic.iterate and further documentation may be found here.

Index Strategy Recap

Here's a quick recap:

Index strategies parameters:

producer (required, returns the information to be used in storing documents to Elasticsearch)

anchorLabel (required, label that is used to identify the anchor)

anchor (required, node that connects the graph to the document)

anchorId (required, unique identifier of the node)

batchSize (optional, how many rows processed per ONgDB transaction commit, defaults to 1,000)

parallel (optional, whether the producer and consumer in apoc.periodic.iterate run parallel, defaults to true)

retries (optional, retries for each failed commit, defaults to 0)

iterateList (optional, whether the consumer consumes a single row at once or all rows at once, defaults to true which runs all rows at once)

Index strategies can be named any alphanumeric string with underscores. The names defaultFull and defaultPartial should be used with caution, as these are the index strategy names for the generated index policy.

Advanced Index Policy

In the example above our producer returned Person nodes as rows, with the variable name person. We used this returned row directly as our anchor, but sometimes we want our producer to return more than just a single node per row. Below is an example where the producer returns more than just a single node:

"director": {
"indexStrategies": {
"full_slice": {
"producer": "MATCH (n:Person)-[:DIRECTED]->(m:Movie) WITH [{director: n, movies: collect(m)}] AS slices UNWIND slices AS slice RETURN slice",
"anchorLabel": "Person",
"anchor": "slice.director",
"anchorId": "slice.director.grn"
},
}
}

The goal of the index director is to store each director (a Person that has DIRECTED a Movie) and their information in Elasticsearch. The index strategy full_slice returns a single row named slice. Notice that slice contains a person node, named director, and the movies that this person has directed under the variable named movies. Our anchor is slice.director, since that is the node we want our Elasticsearch document to be associated with. The anchorId is slice director.grn which uniquely identifies our anchor and thus its Elasticsearch document. Lastly, our anchorLabel is Person since the anchor has the label of Person.

The way we use the MATCH clause ensures that our producer returns slices where director is someone that has directed at least one movie.

The index strategies define how we query the ONgDB graph to acquire information that we want to store in Elasticsearch. The next section we look into the schema, which defines what information we will store, and how we store it.

Schema

The schema defines what should be stored and how that information is computed. The schema of an index references the index strategies also defined by that index. An example schema of an index with the given name person is shown in the Index Policy snippet below:

{
"person": {
"schema": {
"born": { <- Property 1
"schema": {
"generatorStrategies": {
"defaultFull": "",
"defaultPartial": ""
},
"searchQuery": {},
"tokenizer": null,
"suggester": null
}
},
"name": { <- Property 2
"schema": {
"generatorStrategies": {
"defaultFull": "WITH coalesce(person.name, person.full_name)",
"defaultPartial": "WITH coalesce(person.name, person.full_name)"
},
"searchQuery": {},
"tokenizer": null,
"suggester": null
}
},
"baconNumber": { <- Property 3
"schema": {
"generatorStrategies": {
"defaultFull": "MATCH p=shortestPath((bacon:Person {name:'Kevin Bacon'})-[*0..]-(person)) WITH length(p)/2",
"defaultPartial": "MATCH p=shortestPath((bacon:Person {name:'Kevin Bacon'})-[*0..]-(person)) WITH length(p)/2"
},
"searchQuery": {},
"tokenizer": null,
"suggester": null
}
}
}
}
}

Inside the index person we have a schema which holds the properties that we want to store. The properties in this schema corresponds to born, name, and baconNumber.

Each property has an internal schema that holds the generatorStrategies which correspond to index strategies defined by the index. The properties also have searchQuery, tokenizer, and suggester, which are used to customize how searches to the Elasticsearch index can be performed.

Generator Strategies

The generatorStrategies are maps from an index strategy name (such as defaultFull), to a string that is used to construct the value to be stored in the property for the Elasticsearch document. This string may be empty or may be a Geequel statement.

Generator Strategies: Empty String

Sometimes the string of a generator strategy is empty, like both strategies in born. Whenever a generator strategy string is left empty the value stored in the Elasticsearch document is the property of anchor. That is, <anchor>.<propertyName> is evaluated and stored in the Elasticsearch document.

Here's an example: for the index strategy defaultFull our anchor is person, and the property born is defined for the person node. This means that person.born is evaluated and stored in the person Elasticsearch document under the property born.

If an empty string is used on a schema property that is not an anchor property, then the property will be stored as null in the Elasticsearch document. There is nothing inherently wrong with this and will happen when nodes of the same type have different properties.

Sometimes nodes of the same type have the same information stored in different properties, like some person nodes using a name property to store their name, and other person nodes using a full_name property. To make sure all Elasticsearch documents for person nodes have a name property we use some customized Geequel inside the generator strategies.

Generated index policies make heavy use of empty generator strategies.

Generator Strategies: Customized Geequel

When the generator strategy string is not empty, like name and baconNumber in our example, we can use Geequel to compute a custom value. The Geequel of the generator strategy can use whatever was returned by the producer of the same named index strategy.

In our example, both defaultFull and defaultPartial have producers that return person as a node. Both baconNumber generator strategies use person as a node in their queries computing the bacon number. We can see this explicitly by the use of (person) in the MATCH clause. In our example baconNumber has generator strategies that compute the bacon number of the person and store it in Elasticsearch.

Another example is the name property in the schema. As mentioned above, sometimes nodes of the same type will the same sort of property (like a 'name') stored in different property keys. To ensure each Elasticsearch document gets correctly populated we use customized Geequel. The Geequel "WITH coalesce(person.name, person.full_name)" uses the first non-null property, either person.name or person.full_name. Using this technique, we can make sure every Elasticsearch document has the correct information for each property, even if they are stored in different property keys on nodes.

It is important that each non-empty generator strategy string ends with a "WITH" instead of ending with a "RETURN" as a normal query would. Internally, we compute the custom value and store it in a map. We do this as many times as there are custom queries and use "WITH"s to string together multiple custom queries.

Moreover, a custom query cannot end with multiple items in the WITH clause (e.g. "WITH [username,password]" is allowed, but "WITH username, password" is NOT allowed.)

Furthermore, the ending expression must evaluate to a scalar, map, or list. This is due to Elasticsearch not being able to explicitly store certain objects (like nodes) that cannot be converted to json implicitly. (IMPORTANT: Storing nodes and relationships in Elasticsearch is possible! Instead of ending the generator strategy as "WITH n" where n is a node, end it as "WITH n {.*}". The expression "n {.*}" evaluates to a map of all of n's properties, which is essentially the node n. )

Lastly, do not alias the last WITH clause with an AS.

Here's a small recap of the rules when using customized queries in the generator strategies: Custom queries must end with a "WITH <stuff>" The <stuff> is singular object. Do not end the WITH with multiple things. The <stuff> is a scalar, map, or list. Do not alias the final WITH. Using "WITH <stuff> AS <name>" will result in an error and break the indexing.

Also, be aware of the following:

  • The queries cannot contain a "true" tab or newline. The JSON parser will throw an error. Use "\n" and "\t" for newlines and tabs if desired. For example, using only spaces is most readable.

  • Do not use double quotes in the queries. You should use single quotes when using strings since usage of double quotes will actually escape the query string and will result in invalid JSON (e.g. Use 'Kevin Bacon' instead of "Kevin Bacon.") (At the time of writing it is unknown whether using double quotes escaped with backslashes work inside the query string.)

  • In custom queries it is safer to use an OPTIONAL MATCH rather than a standard MATCH, as an unsuccessful MATCH can result in ONgDB errors and documents not being pushed to Elasticsearch.

A more Complex Schema

Let us look at another example, where we're forced to use what the producer returns in a slightly different way. This is the schema for the director index. Recall that the index had one index strategy called full_slice that returned a slice that internally holds a person (named director) and a list of movies they had directed (named movies).

{
"director": {
"schema": {
"born": {
"schema": {
"generatorStrategies": {
"full_slice": ""
},
"searchQuery": {}
}
},
"name": {
"schema": {
"generatorStrategies": {
"full_slice": ""
},
"searchQuery": {}
}
},
"numMovies": {
"schema": {
"generatorStrategies": {
"full_slice": "WITH size(slice.movies)"
},
"searchQuery": {}
}
"directedAndStarring": {
"schema": {
"generatorStrategies": {
"full_slice": "OPTIONAL MATCH (:Person {grn: slice.director.grn})-[:ACTED_IN]->(m:Movie) WHERE m IN slice.movies WITH collect( m {.*} )"
},
"searchQuery": {}
}
}
}
}
}

Like before, we are able to use empty strings for born and name since the anchor, slice.director has the properties born and name defined (internally they will be accessed as slice.director.born and slice.director.name). The other two properties to be stored in the Elasticsearch director documents, numMovies and directedAndStarring, use a custom Geequel query.

The numMovies property shows how other parts returned by the producer of an index strategy can be used in a generator strategy. The numMovies property evaluates to number of movies the person has directed.

The directedAndStarring property computes and stores the movies that a director has both directed and acted in. We include it here to point out an important difference between this custom query and the custom query for baconNumber in the person index. In our person index, we could directly use the person node returned by the producer. In our director index we return a row named slice, that internally holds a copy of the person node and the movies they have directed Since we do not directly return the person node in our producer of the index strategy, we cannot use it in the same way as the person the index. Notice that we retrieve the person node by matching on its grn, MATCH (:Person {grn: slice.director.grn}) rather than trying to directly access the node: (slice.director. This is for two reasons: First, (slice.director) is not valid ONgDB syntax. Second, even if it was valid syntax the slice returned by the producer does not actually store the node in director, but rather stores a copy of its properties. This means you cannot alias slice.director AS director and try to use it as a node, like (director). Instead, it is required to retrieve the node directly rather than working with a copy of its properties.

Search Query

The searchQuery is currently unused, but it is planned to be able to "boost" queries. This is a method to better tune queries.

Tokenizer

A tokenizer breaks full text into individual tokens that are searchable. One particular use case is making fuzzy search. Details can be found on Elasticsearch | Tokenizers. As for now, we support the usage of Partial Word Tokenizers. The tokenizer object consists of four parameters: type, minGram, maxGram and tokenChars. The explanation of parameters is shown here.

Suggester

A suggester suggests similar looking terms based on a provided text. It uses different data structure and algorithms, making it faster than the normal search to look up similar looking terms. The suggester object consists of one parameter type. The supported suggester types are Term Suggester, Phrase Suggester, Completion Suggester, and Context Suggester.

Globals

Globals are the last part of the index policy. As their name suggests, these are variables that may be used in any generator strategy. The outermost string is "globals", followed by a list of the globals. The globals have a name, in this case "averageBaconNumber" and "kevinBacon". Inside this, we have a "query"ß that defines the statement used to evaluate the global variable.

{
"globals": {
"averageBaconNumber": {
"query": "MATCH p=shortestPath((bacon:Person {name:'Kevin Bacon'})-[*0..]-(n)) WITH avg(length(p)/2)"
},
"kevinBacon": {
"query": "MATCH (kevin:Person) WHERE kevin.name='Kevin Bacon' WITH kevin {.*}"
}
}
}

The globals above are called averageBaconNumber and kevinBacon. The averageBaconNumber has the value of the average bacon number, and kevinBacon global holds a copy of the properties of the Kevin Bacon node. These are stored in Elasticsearch under the index "globals" and may be used in any generator strategy with the syntax {averageBaconNumber} and {kevinBacon}. Like everything else stored in Elasticsearch, it must be a scalar, map, or list. Global queries follow the same ending rules as the generatorStrategies (See sub-subsection Generator Strategies: Customized Geequel). Global names must be unique. In the section below, we how an example of how globals are used in custom queries.

Example Index Policy

Combining metadata, indexes, and globals we arrive at our completed index policy:

{
"metadata": {
"description": "Example index policy for Person."
"displayName": "beta-person-policy",
"createdAt": "2018-07-13T13:37:17-04:00",
"updatedAt": "2018-07-13T13:37:18-04:00",
"state": "ONLINE",
"versions": []
},
"indexes": {
"person": {
"displayName": "person",
"indexData": {
"description": "Index policy for person. Stores the 'Person' node type in Elasticsearch.",
"displayName": "person"
},
"indexStrategies": {
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS person RETURN person",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "10000",
"parallel": "true",
"retries": "0",
"iterateList": "true"
},
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS person RETURN person",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "10000",
"parallel": "true",
"retries": "0",
"iterateList": "true"
}
},
"schema": {
"born": {
"schema": {
"generatorStrategies": {
"defaultFull": "",
"defaultPartial": ""
},
"searchQuery": {},
"tokenizer": {},
"suggester": {}
}
},
"name": {
"schema": {
"generatorStrategies": {
"defaultFull": "WITH coalesce(person.name, person.full_name)",
"defaultPartial": "WITH coalesce(person.name, person.full_name)"
},
"searchQuery": {},
"tokenizer": {},
"suggester": {}
}
},
"baconNumber": {
"schema": {
"generatorStrategies": {
"defaultFull": "MATCH p=shortestPath((bacon:Person {name:{kevinBacon}.name})-[*0..]-(person)) WITH length(p)/2",
"defaultPartial": "MATCH p=shortestPath((bacon:Person {name:{kevinBacon}.name)-[*0..]-(person)) WITH length(p)/2"
},
"searchQuery": {},
"tokenizer": {},
"suggester": {}
}
}
}
},
},
"globals": {
"averageBaconNumber":
{
"query": "MATCH p=shortestPath( (bacon:Person {name:'Kevin Bacon'})-[*0..]-(n)) WITH avg(length(p)/2)"
},
"kevinBacon":
{
"query": "MATCH (kevin:Person) WHERE kevin.name='Kevin Bacon' WITH kevin"
}
}
}

For example, when using globals in a generator strategy we have slightly changed both generator strategies in baconNumber to use the global variable kevinBacon. We do this by accessing Kevin Bacon's name through {kevinBacon}.name.

caution

Copying this directly may result in incorrect JSON syntax. It has been changed from the original JSON format to something more readable. If copied ensure the JSON syntax is preserved and that there are no tabs or newlines in the "generatorQuery"/"query" strings! Other than reformatting, the semantics of this index policy are correct.

Triggers in Indexing

To continuously index the graph, we rely on using APOC triggers. Triggers are event-driven listeners that "fire" every write transaction. We support triggers that add properties to created nodes, triggers that update the updatedAt property anytime another property is changed on indexed nodes, and triggers that start APOC periodic repeat jobs to run indexing. The "Continuous Indexing (APOC Repeat and APOC Triggers)" subsection explains Search's use of triggers in more depth.

Indexing Endpoints and Operational Information

This section explores the indexing features, endpoints, and operation information. There are two types of endpoints used for indexing: base endpoints and broker endpoints. Base endpoints are used to manually go through the process of indexing. They can also be used to achieve specific results without running through the whole indexing process. Broker endpoints send indexing jobs through the broker and should always be used if a broker is set up.

The Indexing Process

Indexing is done through a series of four steps:

  1. Adding the correct properties for indexing to nodes on the graph.
  2. Acquiring/generating an index policy.
  3. Evaluating and storing any globals defined by the index policy.
  4. Start the APOC iterate job that pushes documents to Elasticsearch. Or setting up an APOC repeat job/trigger to achieve this.

Base Endpoints and Manual Indexing

Base endpoints are used to expose a single indexing feature offered by Search. They are useful for modularizing the indexing process, making it is easier to debug or to run a single step in the indexing process. They are also useful in understanding the indexing process as a whole.

Generating a generic Index Policy

Basic index policies can be generated from the graph itself, and then edited for customization. There are two endpoints /generateIndexPolicy and /generateCustomIndexPolicy for generating index policies from the graph.

The first endpoint /generateIndexPolicy creates an index policy for the entire graph, creating indexes for all node types and each property. Note that this generates the index policy for any node type that is returned from "CALL db.indices()". This includes all node types with constraints, even if no node of that type exists on the graph.

The second endpoint /generateCustomIndexPolicy generates an index policy for node types passed in it by a JSON.

Generating a custom Index Policy

The /generateCustomIndexPolicy is useful in a situation where only certain nodes of the graph need to be indexed into Elasticsearch. The API endpoint is a POST method and requires a JSON body. Below is an example:

{
"nodeTypes": ["Person", "Movie"]
}

The JSON is a single list, named nodeTypes, with contents that are the node types (labels) that an index policy should be generated for. An index policy for Person and Movie nodes would be generated by the above example.

Assumptions in Generated Index Policies

These endpoints make assumptions about generated index policies. They assume that the labels of nodes are acceptable Elasticsearch index names when fully lower case. An index will not be generated if the lowercase labels fail to validate. It is also assumed that the unique identifier will always be grnß. Generated policies for graphs where the unique identifier is not a "grn" will need to manually edit the anchorId property in the index strategies.

Saving, Loading, and Deleting Index Policies

Index policies can be saved, loaded, and deleted from S3. The cluster name in nearly all endpoints specify the file path within S3, while the parameters spring.aws.index.s3.bucket.name and spring.aws.index.s3.bucket.region set in the ECS json files specify the bucket name and bucket region. The base endpoint /saveIndexPolicy is used to save index policies directly.

Required Properties for Indexing and Forcing Index Properties

The two properties updatedAt and lastSearchIndexedAt are used to keep track of whether a node has been indexed or not. GraphGrid Search provides capabilities to add these two properties to node types the user specifies.

This is done using the /forceIndexProperties POST endpoint. The body of the POST is a JSON format that has one list called nodeTypes. Here is an example:

{
"nodeTypes": ["Person", "Movie"]
}

(Notice that this is the same syntax used for generating custom index policies.)

This will create the two properties updatedAt and lastSearchIndexedAt on all Person and Movie nodes. If there is already an updatedAt property then it only creates the lastSearchIndexedAt property. If both are missing it configures updatedAt to the current time and sets lastSearchIndexedAt to zero.

The generated strategy defaultPartial uses these two properties for knowing which nodes to create documents for. Streamlined endpoints always add/update these two properties to nodes that will be indexed.

Evaluating and Storing Globals In Elasticsearch

Globals are actually stored in Elasticsearch themselves, and thus must be evaluated and then pushed up to Elasticsearch. When indexing through base endpoints there is only one endpoint that will compute and store globals in Elasticsearch. The endpoint /indexGlobals must be run if an index policy makes use of globals,ß and it is being sent through base endpoints.

Continuous Indexing (APOC Repeat and APOC Triggers)

To achieve continuous indexing when no broker is configured the GraphGrid Search service makes use of apoc.periodic.repeat combined with apoc triggers. These methods of indexing do not scale; a broker is required for large indexing jobs.

APOC Repeat

An APOC repeat job is an APOC procedure that executes custom Geequel on a timed basis. The repeat job kicks off an indexing job every 60 seconds.

APOC repeat jobs are not persistent and will stop if the ONgDB instance is restarted (to achieve persistence we use a trigger to restart APOC repeat jobs if they stop).

APOC repeat jobs can be set up manually through the endpoints /createRepeatJobsForPolicy and /createRepeatJobsForIndex. The index strategy specified in the endpoints' indexStrategyName path variable is the index strategy used to create the apoc repeat job. The index strategy used is important so please see the subsection "Index Strategies and Apoc Repeat/Triggers".

Apoc Triggers

The endpoints /createTriggersForPolicy and /createTriggersForIndex each create a total of three triggers. The former does it for all indexes in an index policy, the latter does it for a specific index in an index policy.

  1. One trigger adds the properties grn, updatedAt, lastSearchIndexedAt to newly ingested nodes that have been/will be indexed. These properties are only added if they do not yet exist on the node.
  2. The second trigger listens for changes in properties of nodes that have been/will be indexed. Whenever a node's property is changed, this trigger ensures the node's updatedAt property is also changed. This actuates how the indexing job kicked off by the apoc repeat job picks up new nodes that need to be indexed.
  3. The third trigger starts an apoc repeat job for the index.

These triggers persist on the ONgDB instance even if it is restarted and are used as a way to make sure indexing happens continuously. If the ONgDB instance is restarted, it will require a WRITE transaction to cause the trigger to start the apoc repeat job.

Similar to apoc repeat jobs, the index strategy used when creating apoc triggers is extremely important. The index strategy specified in the endpoints' indexStrategyName path variable is the index strategy used to create the apoc repeat job. Please see the subsection "Index Strategies and Apoc Repeat/Triggers".

Index Strategies and Apoc Repeat/Triggers

Due to how GraphGrid Search achieves continuous indexing it is important which index strategy is used to create the apoc repeat jobs and the apoc triggers. As a rule, DO NOT create apoc repeat jobs or these apoc triggers using the defaultFull generated index strategy. The defaultRepeat index strategy should be used by instead for reasons explained below.

The index policy and index strategy used when creating an apoc repeat job or a trigger should have a producer that filters anchor nodes by checking if updatedAt > lastSearchIndexedAt. This filtering is extremely important as it makes sure the indexing job (really an apoc iterate job) running inside the apoc repeat job/trigger will eventually index every document, even if the indexing is interrupted. The generated index strategy defaultRepeat does this producer filtering and defaultFull does not, which is why it is so important to use the defaultRepeat strategy.

If an apoc repeat job/trigger is created from an index strategy that does NOT filter the producer then the same documents will be re-indexed every 25 seconds, regardless if they have changed or not. The best case is where every document is indexed in 25 seconds. Even in this case there will be unnecessary overhead on any unchanged document created and pushed to Elasticsearch. The worst case is where not all documents are indexed in the 25 seconds. The indexing job does not filter nodes it has already indexed it will restart indexing from the beginning. This case continuously indexes the same set of documents every 25 seconds and fails to finish indexing fully. This should really be avoided as it is repetitive, wasteful, and risks never fully completing indexing.

note

The defaultPartial strategy was previously used for APOC repeat jobs, this has now been changed to better use the defaultRepeat strategy.

Using Base endpoints to Index and create ES documents

With all the information above we are ready to create an index policy and start indexing the graph! This section briefly covers how to use base endpoints to index the graph.

  1. The first thing is to ensure that the node types to be indexed have updatedAt and lastSearchIndexedAt properties, and that updatedAt is less than lastSearchIndexedAt. Run the endpoint /forceIndexProperties with a JSON specifying the node types to force the configuration of those index properties.

  2. Now we will generate and save an index policy. Use the endpoint /generateAndSaveCustomIndexPolicy with a JSON body to generate a specific index policy. Copy the resulting JSON returned by the endpoints and edit the policies if desired. If any globals are added to the index policy make sure you run the /indexGlobals endpoint!

  3. Make sure the necessary triggers are added to the ONgDB instance by running /createTriggersForPolicy and make sure to use the defaultPartial index strategy!

  4. Now use the /createRepeatJobsForPolicy and make the JSON index policy the body. Make sure to use the defaultPartial index strategy! Running this endpoint will create the apoc repeat job that will start indexing the graph.

  5. Lastly, run /savePolicy and save the index policy.

This is the process to set up persistent indexing for a generated index policy using the base endpoints.

Timeout Risks

Whenever an indexing job (apoc iterate job) is run manually, it can timeout. A timeout has consequences that one should be aware of.

If a timeout occurs, it means that the indexing job ran did not finish. The indexing job can be run again with a defaultPartial to try and finish the indexing.

However, after a timeout, the documents will have been committed to Elasticsearch but the anchor nodes associated with those documents will not have their lastSearchIndexedAt property updated. Those nodes will be re-indexed if an indexing job is started again using defaultPartial. This wastes some time, but is not a big problem, UNLESS the timeout occurs BEFORE the first batch of indexed nodes is committed. If that occurs, then it is a big problem. If the first batch is not completed within the timeout time then it is not committed, and that means nothing was created/updated. Even though documents were pushed to Elasticsearch no progress was made and running another indexing job will just recreate and republish the same documents to Elasticsearch. This is a situation where we never actually commit anything, no matter how many times we run the manual indexing. Lowering the batchSize in the index strategy can fix this. Just a precaution and something be aware of.

Broker Integration Strategies

Currently, SQS and RabbitMQ are supported as brokers. Brokers are used to distribute the load of indexing jobs. The indexing jobs can be distributed to either a single GraphGrid Search instance or multiple Search instances. Since the load can be distributed between multiple Search instances, this method of indexing is scalable. By default, the service tries to process index jobs directly (without the use of any messaging system) and only uses the messaging system when specified.

When indexing tens of thousands of nodes there are not many differences between using brokers and using the other methods of indexing. They do work very different on an internal level though and greatly differ in performance and stability as the number of nodes enters the millions.

Enabling SQS and RabbitMQ is done through the application configuration when GraphGrid Search is deployed. To enable SQS set spring.sqs.enabled to true. To enable RabbitMQ set spring.rabbitmq.enabled to true.

Once enabled, using the broker for indexing is specified by each index strategy under the optional key BROKER. The BROKER key can be one of three values: DIRECT, SQS, or RABBITMQ.

The DIRECT option just runs the above mentioned apoc-iterate jobs in batches of 1000, it does not use a broker and cannot scale. This value is used to prevent the defaultBroker index strategy from trying to access SQS and RABBITMQ when they have not been enabled.

The SQS option uses the SQS queue as a messaging system.

The RABBITMQ option uses the RABBITMQ queue as a messaging system.

The index policy generator automatically generates an index strategy that should be used with the broker endpoints, called defaultBroker. The other generated index strategies defaultFull, defaultPartial, and defaultRepeat should not be used with the broker-based endpoints as they use different methods to index the graph.

Broker Based Endpoints:

  • brokerIndexFull
  • brokerIndexPartial
  • brokerIndexPolicyFull
  • brokerIndexPolicyPartial

Other generated index strategies cannot be used for the broker because their producers are different. The defaultBroker makes use of a property named idList which is a list of the anchorIds from the received broker message. This is used to index exactly what nodes were partitioned and sent within that message.

All endpoints pull index policies from S3 directly and do not allow for a passed-in index policy.

Keeping with the convention the brokerIndexFull and brokerIndexPolicyFull endpoints indexes all nodes while the brokerIndexPartial and brokerIndexPolicyPartial endpoints indexes nodes where updatedAt > lastSearchIndexedAt.

Similar to the other methods of indexing the updatedAt and lastSearchIndexedAt properties are expected to be on the nodes!

info

Any node with the property brokerMessageNumber will not be indexed by a broker. This property is used to keep track of which nodes are in-route to be indexed by a broker. If nodes are not being indexed by the broker and there is no error message then check if they have this property and remove it.

If possible always use a broker for indexing and avoid the other strategies. The broker is a reliable way to ensure continue indexing occurs even if something goes wrong. It can also be used to split up indexing between multiple deployed GraphGrid Search instances!

Internal information on Brokers

  • The broker works by getting all the nodes to be indexed (either getting all of them in the case of brokerIndexFull endpoints or getting the ones where updatedAt > lastSearchIndexedAt in the case of brokerIndexPartial endpoints) and partitioning these nodes into batches.
  • Partitioning works by setting getting a limited number of nodes without the property brokerMessageNumber, then sets that property. This ensures that the nodes and their anchorIds are gotten in batches, rather than all at once. This property brokerMessageNumber is then set to null on the node after its corresponding document is pushed up to Elasticsearch. Any node with the property brokerMessageNumber will not be indexed by a broker. This property is used to keep track of which nodes are in-route to be indexed by a broker. If nodes are not being indexed by the broker and there is no error message then check if they have this property and remove it.
  • These batches are then passed with the index policy information to the broker who distributes the messages to any open Search instance.
  • GraphGrid Search consumes the messages and indexes the nodes specified by the message. The property brokerMessageNumber is set to null after being indexed.

Notes on Indexing

  • The structure of the JSON for both indexes and globals is very important and cannot be ignored. For example if the index policy uses no globals, still make sure there is a globals part of the JSON even though it will be an empty object.

  • When using the generated defaultPartial index strategy in any base endpoint context (i.e. manual indexing, setting up a repeat job, setting up triggers) the nodes to be indexed are expected to have an updatedAt and lastSearchIndexedAt properties. If these two properties are missing, indexing will break. Streamlined endpoints automatically add these properties, so the only concern would be a user manually deleting these properties.

  • Accessing the Elasticsearch Data Manually

    • Index Names: The indexing job stores documents in indexes name which, if generated, corresponds to the decapitalized node type. For example, the documents for node type "Person" is stored under the index "person".
    • Type Names: The indexing job automatically stores the documents in their specified index and under the type "doc"ß. There is currently no way to alter this through the index policy.
    • To access ES data manually requires the index, type, and either id or a query. The first two are extremely important for accessing the ES data manually using curl commands or apoc.es commands.

    Note these examples that use both the index and type to query all the Elasticsearch Documents under the "person" index and the "doc" type: curl -XGET "localhost:9200/person" CALL apoc.es.query("localhost", "person", "doc", "_all", null) YIELD value

  • The ES Id: The unique identifier for a document is its corresponding node's anchorId (the anchor's anchorId). If the node does not have the property that corresponds to the anchorId then a RuntimeException will be thrown by ONgDB and indexing will halt. Ensure all nodes to be indexed have a property specified by the anchorId in the index strategy being used.

Searching

The search endpoint is at the root of the API (search/). Two path variables are required in order to use the search endpoint:

  • clusterName: the name of the GraphGrid Cluster
  • policyName: the name of the Index Policy

Only the indexes and properties defined in the index policy will be searched. Details regarding the concepts above can be found here - Index Policy. GraphGrid Search also has a functionality that exposed the Elasticsearch scripting capability. More API information about scripting can be found here.

The complete endpoint path:
search/{clusterName}/{policyName}/?query={terms=""}

Query

A JSON string that defines the query parameters. The parameters can be grouped into several categories:

General

ParameterDescription
terms stringKeywords to search for
indexes optional string arrayAn array of node types to search. Default: ["_all"]
properties optional string arrayAn array of node properties to search Default: ["_all"]

Pagination

ParameterDescription
pageNumber optional integerThe number of pages of search result to navigate to.
pageSize optional integerThe limit of search results per page.
pagePerIndex optional booleanWhether to return the pagination results for each index.

Fuzziness

ParameterDescription
fuzzySearch optional booleanWhether to enable fuzzy search.
editDistance optional integer available values: - 0 - 1 - 2Maximum allowed Levenshtein Edit Distance(number of edits). Default: AUTO, a variable set by Elasticsearch based on the length of the terms: (1) length [0,2] -> 0: result field must match exactly with the query term (2) length [3,5] -> 1: allow up to 1 edit (3) length [6,∞) -> 2: allow up to 2 edits Therefore, at default the number of results increase as the length of query term changes from 2 to 3 and 5 to 6.
prefixLength optional integerThe number of initial characters which will not be "fuzzified". This helps to reduce the number of terms which must be examined. Default: 0
maxExpansions optional integerThe maximum number of terms that the fuzzy query will expand to. Default: 50
transpositions optional booleanWhether fuzzy transpositions (ab → ba) are supported. Default: true

Tokenizer

ParameterDescription
tokenizer optional object, available values: -ngram -edge_ngramBuild search query with a tokenizer. A tokenizer breaks full text into individual tokens that are searchable Default: ngram

Suggester

ParameterDescription
suggester optional object, available values: -term -phrase -completion -contextBuild search query with a suggester. The suggest feature suggests similar looking terms based on a provided text by using a suggester. It is useful for fast suggestion look up. Properties must be explicitly defined for using the suggester. Suggest feature uses data structure and algorithm that are different from the normal search, hence including this parameter will disable the features in normal search (pagination, fuzzy search and tokenizers). Also, the suggested result(s) include a text field that indicates the content of the suggested matches.
maxSuggestSize optional integerMaximum number of the suggested results to display Default: 10

Result

A JSON object that describes the results returned by the search API. There are two main parts: report and data.

Report

ParameterDescription
type string, available values: -info -warning -errorThe type of overall status of the search result.
message stringThe message that explains the search result.
failures optional arrayIf the search query cannot be executed successfully, failures list the exceptions happened grouped by index.
summary objectAn object field that holds count statistics of the search result. See below.

Summary

ParameterDescription
totalCount integerThe total number of documents retrieved by the search service.
countDistribution optional mapA map of each index appeared in the search result and its count.

Data

It contains the retrieved documents grouped by index.

ParameterDescription
documentCount integerThe number of documents from an index.
documentData objectThe document data from an index.

Document Data

ParameterDescription
id stringthe GRN of a document.
score doubleScore returned by the Elasticsearch.
caption stringThe display field for a document result. It is set on the index level. If none of the fields defined is found in the source, this field will be empty.
source objectThe source data of a document.

Example Search Queries

For examples of search queries and results visit this page.

API

This Search API version is 1.0 and as such all endpoints are rooted at /1.0/. For example, http://localhost/1.0/search/ (requires auth) would be the base context for this Search API under the GraphGrid API deployed at http://localhost. Here we will cover the searching and base search endpoints. For more endpoints related to other search features and index policies, see the links below.

Base Endpoints

Get Status

Check the status of GraphGrid Search. Will return a 200 OK response if healthy.

Base URL: /1.0/search/status
Method: GET

Request

curl --location --request GET "${API_BASE}/1.0/search/status"

Response

200 OK

Search All

Base URL: /1.0/search/{{clusterName}}/{{policyName}}/?query={"terms"=""}
Method: GET

Performs a textual search across indexes managed for a GraphGrid Cluster.

ParameterDescription
clusterName stringThe GraphGrid Cluster to search across.
Request
curl --location --request GET "${API_BASE}/1.0/search/default/gg-dev-index-policy/?query=%7B%22terms%22:%22keanu%22%7D" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}"
Response
{
"report": {
"summary": {
"totalCount": 1,
"countDistribution": {
"person": 1
}
}
},
"data": {
"person": {
"documentCount": 1,
"maxScore": 3.2498474,
"documentData": [
{
"id": "grn:gg:person:MXSlswlrRt4Vl80OiEsskriQcLb3ASP9y8jzpaHmz8PK",
"score": 3.2498474,
"caption": "Keanu Reeves",
"source": {
"name": "Keanu Reeves"
}
}
]
}
},
"suggest": {},
"collapse": {},
"searchParams": {}
}

Index Policy Enpoints

Generate Index Policy

Generates an Index Policy for the GraphGrid Cluster.

Base URL: 1.0/search/{{clusterName}}/generateIndexPolicy/{{policyName}}
Method: POST

ParameterDescription
clusterName stringThe GraphGrid Cluster with an Index Policy.
policyName stringThe name used to load the index policy on S3.
Request
curl --location --request POST "${API_BASE}/1.0/search/default/generateIndexPolicy/example-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}"
Response
{
"metadata": {
"description": "Generated index policy for node types: [Movie, Person].",
"displayName": "example-index-policy",
"createdAt": "2021-03-12T18:57:14.247Z",
"updatedAt": "2021-03-12T18:57:14.248Z",
"versions": [],
"state": "OFFLINE"
},
"caption": {
"priority": [
{
"property": "title"
},
{
"property": "name"
},
{
"property": "address"
},
{
"property": "phone"
},
{
"property": "grn"
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"movie": {
"indexData": {
"description": "Generated 'movie' index from the 'Movie' node type.",
"displayName": "movie",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Movie`) WHERE n.grn IN {idList} WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Movie`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Movie`) WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"tagline": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"title": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"released": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
},
"person": {
"indexData": {
"description": "Generated 'person' index from the 'Person' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Person`) WHERE n.grn IN {idList} WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"born": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"name": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lon": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lat": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
}
},
"globals": {},
"views": {}
}

Generate Custom Index Policy

Generates a custom Index Policy for the GraphGrid Cluster.

Base URL: /search/{{clusterName}}/generateCustomIndexPolicy/{{policyName}}
Method: POST

ParameterDescription
clusterName stringThe GraphGrid Cluster with an Index Policy.
policyName stringThe name used to load the index policy on S3.
Request
curl --location --request POST "${API_BASE}/1.0/search/default/generateCustomIndexPolicy/gg-dev-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}" \
--data-raw '{
"nodeTypes" : [
"Person"
]
}'
Response (200)
{
"metadata": {
"description": "Generated index policy for node types: [Person].",
"displayName": "gg-dev-index-policy",
"createdAt": "2021-03-12T19:43:17.528Z",
"updatedAt": "2021-03-12T19:43:17.528Z",
"versions": [],
"state": "OFFLINE"
},
"caption": {
"priority": [
{
"property": "title"
},
{
"property": "name"
},
{
"property": "address"
},
{
"property": "phone"
},
{
"property": "grn"
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"person": {
"indexData": {
"description": "Generated 'person' index from the 'Person' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Person`) WHERE n.grn IN {idList} WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"born": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"name": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lon": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lat": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
}
},
"globals": {},
"views": {}
}

Save Index Policy

Saves an Index Policy to a GraphGrid Cluster on S3 under the policy name. Returns the saved policy.

Base URL: /1.0/search/{{clusterName}}/saveIndexPolicy/{{policyName}}
Method: POST

ParameterDescription
clusterName stringThe GraphGrid Cluster with an Index Policy.
policyName stringThe name used to load the index policy on S3.
Request
curl --location --request POST "${API_BASE}/1.0/search/default/saveIndexPolicy/example-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}" \
--data-raw '{
"metadata": {
"description": "Index policy for node types: [Movie, Person].",
"displayName": "example-index-policy",
"createdAt": "2021-03-12T18:57:14.247Z",
"updatedAt": "2021-03-12T18:57:14.248Z",
"versions": [],
"state": "OFFLINE"
},
"caption": {
"priority": [
{
"property": "title"
},
{
"property": "name"
},
{
"property": "address"
},
{
"property": "phone"
},
{
"property": "grn"
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"movie": {
"indexData": {
"description": "Generated 'movie' index from the 'Movie' node type.",
"displayName": "movie",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Movie`) WHERE n.grn IN {idList} WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Movie`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Movie`) WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"tagline": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"title": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"released": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
},
"person": {
"indexData": {
"description": "Generated 'person' index from the 'Person' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Person`) WHERE n.grn IN {idList} WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"born": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"name": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lon": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lat": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
}
},
"globals": {},
"views": {}
}

Force Index Properties

Enforces that the node(s) on the graph have the lastSearchIndexedAt and updatedAt properties that are required for indexing.

Request
curl --location --request POST "${API_BASE}/1.0/search/default/forceIndexProperties/2" \
--header "Authorization: Bearer ${BEARER_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw '{
"nodeTypes": [
"Person",
"Movie"
]
}'
Response
["Person", "Movie"]
A limit may be set to limit how many properties are set at once. The limit parameter would be set like this: /1.0/search/{{clusterName}} forceIndexProperties/{{limit}}

Load Index Policy

Loads and returns the Index Policy for the GraphGrid Cluster.

Base URL: /1.0/search/{{clusterName}}/loadIndexPolicy/{{policyName}}
Method: GET

ParameterDescription
clusterName stringThe GraphGrid Cluster with an Index Policy.
policyName stringThe name used to load the index policy on S3.
Request
curl --location --request GET "${API_BASE}/1.0/search/default/loadIndexPolicy/example-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}"

Index Globals

Evaluates and stores globals into the Elasticsearch index globals.

Base URL: /1.0/search/{{clusterName}}/indexGlobals/{{policyName}}/
Method: POST

ParameterDescription
clusterName stringThe GraphGrid Cluster with an Index Policy.
policyName stringThe name used to load the index policy on S3.
Request
curl --location --request POST "${API_BASE}/1.0/search/default/indexGlobals/gg-dev-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}" \
--data-raw '{
"metadata": {
"description": "Test index policy for custom queries using the Person label.",
"displayName": "customQuery-index-policy",
"createdAt": "2018-12-18T19:53:26+00:00",
"updatedAt": "2018-12-20T15:45:11+00:00",
"state": "OFFLINE",
"versions": [
]
},
"caption": {
"priority": [
{
...
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"person": {
"indexData": {
"description": "Generated '\''person'\'' index from the '\''Person'\'' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "true",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
...
"globals": {
"keanuReeves": {
"query": "MATCH (keanu:Person) WHERE keanu.name='\''Keanu Reeves'\'' WITH keanu {.*}"
}
}
}'
Response (200)
{
"metadata": {
"description": "Test index policy for custom queries using the Person label.",
"displayName": "customQuery-index-policy",
"createdAt": "2018-12-18T19:53:26+00:00",
"updatedAt": "2018-12-20T15:45:11+00:00",
"versions": [],
"state": "OFFLINE"
},
"caption": {
"priority": [
{
...
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"person": {
"indexData": {
"description": "Generated 'person' index from the 'Person' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "true",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
...
}
},
"globals": {
"keanuReeves": {
"query": "MATCH (keanu:Person) WHERE keanu.name='Keanu Reeves' WITH keanu {.*}"
}
},
"views": {}
}