GraphGrid Search
2.0
API Version 1.0
Introduction
GraphGrid Search provides textual search capabilities across a graph database by integrating ONgDB with Elasticsearch. Users are allowed to define policies for populating indexes based on highly customizable Elasticsearch documents built using policies that support Geequel. Search provides two main capabilities. First, it takes nodes, their properties, and customizable queries and stores them in Elasticsearch; this is called indexing. Second it takes text input and searches across documents for matches; this is called searching.
Environment
Search requires the following integrations:
- ONgDB 1.0+
- Elasticsearch 5.5+
Search supports the following integrations:
- RABBITMQ 3.5+
- AWS SQS
Indexing
Indexing refers to the process of pushing up documents in to indexes that are held by the Elasticsearch server. Information must be indexed in to Elasticsearch before it can be searched for and retrieved.
If a broker (such as RABBITMQ or SQS) is set up to be used by GraphGrid Search, then indexing should always be done using the broker. Indexing using apoc repeat jobs and triggers cannot scale as a broker is able to and cannot be used across a cluster.
Index Policies
An index policy is used to define what indexes and documents should be created in Elasticsearch using graph data from ONgDB. The policy is composed of three parts, indexes, globals and metadata. Indexes define what should be stored (indexed) in Elasticsearch. Globals define variables that can be used in custom queries for storing custom information. Metadata is used to store information about the index policy itself.
Index policies can be generated from the nodes on the ONgDB graph itself. These generated policies can be used as-is or can be edited for optimizing and storing custom information. Index policies are written in JSON and are stored in AWS S3.
Metadata
Information about the index policy itself is stored as metadata. It includes a description, a displayName, a createdAt, an updatedAt, a state, and previous versions of the index policy.
The displayName
is used to store the name of the index policy.
The createdAt property contains when the original index policy was created, and updatedAt contains when the index policy was last saved. They are both stored in the ISO-8601 format.
The state of an index policy can be ONLINE, OFFLINE, POPULATING, or FAILED. An index policy with an ONLINE property is the index policy that was most recently used to index documents to Elasticsearch. An index policy with an OFFLINE property means it is not actively being used for indexing. An index policy with a POPULATING property means the index policy is currently being used to index and push documents to Elasticsearch (Note: This state is only applied when the index policy is manually used to index through specific endpoints. An index policy used to create apoc repeat jobs/triggers that push documents to Elasticsearch will have an ONLINE state). An index policy with a FAILED state indicates that an index within the index policy failed to push documents to Elasticsearch.
The versions
property in metadata is a list of the previous index policies of the same displayName in the same cluster. The configurable value maxVersions
sets how many previous index policies are stored at once, with a default value of six. Note that the previous policies are displayed in full, so when editing an
index policy make you are editing the actual index policy and not a previous version!
Indexes
An index in the index policy is used to specify what and how information will be stored in Elasticsearch. An index can be broken into three parts, indexData, index strategies, and a schema. Below is an example of a person index policy, without any inner parts expanded.
{
"person": {
"indexData":{ ... },
"indexStrategies": { ... },
"schema": { ... }
}
}
The name of an index, "person" in this case, is validated and must be an acceptable index name for Elasticsearch. The index names are validated by the following
regex: [a-z0-9][a-z0-9_\\-]*[a-z0-9]
That is, names must be lowercase, starting and ending with a letter or number, and allowing underscores and dashes. Names must be at least two characters long.
Index Data
Information about the index is stored here. It includes a description, and a displayName. The description is a statement about the index. The displayName is used to store an alias of the index. Since the index name is subject to validation the displayName allows for a more flexible name if desired.
{
"person": {
"indexData": {
"description": "The 'person' index representing the 'Person' node label in Elasticsearch.",
"displayName": "person"
}
}
}
Index Strategies
The index strategies define how we query the ONgDB graph to acquire information that we want to store in Elasticsearch. In an index the index strategies are
used to define different ways to get the same base information that will be stored in Elasticsearch. They can be used to achieve different indexing features, all
from the same index policy. For example, one index strategy can be used to index all Person
nodes while another index strategy can be used to index any Person
nodes that have not yet been indexed. Below is an index with two index strategies, named defaultFull
and defaultPartial
, that do just that.
{
"person": {
"indexStrategies": {
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS person RETURN person",
"anchorLabel": "Person",
"anchor": "person",
"anchorId": "person.grn"
},
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS person RETURN person",
"anchorLabel": "Person",
"anchor": "person",
"anchorId": "person.grn"
}
}
}
}
Notice that each index strategy contains a producer
, anchorLabel
, anchor
, and anchorId
. These are four required properties out of eight total
possible properties (more on the others later).
The producer
is a Geequel statement that should return any information that will be used in the schema. The information returned by the producer is also used
to define some other parameters of an index strategy.
In our example the producer retrieves nodes of type Person
and then returns those nodes under the variable person
. The only difference between defaultFull
and defaultPartial
is the producer. Specifically, defaultPartial
has a WHERE
clause that filters out all Person
nodes that have already been indexed.
The anchor
is used to connect each Elasticsearch document with a node on the graph. The anchor is evaluated in context of what is returned by the producer.
In our example, the producer returns person
and these returned nodes are also what we want our Elasticsearch documents to be connected to. Thus, the anchor is
also person
. In an example below, we look at a case where the producer returns is not exactly the same as the anchor (See "More complex Index Strategies").
The anchorLabel
is the label that is used when searching for the anchor node. Choosing a label that makes it quick to find the anchor will help speed up
performance in certain situations. The example has an anchorLabel
of Person
, since our anchor person
is of type Person. Since full label scans are slow, an
anchorLabel
is required.
The anchorId
is used as the unique identifier for the Elasticsearch document. Similar to the anchor, the anchorId
is evaluated in context of what is returned
by the producer. The anchorId
should evaluate to a unique value. In the example, every person
has a unique grn
property, and so person.grn
is always
unique. Every person
has a unique grn
, therefore we can be assured every person
node will have its own Elasticsearch document.
There are also validation requirements for the anchorId values, which must match the following regex: [A-Za-z0-9_\-\.\:]+
. If an evaluated anchorId
fails the
validation, the indexing process for the current index will be stopped and an error message will be pushed to ONgDB's console and debug log.
As mentioned above, an index strategy has four more parameters all of which are optional: batchSize
, parallel
, retries
, and iterateList
. These
control the way documents are indexed and can be specified or left out entirely. Specifically they are used in an internal apoc.periodic.iterate and further
documentation may be found here.
Index Strategy Recap
Here's a quick recap:
Index strategies parameters:
producer
(required, returns the information to be used in storing documents to Elasticsearch)
anchorLabel
(required, label that is used to identify the anchor)
anchor
(required, node that connects the graph to the document)
anchorId
(required, unique identifier of the node)
batchSize
(optional, how many rows processed per ONgDB transaction commit, defaults to 1,000)
parallel
(optional, whether the producer and consumer in apoc.periodic.iterate run parallel, defaults to true)
retries
(optional, retries for each failed commit, defaults to 0)
iterateList
(optional, whether the consumer consumes a single row at once or all rows at once, defaults to true which runs all rows at once)
Index strategies can be named any alphanumeric string with underscores. The names defaultFull
and defaultPartial
should be used with caution, as these are the
index strategy names for the generated index policy.
Advanced Index Policy
In the example above our producer returned Person nodes as rows, with the variable name person
. We used this returned row directly as our anchor, but sometimes
we want our producer to return more than just a single node per row. Below is an example where the producer returns more than just a single node:
"director": {
"indexStrategies": {
"full_slice": {
"producer": "MATCH (n:Person)-[:DIRECTED]->(m:Movie) WITH [{director: n, movies: collect(m)}] AS slices UNWIND slices AS slice RETURN slice",
"anchorLabel": "Person",
"anchor": "slice.director",
"anchorId": "slice.director.grn"
},
}
}
The goal of the index director
is to store each director (a Person that has DIRECTED a Movie) and their information in Elasticsearch. The index strategy
full_slice
returns a single row named slice
. Notice that slice
contains a person node, named director
, and the movies that this person has directed under
the variable named movies. Our anchor is slice.director
, since that is the node we want our Elasticsearch document to be associated with. The anchorId
is slice
director.grn which uniquely identifies our anchor and thus its Elasticsearch document. Lastly, our anchorLabel
is Person
since the anchor has the label of
Person
.
The way we use the MATCH
clause ensures that our producer returns slices where director
is someone that has directed at least one movie.
The index strategies define how we query the ONgDB graph to acquire information that we want to store in Elasticsearch. The next section we look into the schema, which defines what information we will store, and how we store it.
Schema
The schema defines what should be stored and how that information is computed. The schema of an index references the index strategies also defined by that
index. An example schema of an index with the given name person
is shown in the Index Policy snippet below:
{
"person": {
"schema": {
"born": { <- Property 1
"schema": {
"generatorStrategies": {
"defaultFull": "",
"defaultPartial": ""
},
"searchQuery": {},
"tokenizer": null,
"suggester": null
}
},
"name": { <- Property 2
"schema": {
"generatorStrategies": {
"defaultFull": "WITH coalesce(person.name, person.full_name)",
"defaultPartial": "WITH coalesce(person.name, person.full_name)"
},
"searchQuery": {},
"tokenizer": null,
"suggester": null
}
},
"baconNumber": { <- Property 3
"schema": {
"generatorStrategies": {
"defaultFull": "MATCH p=shortestPath((bacon:Person {name:'Kevin Bacon'})-[*0..]-(person)) WITH length(p)/2",
"defaultPartial": "MATCH p=shortestPath((bacon:Person {name:'Kevin Bacon'})-[*0..]-(person)) WITH length(p)/2"
},
"searchQuery": {},
"tokenizer": null,
"suggester": null
}
}
}
}
}
Inside the index person
we have a schema
which holds the properties that we want to store. The properties in this schema corresponds to born
, name
, and
baconNumber
.
Each property has an internal schema
that holds the generatorStrategies
which correspond to index strategies defined by the index. The properties also have
searchQuery
, tokenizer
, and suggester
, which are used to customize how searches to the Elasticsearch index can be performed.
Generator Strategies
The generatorStrategies
are maps from an index strategy name (such as defaultFull
), to a string that is used to construct the value to be stored in the
property for the Elasticsearch document. This string may be empty or may be a Geequel statement.
Generator Strategies: Empty String
Sometimes the string of a generator strategy is empty, like both strategies in born
. Whenever a generator strategy string is left empty the value stored in the
Elasticsearch document is the property of anchor
. That is, <anchor>.<propertyName>
is evaluated and stored in the Elasticsearch document.
Here's an example: for the index strategy defaultFull
our anchor is person
, and the property born
is defined for the person
node. This means that
person.born
is evaluated and stored in the person Elasticsearch document under the property born
.
If an empty string is used on a schema property that is not an anchor property, then the property will be stored as null
in the Elasticsearch document. There is
nothing inherently wrong with this and will happen when nodes of the same type have different properties.
Sometimes nodes of the same type have the same information stored in different properties, like some person
nodes using a name
property to store their name,
and other person
nodes using a full_name
property. To make sure all Elasticsearch documents for person
nodes have a name property we use some customized
Geequel inside the generator strategies.
Generated index policies make heavy use of empty generator strategies.
Generator Strategies: Customized Geequel
When the generator strategy string is not empty, like name
and baconNumber
in our example, we can use Geequel to compute a custom value. The Geequel of the
generator strategy can use whatever was returned by the producer of the same named index strategy.
In our example, both defaultFull
and defaultPartial
have producers that return person as a node. Both baconNumber
generator strategies use person
as a
node in their queries computing the bacon number. We can see this explicitly by the use of (person)
in the MATCH
clause. In our example baconNumber
has
generator strategies that compute the bacon number of the person and store it in Elasticsearch.
Another example is the name
property in the schema. As mentioned above, sometimes nodes of the same type will the same sort of property (like a 'name') stored
in different property keys. To ensure each Elasticsearch document gets correctly populated we use customized Geequel. The Geequel
"WITH coalesce(person.name, person.full_name)"
uses the first non-null property, either person.name
or person.full_name
. Using this technique, we can make
sure every Elasticsearch document has the correct information for each property, even if they are stored in different property keys on nodes.
It is important that each non-empty generator strategy string ends with a "WITH"
instead of ending with a "RETURN"
as a normal query would. Internally, we
compute the custom value and store it in a map. We do this as many times as there are custom queries and use "WITH"
s to string together multiple custom queries.
Moreover, a custom query cannot end with multiple items in the WITH
clause (e.g. "WITH [username,password]"
is allowed, but "WITH username, password"
is NOT
allowed.)
Furthermore, the ending expression must evaluate to a scalar, map, or list. This is due to Elasticsearch not being able to explicitly store certain objects (like
nodes) that cannot be converted to json implicitly. (IMPORTANT: Storing nodes and relationships in Elasticsearch is possible! Instead of ending the generator
strategy as "WITH n"
where n is a node, end it as "WITH n {.*}"
. The expression "n {.*}"
evaluates to a map of all of n's properties, which is essentially
the node n
. )
Lastly, do not alias the last WITH
clause with an AS
.
Here's a small recap of the rules when using customized queries in the generator strategies:
Custom queries must end with a "WITH <stuff>"
The <stuff>
is singular object. Do not end the WITH
with multiple things.
The <stuff>
is a scalar, map, or list.
Do not alias the final WITH
. Using "WITH <stuff> AS <name>"
will result in an error and break the indexing.
Also, be aware of the following:
The queries cannot contain a "true" tab or newline. The JSON parser will throw an error. Use "\n" and "\t" for newlines and tabs if desired. For example, using only spaces is most readable.
Do not use double quotes in the queries. You should use single quotes when using strings since usage of double quotes will actually escape the query string and will result in invalid JSON (e.g. Use 'Kevin Bacon' instead of "Kevin Bacon.") (At the time of writing it is unknown whether using double quotes escaped with backslashes work inside the query string.)
In custom queries it is safer to use an OPTIONAL MATCH rather than a standard MATCH, as an unsuccessful MATCH can result in ONgDB errors and documents not being pushed to Elasticsearch.
A more Complex Schema
Let us look at another example, where we're forced to use what the producer returns in a slightly different way. This is the schema for the director
index.
Recall that the index had one index strategy called full_slice
that returned a slice
that internally holds a person (named director) and a list of movies they
had directed (named movies
).
{
"director": {
"schema": {
"born": {
"schema": {
"generatorStrategies": {
"full_slice": ""
},
"searchQuery": {}
}
},
"name": {
"schema": {
"generatorStrategies": {
"full_slice": ""
},
"searchQuery": {}
}
},
"numMovies": {
"schema": {
"generatorStrategies": {
"full_slice": "WITH size(slice.movies)"
},
"searchQuery": {}
}
"directedAndStarring": {
"schema": {
"generatorStrategies": {
"full_slice": "OPTIONAL MATCH (:Person {grn: slice.director.grn})-[:ACTED_IN]->(m:Movie) WHERE m IN slice.movies WITH collect( m {.*} )"
},
"searchQuery": {}
}
}
}
}
}
Like before, we are able to use empty strings for born
and name
since the anchor, slice.director
has the properties born
and name
defined (internally
they will be accessed as slice.director.born
and slice.director.name
). The other two properties to be stored in the Elasticsearch director documents,
numMovies
and directedAndStarring
, use a custom Geequel query.
The numMovies
property shows how other parts returned by the producer of an index strategy can be used in a generator strategy. The numMovies
property
evaluates to number of movies the person has directed.
The directedAndStarring
property computes and stores the movies that a director has both directed and acted in. We include it here to point out an important
difference between this custom query and the custom query for baconNumber
in the person
index. In our person
index, we could directly use the person
node
returned by the producer. In our director
index we return a row named slice, that internally holds a copy of the person node and the movies they have directed
Since we do not directly return the person
node in our producer of the index strategy, we cannot use it in the same way as the person
the index. Notice that
we retrieve the person
node by matching on its grn, MATCH (:Person {grn: slice.director.grn})
rather than trying to directly access the node:
(slice.director
. This is for two reasons: First, (slice.director)
is not valid ONgDB syntax. Second, even if it was valid syntax the slice
returned by
the producer does not actually store the node in director, but rather stores a copy of its properties. This means you cannot alias slice.director AS director
and try to use it as a node, like (director)
. Instead, it is required to retrieve the node directly rather than working with a copy of its properties.
Search Query
The searchQuery
is currently unused, but it is planned to be able to "boost" queries. This is a method to better tune queries.
Tokenizer
A tokenizer breaks full text into individual tokens that are searchable. One particular use case is making fuzzy search. Details can be found on
Elasticsearch | Tokenizers. As for now, we support the usage of
Partial Word Tokenizers. The tokenizer object
consists of four parameters: type
, minGram
, maxGram
and tokenChars
.
The explanation of parameters is shown here.
Suggester
A suggester suggests similar looking terms based on a provided text. It uses different data structure and algorithms, making it faster than the normal search to look up similar looking terms. The suggester object consists of one parameter type. The supported suggester types are Term Suggester, Phrase Suggester, Completion Suggester, and Context Suggester.
Globals
Globals are the last part of the index policy. As their name suggests, these are variables that may be used in any generator strategy. The outermost string is
"globals", followed by a list of the globals. The globals have a name, in this case "averageBaconNumber"
and "kevinBacon"
. Inside this, we have a "query"ß
that defines the statement used to evaluate the global variable.
{
"globals": {
"averageBaconNumber": {
"query": "MATCH p=shortestPath((bacon:Person {name:'Kevin Bacon'})-[*0..]-(n)) WITH avg(length(p)/2)"
},
"kevinBacon": {
"query": "MATCH (kevin:Person) WHERE kevin.name='Kevin Bacon' WITH kevin {.*}"
}
}
}
The globals above are called averageBaconNumber
and kevinBacon
. The averageBaconNumber
has the value of the average bacon number, and kevinBacon
global
holds a copy of the properties of the Kevin Bacon node. These are stored in Elasticsearch under the index "globals" and may be used in any generator strategy with
the syntax {averageBaconNumber}
and {kevinBacon}
. Like everything else stored in Elasticsearch, it must be a scalar, map, or list. Global queries follow
the same ending rules as the generatorStrategies
(See sub-subsection Generator Strategies: Customized Geequel).
Global names must be unique. In the section below, we how an example of how globals are used in custom queries.
Example Index Policy
Combining metadata, indexes, and globals we arrive at our completed index policy:
{
"metadata": {
"description": "Example index policy for Person."
"displayName": "beta-person-policy",
"createdAt": "2018-07-13T13:37:17-04:00",
"updatedAt": "2018-07-13T13:37:18-04:00",
"state": "ONLINE",
"versions": []
},
"indexes": {
"person": {
"displayName": "person",
"indexData": {
"description": "Index policy for person. Stores the 'Person' node type in Elasticsearch.",
"displayName": "person"
},
"indexStrategies": {
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS person RETURN person",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "10000",
"parallel": "true",
"retries": "0",
"iterateList": "true"
},
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS person RETURN person",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "10000",
"parallel": "true",
"retries": "0",
"iterateList": "true"
}
},
"schema": {
"born": {
"schema": {
"generatorStrategies": {
"defaultFull": "",
"defaultPartial": ""
},
"searchQuery": {},
"tokenizer": {},
"suggester": {}
}
},
"name": {
"schema": {
"generatorStrategies": {
"defaultFull": "WITH coalesce(person.name, person.full_name)",
"defaultPartial": "WITH coalesce(person.name, person.full_name)"
},
"searchQuery": {},
"tokenizer": {},
"suggester": {}
}
},
"baconNumber": {
"schema": {
"generatorStrategies": {
"defaultFull": "MATCH p=shortestPath((bacon:Person {name:{kevinBacon}.name})-[*0..]-(person)) WITH length(p)/2",
"defaultPartial": "MATCH p=shortestPath((bacon:Person {name:{kevinBacon}.name)-[*0..]-(person)) WITH length(p)/2"
},
"searchQuery": {},
"tokenizer": {},
"suggester": {}
}
}
}
},
},
"globals": {
"averageBaconNumber":
{
"query": "MATCH p=shortestPath( (bacon:Person {name:'Kevin Bacon'})-[*0..]-(n)) WITH avg(length(p)/2)"
},
"kevinBacon":
{
"query": "MATCH (kevin:Person) WHERE kevin.name='Kevin Bacon' WITH kevin"
}
}
}
For example, when using globals in a generator strategy we have slightly changed both generator strategies in baconNumber
to use the global variable
kevinBacon
. We do this by accessing Kevin Bacon's name through {kevinBacon}.name
.
Copying this directly may result in incorrect JSON syntax. It has been changed from the original JSON format to something more readable. If copied ensure the JSON syntax is preserved and that there are no tabs or newlines in the "generatorQuery"/"query" strings! Other than reformatting, the semantics of this index policy are correct.
Triggers in Indexing
To continuously index the graph, we rely on using APOC triggers. Triggers are event-driven listeners that "fire" every write transaction. We support triggers that
add properties to created nodes, triggers that update the updatedAt
property anytime another property is changed on indexed nodes, and triggers that start APOC
periodic repeat jobs to run indexing. The "Continuous Indexing (APOC Repeat and APOC Triggers)"
subsection explains Search's use of triggers in more depth.
Indexing Endpoints and Operational Information
This section explores the indexing features, endpoints, and operation information. There are two types of endpoints used for indexing: base endpoints and broker endpoints. Base endpoints are used to manually go through the process of indexing. They can also be used to achieve specific results without running through the whole indexing process. Broker endpoints send indexing jobs through the broker and should always be used if a broker is set up.
The Indexing Process
Indexing is done through a series of four steps:
- Adding the correct properties for indexing to nodes on the graph.
- Acquiring/generating an index policy.
- Evaluating and storing any globals defined by the index policy.
- Start the APOC iterate job that pushes documents to Elasticsearch. Or setting up an APOC repeat job/trigger to achieve this.
Base Endpoints and Manual Indexing
Base endpoints are used to expose a single indexing feature offered by Search. They are useful for modularizing the indexing process, making it is easier to debug or to run a single step in the indexing process. They are also useful in understanding the indexing process as a whole.
Generating a generic Index Policy
Basic index policies can be generated from the graph itself, and then edited for customization. There are two endpoints /generateIndexPolicy
and
/generateCustomIndexPolicy
for generating index policies from the graph.
The first endpoint /generateIndexPolicy
creates an index policy for the entire graph, creating indexes for all node types and each property. Note that this
generates the index policy for any node type that is returned from "CALL db.indices()"
. This includes all node types with constraints, even if no node of that
type exists on the graph.
The second endpoint /generateCustomIndexPolicy
generates an index policy for node types passed in it by a JSON.
Generating a custom Index Policy
The /generateCustomIndexPolicy
is useful in a situation where only certain nodes of the graph need to be indexed into Elasticsearch. The API endpoint is a POST
method and requires a JSON body. Below is an example:
{
"nodeTypes": ["Person", "Movie"]
}
The JSON is a single list, named nodeTypes
, with contents that are the node types (labels) that an index policy should be generated for. An index policy for
Person
and Movie
nodes would be generated by the above example.
Assumptions in Generated Index Policies
These endpoints make assumptions about generated index policies. They assume that the labels of nodes are acceptable Elasticsearch index names when fully lower
case. An index will not be generated if the lowercase labels fail to validate. It is also assumed that the unique identifier will always be grn
ß. Generated
policies for graphs where the unique identifier is not a "grn" will need to manually edit the anchorId
property in the index strategies.
Saving, Loading, and Deleting Index Policies
Index policies can be saved, loaded, and deleted from S3. The cluster name in nearly all endpoints specify the file path within S3, while the parameters
spring.aws.index.s3.bucket.name
and spring.aws.index.s3.bucket.region
set in the ECS json files specify the bucket name and bucket region. The base endpoint
/saveIndexPolicy
is used to save index policies directly.
Required Properties for Indexing and Forcing Index Properties
The two properties updatedAt
and lastSearchIndexedAt
are used to keep track of whether a node has been indexed or not. GraphGrid Search provides capabilities
to add these two properties to node types the user specifies.
This is done using the /forceIndexProperties
POST endpoint. The body of the POST is a JSON format that has one list called nodeTypes
. Here is an example:
{
"nodeTypes": ["Person", "Movie"]
}
(Notice that this is the same syntax used for generating custom index policies.)
This will create the two properties updatedAt
and lastSearchIndexedAt
on all Person and Movie nodes. If there is already an updatedAt
property then it only
creates the lastSearchIndexedAt
property. If both are missing it configures updatedAt
to the current time and sets lastSearchIndexedAt
to zero.
The generated strategy defaultPartial
uses these two properties for knowing which nodes to create documents for. Streamlined endpoints always add/update these
two properties to nodes that will be indexed.
Evaluating and Storing Globals In Elasticsearch
Globals are actually stored in Elasticsearch themselves, and thus must be evaluated and then pushed up to Elasticsearch. When indexing through base endpoints
there is only one endpoint that will compute and store globals in Elasticsearch. The endpoint /indexGlobals
must be run if an index policy makes use of globals,ß
and it is being sent through base endpoints.
Continuous Indexing (APOC Repeat and APOC Triggers)
To achieve continuous indexing when no broker is configured the GraphGrid Search service makes use of apoc.periodic.repeat combined with apoc triggers. These methods of indexing do not scale; a broker is required for large indexing jobs.
APOC Repeat
An APOC repeat job is an APOC procedure that executes custom Geequel on a timed basis. The repeat job kicks off an indexing job every 60 seconds.
APOC repeat jobs are not persistent and will stop if the ONgDB instance is restarted (to achieve persistence we use a trigger to restart APOC repeat jobs if they stop).
APOC repeat jobs can be set up manually through the endpoints /createRepeatJobsForPolicy
and /createRepeatJobsForIndex
. The index strategy specified in the
endpoints' indexStrategyName
path variable is the index strategy used to create the apoc repeat job. The index strategy used is important so please see the
subsection "Index Strategies and Apoc Repeat/Triggers".
Apoc Triggers
The endpoints /createTriggersForPolicy
and /createTriggersForIndex
each create a total of three triggers. The former does it for all indexes in an index
policy, the latter does it for a specific index in an index policy.
- One trigger adds the properties
grn
,updatedAt
,lastSearchIndexedAt
to newly ingested nodes that have been/will be indexed. These properties are only added if they do not yet exist on the node. - The second trigger listens for changes in properties of nodes that have been/will be indexed. Whenever a node's property is changed, this trigger ensures
the node's
updatedAt
property is also changed. This actuates how the indexing job kicked off by the apoc repeat job picks up new nodes that need to be indexed. - The third trigger starts an apoc repeat job for the index.
These triggers persist on the ONgDB instance even if it is restarted and are used as a way to make sure indexing happens continuously. If the ONgDB instance is restarted, it will require a WRITE transaction to cause the trigger to start the apoc repeat job.
Similar to apoc repeat jobs, the index strategy used when creating apoc triggers is extremely important. The index strategy specified in the endpoints'
indexStrategyName
path variable is the index strategy used to create the apoc repeat job. Please see the subsection "Index Strategies and Apoc Repeat/Triggers".
Index Strategies and Apoc Repeat/Triggers
Due to how GraphGrid Search achieves continuous indexing it is important which index strategy is used to create the apoc repeat jobs and the apoc triggers. As
a rule, DO NOT create apoc repeat jobs or these apoc triggers using the defaultFull
generated index strategy. The defaultRepeat
index strategy should be
used by instead for reasons explained below.
The index policy and index strategy used when creating an apoc repeat job or a trigger should have a producer that filters anchor nodes by checking if
updatedAt > lastSearchIndexedAt
. This filtering is extremely important as it makes sure the indexing job (really an apoc iterate job) running inside the apoc
repeat job/trigger will eventually index every document, even if the indexing is interrupted. The generated index strategy defaultRepeat
does this producer
filtering and defaultFull
does not, which is why it is so important to use the defaultRepeat
strategy.
If an apoc repeat job/trigger is created from an index strategy that does NOT filter the producer then the same documents will be re-indexed every 25 seconds, regardless if they have changed or not. The best case is where every document is indexed in 25 seconds. Even in this case there will be unnecessary overhead on any unchanged document created and pushed to Elasticsearch. The worst case is where not all documents are indexed in the 25 seconds. The indexing job does not filter nodes it has already indexed it will restart indexing from the beginning. This case continuously indexes the same set of documents every 25 seconds and fails to finish indexing fully. This should really be avoided as it is repetitive, wasteful, and risks never fully completing indexing.
The defaultPartial
strategy was previously used for APOC repeat jobs, this has now been changed to better use the defaultRepeat
strategy.
Using Base endpoints to Index and create ES documents
With all the information above we are ready to create an index policy and start indexing the graph! This section briefly covers how to use base endpoints to index the graph.
The first thing is to ensure that the node types to be indexed have
updatedAt
andlastSearchIndexedAt
properties, and thatupdatedAt
is less thanlastSearchIndexedAt
. Run the endpoint/forceIndexProperties
with a JSON specifying the node types to force the configuration of those index properties.Now we will generate and save an index policy. Use the endpoint
/generateAndSaveCustomIndexPolicy
with a JSON body to generate a specific index policy. Copy the resulting JSON returned by the endpoints and edit the policies if desired. If any globals are added to the index policy make sure you run the/indexGlobals
endpoint!Make sure the necessary triggers are added to the ONgDB instance by running
/createTriggersForPolicy
and make sure to use thedefaultPartial
index strategy!Now use the
/createRepeatJobsForPolicy
and make the JSON index policy the body. Make sure to use thedefaultPartial
index strategy! Running this endpoint will create the apoc repeat job that will start indexing the graph.Lastly, run
/savePolicy
and save the index policy.
This is the process to set up persistent indexing for a generated index policy using the base endpoints.
Timeout Risks
Whenever an indexing job (apoc iterate job) is run manually, it can timeout. A timeout has consequences that one should be aware of.
If a timeout occurs, it means that the indexing job ran did not finish. The indexing job can be run again with a defaultPartial
to try and finish the indexing.
However, after a timeout, the documents will have been committed to Elasticsearch but the anchor nodes associated with those documents will not have their
lastSearchIndexedAt
property updated. Those nodes will be re-indexed if an indexing job is started again using defaultPartial
. This wastes some time, but
is not a big problem, UNLESS the timeout occurs BEFORE the first batch of indexed nodes is committed. If that occurs, then it is a big problem. If the first batch
is not completed within the timeout time then it is not committed, and that means nothing was created/updated. Even though documents were pushed to
Elasticsearch no progress was made and running another indexing job will just recreate and republish the same documents to Elasticsearch. This is a situation
where we never actually commit anything, no matter how many times we run the manual indexing. Lowering the batchSize
in the index strategy can fix this.
Just a precaution and something be aware of.
Broker Integration Strategies
Currently, SQS and RabbitMQ are supported as brokers. Brokers are used to distribute the load of indexing jobs. The indexing jobs can be distributed to either a single GraphGrid Search instance or multiple Search instances. Since the load can be distributed between multiple Search instances, this method of indexing is scalable. By default, the service tries to process index jobs directly (without the use of any messaging system) and only uses the messaging system when specified.
When indexing tens of thousands of nodes there are not many differences between using brokers and using the other methods of indexing. They do work very different on an internal level though and greatly differ in performance and stability as the number of nodes enters the millions.
Enabling SQS and RabbitMQ is done through the application configuration when GraphGrid Search is deployed. To enable SQS set spring.sqs.enabled to true. To enable RabbitMQ set spring.rabbitmq.enabled to true.
Once enabled, using the broker for indexing is specified by each index strategy under the optional key BROKER
. The BROKER
key can be one of three values:
DIRECT
, SQS
, or RABBITMQ
.
The DIRECT
option just runs the above mentioned apoc-iterate jobs in batches of 1000, it does not use a broker and cannot scale. This value is used to prevent
the defaultBroker
index strategy from trying to access SQS and RABBITMQ when they have not been enabled.
The SQS
option uses the SQS queue as a messaging system.
The RABBITMQ
option uses the RABBITMQ queue as a messaging system.
The index policy generator automatically generates an index strategy that should be used with the broker endpoints, called defaultBroker
. The other generated
index strategies defaultFull
, defaultPartial
, and defaultRepeat
should not be used with the broker-based endpoints as they use different methods to index
the graph.
Broker Based Endpoints:
brokerIndexFull
brokerIndexPartial
brokerIndexPolicyFull
brokerIndexPolicyPartial
Other generated index strategies cannot be used for the broker because their producers are different. The defaultBroker
makes use of a property named idList
which is a list of the anchorIds
from the received broker message. This is used to index exactly what nodes were partitioned and sent within that message.
All endpoints pull index policies from S3 directly and do not allow for a passed-in index policy.
Keeping with the convention the brokerIndexFull
and brokerIndexPolicyFull
endpoints indexes all nodes while the brokerIndexPartial
and
brokerIndexPolicyPartial
endpoints indexes nodes where updatedAt > lastSearchIndexedAt
.
Similar to the other methods of indexing the updatedAt
and lastSearchIndexedAt
properties are expected to be on the nodes!
Any node with the property brokerMessageNumber
will not be indexed by a broker. This property is used to keep track of which nodes are in-route to be
indexed by a broker. If nodes are not being indexed by the broker and there is no error message then check if they have this property and remove it.
If possible always use a broker for indexing and avoid the other strategies. The broker is a reliable way to ensure continue indexing occurs even if something goes wrong. It can also be used to split up indexing between multiple deployed GraphGrid Search instances!
Internal information on Brokers
- The broker works by getting all the nodes to be indexed (either getting all of them in the case of
brokerIndexFull
endpoints or getting the ones whereupdatedAt > lastSearchIndexedAt
in the case ofbrokerIndexPartial
endpoints) and partitioning these nodes into batches. - Partitioning works by setting getting a limited number of nodes without the property
brokerMessageNumber
, then sets that property. This ensures that the nodes and theiranchorIds
are gotten in batches, rather than all at once. This propertybrokerMessageNumber
is then set to null on the node after its corresponding document is pushed up to Elasticsearch. Any node with the propertybrokerMessageNumber
will not be indexed by a broker. This property is used to keep track of which nodes are in-route to be indexed by a broker. If nodes are not being indexed by the broker and there is no error message then check if they have this property and remove it. - These batches are then passed with the index policy information to the broker who distributes the messages to any open Search instance.
- GraphGrid Search consumes the messages and indexes the nodes specified by the message. The property
brokerMessageNumber
is set to null after being indexed.
Notes on Indexing
The structure of the JSON for both indexes and globals is very important and cannot be ignored. For example if the index policy uses no globals, still make sure there is a globals part of the JSON even though it will be an empty object.
When using the generated
defaultPartial
index strategy in any base endpoint context (i.e. manual indexing, setting up a repeat job, setting up triggers) the nodes to be indexed are expected to have anupdatedAt
andlastSearchIndexedAt
properties. If these two properties are missing, indexing will break. Streamlined endpoints automatically add these properties, so the only concern would be a user manually deleting these properties.Accessing the Elasticsearch Data Manually
- Index Names: The indexing job stores documents in indexes name which, if generated, corresponds to the decapitalized node type. For example, the documents for node type "Person" is stored under the index "person".
- Type Names: The indexing job automatically stores the documents in their specified index and under the type "doc"ß. There is currently no way to alter this through the index policy.
- To access ES data manually requires the index, type, and either id or a query. The first two are extremely important for accessing the ES data manually using curl commands or apoc.es commands.
Note these examples that use both the index and type to query all the Elasticsearch Documents under the "person" index and the "doc" type:
curl -XGET "localhost:9200/person"
CALL apoc.es.query("localhost", "person", "doc", "_all", null) YIELD value
The ES Id: The unique identifier for a document is its corresponding node's
anchorId
(the anchor'sanchorId
). If the node does not have the property that corresponds to theanchorId
then aRuntimeException
will be thrown by ONgDB and indexing will halt. Ensure all nodes to be indexed have a property specified by theanchorId
in the index strategy being used.
Searching
The search endpoint is at the root of the API (search/
). Two path variables are required in order to use the search endpoint:
clusterName
: the name of the GraphGrid ClusterpolicyName
: the name of the Index Policy
Only the indexes and properties defined in the index policy will be searched. Details regarding the concepts above can be found here - Index Policy. GraphGrid Search also has a functionality that exposed the Elasticsearch scripting capability. More API information about scripting can be found here.
The complete endpoint path:
search/{clusterName}/{policyName}/?query={terms=""}
Query
A JSON string that defines the query parameters. The parameters can be grouped into several categories:
General
Parameter | Description |
---|---|
terms string | Keywords to search for |
indexes optional string array | An array of node types to search. Default: ["_all"] |
properties optional string array | An array of node properties to search Default: ["_all"] |
Pagination
Parameter | Description |
---|---|
pageNumber optional integer | The number of pages of search result to navigate to. |
pageSize optional integer | The limit of search results per page. |
pagePerIndex optional boolean | Whether to return the pagination results for each index. |
Fuzziness
Parameter | Description |
---|---|
fuzzySearch optional boolean | Whether to enable fuzzy search. |
editDistance optional integer available values: - 0 - 1 - 2 | Maximum allowed Levenshtein Edit Distance(number of edits). Default: AUTO , a variable set by Elasticsearch based on the length of the terms: (1) length [0,2] -> 0: result field must match exactly with the query term (2) length [3,5] -> 1: allow up to 1 edit (3) length [6,∞) -> 2: allow up to 2 edits Therefore, at default the number of results increase as the length of query term changes from 2 to 3 and 5 to 6. |
prefixLength optional integer | The number of initial characters which will not be "fuzzified". This helps to reduce the number of terms which must be examined. Default: 0 |
maxExpansions optional integer | The maximum number of terms that the fuzzy query will expand to. Default: 50 |
transpositions optional boolean | Whether fuzzy transpositions (ab → ba) are supported. Default: true |
Tokenizer
Parameter | Description |
---|---|
tokenizer optional object, available values: -ngram -edge_ngram | Build search query with a tokenizer. A tokenizer breaks full text into individual tokens that are searchable Default: ngram |
Suggester
Parameter | Description |
---|---|
suggester optional object, available values: -term -phrase -completion -context | Build search query with a suggester. The suggest feature suggests similar looking terms based on a provided text by using a suggester. It is useful for fast suggestion look up. Properties must be explicitly defined for using the suggester. Suggest feature uses data structure and algorithm that are different from the normal search, hence including this parameter will disable the features in normal search (pagination, fuzzy search and tokenizers). Also, the suggested result(s) include a text field that indicates the content of the suggested matches. |
maxSuggestSize optional integer | Maximum number of the suggested results to display Default: 10 |
Result
A JSON object that describes the results returned by the search API. There are two main parts: report and data.
Report
Parameter | Description |
---|---|
type string, available values: -info -warning -error | The type of overall status of the search result. |
message string | The message that explains the search result. |
failures optional array | If the search query cannot be executed successfully, failures list the exceptions happened grouped by index. |
summary object | An object field that holds count statistics of the search result. See below. |
Summary
Parameter | Description |
---|---|
totalCount integer | The total number of documents retrieved by the search service. |
countDistribution optional map | A map of each index appeared in the search result and its count. |
Data
It contains the retrieved documents grouped by index.
Parameter | Description |
---|---|
documentCount integer | The number of documents from an index. |
documentData object | The document data from an index. |
Document Data
Parameter | Description |
---|---|
id string | the GRN of a document. |
score double | Score returned by the Elasticsearch. |
caption string | The display field for a document result. It is set on the index level. If none of the fields defined is found in the source, this field will be empty. |
source object | The source data of a document. |
Example Search Queries
For examples of search queries and results visit this page.
API
This Search API version is 1.0 and as such all endpoints are rooted at /1.0/
. For example, http://localhost/1.0/search/
(requires auth) would be the base context for this Search API under the GraphGrid API deployed at http://localhost
. Here we
will cover the searching and base search endpoints. For more endpoints related to other search features and index policies, see the links
below.
- Broker Indexing Endpoints If you're utilizing a message broker to send data to Elasticsearch, use these endpoints.
- Direct (non-broker) Indexing Endpoints If you're sending data directly to Elasticsearch, use these endpoints.
- Script Endpoints Exposes Elasticsearch scripting capabilities.
- Search Showme Endpoints Search has functionality that is compatible with GraphGrid showmes. Showmes allow dynamic APIs to be created using only Geequel.
Base Endpoints
Get Status
Check the status of GraphGrid Search. Will return a 200 OK
response if healthy.
Base URL: /1.0/search/status
Method: GET
Request
curl --location --request GET "${API_BASE}/1.0/search/status"
Response
200 OK
Search All
Base URL: /1.0/search/{{clusterName}}/{{policyName}}/?query={"terms"=""}
Method: GET
Performs a textual search across indexes managed for a GraphGrid Cluster.
Parameter | Description |
---|---|
clusterName string | The GraphGrid Cluster to search across. |
Request
curl --location --request GET "${API_BASE}/1.0/search/default/gg-dev-index-policy/?query=%7B%22terms%22:%22keanu%22%7D" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}"
Response
{
"report": {
"summary": {
"totalCount": 1,
"countDistribution": {
"person": 1
}
}
},
"data": {
"person": {
"documentCount": 1,
"maxScore": 3.2498474,
"documentData": [
{
"id": "grn:gg:person:MXSlswlrRt4Vl80OiEsskriQcLb3ASP9y8jzpaHmz8PK",
"score": 3.2498474,
"caption": "Keanu Reeves",
"source": {
"name": "Keanu Reeves"
}
}
]
}
},
"suggest": {},
"collapse": {},
"searchParams": {}
}
Index Policy Enpoints
Generate Index Policy
Generates an Index Policy for the GraphGrid Cluster.
Base URL: 1.0/search/{{clusterName}}/generateIndexPolicy/{{policyName}}
Method: POST
Parameter | Description |
---|---|
clusterName string | The GraphGrid Cluster with an Index Policy. |
policyName string | The name used to load the index policy on S3. |
Request
curl --location --request POST "${API_BASE}/1.0/search/default/generateIndexPolicy/example-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}"
Response
{
"metadata": {
"description": "Generated index policy for node types: [Movie, Person].",
"displayName": "example-index-policy",
"createdAt": "2021-03-12T18:57:14.247Z",
"updatedAt": "2021-03-12T18:57:14.248Z",
"versions": [],
"state": "OFFLINE"
},
"caption": {
"priority": [
{
"property": "title"
},
{
"property": "name"
},
{
"property": "address"
},
{
"property": "phone"
},
{
"property": "grn"
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"movie": {
"indexData": {
"description": "Generated 'movie' index from the 'Movie' node type.",
"displayName": "movie",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Movie`) WHERE n.grn IN {idList} WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Movie`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Movie`) WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"tagline": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"title": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"released": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
},
"person": {
"indexData": {
"description": "Generated 'person' index from the 'Person' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Person`) WHERE n.grn IN {idList} WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"born": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"name": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lon": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lat": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
}
},
"globals": {},
"views": {}
}
Generate Custom Index Policy
Generates a custom Index Policy for the GraphGrid Cluster.
Base URL: /search/{{clusterName}}/generateCustomIndexPolicy/{{policyName}}
Method: POST
Parameter | Description |
---|---|
clusterName string | The GraphGrid Cluster with an Index Policy. |
policyName string | The name used to load the index policy on S3. |
Request
curl --location --request POST "${API_BASE}/1.0/search/default/generateCustomIndexPolicy/gg-dev-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}" \
--data-raw '{
"nodeTypes" : [
"Person"
]
}'
Response (200)
{
"metadata": {
"description": "Generated index policy for node types: [Person].",
"displayName": "gg-dev-index-policy",
"createdAt": "2021-03-12T19:43:17.528Z",
"updatedAt": "2021-03-12T19:43:17.528Z",
"versions": [],
"state": "OFFLINE"
},
"caption": {
"priority": [
{
"property": "title"
},
{
"property": "name"
},
{
"property": "address"
},
{
"property": "phone"
},
{
"property": "grn"
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"person": {
"indexData": {
"description": "Generated 'person' index from the 'Person' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Person`) WHERE n.grn IN {idList} WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"born": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"name": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lon": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lat": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
}
},
"globals": {},
"views": {}
}
Save Index Policy
Saves an Index Policy to a GraphGrid Cluster on S3 under the policy name. Returns the saved policy.
Base URL: /1.0/search/{{clusterName}}/saveIndexPolicy/{{policyName}}
Method: POST
Parameter | Description |
---|---|
clusterName string | The GraphGrid Cluster with an Index Policy. |
policyName string | The name used to load the index policy on S3. |
Request
curl --location --request POST "${API_BASE}/1.0/search/default/saveIndexPolicy/example-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}" \
--data-raw '{
"metadata": {
"description": "Index policy for node types: [Movie, Person].",
"displayName": "example-index-policy",
"createdAt": "2021-03-12T18:57:14.247Z",
"updatedAt": "2021-03-12T18:57:14.248Z",
"versions": [],
"state": "OFFLINE"
},
"caption": {
"priority": [
{
"property": "title"
},
{
"property": "name"
},
{
"property": "address"
},
{
"property": "phone"
},
{
"property": "grn"
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"movie": {
"indexData": {
"description": "Generated 'movie' index from the 'Movie' node type.",
"displayName": "movie",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Movie`) WHERE n.grn IN {idList} WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Movie`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Movie`) WITH n AS `movie` RETURN `movie`",
"anchorLabel": "Movie",
"anchorId": "movie.grn",
"anchor": "movie",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"tagline": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"title": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"released": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
},
"person": {
"indexData": {
"description": "Generated 'person' index from the 'Person' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultBroker": {
"producer": "MATCH (n:`Person`) WHERE n.grn IN {idList} WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "RABBITMQ"
},
"defaultPartial": {
"producer": "MATCH (n:`Person`) WHERE n.updatedAt > n.lastSearchIndexedAt WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
},
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "false",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
"settings": {},
"mappings": null,
"schema": {
"createdAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"grn": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lastSearchIndexedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"born": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"name": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lon": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"lat": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
},
"updatedAt": {
"schema": {
"generatorStrategies": {
"defaultBroker": "",
"defaultPartial": "",
"defaultFull": ""
}
},
"searchQuery": {},
"viewRequirements": {}
}
},
"hardFilter": null,
"viewRequirements": {}
}
},
"globals": {},
"views": {}
}
Force Index Properties
Enforces that the node(s) on the graph have the lastSearchIndexedAt
and updatedAt
properties that are required for indexing.
Request
curl --location --request POST "${API_BASE}/1.0/search/default/forceIndexProperties/2" \
--header "Authorization: Bearer ${BEARER_TOKEN}" \
--header 'Content-Type: application/json' \
--data-raw '{
"nodeTypes": [
"Person",
"Movie"
]
}'
Response
["Person", "Movie"]
/1.0/search/{{clusterName}} forceIndexProperties/{{limit}}
Load Index Policy
Loads and returns the Index Policy for the GraphGrid Cluster.
Base URL: /1.0/search/{{clusterName}}/loadIndexPolicy/{{policyName}}
Method: GET
Parameter | Description |
---|---|
clusterName string | The GraphGrid Cluster with an Index Policy. |
policyName string | The name used to load the index policy on S3. |
Request
curl --location --request GET "${API_BASE}/1.0/search/default/loadIndexPolicy/example-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}"
Index Globals
Evaluates and stores globals into the Elasticsearch index globals
.
Base URL: /1.0/search/{{clusterName}}/indexGlobals/{{policyName}}/
Method: POST
Parameter | Description |
---|---|
clusterName string | The GraphGrid Cluster with an Index Policy. |
policyName string | The name used to load the index policy on S3. |
Request
curl --location --request POST "${API_BASE}/1.0/search/default/indexGlobals/gg-dev-index-policy" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${BEARER_TOKEN}" \
--data-raw '{
"metadata": {
"description": "Test index policy for custom queries using the Person label.",
"displayName": "customQuery-index-policy",
"createdAt": "2018-12-18T19:53:26+00:00",
"updatedAt": "2018-12-20T15:45:11+00:00",
"state": "OFFLINE",
"versions": [
]
},
"caption": {
"priority": [
{
...
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"person": {
"indexData": {
"description": "Generated '\''person'\'' index from the '\''Person'\'' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "true",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
...
"globals": {
"keanuReeves": {
"query": "MATCH (keanu:Person) WHERE keanu.name='\''Keanu Reeves'\'' WITH keanu {.*}"
}
}
}'
Response (200)
{
"metadata": {
"description": "Test index policy for custom queries using the Person label.",
"displayName": "customQuery-index-policy",
"createdAt": "2018-12-18T19:53:26+00:00",
"updatedAt": "2018-12-20T15:45:11+00:00",
"versions": [],
"state": "OFFLINE"
},
"caption": {
"priority": [
{
...
}
]
},
"parameter": null,
"hardFilter": null,
"indexes": {
"person": {
"indexData": {
"description": "Generated 'person' index from the 'Person' node type.",
"displayName": "person",
"caption": {
"priority": []
}
},
"indexStrategies": {
"defaultFull": {
"producer": "MATCH (n:`Person`) WITH n AS `person` RETURN `person`",
"anchorLabel": "Person",
"anchorId": "person.grn",
"anchor": "person",
"batchSize": "1000",
"parallel": "true",
"retries": "0",
"iterateList": "true",
"broker": "DIRECT"
}
},
...
}
},
"globals": {
"keanuReeves": {
"query": "MATCH (keanu:Person) WHERE keanu.name='Keanu Reeves' WITH keanu {.*}"
}
},
"views": {}
}