Version: 2.0

Graph Schema and Queries for NLP Data Extraction

A feature of GraphGrid's NLP module is that once it processes a text it stores its results in a graph format. This section explains what gets created and stored in the graph when text data is processed by NLP, and provides examples of how to query that data.

How NLP data extraction data is stored

GraphGrid's NLP module runs the following tasks on text data during data extraction:

Named entity recognition (NER)
Part of speech tagging (POS)
Coreference resolution
Relation extraction (RE)
Sentiment analysis
Summarization
Translation
Keyphrase tagging

The results from these tasks are stored in a graph database format. Once extraction is complete, the processed text node will have a relationship to an AnnotatedText node.

The diagram below shows a processed Article node connected to an AnnotatedText node by a relationship called HAS_ANNOTATED_TEXT.

Sm_Diagram

note

The properties of your Article node may be different based on the custom data added to the graph.

Annotated Text Nodes

If we were to think of the schema of the stored NLP extraction data, the AnnotatedText node would be the root. Each processed node has its own AnnotatedText node that all of its NLP task results are related to.

For example, the diagram below shows the simplified expansion of an AnnotatedText node.

Diagram

The Article node represents the users text data on the graph. The node label Article is not required for NLP processing. Users of the NLP Module are free to choose another node label to do text processing. The important thing is that our text node (Article in this case) is processed by NLP and then connected to the AnnotatedText node. Several nodes related to NLP tasks are connected to the AnnotatedText node.

The text node, (Article node), is connected to the AnnotatedText node by the relationship HAS_ANNOTATED_TEXT. Each Article text node will have its own AnnotatedText node once it has been processed.

Sm_Diagram

Properties of AnnotatedText nodes

Let's look at the properties of an AnnotatedText node and its HAS_ANNOTATED_TEXT created by data extraction.

AnnotatedText node

Property	Description
`createdAt`	The timestamp of when the node was created.
`grn`	The grn of the node.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`text`	The text that was processed by NLP. This property name is the same as the processed text node property (E.g. an `Article` node with the property `text` that holds the value of the textual data being processed will have an `AnnotatedText` node with the property `text` for that textual data).
`updatedAt`	The timestamp of when the node was last updated.

HAS_ANOTATED_TEXT relationship properties

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.

General guide for querying NLP extraction data

When attempting to access the NLP extraction data in the graph, it is important to be familiar with the schema created once extraction is complete. For instance if we want to only query text nodes that have specific mentions or keyphrases we might think that we can run a query like this:

MATCH (a:Article)-[:HAS_KEYPHRASE]->(k:Keyphrase) WHERE k.value = "GraphGrid" RETURN a

However this query will return no results because the relationships created by processing are not directly connected to the Article node. The reason that the AnnotatedText node is generated is so that the NLP extraction data is isolated from the user's data. If we are basing our query off of the processed text node to access information related to NLP data extraction, we need to match the processed text (Article) node to its AnnotatedText node first.

We do this like so:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:HAS_KEYPHRASE]->(k:Keyphrase) WHERE k.value = "GraphGrid" RETURN a

Every query that is based on the processed text node will start MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText).

Query 25 Article nodes that have AnnotatedText nodes:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText) RETURN a, at LIMIT 25

Query Article nodes that have AnnotatedText nodes that include "chewbacca" in the text:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText) WHERE toLower(at.text) INCLUDES "chewbacca" RETURN a

Corpus node

Every AnnotatedText node is connected to the Corpus node.

Diagram

Like a regular corpus, it is a collection of texts but exists here in a graph format. We can think of the Corpus like a folder that contains the data processing information.

The Corpus node is connected to every article in the graph by a relationship called CONTAINS_ANNOTATED_TEXT.

Corpus node properties

Property	Description
`createdAt`	The timestamp of when the node was created.
`grn`	The grn of the node.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`name`	The name of the node.
`updatedAt`	The timestamp of when the node was last updated.

CONTAINS_ANNOTATED_TEXT relationship properties

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.

Query the Corpus node

To query the Corpus node run:

MATCH (c:Corpus) RETURN c

NLP Metric Nodes

The NlpMetric node holds metrics for each processed text. Each annotatedText node is connected to an NlpMetric node by a HAS_METRIC relationship.

Diagram

Properties of NlpMetric Nodes

Property	Description
`createdAt`	The timestamp of when the node was created.
`grn`	The grn of the node.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`numberOfWords`	The total number of words in the original text.
`totalTimeElapsed`	The amount of time in seconds that it took to fully annotate the text.
`updatedAt`	The timestamp of when the node was last updated.

NLP Metric Node Queries

To query all NlpMetric nodes:

MATCH (n:NlpMetric) RETURN n

To query the NlpMetric node for a certain AnnotatedText:

MATCH (a:AnnotatedText)-[:HAS_METRIC]->(n:NlpMetric) RETURN n

Sentence Nodes

NLP works by extracting and processing a text sentence by sentence. This sentence data is stored on the graph as Sentence nodes.

In the diagram below, the AnnotatedText node has several outgoing relationships called CONTAINS_SENTENCE that connect to a Sentence node. There is a Sentence node for every sentence in the processed text. Some Sentence nodes have more than one incoming relationship from the AnnotatedText node, which we will go over as we continue.

Diagram

The AnnotatedText node in this diagram has two outgoing relationships, CONTAINS_SENTENCE and FIRST_SENTENCE, to a Sentence node.

Diagram

This Sentence node represents the first sentence of the processed text. Each sentence has a numeric sentenceNumber property starting with 0 and is assigned in ascending order until the end of the processed text.

Diagram

NLP uses this property to determine the NEXT_SENTENCE relationship.

Properties of sentence nodes and their relationships

Sentence node

Property	Description
`createdAt`	The timestamp of when the node was created.
`grn`	The grn of the node.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`sentence`	A sentence of the processed text.
`sentenceNumber`	The number assigned to the sentence in order by which it appears in the processed text.
`updatedAt`	The timestamp of when the node was last updated.

CONTAINS_SENTENCE, NEXT SENTENCE, FIRST_SENTENCE relationship properties

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`updatedAt`	The timestamp of when the node was last updated.

Query Sentence Nodes

As mentioned under the general NLP data extraction querying section, to query all Sentence nodes based on the processed text, we have to first match the processed text (Article) node to its AnnotatedText node. From there, our query can look for the Sentence nodes.

If we run this query, we'll get all the Sentence nodes for a particular Article node:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)  RETURN s

This query matches the Article by its grn, then looks for an outgoing relationship HAS_ANNOTATED_TEXT connected to an AnnotatedText node that has an outgoing relationship of CONTAINS_SENTENCE that connects to a Sentence node and returns that sentence.

If we want to query only the first sentence of a particular article we would run this:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:FIRST_SENTENCE]->(s:Sentence) RETURN s

This query follows a similar sequence as the one above, but instead of looking for the CONTAINS_SENTENCE relationship it's looking for FIRST_SENTENCE. However, we're still matching our Article node to its AnnotatedText node first, then matching the AnnotatedText node to the Sentence node.

We can also run a query that looks for and returns a sentence with a specific sentenceNumber property in a particular article:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence) WHERE s.sentenceNumber = 3 RETURN s

This query matches the Sentence node's property instead of its relationship.

Let's say we wanted to get more specific. If we want to query only sentences in this article that include "Chewbacca" we'd run this:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence) WHERE s.sentence CONTAINS "Chewbacca"  RETURN s

Once again, the query matches our Article node by grn to our AnnotatedText node, and finds our Sentence node; but this time the query looks for the sentence property and whether or not it contains "Chewbacca".

Summarization

NLP's summarization task calculates the most representative sentences in the processed text. This data is stored in the graph as a relationship called SUMMARY_SENTENCE that points to those representative Sentence nodes. The SUMMARY_SENTENCE relationship is an outgoing relationship from the AnnotatedText node to a Sentence node. Each SUMMARY_SENTENCE relationship has an order property. The value of this property represents the order of the sentences for the predicted summary.

Diagram

Summarization is the last task run during NLP data extraction processing. When a text is fully processed the last node created is called the NlpProcessingSummary node.

Diagram

The AnnotatedText node has a relationship called HAS_PROCESSING_SUMMARY that connects to the NlpProcessingNode. The node exists to trigger Fuze policy changes that require NLP extraction data.

NlpProcessingSummary

Property	Description
`createdAt`	The timestamp of when the node was created.
`grn`	The grn of the node.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`updatedAt`	The timestamp of when the node was last updated.

SUMMARY_SENTENCE relationship properties

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`order`	The order by which the sentence occurs in the overall predicted summary.
`updatedAt`	The timestamp of when the node was last updated.

Queries for Summarization

To return all of the summary sentences for a particular article run this query:

MATCH (a:Article {grn: "grn:gg:article:Oee1ccdOP8TYb0CC75mJ4j58t7bBqzG3hPvQb2IDtavP"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:SUMMARY_SENTENCE]->(s:Sentence) RETURN s

Use this query to return the summary sentences in order:

MATCH (a:Article {grn: "grn:gg:article:Oee1ccdOP8TYb0CC75mJ4j58t7bBqzG3hPvQb2IDtavP"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[sum:SUMMARY_SENTENCE]->(s:Sentence) RETURN s ORDER by sum.order

Mention Nodes

A Mention node represents an entity (person, place, organization, or thing). Results from the NER, POS and coreference resolution models are used to create and match existing Mention nodes on the graph.

Since different sentences may refer to the same entity we use a single Mention node as a representation of that entity. This is captured through the relationship HAS_MENTION between a Mention and a Sentence. The graph also tracks each occurrence of a mention as distinct information. Any sentence that a mention occurs in creates a new MentionOccurrence node and attaches it to the sentence through SENTENCE_MENTION_OCCURRENCE. MentionOccurrence nodes represent individual usages of the Mention and store POS information about the usage. Lastly, a MentionOccurrence is always connected back to its representative Mention through the MENTION_OCCURRENCE_MENTION relationship.

Diagram

For example, this Sentence node has an outgoing relationship called HAS_MENTION that is connected to a Mention node. The Sentence node has another outgoing relationship called SENTENCE_MENTION_OCCURRENCE that is connected to a MentionOccurrence node. The MentionOccurrence node has an incoming relationship called MENTION_OCCURRENCE_MENTION that comes from the Mention node.

Diagram

In the diagram above we see that there are 3 Sentence nodes that are each connected to one MentionOccurrence node. The MentionOccurrence nodes have a POS property value of "NNP" and are each connected to the same Mention node that has a property value of "Chewbacca". From this we can interpret that NLP predicted that these 3 sentences mention "Chewbacca" as a singular, proper noun (NNP).

The Mention node also has an incoming relationship of MENTION_REPRESENTATIVE that is attached to a Person node with the property name equal to "Chewbacca". This node is the result of NER processing. NLP determined that the mention "Chewbacca" was a person, and therefore created a Person node that is representative of the Mention node.

The node attached to the other end of the MENTION_REPRESENTATIVE can be labeled as a PERSON, LOCATION, ORGANIZATION, or MISCELLANEOUS node. Some mentions may have a combination of, or all three labels depending on NLP's computation and the context that the mention is used in.

Properties of Mention nodes

Mention node

Property	Description
`createdAt`	The timestamp of when the node was created.
`grn`	The grn of the node.
`langauge`	The langauge of the text.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`mentionId`	The unique id of the mention, the value and the language property separated by an underscore ("_").
`multiplicity`	The multiplicity of the mention.
`ne`	The named entity or entities of the mention (`PERSON`, `LOCATION`, `ORGANIZATION`, `MISCELLANEOUS` ).
`pos`	The part of speech (POS) of the mention.
`updatedAt`	The timestamp of when the node was last updated.
`value`	The value of the mention.

Mention Occurrence node

Property	Description
`createdAt`	The timestamp of when the node was created.
`grn`	The grn of the node.
`langauge`	The langauge of the text.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`pos`	The part of speech (POS) of the mention.
`updatedAt`	The timestamp of when the node was last updated.
`value`	The value of the mention.

HAS_MENTION, SENTENCE_MENTION_OCCURRENCE, MENTION_OCCURRENCE_MENTION, MENTION_OCCURRENCE_REPRESENTATIVE relationship properties

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`order`	The order by which the sentence occurs in the overall predicted summary.
`updatedAt`	The timestamp of when the node was last updated.

Querying Mention nodes

If we wanted to query all Mention nodes for a particular Article we would, once again, follow the sequence to get from Article to Sentence, only this time we're traversing one more relationship, HAS_MENTION, in order to get to the Mention node.

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_MENTION]->(m:Mention)  RETURN m

This query matches the Article node by its grn, then finds it's AnnotatedText node in order to find its Sentence nodes. Finally, the query looks at the Sentence node's outgoing relationship,HAS_MENTION, to find and return all Mention nodes.

If we wanted to get more specific, we can return only the Sentence nodes which have a connection to Mention nodes that have a value of "Chewbacca" by running this query:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_MENTION]->(m:Mention) WHERE toLower(m.value) CONTAINS "chewbacca" RETURN s

To query all the people in a particular article:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_MENTION]->(m:Mention)<-[:MENTION_REPRESENTATIVE]-(p:Person) RETURN p

This query will return all nodes that have a label of Person. Some of these nodes may be tagged with several entities. For example, the Person node may have a combination of, if not all the NER tags PERSON, LOCATION, ORGANIZATION, and/or MISCELLANEOUS. If you want to query nodes that are purely tagged PERSON nodes run this:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_MENTION]->(m:Mention)<-[:MENTION_REPRESENTATIVE]-(n:Person) WHERE NOT n:Location AND NOT n:Organization RETURN n

Similarity Scoring

Similarities are scored based on Term Frequency-Inverse Document Frequency (TF-IDF) and Similarity Clustering methods. Each Mention for a text document has its TF-IDF value calculated, and then highest valued Mentions used as a base to calculate how similar two texts are. Note that TF-IDF scores are calculated in the context of a Corpus, meaning the existing documents affect the TF-IDF values on mentions of a new document.

Each Mention node has an incoming relationship called HAS_TF. This relationship is where the TF value part of TF-IDF is stored. These scores are used to calculate how similar two articles, or texts, are. If an article is similar to another, its AnnotatedText node will have a relationship to its similar article's AnnotatedText node called HAS_SIMILAR.

Since IDF values change each time a new document is processed, IDF values are computed on the fly rather than stored in the graph.

Diagram

In the diagram above the two AnnotatedText nodes share two Mention nodes and a Keyphrase node through a relationship called HAS_TF. The Mention and Keyphrase nodes' TF-IDF scores are used to calculate if two articles are similar.

Similar texts are organized within the graph itself. Texts that are predicted to be similar, and belong to the same corpus, become part of a Similarity Cluster represented in the graph as a SimilarityCluster node. TheAnnotatedText nodes belong to the SimilarityCluster node, as representened in the graph by their outgoing BELONGS_TO_SIMILARITY_CLUSTER relationship.

Diagram

Properties for Similarity

SimilarityCluster node

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`depth`	The depth of the similarity cluster.
`grn`	The grn of the relationship.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`updatedAt`	The timestamp of when the node was last updated.

HAS_TF

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`tf`	The term-frequency (TF) score of the mention or keyphrase.
`updatedAt`	The timestamp of when the node was last updated.

HAS_SIMILAR

Property	Description
`corpusGrn`	The grn of the `Corpus` that the `AnnotatedText` belongs to.
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`similarity`	The similarity score of the two similar texts.
`updatedAt`	The timestamp of when the node was last updated.

BELONGS_TO_SIMILARITY_CLUSTER

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`updatedAt`	The timestamp of when the node was last updated.

Queries for Similarity Scoring

Use this query to return the all the similar AnnotatedText nodes for a particular article:

MATCH (a:Article {grn: "grn:gg:article:ODSSvMKZzl4Nq2JqPwJtUBWW6izuxu43SMS5aQUY2Dce"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:HAS_SIMILAR]->(p:AnnotatedText) RETURN p

Remember, if we want to access information related to NLP data extraction, we always need to match the Article to its AnnotatedText node first. Our query finds the article's AnnotatedText node and then traverses the HAS_SIMILAR relationships to get the similar AnnotatedText nodes and returns them.

Use this query to return the all the similar Article nodes for a particular article:

MATCH (a:Article {grn: "grn:gg:article:ODSSvMKZzl4Nq2JqPwJtUBWW6izuxu43SMS5aQUY2Dce"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:HAS_SIMILAR]->(p:AnnotatedText)<-[HAS_ANNOTATED_TEXT]-(x:Article) RETURN x

Just like the query above but with one added traversal, this query finds the similar AnnotatedText nodes, then looks at their HAS_ANNOTATED_TEXT relationships to find their Article nodes.

To query all SimilarityCluster nodes run this:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:BELONGS_TO_SIMILARITY_CLUSTER]->(sc:SimilarityCluster) RETURN a

Keyphrase

Keyphrase extaction data is stored as Keyphrase nodes on the graph. Keyphrase nodes are directly related to Sentence nodes through a relationship called HAS_KEYPHRASE.

Diagram

The Sentence node is connected to a Keyphrase node by a relationship HAS_KEYPHRASE. A Keyphrase node can also be directly connected to an AnntatedText node by a HAS_TF relationship.

Properties for Keyphrase

Keyphrase node

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`keyphraseId`	The unique id of the keyphrase.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`updatedAt`	The timestamp of when the node was last updated.
`value`	The value of the keyphrase.

HAS_KEYPHRASE relationship properties

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`updatedAt`	The timestamp of when the node was last updated.

Queries for Keyphrase

To return the Keyphrase nodes that contain a particular phrase:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_KEYPHRASE]->(k:Keyphrase) RETURN k

To return the Keyphrase nodes that have a value that contain the a particular value:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_KEYPHRASE]->(k:Keyphrase) WHERE toLower(k.value) contains "the force" RETURN k

Sentiment

The NLP sentiment task data is stored as two types of nodes, SentenceSentiment and AnnotatedTextSentiment.

The SentenceSentiment is the sentiment of a particular sentence, while the AnnotatedTextSentiment is the average of the SentenceSentiment scores.

Sm_Diagram

The Sentence node is connected to the SentenceSentiment node by a relationship called HAS_SENTIMENT. The AnnotatedText node is connected to the AnnotatedTextSentiment node by a relationship also called HAS_SENTIMENT.

Properties for Sentiment

SentenceSentiment node

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`sentiment`	The sentiment score of the `Sentence` node
`sentimentType`	The sentiment type. This can be either binary (positive/negative) or categorical.
`updatedAt`	The timestamp of when the node was last updated.

AnnotatedTextSentiment node

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`lastSearchIndexedAt`	The timestamp of when the node was last indexed by Search.
`sentiment`	The sentiment score of the `AnnotatedText` node.
`updatedAt`	The timestamp of when the node was last updated.

HAS_SENTIMENT relationship properties

Property	Description
`createdAt`	The timestamp of when the relationship was created.
`grn`	The grn of the relationship.
`updatedAt`	The timestamp of when the node was last updated.

Queries for Sentiment

This queries all SentenceSentiment nodes for a particular article:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_SENTIMENT]->(sent:SentenceSentiment) RETURN sent

This query returns sentences that contain "Doctor Strange" in them that have a negative sentiment:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_SENTIMENT]-(sent:SentenceSentiment) WHERE s.sentence CONTAINS "Doctor Strange" AND sent.sentimentType = "negative" RETURN s

To query the AnnotatedTextSentiment node run this:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:HAS_SENTIMENT]->(sent:AnnotatedTextSentiment) RETURN sent

How NLP data extraction data is stored​

Annotated Text Nodes​

Properties of AnnotatedText nodes​

AnnotatedText node​

HAS_ANOTATED_TEXT relationship properties​

General guide for querying NLP extraction data​

Corpus node​

Corpus node properties​

CONTAINS_ANNOTATED_TEXT relationship properties​

Query the Corpus node​

NLP Metric Nodes​

Properties of NlpMetric Nodes​

NLP Metric Node Queries​

Sentence Nodes​

Properties of sentence nodes and their relationships​

Sentence node​

CONTAINS_SENTENCE, NEXT SENTENCE, FIRST_SENTENCE relationship properties​

Query Sentence Nodes​

Summarization​

NlpProcessingSummary​

SUMMARY_SENTENCE relationship properties​

Queries for Summarization​

Mention Nodes​

Properties of Mention nodes​

Mention node​

Mention Occurrence node​

HAS_MENTION, SENTENCE_MENTION_OCCURRENCE, MENTION_OCCURRENCE_MENTION, MENTION_OCCURRENCE_REPRESENTATIVE relationship properties​

Querying Mention nodes​

Similarity Scoring​

Properties for Similarity​

SimilarityCluster node​

HAS_TF​

HAS_SIMILAR​

BELONGS_TO_SIMILARITY_CLUSTER​

Queries for Similarity Scoring​

Keyphrase​

Properties for Keyphrase​

Keyphrase node​

HAS_KEYPHRASE relationship properties​

Queries for Keyphrase​

Sentiment​

Properties for Sentiment​

SentenceSentiment node​

AnnotatedTextSentiment node​

HAS_SENTIMENT relationship properties​

Queries for Sentiment​

How NLP data extraction data is stored

Annotated Text Nodes

Properties of AnnotatedText nodes

AnnotatedText node

HAS_ANOTATED_TEXT relationship properties

General guide for querying NLP extraction data

Corpus node

Corpus node properties

CONTAINS_ANNOTATED_TEXT relationship properties

Query the Corpus node

NLP Metric Nodes

Properties of NlpMetric Nodes

NLP Metric Node Queries

Sentence Nodes

Properties of sentence nodes and their relationships

Sentence node

CONTAINS_SENTENCE, NEXT SENTENCE, FIRST_SENTENCE relationship properties

Query Sentence Nodes

Summarization

NlpProcessingSummary

SUMMARY_SENTENCE relationship properties

Queries for Summarization

Mention Nodes

Properties of Mention nodes

Mention node

Mention Occurrence node

HAS_MENTION, SENTENCE_MENTION_OCCURRENCE, MENTION_OCCURRENCE_MENTION, MENTION_OCCURRENCE_REPRESENTATIVE relationship properties

Querying Mention nodes

Similarity Scoring

Properties for Similarity

SimilarityCluster node

HAS_TF

HAS_SIMILAR

BELONGS_TO_SIMILARITY_CLUSTER

Queries for Similarity Scoring

Keyphrase

Properties for Keyphrase

Keyphrase node

HAS_KEYPHRASE relationship properties

Queries for Keyphrase

Sentiment

Properties for Sentiment

SentenceSentiment node

AnnotatedTextSentiment node

HAS_SENTIMENT relationship properties

Queries for Sentiment