Skip to main content
Version: 2.0

Graph Schema and Queries for NLP Data Extraction

A feature of GraphGrid's NLP module is that once it processes a text it stores its results in a graph format. This section explains what gets created and stored in the graph when text data is processed by NLP, and provides examples of how to query that data.

How NLP data extraction data is stored

GraphGrid's NLP module runs the following tasks on text data during data extraction:

  • Named entity recognition (NER)
  • Part of speech tagging (POS)
  • Coreference resolution
  • Relation extraction (RE)
  • Sentiment analysis
  • Summarization
  • Translation
  • Keyphrase tagging

The results from these tasks are stored in a graph database format. Once extraction is complete, the processed text node will have a relationship to an AnnotatedText node.

The diagram below shows a processed Article node connected to an AnnotatedText node by a relationship called HAS_ANNOTATED_TEXT.

Sm_Diagram

note

The properties of your Article node may be different based on the custom data added to the graph.

Annotated Text Nodes

If we were to think of the schema of the stored NLP extraction data, the AnnotatedText node would be the root. Each processed node has its own AnnotatedText node that all of its NLP task results are related to.

For example, the diagram below shows the simplified expansion of an AnnotatedText node.

Diagram

The Article node represents the users text data on the graph. The node label Article is not required for NLP processing. Users of the NLP Module are free to choose another node label to do text processing. The important thing is that our text node (Article in this case) is processed by NLP and then connected to the AnnotatedText node. Several nodes related to NLP tasks are connected to the AnnotatedText node.

The text node, (Article node), is connected to the AnnotatedText node by the relationship HAS_ANNOTATED_TEXT. Each Article text node will have its own AnnotatedText node once it has been processed.

Sm_Diagram

Properties of AnnotatedText nodes

Let's look at the properties of an AnnotatedText node and its HAS_ANNOTATED_TEXT created by data extraction.

AnnotatedText node

PropertyDescription
createdAtThe timestamp of when the node was created.
grnThe grn of the node.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
textThe text that was processed by NLP. This property name is the same as the processed text node property (E.g. an Article node with the property text that holds the value of the textual data being processed will have an AnnotatedText node with the property text for that textual data).
updatedAtThe timestamp of when the node was last updated.
HAS_ANOTATED_TEXT relationship properties
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.

General guide for querying NLP extraction data

When attempting to access the NLP extraction data in the graph, it is important to be familiar with the schema created once extraction is complete. For instance if we want to only query text nodes that have specific mentions or keyphrases we might think that we can run a query like this:

MATCH (a:Article)-[:HAS_KEYPHRASE]->(k:Keyphrase) WHERE k.value = "GraphGrid" RETURN a

However this query will return no results because the relationships created by processing are not directly connected to the Article node. The reason that the AnnotatedText node is generated is so that the NLP extraction data is isolated from the user's data. If we are basing our query off of the processed text node to access information related to NLP data extraction, we need to match the processed text (Article) node to its AnnotatedText node first.

We do this like so:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:HAS_KEYPHRASE]->(k:Keyphrase) WHERE k.value = "GraphGrid" RETURN a

Every query that is based on the processed text node will start MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText).

Query 25 Article nodes that have AnnotatedText nodes:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText) RETURN a, at LIMIT 25

Query Article nodes that have AnnotatedText nodes that include "chewbacca" in the text:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText) WHERE toLower(at.text) INCLUDES "chewbacca" RETURN a

Corpus node

Every AnnotatedText node is connected to the Corpus node.

Diagram

Like a regular corpus, it is a collection of texts but exists here in a graph format. We can think of the Corpus like a folder that contains the data processing information.

The Corpus node is connected to every article in the graph by a relationship called CONTAINS_ANNOTATED_TEXT.

Corpus node properties

PropertyDescription
createdAtThe timestamp of when the node was created.
grnThe grn of the node.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
nameThe name of the node.
updatedAtThe timestamp of when the node was last updated.
CONTAINS_ANNOTATED_TEXT relationship properties
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.

Query the Corpus node

To query the Corpus node run:

MATCH (c:Corpus) RETURN c

NLP Metric Nodes

The NlpMetric node holds metrics for each processed text. Each annotatedText node is connected to an NlpMetric node by a HAS_METRIC relationship.

Diagram

Properties of NlpMetric Nodes

PropertyDescription
createdAtThe timestamp of when the node was created.
grnThe grn of the node.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
numberOfWordsThe total number of words in the original text.
totalTimeElapsedThe amount of time in seconds that it took to fully annotate the text.
updatedAtThe timestamp of when the node was last updated.
NLP Metric Node Queries

To query all NlpMetric nodes:

MATCH (n:NlpMetric) RETURN n

To query the NlpMetric node for a certain AnnotatedText:

MATCH (a:AnnotatedText)-[:HAS_METRIC]->(n:NlpMetric) RETURN n

Sentence Nodes

NLP works by extracting and processing a text sentence by sentence. This sentence data is stored on the graph as Sentence nodes.

In the diagram below, the AnnotatedText node has several outgoing relationships called CONTAINS_SENTENCE that connect to a Sentence node. There is a Sentence node for every sentence in the processed text. Some Sentence nodes have more than one incoming relationship from the AnnotatedText node, which we will go over as we continue.

Diagram

The AnnotatedText node in this diagram has two outgoing relationships, CONTAINS_SENTENCE and FIRST_SENTENCE, to a Sentence node.

Diagram

This Sentence node represents the first sentence of the processed text. Each sentence has a numeric sentenceNumber property starting with 0 and is assigned in ascending order until the end of the processed text.

Diagram

NLP uses this property to determine the NEXT_SENTENCE relationship.

Properties of sentence nodes and their relationships

Sentence node
PropertyDescription
createdAtThe timestamp of when the node was created.
grnThe grn of the node.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
sentenceA sentence of the processed text.
sentenceNumberThe number assigned to the sentence in order by which it appears in the processed text.
updatedAtThe timestamp of when the node was last updated.
CONTAINS_SENTENCE, NEXT SENTENCE, FIRST_SENTENCE relationship properties
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
updatedAtThe timestamp of when the node was last updated.

Query Sentence Nodes

As mentioned under the general NLP data extraction querying section, to query all Sentence nodes based on the processed text, we have to first match the processed text (Article) node to its AnnotatedText node. From there, our query can look for the Sentence nodes.

If we run this query, we'll get all the Sentence nodes for a particular Article node:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)  RETURN s

This query matches the Article by its grn, then looks for an outgoing relationship HAS_ANNOTATED_TEXT connected to an AnnotatedText node that has an outgoing relationship of CONTAINS_SENTENCE that connects to a Sentence node and returns that sentence.

If we want to query only the first sentence of a particular article we would run this:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:FIRST_SENTENCE]->(s:Sentence) RETURN s

This query follows a similar sequence as the one above, but instead of looking for the CONTAINS_SENTENCE relationship it's looking for FIRST_SENTENCE. However, we're still matching our Article node to its AnnotatedText node first, then matching the AnnotatedText node to the Sentence node.

We can also run a query that looks for and returns a sentence with a specific sentenceNumber property in a particular article:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence) WHERE s.sentenceNumber = 3 RETURN s

This query matches the Sentence node's property instead of its relationship.

Let's say we wanted to get more specific. If we want to query only sentences in this article that include "Chewbacca" we'd run this:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence) WHERE s.sentence CONTAINS "Chewbacca"  RETURN s

Once again, the query matches our Article node by grn to our AnnotatedText node, and finds our Sentence node; but this time the query looks for the sentence property and whether or not it contains "Chewbacca".

Summarization

NLP's summarization task calculates the most representative sentences in the processed text. This data is stored in the graph as a relationship called SUMMARY_SENTENCE that points to those representative Sentence nodes. The SUMMARY_SENTENCE relationship is an outgoing relationship from the AnnotatedText node to a Sentence node. Each SUMMARY_SENTENCE relationship has an order property. The value of this property represents the order of the sentences for the predicted summary.

Diagram

Summarization is the last task run during NLP data extraction processing. When a text is fully processed the last node created is called the NlpProcessingSummary node.

Diagram

The AnnotatedText node has a relationship called HAS_PROCESSING_SUMMARY that connects to the NlpProcessingNode. The node exists to trigger Fuze policy changes that require NLP extraction data.

NlpProcessingSummary
PropertyDescription
createdAtThe timestamp of when the node was created.
grnThe grn of the node.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
updatedAtThe timestamp of when the node was last updated.

SUMMARY_SENTENCE relationship properties

PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
orderThe order by which the sentence occurs in the overall predicted summary.
updatedAtThe timestamp of when the node was last updated.

Queries for Summarization

To return all of the summary sentences for a particular article run this query:

MATCH (a:Article {grn: "grn:gg:article:Oee1ccdOP8TYb0CC75mJ4j58t7bBqzG3hPvQb2IDtavP"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:SUMMARY_SENTENCE]->(s:Sentence) RETURN s

Use this query to return the summary sentences in order:

MATCH (a:Article {grn: "grn:gg:article:Oee1ccdOP8TYb0CC75mJ4j58t7bBqzG3hPvQb2IDtavP"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[sum:SUMMARY_SENTENCE]->(s:Sentence) RETURN s ORDER by sum.order

Mention Nodes

A Mention node represents an entity (person, place, organization, or thing). Results from the NER, POS and coreference resolution models are used to create and match existing Mention nodes on the graph.

Since different sentences may refer to the same entity we use a single Mention node as a representation of that entity. This is captured through the relationship HAS_MENTION between a Mention and a Sentence. The graph also tracks each occurrence of a mention as distinct information. Any sentence that a mention occurs in creates a new MentionOccurrence node and attaches it to the sentence through SENTENCE_MENTION_OCCURRENCE. MentionOccurrence nodes represent individual usages of the Mention and store POS information about the usage. Lastly, a MentionOccurrence is always connected back to its representative Mention through the MENTION_OCCURRENCE_MENTION relationship.

Diagram

For example, this Sentence node has an outgoing relationship called HAS_MENTION that is connected to a Mention node. The Sentence node has another outgoing relationship called SENTENCE_MENTION_OCCURRENCE that is connected to a MentionOccurrence node. The MentionOccurrence node has an incoming relationship called MENTION_OCCURRENCE_MENTION that comes from the Mention node.

Diagram

In the diagram above we see that there are 3 Sentence nodes that are each connected to one MentionOccurrence node. The MentionOccurrence nodes have a POS property value of "NNP" and are each connected to the same Mention node that has a property value of "Chewbacca". From this we can interpret that NLP predicted that these 3 sentences mention "Chewbacca" as a singular, proper noun (NNP).

The Mention node also has an incoming relationship of MENTION_REPRESENTATIVE that is attached to a Person node with the property name equal to "Chewbacca". This node is the result of NER processing. NLP determined that the mention "Chewbacca" was a person, and therefore created a Person node that is representative of the Mention node.

The node attached to the other end of the MENTION_REPRESENTATIVE can be labeled as a PERSON, LOCATION, ORGANIZATION, or MISCELLANEOUS node. Some mentions may have a combination of, or all three labels depending on NLP's computation and the context that the mention is used in.

Properties of Mention nodes

Mention node
PropertyDescription
createdAtThe timestamp of when the node was created.
grnThe grn of the node.
langaugeThe langauge of the text.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
mentionIdThe unique id of the mention, the value and the language property separated by an underscore ("_").
multiplicityThe multiplicity of the mention.
neThe named entity or entities of the mention (PERSON, LOCATION, ORGANIZATION, MISCELLANEOUS ).
posThe part of speech (POS) of the mention.
updatedAtThe timestamp of when the node was last updated.
valueThe value of the mention.
Mention Occurrence node
PropertyDescription
createdAtThe timestamp of when the node was created.
grnThe grn of the node.
langaugeThe langauge of the text.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
posThe part of speech (POS) of the mention.
updatedAtThe timestamp of when the node was last updated.
valueThe value of the mention.
HAS_MENTION, SENTENCE_MENTION_OCCURRENCE, MENTION_OCCURRENCE_MENTION, MENTION_OCCURRENCE_REPRESENTATIVE relationship properties
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
orderThe order by which the sentence occurs in the overall predicted summary.
updatedAtThe timestamp of when the node was last updated.

Querying Mention nodes

If we wanted to query all Mention nodes for a particular Article we would, once again, follow the sequence to get from Article to Sentence, only this time we're traversing one more relationship, HAS_MENTION, in order to get to the Mention node.

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_MENTION]->(m:Mention)  RETURN m

This query matches the Article node by its grn, then finds it's AnnotatedText node in order to find its Sentence nodes. Finally, the query looks at the Sentence node's outgoing relationship,HAS_MENTION, to find and return all Mention nodes.

If we wanted to get more specific, we can return only the Sentence nodes which have a connection to Mention nodes that have a value of "Chewbacca" by running this query:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_MENTION]->(m:Mention) WHERE toLower(m.value) CONTAINS "chewbacca" RETURN s

To query all the people in a particular article:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_MENTION]->(m:Mention)<-[:MENTION_REPRESENTATIVE]-(p:Person) RETURN p

This query will return all nodes that have a label of Person. Some of these nodes may be tagged with several entities. For example, the Person node may have a combination of, if not all the NER tags PERSON, LOCATION, ORGANIZATION, and/or MISCELLANEOUS. If you want to query nodes that are purely tagged PERSON nodes run this:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_MENTION]->(m:Mention)<-[:MENTION_REPRESENTATIVE]-(n:Person) WHERE NOT n:Location AND NOT n:Organization RETURN n

Similarity Scoring

Similarities are scored based on Term Frequency-Inverse Document Frequency (TF-IDF) and Similarity Clustering methods. Each Mention for a text document has its TF-IDF value calculated, and then highest valued Mentions used as a base to calculate how similar two texts are. Note that TF-IDF scores are calculated in the context of a Corpus, meaning the existing documents affect the TF-IDF values on mentions of a new document.

Each Mention node has an incoming relationship called HAS_TF. This relationship is where the TF value part of TF-IDF is stored. These scores are used to calculate how similar two articles, or texts, are. If an article is similar to another, its AnnotatedText node will have a relationship to its similar article's AnnotatedText node called HAS_SIMILAR.

Since IDF values change each time a new document is processed, IDF values are computed on the fly rather than stored in the graph.

Diagram

In the diagram above the two AnnotatedText nodes share two Mention nodes and a Keyphrase node through a relationship called HAS_TF. The Mention and Keyphrase nodes' TF-IDF scores are used to calculate if two articles are similar.

Similar texts are organized within the graph itself. Texts that are predicted to be similar, and belong to the same corpus, become part of a Similarity Cluster represented in the graph as a SimilarityCluster node. TheAnnotatedText nodes belong to the SimilarityCluster node, as representened in the graph by their outgoing BELONGS_TO_SIMILARITY_CLUSTER relationship.

Diagram

Properties for Similarity

SimilarityCluster node
PropertyDescription
createdAtThe timestamp of when the relationship was created.
depthThe depth of the similarity cluster.
grnThe grn of the relationship.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
updatedAtThe timestamp of when the node was last updated.
HAS_TF
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
tfThe term-frequency (TF) score of the mention or keyphrase.
updatedAtThe timestamp of when the node was last updated.
HAS_SIMILAR
PropertyDescription
corpusGrnThe grn of the Corpus that the AnnotatedText belongs to.
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
similarityThe similarity score of the two similar texts.
updatedAtThe timestamp of when the node was last updated.
BELONGS_TO_SIMILARITY_CLUSTER
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
updatedAtThe timestamp of when the node was last updated.

Queries for Similarity Scoring

Use this query to return the all the similar AnnotatedText nodes for a particular article:

MATCH (a:Article {grn: "grn:gg:article:ODSSvMKZzl4Nq2JqPwJtUBWW6izuxu43SMS5aQUY2Dce"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:HAS_SIMILAR]->(p:AnnotatedText) RETURN p

Remember, if we want to access information related to NLP data extraction, we always need to match the Article to its AnnotatedText node first. Our query finds the article's AnnotatedText node and then traverses the HAS_SIMILAR relationships to get the similar AnnotatedText nodes and returns them.

Use this query to return the all the similar Article nodes for a particular article:

MATCH (a:Article {grn: "grn:gg:article:ODSSvMKZzl4Nq2JqPwJtUBWW6izuxu43SMS5aQUY2Dce"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:HAS_SIMILAR]->(p:AnnotatedText)<-[HAS_ANNOTATED_TEXT]-(x:Article) RETURN x

Just like the query above but with one added traversal, this query finds the similar AnnotatedText nodes, then looks at their HAS_ANNOTATED_TEXT relationships to find their Article nodes.

To query all SimilarityCluster nodes run this:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:BELONGS_TO_SIMILARITY_CLUSTER]->(sc:SimilarityCluster) RETURN a

Keyphrase

Keyphrase extaction data is stored as Keyphrase nodes on the graph. Keyphrase nodes are directly related to Sentence nodes through a relationship called HAS_KEYPHRASE.

Diagram

The Sentence node is connected to a Keyphrase node by a relationship HAS_KEYPHRASE. A Keyphrase node can also be directly connected to an AnntatedText node by a HAS_TF relationship.

Properties for Keyphrase

Keyphrase node
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
keyphraseIdThe unique id of the keyphrase.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
updatedAtThe timestamp of when the node was last updated.
valueThe value of the keyphrase.
HAS_KEYPHRASE relationship properties
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
updatedAtThe timestamp of when the node was last updated.

Queries for Keyphrase

To return the Keyphrase nodes that contain a particular phrase:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_KEYPHRASE]->(k:Keyphrase) RETURN k

To return the Keyphrase nodes that have a value that contain the a particular value:

MATCH (a:Article)-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_KEYPHRASE]->(k:Keyphrase) WHERE toLower(k.value) contains "the force" RETURN k

Sentiment

The NLP sentiment task data is stored as two types of nodes, SentenceSentiment and AnnotatedTextSentiment.

The SentenceSentiment is the sentiment of a particular sentence, while the AnnotatedTextSentiment is the average of the SentenceSentiment scores.

Sm_Diagram

The Sentence node is connected to the SentenceSentiment node by a relationship called HAS_SENTIMENT. The AnnotatedText node is connected to the AnnotatedTextSentiment node by a relationship also called HAS_SENTIMENT.

Properties for Sentiment

SentenceSentiment node
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
sentimentThe sentiment score of the Sentence node
sentimentTypeThe sentiment type. This can be either binary (positive/negative) or categorical.
updatedAtThe timestamp of when the node was last updated.
AnnotatedTextSentiment node
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
lastSearchIndexedAtThe timestamp of when the node was last indexed by Search.
sentimentThe sentiment score of the AnnotatedText node.
updatedAtThe timestamp of when the node was last updated.
HAS_SENTIMENT relationship properties
PropertyDescription
createdAtThe timestamp of when the relationship was created.
grnThe grn of the relationship.
updatedAtThe timestamp of when the node was last updated.

Queries for Sentiment

This queries all SentenceSentiment nodes for a particular article:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_SENTIMENT]->(sent:SentenceSentiment) RETURN sent

This query returns sentences that contain "Doctor Strange" in them that have a negative sentiment:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_SENTIMENT]-(sent:SentenceSentiment) WHERE s.sentence CONTAINS "Doctor Strange" AND sent.sentimentType = "negative" RETURN s

To query the AnnotatedTextSentiment node run this:

MATCH (a:Article {grn: "grn:gg:article:JXqxrUXmNJq6uPw1tbUWH5Kw9iUfAciP952FPd1msrky"})-[:HAS_ANNOTATED_TEXT]->(at:AnnotatedText)-[:HAS_SENTIMENT]->(sent:AnnotatedTextSentiment) RETURN sent