aanpassing

This commit is contained in:
John 2022-10-02 11:17:59 +02:00
parent adf28391f0
commit 55f6d54f76
1 changed files with 71 additions and 34 deletions

View File

@ -5,16 +5,19 @@ created: 2021-05-04 14:58:11Z
---
## show database schema info
CALL db.schema.visualization()
CALL db.schema.relTypeProperties()
CALL db.schema.nodeTypeProperties()
CALL db.propertyKeys()
## syntax
MATCH (variable:Label {propertyKey: propertyValue, propertyKey2: propertyValue2})
RETURN variable
## relationships
() // a node
()--() // 2 nodes have some type of relationship
()-[]-() // 2 nodes have some type of relationship
@ -27,31 +30,35 @@ RETURN node1, node2
MATCH (node1)-[:REL_TYPEA | REL_TYPEB]->(node2)
RETURN node1, node2
## show node with name "Tom Hanks"
MATCH (tom {name: "Tom"}) RETURN tom
## return all nodes in database
MATCH (a:Person) WHERE a.name = "Tom" RETURN a
MATCH (a:Person) RETURN a.name
## with where clause
match (a:Movie)
where a.released >= 1990 and a.released < 1999
return a.title;
## a list of all properties that match a string
## a list of all properties that match a string
MATCH (n) WITH keys(n) AS p UNWIND p AS x WITH DISTINCT x WHERE x =~ ".*" RETURN collect(x) AS SET;
## delete all nodes and relations
MATCH (n)
DETACH DELETE n
## create
```cypher
create (:Person {name = 'jan', age = 32})
```
match(n:Person {age: 32}) return n
@ -78,7 +85,6 @@ WHERE directors >= 2
OPTIONAL MATCH (p:Person)-[:REVIEWED]->(m)
RETURN m.title, p.name
match (a:Person), (m:Movie), (b:Person)
where a.name = 'Liam Neeson'
and b.name = 'Benjamin Melniker'
@ -120,25 +126,33 @@ return p
match (p:Person {name:'Robert Zemeckis'}), (m:Movie {title:'Forrest Gump'})
merge (p)-[r:DIRECTED]->(m)
return p,r,m
```
## constrain uniqueness
```
CREATE CONSTRAINT UniqueMovieTitleConstraint
ON (m:Movie)
ASSERT m.title IS UNIQUE
```
## constrain uniqueness over two properties
## only enterprise edition
CREATE CONSTRAINT UniqueNameBornConstraint
ON (p:Person)
ASSERT (p.name, p.born) IS NODE KEY
## needs enterprise edition of neo4j
create constraint PersonBornExistsConstraint on (p:Person)
assert exists(p.born)
## existence constraint (possible for node
CREATE CONSTRAINT ExistsMovieTagline
ON (m:Movie)
ASSERT exists(m.tagline)
@ -146,79 +160,100 @@ CREATE CONSTRAINT ExistsMovieTagline
DROP CONSTRAINT MovieTitleConstraint
## existence constraint for relationship
## only enterprise edition of neo4j
CREATE CONSTRAINT ExistsREVIEWEDRating
ON ()-[rel:REVIEWED]-()
ASSERT exists(rel.rating)
## drop constraint
DROP CONSTRAINT ExistsREVIEWEDRating
CALL db.constraints() better SHOW CONSTRAINTS
## Indexes
## Single property index
CREATE INDEX MovieReleased FOR (m:Movie) ON (m.released)
## composite index
CREATE INDEX MovieReleasedVideoFormat
FOR (m:Movie)
ON (m.released, m.videoFormat)
## full-text schema index
CALL db.index.fulltext.createNodeIndex(
'MovieTitlePersonName',['Movie', 'Person'], ['title', 'name'])
### To use a full-text schema index, you must call the query procedure that uses the index.
### To use a full-text schema index, you must call the query procedure that uses the index
CALL db.index.fulltext.queryNodes(
'MovieTitlePersonName', 'Jerry')
YIELD node, score
RETURN node.title, score
### Searching on a particular property
CALL db.index.fulltext.queryNodes(
'MovieTitlePersonName', 'name: Jerry') YIELD node
RETURN node
## drop index
DROP INDEX MovieReleasedVideoFormat
## dropping full-text schema index
CALL db.index.fulltext.drop('MovieTitlePersonName')
## search a full-text schema index
CALL db.index.fulltext.queryNodes('MovieTaglineFTIndex', 'real OR world')
YIELD node
RETURN node.title, node.tagline
## set parameters
:param year => 2000
:params {actorName: 'Tom Cruise', movieName: 'Top Gun'}
## for statement
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName AND m.title = $movieName
RETURN p, m
## clear
:params {}
## view
:params
## Analyzing queries
- EXPLAIN provides estimates of the graph engine processing that will occur, but does not execute the Cypher statement.
- PROFILE provides real profiling information for what has occurred in the graph engine during the query and executes the Cypher statement. (run-time performance metrics)
## Monitoring queries
:queries
## exercise
:params {year:2006, ratingValue:65}
match (p:Person)-[r:REVIEWED]->(m:Movie)<-[:ACTED_IN]-(a:Person)
where m.released = $year and r.rating = $ratingValue
return p.name, m.title, m.released, r.rating, collect(a.name)
:auto USING PERIODIC COMMIT LOAD CSV
commit every 1000 rows
Eager operators don't act on this command, ie:
@ -241,15 +276,14 @@ MERGE (actor:Person {name: line.name})
ON CREATE SET actor.born = toInteger(trim(line.birthYear)), actor.actorId = line.id
ON MATCH SET actor.actorId = line.id
## before load
CREATE CONSTRAINT UniqueMovieIdConstraint ON (m:Movie) ASSERT m.id IS UNIQUE;
## after load
CREATE INDEX MovieTitleIndex ON (m:Movie) FOR (m.title);
// Delete all constraints and indexes
CALL apoc.schema.assert({},{},true);
// Delete all nodes and relationships
@ -259,19 +293,19 @@ CALL apoc.periodic.iterate(
{ batchSize:500 }
)
## test apoc
CALL dbms.procedures()
YIELD name WHERE name STARTS WITH "apoc"
RETURN name
## Graph modelling
How does Neo4j support graph data modeling?
- allows you to create property graphs.
- traversing the graph: traversal means anchoring a query based upon a property value, then traversing the graph to satisfy the query
Nodes and relationships are the key components of a graph.
Nodes must have labels to categorize entities.
A label is used to categorize a set of nodes.
@ -280,22 +314,26 @@ A relationship is only traversed once during a query.
Nodes and relationships can have properties.
Properties are used to provide specific values to a node or relationship.
## Your model must address Nodes:
## Your model must address Nodes
- Uniqueness of nodes: always have a property (or set of properties) that uniquely identify a node.
- Complex data: balance between number of properties that represent complex data vs. multiple nodes and relationships.
super nodes = (a node with lots of fan-in or fan-out)
- Reduce property duplication (no repeating property values)
- Reduce gather-and-inspect (traversal)
## Best practices for modeling relationships
- Using specific relationship types.
- Reducing symmetric relationships.
- No semantically identical relationships (PARENT_OF and CHILD_OF)
- Not all mutual relationships are semantically symmetric(FOLLOWS)
- No semantically identical relationships (PARENT_OF and CHILD_OF)
- Not all mutual relationships are semantically symmetric(FOLLOWS)
- Using types vs. properties.
## Property best practices
In the case of property value complexity, it depends on how the property is used. Anchors and traversal paths that use property values need to be parsed at query time.
- Property lookups have a cost.
@ -304,6 +342,7 @@ In the case of property value complexity, it depends on how the property is used
- Identifiers, outputs, and decoration are OK as complex values.
## Hierarchy of accessibility
1. Anchor node label, indexed anchor node properties (cheap)
2. Relationship types (cheap)
3. Non-indexed anchor node properties
@ -312,26 +351,25 @@ In the case of property value complexity, it depends on how the property is used
Downstream labels and properties are most expensive.
## Common graph structures used in modeling:
## Common graph structures used in modeling
1. Intermediate nodes
- (solve hyperedge; n-ary relationships)
- sharing context (share contextual information)
- sharing data (deduplicate information)
- organizing data (avoid density of nodes)
- (solve hyperedge; n-ary relationships)
- sharing context (share contextual information)
- sharing data (deduplicate information)
- organizing data (avoid density of nodes)
2. Linked lists (useful whenever the sequence of objects matters)
- Interleaved linked list
- Head and tail of linked list (root point to head and tail)
- No double linked-lists (redundant symmetrical relationships)
- Interleaved linked list
- Head and tail of linked list (root point to head and tail)
- No double linked-lists (redundant symmetrical relationships)
3. Timeline trees
- use time as either an anchor or a navigational aid
- topmost node in the timeline is an “all time” node
- timeline trees consume a lot of space
- use time as either an anchor or a navigational aid
- topmost node in the timeline is an “all time” node
- timeline trees consume a lot of space
4. Multiple structures in a single graph
CREATE (:Airport {code: "ABQ"})<-[:CONNECTED_TO {airline: "WN", flightNumber: 500, date: "2019-1-3", depature: 1445, arrival: 1710}]-(:Airport {code: "LAS"})-[:CONNECTED_TO {airline: "WN", flightNumber: 82, date: "2019-1-3", depature: 1715, arrival: 1820}]->(:Airport {code: "LAX"})
LOAD CSV WITH HEADERS FROM 'file:///flights_2019_1k.csv' AS row
MERGE (origin:Airport {code: row.Origin})
MERGE (destination:Airport {code: row.Dest})
@ -340,4 +378,3 @@ MERGE (origin)-[connection:CONNECTED_TO {
flightNumber: row.FlightNum,
date: toInteger(row.Year) + '-' + toInteger(row.Month) + '-' + toInteger(row.DayofMonth)}]->(destination)
ON CREATE SET connection.departure = toInteger(row.CRSDepTime), connection.arrival = toInteger(row.CRSArrTime)