12 KiB
title | updated | created |
---|---|---|
Intermediate Cypher Queries | 2022-08-08 19:42:32Z | 2022-08-01 13:15:27Z |
Filtering Queries
CALL db.schema.visualization()
CALL db.schema.nodeTypeProperties()
CALL db.schema.relTypeProperties()
SHOW CONSTRAINTS
:HISTORY
:USE database
check multiple labels
match (p)
where p:Actor:Director
and p.born.year >= 1950 and p.born.year <= 1959
return count(p)
MATCH (p:Director)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(p)
WHERE "German" IN m.languages
return p.name, labels(p), m.title
match (n)-[a]->(m:Movie)
where (n:Actor or n:Director)
and toUpper(a.role) contains 'DOG'
return n.name, m.title, a.role
Difference EXPLAIN vs PROFILE
- EXPLAIN provides estimates of the query steps
- PROFILE provides the exact steps and number of rows retrieved for the query.
Providing you are simply querying the graph and not updating anything, it is fine to execute the query multiple times using PROFILE. In fact, as part of query tuning, you should execute the query at least twice as the first execution involves the generation of the execution plan which is then cached. That is, the first PROFILE of a query will always be more expensive than subsequent queries.
Useful use of exists to exclude patterns in the graph
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
AND NOT exists {(p)-[:DIRECTED]->(m)}
RETURN m.title
If you profile this query, you will find that it is not performant, but it is the only way to perform this query.
Multiple MATCH Clauses
MATCH (a:Person)-[:ACTED_IN]->(m:Movie),
(m)<-[:DIRECTED]-(d:Person)
WHERE m.year > 2000
RETURN a.name, m.title, d.name
In general, using a single MATCH clause will perform better than multiple MATCH clauses. This is because relationship uniquness is enforced so there are fewer relationships traversed.
Same as above
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE m.year > 2000
RETURN a.name, m.title, d.name
Optionally matching rows
MATCH (m:Movie) WHERE m.title = "Kiss Me Deadly"
MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
OPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(rec)
RETURN rec.title, a.name
This query returns rows where the pattern where an actor acted in both movies is optional and a null value is returned for any row that has no value. In general, and depending on your graph, an optional match will return more rows.
Controlling Results Returned
Ordering Returned Results
MATCH (p:Person)
WHERE p.born.year = 1980
RETURN p.name AS name, p.born AS birthDate
ORDER BY birthDate DESC , name ASC
Limiting results; Skipping some results
MATCH (p:Person)
WHERE p.born.year = 1980
RETURN p.name as name,
p.born AS birthDate
ORDER BY p.born SKIP 40 LIMIT 10
In this query, we return 10 rows representing page 5, where each page contains 10 rows.
MATCH (p:Person)-[:ACTED_IN| DIRECTED]->(m)
WHERE m.title = 'Toy Story'
MATCH (p)-[:ACTED_IN]->()<-[:ACTED_IN]-(p2:Person)
RETURN p.name, p2.name
Returns the names people who acted or directed the movie Toy Story and then retrieves all people who acted in the same movie.
Map projections
MATCH (p:Person)
WHERE p.name CONTAINS "Thomas"
RETURN p { .* } AS person
ORDER BY p.name ASC
MATCH (p:Person)
WHERE p.name CONTAINS "Thomas"
RETURN p { .name, .born } AS person
ORDER BY p.name
MATCH (m:Movie)<-[:DIRECTED]-(d:Director)
WHERE d.name = 'Woody Allen'
RETURN m {.*, favorite: true} AS movie
Returning a property of favorite with a value of true for each Movie object returned.
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WHERE p.name = 'Henry Fonda'
RETURN m.title AS movie,
CASE
WHEN m.year < 1940 THEN 'oldies'
WHEN 1940 <= m.year < 1950 THEN 'forties'
WHEN 1950 <= m.year < 1960 THEN 'fifties'
WHEN 1960 <= m.year < 1970 THEN 'sixties'
WHEN 1970 <= m.year < 1980 THEN 'seventies'
WHEN 1980 <= m.year < 1990 THEN 'eighties'
WHEN 1990 <= m.year < 2000 THEN 'nineties'
ELSE 'two-thousands'
END
AS timeFrame
Aggregating Data
If a aggregation function like count() is used, all non-aggregated result columns become grouping keys.
If you specify count(n), the graph engine calculates the number of non-null occurrences of n. If you specify **count()*, the graph engine calculates the number of rows retrieved, including those with null values.
Returning a list
MATCH (p:Person)
RETURN p.name, [p.born, p.died] AS lifeTime
LIMIT 10
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.year = 1920
RETURN collect( DISTINCT m.title) AS movies,
collect( a.name) AS actors
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN m.title AS movie,
collect(a.name)[2..] AS castMember,
size(collect(a.name)) as castSize
return a slice of a collection.
List comprehension
MATCH (m:Movie)
RETURN m.title as movie,
[x IN m.countries WHERE x = 'USA' OR x = 'Germany']
AS country LIMIT 500
Pattern comprehension
MATCH (m:Movie)
WHERE m.year = 2015
RETURN m.title,
[(dir:Person)-[:DIRECTED]->(m) | dir.name] AS directors,
[(actor:Person)-[:ACTED_IN]->(m) | actor.name] AS actors
For pattern comprehension specify the list with the square braces to include the pattern followed by the pipe character to then specify what value will be placed in the list from the pattern.
[<pattern> | value]
MATCH (a:Person {name: 'Tom Hanks'}) RETURN [(a)-->(b:Movie) WHERE b.title CONTAINS "Toy" | b.title + ": " + b.year] AS movies
### Working with maps
A Cypher map is list of key/value pairs where each element of the list is of the format 'key': value.
RETURN {Jan: 31, Feb: 28, Mar: 31, Apr: 30 , May: 31, Jun: 30 , Jul: 31, Aug: 31, Sep: 30, Oct: 31, Nov: 30, Dec: 31}['Feb'] AS daysInFeb
Also with dot notation Dec: 31}.Feb AS daysInFeb
Map projections
MATCH (m:Movie)
WHERE m.title CONTAINS 'Matrix'
RETURN m { .title, .released } AS movie
Working with Dates and Times
RETURN date(), datetime(), time()
CALL apoc.meta.nodeTypeProperties()
List node properties
MATCH (x:Test {id: 1})
RETURN x.date.day, x.date.year,
x.datetime.year, x.datetime.hour,
x.datetime.minute
Extract date components
MATCH (x:Test {id: 1})
SET x.datetime1 = datetime('2022-01-04T10:05:20'),
x.datetime2 = datetime('2022-04-09T18:33:05')
RETURN x
Date property using a <ISO-date> string.
MATCH (x:Test {id: 1})
RETURN duration.between(x.date1,x.date2)
RETURN duration.inDays(x.datetime1,x.datetime2).days
RETURN x.date1 + duration({months: 6})
APOC to format dates and times
MATCH (x:Test {id: 1})
RETURN x.datetime as Datetime,
apoc.temporal.format( x.datetime, 'HH:mm:ss.SSSS')
AS formattedDateTime
Graph Traversal
Anchor of a query
Execution plan determines the set of nodes, which are the starting points for the query. The anchor is ostly based on the match clause. The anchor is typically determined by meta-data that is stored in the graph or a filter that is provided inline or in a WHERE clause. The anchor for a query will be based upon the fewest number of nodes that need to be retrieved into memory.
Varying Length Traversal
MATCH p = shortestPath((p1:Person)-[*]-(p2:Person))
WHERE p1.name = "Eminem"
AND p2.name = "Charlton Heston"
RETURN p
shortest path, regardless of relations
MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*2]-(others:Person)
RETURN others.name
Two hops away from Eminem using the ACTED_IN relationship
MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*1..4]-(others:Person)
RETURN others.name
1 to 4 nodes; all connections of the connectod nodes; 4 deep
Pipelining Queries
MATCH (n:Movie)
WHERE n.imdbRating IS NOT NULL
AND n.poster IS NOT NULL
WITH n {
.title,
.year,
.languages,
.plot,
.poster,
.imdbRating,
directors: [ (n)<-[:DIRECTED]-(d) | d { tmdbId:d.imdbId, .name } ]
}
ORDER BY n.imdbRating DESC LIMIT 4
RETURN collect(n)
WITH 'Clint Eastwood' AS a, 'high' AS t
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
with p, m, toLower(m.title) as movieTitle
WHERE p.name = a
AND movieTitle CONTAINS t
RETURN p.name AS actor, m.title AS movie
WITH 'Tom Hanks' AS theActor
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = theActor
AND m.revenue IS NOT NULL
with m order by m.revenue desc limit 1
// Use WITH here to limit the movie node to 1 and order it by revenue
RETURN m.revenue AS revenue, m.title AS title
MATCH (n:Movie)
WHERE n.imdbRating IS NOT NULL and n.poster IS NOT NULL
with n {
.title,
.imdbRating,
actors: [(a)-[:ACTED_IN]->(n) | a {name:a.name, .name}],
genre: [(n)-[:IN_GENRE]->(g) | g {name:g.name, .name}]}
ORDER BY n.imdbRating DESC LIMIT 4
with collect(n.actors) as a
unwind a as b
unwind b as listB
return listB.name, count(listB.name)
order by listB.name
Pipelining Queries
Aggregation and pipelining
MATCH (:Movie {title: 'Toy Story'})-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(m)
WHERE m.imdbRating IS NOT NULL
WITH
g.name AS genre,
count(m) AS moviesInCommon,
sum(m.imdbRating) AS total
RETURN
genre, moviesInCommon,
total/moviesInCommon AS score
ORDER By score DESC
MATCH (u:User {name: "Misty Williams"})-[r:RATED]->(:Movie)
WITH u, avg(r.rating) AS average
MATCH (u)-[r:RATED]->(m:Movie)
WHERE r.rating > average
RETURN
average , m.title AS movie,
r.rating as rating
ORDER BY rating DESC
Using WITH for collecting
MATCH (m:Movie)--(a:Actor)
WHERE m.title CONTAINS 'New York'
WITH
m,
collect (a.name) AS actors,
count(*) AS numActors
RETURN
m.title AS movieTitle,
actors
ORDER BY numActors DESC
MATCH (m:Movie)<-[:ACTED_IN]-(a:Actor)
WHERE m.title CONTAINS 'New York'
WITH
m,
collect (a.name) AS actors,
count(*) AS numActors
ORDER BY numActors DESC
RETURN collect(m { .title, actors, numActors }) AS movies
Using LIMIT early
MATCH (p:Actor)
WHERE p.born.year = 1980
WITH p LIMIT 3
MATCH (p)-[:ACTED_IN]->(m:Movie)-[:IN_GENRE]->(g:Genre)
WITH
p,
collect(DISTINCT g.name) AS genres
RETURN p.name AS actor, genres
Match (a:Actor)-[:ACTED_IN]->(m)
where a.name = 'Tom Hanks'
with m
match (m)<-[r:RATED]-(u)
with
m,
avg(r.rating) as rating
return rating, m.title
order by rating desc
limit 1
Unwinding Lists
MATCH (m:Movie)
UNWIND m.languages AS lang
WITH
m,
trim(lang) AS language
// this automatically, makes the language distinct because it's a grouping key
WITH
language,
collect(m.title) AS movies
RETURN
language,
movies[0..10]
Reducing Memory (CALL, UNION)
MATCH clauses exceed the VM configured, the query will fail. A subquery is a set of Cypher statements that execute within their own scope.
Important things to know about a subquery:
- A subquery returns values referred to by the variables in the RETURN clause.
- A subquery cannot return variables with the same name used in the enclosing query.
- You must explicitly pass in variables from the enclosing query to a subquery.
CALL
MATCH (m:Movie)
CALL {
WITH m
MATCH (m)<-[r:RATED]-(u:User)
WHERE r.rating = 5
RETURN count(u) AS numReviews
}
RETURN m.title, numReviews
ORDER BY numReviews DESC
UNION [ALL]
MATCH (p:Person)
WITH p LIMIT 100
CALL {
WITH p
OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
RETURN m.title + ": " + "Actor" AS work
UNION
WITH p
OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
RETURN m.title+ ": " + "Director" AS work
}
RETURN p.name, collect(work)
MATCH (g:Genre)
call {
with g
match (m:Movie)-[:IN_GENRE]->(g)
where 'France' in m.countries
return count(m) as numMovies
}
RETURN g.name AS genre, numMovies
ORDER BY numMovies DESC
Using Parameters
:params {actorName: 'Tom Cruise', movieName: 'Top Gun'}
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName
RETURN m.released AS releaseDate,
m.title AS title
ORDER BY m.released DESC
:params {actorName: 'Tom Cruise', movieName: 'Top Gun', l:2}
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title = $movieName RETURN p.name LIMIT $l
Setting Integers
:param number: 10 >>>> will be converted to float!!!!! :param number=> 10 >>>>> remains an integer!!!!
:param
to view all set parameters
:param {}
clear all set parameters
Application Examples Using Parameters
def get_actors(tx, movieTitle): # (1)
result = tx.run("""
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title = $title
RETURN p
""", title=movieTitle)
# Access the `p` value from each record
return [ record["p"] for record in result ]
with driver.session() as session:
result = session.read_transaction(get_actors, movieTitle="Toy Story")