Summaries/Databases/Neo4j/Intermediate Cypher Queries.md

12 KiB

title updated created
Intermediate Cypher Queries 2022-08-08 19:42:32Z 2022-08-01 13:15:27Z

Filtering Queries

CALL db.schema.visualization()
CALL db.schema.nodeTypeProperties()
CALL db.schema.relTypeProperties()
SHOW CONSTRAINTS
:HISTORY
:USE database

check multiple labels

match (p)
where p:Actor:Director
and  p.born.year >= 1950 and p.born.year <= 1959
return count(p)
MATCH (p:Director)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(p)
WHERE "German" IN m.languages
return p.name, labels(p), m.title
match (n)-[a]->(m:Movie)
where (n:Actor or n:Director)
and toUpper(a.role) contains 'DOG'
return n.name, m.title, a.role

Difference EXPLAIN vs PROFILE

  • EXPLAIN provides estimates of the query steps
  • PROFILE provides the exact steps and number of rows retrieved for the query.

Providing you are simply querying the graph and not updating anything, it is fine to execute the query multiple times using PROFILE. In fact, as part of query tuning, you should execute the query at least twice as the first execution involves the generation of the execution plan which is then cached. That is, the first PROFILE of a query will always be more expensive than subsequent queries.

Useful use of exists to exclude patterns in the graph

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE  p.name = 'Tom Hanks'
AND NOT exists {(p)-[:DIRECTED]->(m)}
RETURN  m.title

If you profile this query, you will find that it is not performant, but it is the only way to perform this query.

Multiple MATCH Clauses

MATCH (a:Person)-[:ACTED_IN]->(m:Movie),
      (m)<-[:DIRECTED]-(d:Person)
WHERE m.year > 2000
RETURN a.name, m.title, d.name

In general, using a single MATCH clause will perform better than multiple MATCH clauses. This is because relationship uniquness is enforced so there are fewer relationships traversed.

Same as above

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE m.year > 2000
RETURN a.name, m.title, d.name

Optionally matching rows

MATCH (m:Movie) WHERE m.title = "Kiss Me Deadly"
MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
OPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(rec)
RETURN rec.title, a.name

This query returns rows where the pattern where an actor acted in both movies is optional and a null value is returned for any row that has no value. In general, and depending on your graph, an optional match will return more rows.

Controlling Results Returned

Ordering Returned Results

MATCH (p:Person)
WHERE p.born.year = 1980
RETURN p.name AS name, p.born AS birthDate
ORDER BY birthDate DESC , name ASC

Limiting results; Skipping some results

MATCH (p:Person)
WHERE p.born.year = 1980
RETURN  p.name as name,
p.born AS birthDate
ORDER BY p.born SKIP 40 LIMIT 10

In this query, we return 10 rows representing page 5, where each page contains 10 rows.

MATCH (p:Person)-[:ACTED_IN| DIRECTED]->(m)
WHERE m.title = 'Toy Story'
MATCH (p)-[:ACTED_IN]->()<-[:ACTED_IN]-(p2:Person)
RETURN  p.name, p2.name

Returns the names people who acted or directed the movie Toy Story and then retrieves all people who acted in the same movie.

Map projections

MATCH (p:Person)
WHERE p.name CONTAINS "Thomas"
RETURN p { .* } AS person
ORDER BY p.name ASC
MATCH (p:Person)
WHERE p.name CONTAINS "Thomas"
RETURN p { .name, .born } AS person
ORDER BY p.name
MATCH (m:Movie)<-[:DIRECTED]-(d:Director)
WHERE d.name = 'Woody Allen'
RETURN m {.*, favorite: true} AS movie

Returning a property of favorite with a value of true for each Movie object returned.

MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WHERE p.name = 'Henry Fonda'
RETURN m.title AS movie,
CASE
WHEN m.year < 1940 THEN 'oldies'
WHEN 1940 <= m.year < 1950 THEN 'forties'
WHEN 1950 <= m.year < 1960 THEN 'fifties'
WHEN 1960 <= m.year < 1970 THEN 'sixties'
WHEN 1970 <= m.year < 1980 THEN 'seventies'
WHEN 1980 <= m.year < 1990 THEN 'eighties'
WHEN 1990 <= m.year < 2000 THEN 'nineties'
ELSE  'two-thousands'
END
AS timeFrame

Aggregating Data

If a aggregation function like count() is used, all non-aggregated result columns become grouping keys.

If you specify count(n), the graph engine calculates the number of non-null occurrences of n. If you specify **count()*, the graph engine calculates the number of rows retrieved, including those with null values.

Returning a list

MATCH (p:Person)
RETURN p.name, [p.born, p.died] AS lifeTime
LIMIT 10
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.year = 1920
RETURN  collect( DISTINCT m.title) AS movies,
collect( a.name) AS actors
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN m.title AS movie,
collect(a.name)[2..] AS castMember,
size(collect(a.name)) as castSize

return a slice of a collection.

List comprehension

MATCH (m:Movie)
RETURN m.title as movie,
[x IN m.countries WHERE x = 'USA' OR x = 'Germany']
AS country LIMIT 500

Pattern comprehension

MATCH (m:Movie)
WHERE m.year = 2015
RETURN m.title,
[(dir:Person)-[:DIRECTED]->(m) | dir.name] AS directors,
[(actor:Person)-[:ACTED_IN]->(m) | actor.name] AS actors

For pattern comprehension specify the list with the square braces to include the pattern followed by the pipe character to then specify what value will be placed in the list from the pattern.

[<pattern> | value]

MATCH (a:Person {name: 'Tom Hanks'}) RETURN [(a)-->(b:Movie) WHERE b.title CONTAINS "Toy" | b.title + ": " + b.year] AS movies


### Working with maps

A Cypher map is list of key/value pairs where each element of the list is of the format 'key': value.

RETURN {Jan: 31, Feb: 28, Mar: 31, Apr: 30 , May: 31, Jun: 30 , Jul: 31, Aug: 31, Sep: 30, Oct: 31, Nov: 30, Dec: 31}['Feb'] AS daysInFeb

Also with dot notation Dec: 31}.Feb AS daysInFeb

Map projections

MATCH (m:Movie)
WHERE m.title CONTAINS 'Matrix'
RETURN m { .title, .released } AS movie

Working with Dates and Times

RETURN date(), datetime(), time()


CALL apoc.meta.nodeTypeProperties()

List node properties


MATCH (x:Test {id: 1})
RETURN x.date.day, x.date.year,
x.datetime.year, x.datetime.hour,
x.datetime.minute

Extract date components


MATCH (x:Test {id: 1})
SET x.datetime1 = datetime('2022-01-04T10:05:20'),
x.datetime2 = datetime('2022-04-09T18:33:05')
RETURN x

Date property using a <ISO-date> string.

MATCH (x:Test {id: 1})
RETURN duration.between(x.date1,x.date2)
RETURN duration.inDays(x.datetime1,x.datetime2).days
RETURN x.date1 + duration({months: 6})

APOC to format dates and times

MATCH (x:Test {id: 1})
RETURN x.datetime as Datetime,
apoc.temporal.format( x.datetime, 'HH:mm:ss.SSSS')
AS formattedDateTime

Graph Traversal

Anchor of a query

Execution plan determines the set of nodes, which are the starting points for the query. The anchor is ostly based on the match clause. The anchor is typically determined by meta-data that is stored in the graph or a filter that is provided inline or in a WHERE clause. The anchor for a query will be based upon the fewest number of nodes that need to be retrieved into memory.

Varying Length Traversal

MATCH p = shortestPath((p1:Person)-[*]-(p2:Person))
WHERE p1.name = "Eminem"
AND p2.name = "Charlton Heston"
RETURN p

shortest path, regardless of relations

MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*2]-(others:Person)
RETURN others.name

Two hops away from Eminem using the ACTED_IN relationship

MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*1..4]-(others:Person)
RETURN  others.name

1 to 4 nodes; all connections of the connectod nodes; 4 deep

Pipelining Queries

MATCH (n:Movie)
WHERE n.imdbRating IS NOT NULL
AND n.poster IS NOT NULL
WITH n {
  .title,
  .year,
  .languages,
  .plot,
  .poster,
  .imdbRating,
  directors: [ (n)<-[:DIRECTED]-(d) | d { tmdbId:d.imdbId, .name } ]
}
ORDER BY n.imdbRating DESC LIMIT 4
RETURN collect(n)
WITH  'Clint Eastwood' AS a, 'high' AS t
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
with p, m, toLower(m.title) as movieTitle
WHERE p.name = a
AND movieTitle CONTAINS t
RETURN p.name AS actor, m.title AS movie
WITH  'Tom Hanks' AS theActor
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = theActor
AND m.revenue IS NOT NULL
with m order by m.revenue desc limit 1
// Use WITH here to limit the movie node to 1 and order it by revenue
RETURN m.revenue AS revenue, m.title AS title
MATCH (n:Movie)
WHERE n.imdbRating IS NOT NULL and n.poster IS NOT NULL
with n {
    .title,
    .imdbRating,
    actors: [(a)-[:ACTED_IN]->(n) | a {name:a.name, .name}],
    genre: [(n)-[:IN_GENRE]->(g) | g {name:g.name, .name}]}
ORDER BY n.imdbRating DESC LIMIT 4
with collect(n.actors) as a
unwind a as b
unwind b as listB
return listB.name, count(listB.name)
order by listB.name

Pipelining Queries

Aggregation and pipelining

MATCH (:Movie {title: 'Toy Story'})-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(m)
WHERE m.imdbRating IS NOT NULL
WITH
	g.name AS genre,
	count(m) AS moviesInCommon,
	sum(m.imdbRating) AS total
RETURN
	genre, moviesInCommon,
	total/moviesInCommon AS score
ORDER By score DESC
MATCH (u:User {name: "Misty Williams"})-[r:RATED]->(:Movie)
WITH u, avg(r.rating) AS average
MATCH (u)-[r:RATED]->(m:Movie)
WHERE r.rating > average
RETURN
	average , m.title AS movie,
	r.rating as rating
ORDER BY rating DESC

Using WITH for collecting

MATCH (m:Movie)--(a:Actor)
WHERE m.title CONTAINS 'New York'
WITH
	m,
	collect (a.name) AS actors,
	count(*) AS numActors
RETURN
	m.title AS movieTitle,
	actors
ORDER BY numActors DESC
MATCH (m:Movie)<-[:ACTED_IN]-(a:Actor)
WHERE m.title CONTAINS 'New York'
WITH
	m,
	collect (a.name) AS actors,
	count(*) AS numActors
ORDER BY numActors DESC
RETURN collect(m { .title, actors, numActors }) AS movies

Using LIMIT early

MATCH (p:Actor)
WHERE p.born.year = 1980
WITH p  LIMIT 3
MATCH (p)-[:ACTED_IN]->(m:Movie)-[:IN_GENRE]->(g:Genre)
WITH
	p,
	collect(DISTINCT g.name) AS genres
RETURN p.name AS actor, genres
Match (a:Actor)-[:ACTED_IN]->(m)
where a.name = 'Tom Hanks'
with m
match (m)<-[r:RATED]-(u)
with
	m,
	avg(r.rating) as rating
return rating, m.title
order by rating desc
limit 1

Unwinding Lists

MATCH (m:Movie)
UNWIND m.languages AS lang
WITH
	m,
	trim(lang) AS language
// this automatically, makes the language distinct because it's a grouping key
WITH
	language,
	collect(m.title) AS movies
RETURN
	language,
	movies[0..10]

Reducing Memory (CALL, UNION)

MATCH clauses exceed the VM configured, the query will fail. A subquery is a set of Cypher statements that execute within their own scope.

Important things to know about a subquery:

  • A subquery returns values referred to by the variables in the RETURN clause.
  • A subquery cannot return variables with the same name used in the enclosing query.
  • You must explicitly pass in variables from the enclosing query to a subquery.

CALL

MATCH (m:Movie)
CALL {
    WITH m
    MATCH (m)<-[r:RATED]-(u:User)
     WHERE r.rating = 5
    RETURN count(u) AS numReviews
}
RETURN m.title, numReviews
ORDER BY numReviews DESC

UNION [ALL]

MATCH (p:Person)
WITH p LIMIT 100
CALL {
  WITH p
  OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
  RETURN m.title + ": " + "Actor" AS work
UNION
  WITH p
  OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
  RETURN m.title+ ": " +  "Director" AS work
}
RETURN p.name, collect(work)
MATCH (g:Genre)
call {
    with g
    match (m:Movie)-[:IN_GENRE]->(g)
    where 'France' in m.countries
    return count(m) as numMovies
}
RETURN g.name AS genre, numMovies
ORDER BY numMovies DESC

Using Parameters

:params {actorName: 'Tom Cruise', movieName: 'Top Gun'}
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName
RETURN m.released AS releaseDate,
m.title AS title
ORDER BY m.released DESC
:params {actorName: 'Tom Cruise', movieName: 'Top Gun', l:2}

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title = $movieName RETURN p.name LIMIT $l

Setting Integers

:param number: 10 >>>> will be converted to float!!!!! :param number=> 10 >>>>> remains an integer!!!!

:param

to view all set parameters

:param {}

clear all set parameters

Application Examples Using Parameters

def get_actors(tx, movieTitle): # (1)
  result = tx.run("""
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE m.title = $title
    RETURN p
  """, title=movieTitle)

  # Access the `p` value from each record
  return [ record["p"] for record in result ]

with driver.session() as session:
    result = session.read_transaction(get_actors, movieTitle="Toy Story")