--- title: Intermediate Cypher Queries updated: 2022-08-08 19:42:32Z created: 2022-08-01 13:15:27Z --- # Filtering Queries ``` CALL db.schema.visualization() CALL db.schema.nodeTypeProperties() CALL db.schema.relTypeProperties() SHOW CONSTRAINTS :HISTORY :USE database ``` check multiple labels ``` match (p) where p:Actor:Director and p.born.year >= 1950 and p.born.year <= 1959 return count(p) ``` ``` MATCH (p:Director)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(p) WHERE "German" IN m.languages return p.name, labels(p), m.title ``` ``` match (n)-[a]->(m:Movie) where (n:Actor or n:Director) and toUpper(a.role) contains 'DOG' return n.name, m.title, a.role ``` ### Difference EXPLAIN vs PROFILE - EXPLAIN provides estimates of the query steps - PROFILE provides the exact steps and number of rows retrieved for the query. Providing you are simply querying the graph and not updating anything, it is fine to execute the query multiple times using **PROFILE**. In fact, as part of query tuning, you should _execute the query at least twice_ as the first execution involves the generation of the execution plan which is then cached. That is, the first PROFILE of a query will always be more expensive than subsequent queries. Useful use of exists to exclude patterns in the graph ``` MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE p.name = 'Tom Hanks' AND NOT exists {(p)-[:DIRECTED]->(m)} RETURN m.title ``` If you profile this query, you will find that it is not performant, but it is the only way to perform this query. ### Multiple MATCH Clauses ``` MATCH (a:Person)-[:ACTED_IN]->(m:Movie), (m)<-[:DIRECTED]-(d:Person) WHERE m.year > 2000 RETURN a.name, m.title, d.name ``` In general, using a single MATCH clause will perform better than multiple MATCH clauses. This is because relationship uniquness is enforced so there are fewer relationships traversed. Same as above ``` MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person) WHERE m.year > 2000 RETURN a.name, m.title, d.name ``` ### Optionally matching rows ``` MATCH (m:Movie) WHERE m.title = "Kiss Me Deadly" MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie) OPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(rec) RETURN rec.title, a.name ``` This query returns rows where the pattern where an actor acted in both movies is optional and a null value is returned for any row that has no value. In general, and depending on your graph, an optional match will return more rows. ## Controlling Results Returned ### Ordering Returned Results ``` MATCH (p:Person) WHERE p.born.year = 1980 RETURN p.name AS name, p.born AS birthDate ORDER BY birthDate DESC , name ASC ``` ### Limiting results; Skipping some results ``` MATCH (p:Person) WHERE p.born.year = 1980 RETURN p.name as name, p.born AS birthDate ORDER BY p.born SKIP 40 LIMIT 10 ``` In this query, we return 10 rows representing page 5, where each page contains 10 rows. ``` MATCH (p:Person)-[:ACTED_IN| DIRECTED]->(m) WHERE m.title = 'Toy Story' MATCH (p)-[:ACTED_IN]->()<-[:ACTED_IN]-(p2:Person) RETURN p.name, p2.name ``` Returns the names people who acted or directed the movie Toy Story and then retrieves all people who acted in the same movie. ### Map projections ``` MATCH (p:Person) WHERE p.name CONTAINS "Thomas" RETURN p { .* } AS person ORDER BY p.name ASC ``` ``` MATCH (p:Person) WHERE p.name CONTAINS "Thomas" RETURN p { .name, .born } AS person ORDER BY p.name ``` ``` MATCH (m:Movie)<-[:DIRECTED]-(d:Director) WHERE d.name = 'Woody Allen' RETURN m {.*, favorite: true} AS movie ``` Returning a property of favorite with a value of true for each Movie object returned. ``` MATCH (m:Movie)<-[:ACTED_IN]-(p:Person) WHERE p.name = 'Henry Fonda' RETURN m.title AS movie, CASE WHEN m.year < 1940 THEN 'oldies' WHEN 1940 <= m.year < 1950 THEN 'forties' WHEN 1950 <= m.year < 1960 THEN 'fifties' WHEN 1960 <= m.year < 1970 THEN 'sixties' WHEN 1970 <= m.year < 1980 THEN 'seventies' WHEN 1980 <= m.year < 1990 THEN 'eighties' WHEN 1990 <= m.year < 2000 THEN 'nineties' ELSE 'two-thousands' END AS timeFrame ``` # Aggregating Data If a aggregation function like count() is used, all non-aggregated result columns become grouping keys. _If you specify **count(n)**, the graph engine calculates the number of non-null occurrences of n. If you specify \*\*count(_)\*_, the graph engine calculates the number of rows retrieved, including those with null values._ ### Returning a list ``` MATCH (p:Person) RETURN p.name, [p.born, p.died] AS lifeTime LIMIT 10 ``` ``` MATCH (a:Person)-[:ACTED_IN]->(m:Movie) WHERE m.year = 1920 RETURN collect( DISTINCT m.title) AS movies, collect( a.name) AS actors ``` ``` MATCH (a:Person)-[:ACTED_IN]->(m:Movie) RETURN m.title AS movie, collect(a.name)[2..] AS castMember, size(collect(a.name)) as castSize ``` return a slice of a collection. ### List comprehension ``` MATCH (m:Movie) RETURN m.title as movie, [x IN m.countries WHERE x = 'USA' OR x = 'Germany'] AS country LIMIT 500 ``` ### Pattern comprehension ``` MATCH (m:Movie) WHERE m.year = 2015 RETURN m.title, [(dir:Person)-[:DIRECTED]->(m) | dir.name] AS directors, [(actor:Person)-[:ACTED_IN]->(m) | actor.name] AS actors ``` For pattern comprehension specify the list with the square braces to include the pattern followed by the pipe character to then specify what value will be placed in the list from the pattern. ``` [ | value] ``` MATCH (a:Person {name: 'Tom Hanks'}) RETURN [(a)-->(b:Movie) WHERE b.title CONTAINS "Toy" | b.title + ": " + b.year] AS movies ``` ### Working with maps A Cypher map is list of key/value pairs where each element of the list is of the format 'key': value. ``` RETURN {Jan: 31, Feb: 28, Mar: 31, Apr: 30 , May: 31, Jun: 30 , Jul: 31, Aug: 31, Sep: 30, Oct: 31, Nov: 30, Dec: 31}['Feb'] AS daysInFeb Also with dot notation Dec: 31}.Feb AS daysInFeb ### Map projections ``` MATCH (m:Movie) WHERE m.title CONTAINS 'Matrix' RETURN m { .title, .released } AS movie ``` # Working with Dates and Times ``` RETURN date(), datetime(), time() ``` ``` CALL apoc.meta.nodeTypeProperties() ``` List node properties ``` MATCH (x:Test {id: 1}) RETURN x.date.day, x.date.year, x.datetime.year, x.datetime.hour, x.datetime.minute ``` Extract date components ``` MATCH (x:Test {id: 1}) SET x.datetime1 = datetime('2022-01-04T10:05:20'), x.datetime2 = datetime('2022-04-09T18:33:05') RETURN x ``` `Date property using a string.` ``` MATCH (x:Test {id: 1}) RETURN duration.between(x.date1,x.date2) RETURN duration.inDays(x.datetime1,x.datetime2).days RETURN x.date1 + duration({months: 6}) ``` ### APOC to format dates and times ``` MATCH (x:Test {id: 1}) RETURN x.datetime as Datetime, apoc.temporal.format( x.datetime, 'HH:mm:ss.SSSS') AS formattedDateTime ``` # Graph Traversal ### Anchor of a query Execution plan determines the set of nodes, which are the starting points for the query. The anchor is ostly based on the match clause. The anchor is typically determined by meta-data that is stored in the graph or a filter that is provided inline or in a WHERE clause. The anchor for a query will be based upon the fewest number of nodes that need to be retrieved into memory. # Varying Length Traversal ``` MATCH p = shortestPath((p1:Person)-[*]-(p2:Person)) WHERE p1.name = "Eminem" AND p2.name = "Charlton Heston" RETURN p ``` shortest path, regardless of relations ``` MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*2]-(others:Person) RETURN others.name ``` Two hops away from Eminem using the ACTED_IN relationship ``` MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*1..4]-(others:Person) RETURN others.name ``` 1 to 4 nodes; all connections of the connectod nodes; 4 deep # Pipelining Queries ``` MATCH (n:Movie) WHERE n.imdbRating IS NOT NULL AND n.poster IS NOT NULL WITH n { .title, .year, .languages, .plot, .poster, .imdbRating, directors: [ (n)<-[:DIRECTED]-(d) | d { tmdbId:d.imdbId, .name } ] } ORDER BY n.imdbRating DESC LIMIT 4 RETURN collect(n) ``` ``` WITH 'Clint Eastwood' AS a, 'high' AS t MATCH (p:Person)-[:ACTED_IN]->(m:Movie) with p, m, toLower(m.title) as movieTitle WHERE p.name = a AND movieTitle CONTAINS t RETURN p.name AS actor, m.title AS movie ``` ``` WITH 'Tom Hanks' AS theActor MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE p.name = theActor AND m.revenue IS NOT NULL with m order by m.revenue desc limit 1 // Use WITH here to limit the movie node to 1 and order it by revenue RETURN m.revenue AS revenue, m.title AS title ``` ``` MATCH (n:Movie) WHERE n.imdbRating IS NOT NULL and n.poster IS NOT NULL with n { .title, .imdbRating, actors: [(a)-[:ACTED_IN]->(n) | a {name:a.name, .name}], genre: [(n)-[:IN_GENRE]->(g) | g {name:g.name, .name}]} ORDER BY n.imdbRating DESC LIMIT 4 with collect(n.actors) as a unwind a as b unwind b as listB return listB.name, count(listB.name) order by listB.name ``` # Pipelining Queries ### Aggregation and pipelining ``` MATCH (:Movie {title: 'Toy Story'})-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(m) WHERE m.imdbRating IS NOT NULL WITH g.name AS genre, count(m) AS moviesInCommon, sum(m.imdbRating) AS total RETURN genre, moviesInCommon, total/moviesInCommon AS score ORDER By score DESC ``` ``` MATCH (u:User {name: "Misty Williams"})-[r:RATED]->(:Movie) WITH u, avg(r.rating) AS average MATCH (u)-[r:RATED]->(m:Movie) WHERE r.rating > average RETURN average , m.title AS movie, r.rating as rating ORDER BY rating DESC ``` ### Using WITH for collecting ``` MATCH (m:Movie)--(a:Actor) WHERE m.title CONTAINS 'New York' WITH m, collect (a.name) AS actors, count(*) AS numActors RETURN m.title AS movieTitle, actors ORDER BY numActors DESC ``` ``` MATCH (m:Movie)<-[:ACTED_IN]-(a:Actor) WHERE m.title CONTAINS 'New York' WITH m, collect (a.name) AS actors, count(*) AS numActors ORDER BY numActors DESC RETURN collect(m { .title, actors, numActors }) AS movies ``` ### Using LIMIT early ``` MATCH (p:Actor) WHERE p.born.year = 1980 WITH p LIMIT 3 MATCH (p)-[:ACTED_IN]->(m:Movie)-[:IN_GENRE]->(g:Genre) WITH p, collect(DISTINCT g.name) AS genres RETURN p.name AS actor, genres ``` ``` Match (a:Actor)-[:ACTED_IN]->(m) where a.name = 'Tom Hanks' with m match (m)<-[r:RATED]-(u) with m, avg(r.rating) as rating return rating, m.title order by rating desc limit 1 ``` # Unwinding Lists ``` MATCH (m:Movie) UNWIND m.languages AS lang WITH m, trim(lang) AS language // this automatically, makes the language distinct because it's a grouping key WITH language, collect(m.title) AS movies RETURN language, movies[0..10] ``` # Reducing Memory (CALL, UNION) MATCH clauses exceed the VM configured, the query will fail. A subquery is a set of Cypher statements that execute within their own scope. Important things to know about a subquery: - A subquery returns values referred to by the variables in the RETURN clause. - A subquery cannot return variables with the same name used in the enclosing query. - You must explicitly pass in variables from the enclosing query to a subquery. ### CALL ``` MATCH (m:Movie) CALL { WITH m MATCH (m)<-[r:RATED]-(u:User) WHERE r.rating = 5 RETURN count(u) AS numReviews } RETURN m.title, numReviews ORDER BY numReviews DESC ``` ### UNION [ALL] ``` MATCH (p:Person) WITH p LIMIT 100 CALL { WITH p OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie) RETURN m.title + ": " + "Actor" AS work UNION WITH p OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie) RETURN m.title+ ": " + "Director" AS work } RETURN p.name, collect(work) ``` ``` MATCH (g:Genre) call { with g match (m:Movie)-[:IN_GENRE]->(g) where 'France' in m.countries return count(m) as numMovies } RETURN g.name AS genre, numMovies ORDER BY numMovies DESC ``` # Using Parameters ``` :params {actorName: 'Tom Cruise', movieName: 'Top Gun'} ``` ``` MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE p.name = $actorName RETURN m.released AS releaseDate, m.title AS title ORDER BY m.released DESC ``` ``` :params {actorName: 'Tom Cruise', movieName: 'Top Gun', l:2} MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = $movieName RETURN p.name LIMIT $l ``` ### Setting Integers :param number: 10 >>>> will be converted to float!!!!! :param number=> 10 >>>>> remains an integer!!!! ``` :param ``` to view all set parameters ``` :param {} ``` clear all set parameters # Application Examples Using Parameters ``` def get_actors(tx, movieTitle): # (1) result = tx.run(""" MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WHERE m.title = $title RETURN p """, title=movieTitle) # Access the `p` value from each record return [ record["p"] for record in result ] with driver.session() as session: result = session.read_transaction(get_actors, movieTitle="Toy Story") ```