Summaries/Databases/Neo4j/Refactoring Imported Data.md

2.6 KiB

title updated created
Refactoring Imported Data 2022-07-31 20:32:04Z 2022-07-31 17:34:56Z

Transforming String Properties to Dates

Converting to Date values

date(property)

Test for empty string

MATCH (p:Person)
SET p.born = CASE p.born WHEN "" THEN null ELSE date(p.born) END
WITH p
SET p.died = CASE p.died WHEN "" THEN null ELSE date(p.died) END

List all stored node types in the database:

CALL apoc.meta.nodeTypeProperties()

List all stored relation types in the database:

CALL apoc.meta.relTypeProperties()

Transforming Multi-value Properties

Transform Strings to Lists

  • list
  • same type

Transform to list from string with seperator eq "|"

MATCH (m:Movie)
SET m.countries = split(coalesce(m.countries,""), "|")

Transform a multi-value property to a list of strings => StringArray in database

Adding labels

Labels is a best practice so that key queries will perform better, especially when the graph is large.

MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p SET p:Actor

Refactoring Properties as Nodes

Increase performance For unique properties, like id's, create indexes. Best practice is to always have a unique ID for every type of node in the graph. View defined constraints in database

SHOW CONSTRAINTS

Before using MERGE, create first a unique constaint

CREATE CONSTRAINT Genre_name ON (g:Genre) ASSERT g.name IS UNIQUE
CREATE CONSTRAINT Genre_name IF NOT EXISTS ON (x:Genre) ASSERT x.name IS UNIQUE

Creating the new nodes from a node property

MATCH (m:Movie)
UNWIND m.genres AS genre
WITH m, genre
MERGE (g:Genre {name:genre})
MERGE (m)-[:IN_GENRE]->(g)

eq

unwind ['aap','olifant'] as a
return a

Removing a node property, set it to NULL

MATCH (m:Movie)
SET m.genres = null

Show the schema of a database

CALL db.schema.visualization

Importing Large Datasets with Cypher

Data Importer can be used for small to medium datasets containing less than 1M rows

In Cypher, by default, the execution of your code is a single transaction. Break up the execution of the Cypher into multiple transactions. Reduces the amount of memory needed for the import. In Neo4j:

:auto USING PERIODIC COMMIT LOAD CSV ....

The advantage of performing the import in multiple passes is that you can check the graph after each import to see if it is getting closer to the data model. If the CSV file were extremely large, you might want to consider a single pass.