2.6 KiB
title | updated | created |
---|---|---|
Refactoring Imported Data | 2022-07-31 20:32:04Z | 2022-07-31 17:34:56Z |
Transforming String Properties to Dates
Converting to Date values
date(property)
- correct data format eq "yyyy-mm-dd"
- not empty Cypher: Temporal (Date/Time) values
Test for empty string
MATCH (p:Person)
SET p.born = CASE p.born WHEN "" THEN null ELSE date(p.born) END
WITH p
SET p.died = CASE p.died WHEN "" THEN null ELSE date(p.died) END
List all stored node types in the database:
CALL apoc.meta.nodeTypeProperties()
List all stored relation types in the database:
CALL apoc.meta.relTypeProperties()
Transforming Multi-value Properties
Transform Strings to Lists
- list
- same type
Transform to list from string with seperator eq "|"
MATCH (m:Movie)
SET m.countries = split(coalesce(m.countries,""), "|")
Transform a multi-value property to a list of strings => StringArray in database
Adding labels
Labels is a best practice so that key queries will perform better, especially when the graph is large.
MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p SET p:Actor
Refactoring Properties as Nodes
Increase performance For unique properties, like id's, create indexes. Best practice is to always have a unique ID for every type of node in the graph. View defined constraints in database
SHOW CONSTRAINTS
Before using MERGE, create first a unique constaint
CREATE CONSTRAINT Genre_name ON (g:Genre) ASSERT g.name IS UNIQUE
CREATE CONSTRAINT Genre_name IF NOT EXISTS ON (x:Genre) ASSERT x.name IS UNIQUE
Creating the new nodes from a node property
MATCH (m:Movie)
UNWIND m.genres AS genre
WITH m, genre
MERGE (g:Genre {name:genre})
MERGE (m)-[:IN_GENRE]->(g)
eq
unwind ['aap','olifant'] as a
return a
Removing a node property, set it to NULL
MATCH (m:Movie)
SET m.genres = null
Show the schema of a database
CALL db.schema.visualization
Importing Large Datasets with Cypher
Data Importer can be used for small to medium datasets containing less than 1M rows
In Cypher, by default, the execution of your code is a single transaction. Break up the execution of the Cypher into multiple transactions. Reduces the amount of memory needed for the import. In Neo4j:
:auto USING PERIODIC COMMIT LOAD CSV ....
The advantage of performing the import in multiple passes is that you can check the graph after each import to see if it is getting closer to the data model. If the CSV file were extremely large, you might want to consider a single pass.