Summaries/Databases/Neo4j/Refactoring Imported Data.md

122 lines
2.6 KiB
Markdown
Raw Normal View History

2022-08-09 21:04:44 +02:00
---
title: Refactoring Imported Data
updated: 2022-07-31 20:32:04Z
created: 2022-07-31 17:34:56Z
---
# Transforming String Properties to Dates
### Converting to Date values
date(property)
- correct data format eq "yyyy-mm-dd"
- not empty
[Cypher: Temporal (Date/Time) values](https://neo4j.com/docs/cypher-manual/current/syntax/temporal/)
Test for empty string
```
MATCH (p:Person)
SET p.born = CASE p.born WHEN "" THEN null ELSE date(p.born) END
WITH p
SET p.died = CASE p.died WHEN "" THEN null ELSE date(p.died) END
```
List all stored **node** types in the database:
```
CALL apoc.meta.nodeTypeProperties()
```
List all stored **relation** types in the database:
```
CALL apoc.meta.relTypeProperties()
```
# Transforming Multi-value Properties
### Transform Strings to Lists
- list
- same type
Transform to list from string with seperator eq "|"
```
MATCH (m:Movie)
SET m.countries = split(coalesce(m.countries,""), "|")
```
Transform a multi-value property to a list of strings => StringArray in database
# Adding labels
Labels is a best practice so that key queries will perform better, especially when the graph is large.
```
MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p SET p:Actor
```
# Refactoring Properties as Nodes
Increase performance
For unique properties, like id's, create indexes.
Best practice is to always have a unique ID for every type of node in the graph.
View defined constraints in database
```
SHOW CONSTRAINTS
```
Before using MERGE, create first a unique constaint
```
CREATE CONSTRAINT Genre_name ON (g:Genre) ASSERT g.name IS UNIQUE
CREATE CONSTRAINT Genre_name IF NOT EXISTS ON (x:Genre) ASSERT x.name IS UNIQUE
```
Creating the new nodes from a node property
```
MATCH (m:Movie)
UNWIND m.genres AS genre
WITH m, genre
MERGE (g:Genre {name:genre})
MERGE (m)-[:IN_GENRE]->(g)
```
eq
```
unwind ['aap','olifant'] as a
return a
```
Removing a node property, set it to NULL
```
MATCH (m:Movie)
SET m.genres = null
```
Show the schema of a database
```
CALL db.schema.visualization
```
# Importing Large Datasets with Cypher
Data Importer can be used for small to medium datasets containing less than 1M rows
In Cypher, by default, the execution of your code is a single transaction. Break up the execution of the Cypher into multiple transactions. Reduces the amount of memory needed for the import. In Neo4j:
```
:auto USING PERIODIC COMMIT LOAD CSV ....
```
The advantage of performing the import in multiple passes is that you can check the graph after each import to see if it is getting closer to the data model. If the CSV file were extremely large, you might want to consider a single pass.