122 lines
2.6 KiB
Markdown
122 lines
2.6 KiB
Markdown
|
---
|
||
|
title: Refactoring Imported Data
|
||
|
updated: 2022-07-31 20:32:04Z
|
||
|
created: 2022-07-31 17:34:56Z
|
||
|
---
|
||
|
|
||
|
# Transforming String Properties to Dates
|
||
|
|
||
|
### Converting to Date values
|
||
|
|
||
|
date(property)
|
||
|
|
||
|
- correct data format eq "yyyy-mm-dd"
|
||
|
- not empty
|
||
|
[Cypher: Temporal (Date/Time) values](https://neo4j.com/docs/cypher-manual/current/syntax/temporal/)
|
||
|
|
||
|
Test for empty string
|
||
|
|
||
|
```
|
||
|
MATCH (p:Person)
|
||
|
SET p.born = CASE p.born WHEN "" THEN null ELSE date(p.born) END
|
||
|
WITH p
|
||
|
SET p.died = CASE p.died WHEN "" THEN null ELSE date(p.died) END
|
||
|
```
|
||
|
|
||
|
List all stored **node** types in the database:
|
||
|
|
||
|
```
|
||
|
CALL apoc.meta.nodeTypeProperties()
|
||
|
```
|
||
|
|
||
|
List all stored **relation** types in the database:
|
||
|
|
||
|
```
|
||
|
CALL apoc.meta.relTypeProperties()
|
||
|
```
|
||
|
|
||
|
# Transforming Multi-value Properties
|
||
|
|
||
|
### Transform Strings to Lists
|
||
|
|
||
|
- list
|
||
|
- same type
|
||
|
|
||
|
Transform to list from string with seperator eq "|"
|
||
|
|
||
|
```
|
||
|
MATCH (m:Movie)
|
||
|
SET m.countries = split(coalesce(m.countries,""), "|")
|
||
|
```
|
||
|
|
||
|
Transform a multi-value property to a list of strings => StringArray in database
|
||
|
|
||
|
# Adding labels
|
||
|
|
||
|
Labels is a best practice so that key queries will perform better, especially when the graph is large.
|
||
|
|
||
|
```
|
||
|
MATCH (p:Person)-[:ACTED_IN]->()
|
||
|
WITH DISTINCT p SET p:Actor
|
||
|
```
|
||
|
|
||
|
# Refactoring Properties as Nodes
|
||
|
|
||
|
Increase performance
|
||
|
For unique properties, like id's, create indexes.
|
||
|
Best practice is to always have a unique ID for every type of node in the graph.
|
||
|
View defined constraints in database
|
||
|
|
||
|
```
|
||
|
SHOW CONSTRAINTS
|
||
|
```
|
||
|
|
||
|
Before using MERGE, create first a unique constaint
|
||
|
|
||
|
```
|
||
|
CREATE CONSTRAINT Genre_name ON (g:Genre) ASSERT g.name IS UNIQUE
|
||
|
CREATE CONSTRAINT Genre_name IF NOT EXISTS ON (x:Genre) ASSERT x.name IS UNIQUE
|
||
|
```
|
||
|
|
||
|
Creating the new nodes from a node property
|
||
|
|
||
|
```
|
||
|
MATCH (m:Movie)
|
||
|
UNWIND m.genres AS genre
|
||
|
WITH m, genre
|
||
|
MERGE (g:Genre {name:genre})
|
||
|
MERGE (m)-[:IN_GENRE]->(g)
|
||
|
```
|
||
|
|
||
|
eq
|
||
|
|
||
|
```
|
||
|
unwind ['aap','olifant'] as a
|
||
|
return a
|
||
|
```
|
||
|
|
||
|
Removing a node property, set it to NULL
|
||
|
|
||
|
```
|
||
|
MATCH (m:Movie)
|
||
|
SET m.genres = null
|
||
|
```
|
||
|
|
||
|
Show the schema of a database
|
||
|
|
||
|
```
|
||
|
CALL db.schema.visualization
|
||
|
```
|
||
|
|
||
|
# Importing Large Datasets with Cypher
|
||
|
|
||
|
Data Importer can be used for small to medium datasets containing less than 1M rows
|
||
|
|
||
|
In Cypher, by default, the execution of your code is a single transaction. Break up the execution of the Cypher into multiple transactions. Reduces the amount of memory needed for the import. In Neo4j:
|
||
|
|
||
|
```
|
||
|
:auto USING PERIODIC COMMIT LOAD CSV ....
|
||
|
```
|
||
|
|
||
|
The advantage of performing the import in multiple passes is that you can check the graph after each import to see if it is getting closer to the data model. If the CSV file were extremely large, you might want to consider a single pass.
|