Summaries/Databases/Neo4j/Refactoring Imported Data.md

---
title: Refactoring Imported Data
updated: 2022-07-31 20:32:04Z
created: 2022-07-31 17:34:56Z
---

# Transforming String Properties to Dates

### Converting to Date values

date(property)

- correct data format eq "yyyy-mm-dd"
- not empty
  [Cypher: Temporal (Date/Time) values](https://neo4j.com/docs/cypher-manual/current/syntax/temporal/)

Test for empty string

```
MATCH (p:Person)
SET p.born = CASE p.born WHEN "" THEN null ELSE date(p.born) END
WITH p
SET p.died = CASE p.died WHEN "" THEN null ELSE date(p.died) END
```

List all stored **node** types in the database:

```
CALL apoc.meta.nodeTypeProperties()
```

List all stored **relation** types in the database:

```
CALL apoc.meta.relTypeProperties()
```

# Transforming Multi-value Properties

### Transform Strings to Lists

- list
- same type

Transform to list from string with seperator eq "|"

```
MATCH (m:Movie)
SET m.countries = split(coalesce(m.countries,""), "|")
```

Transform a multi-value property to a list of strings => StringArray in database

# Adding labels

Labels is a best practice so that key queries will perform better, especially when the graph is large.

```
MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p SET p:Actor
```

# Refactoring Properties as Nodes

Increase performance
For unique properties, like id's, create indexes.
Best practice is to always have a unique ID for every type of node in the graph.
View defined constraints in database

```
SHOW CONSTRAINTS
```

Before using MERGE, create first a unique constaint

```
CREATE CONSTRAINT Genre_name ON (g:Genre) ASSERT g.name IS UNIQUE
CREATE CONSTRAINT Genre_name IF NOT EXISTS ON (x:Genre) ASSERT x.name IS UNIQUE
```

Creating the new nodes from a node property

```
MATCH (m:Movie)
UNWIND m.genres AS genre
WITH m, genre
MERGE (g:Genre {name:genre})
MERGE (m)-[:IN_GENRE]->(g)
```

eq

```
unwind ['aap','olifant'] as a
return a
```

Removing a node property, set it to NULL

```
MATCH (m:Movie)
SET m.genres = null
```

Show the schema of a database

```
CALL db.schema.visualization
```

# Importing Large Datasets with Cypher

Data Importer can be used for small to medium datasets containing less than 1M rows

In Cypher, by default, the execution of your code is a single transaction. Break up the execution of the Cypher into multiple transactions. Reduces the amount of memory needed for the import. In Neo4j:

```
:auto USING PERIODIC COMMIT LOAD CSV ....
```

The advantage of performing the import in multiple passes is that you can check the graph after each import to see if it is getting closer to the data model. If the CSV file were extremely large, you might want to consider a single pass.
Init rest 2022-08-09 21:04:44 +02:00			`---`
			`title: Refactoring Imported Data`
			`updated: 2022-07-31 20:32:04Z`
			`created: 2022-07-31 17:34:56Z`
			`---`

			`# Transforming String Properties to Dates`

			`### Converting to Date values`

			`date(property)`

			`- correct data format eq "yyyy-mm-dd"`
			`- not empty`
			`[Cypher: Temporal (Date/Time) values](https://neo4j.com/docs/cypher-manual/current/syntax/temporal/)`

			`Test for empty string`

			```
			`MATCH (p:Person)`
			`SET p.born = CASE p.born WHEN "" THEN null ELSE date(p.born) END`
			`WITH p`
			`SET p.died = CASE p.died WHEN "" THEN null ELSE date(p.died) END`
			```

			`List all stored node types in the database:`

			```
			`CALL apoc.meta.nodeTypeProperties()`
			```

			`List all stored relation types in the database:`

			```
			`CALL apoc.meta.relTypeProperties()`
			```

			`# Transforming Multi-value Properties`

			`### Transform Strings to Lists`

			`- list`
			`- same type`

			`Transform to list from string with seperator eq "\|"`

			```
			`MATCH (m:Movie)`
			`SET m.countries = split(coalesce(m.countries,""), "\|")`
			```

			`Transform a multi-value property to a list of strings => StringArray in database`

			`# Adding labels`

			`Labels is a best practice so that key queries will perform better, especially when the graph is large.`

			```
			`MATCH (p:Person)-[:ACTED_IN]->()`
			`WITH DISTINCT p SET p:Actor`
			```

			`# Refactoring Properties as Nodes`

			`Increase performance`
			`For unique properties, like id's, create indexes.`
			`Best practice is to always have a unique ID for every type of node in the graph.`
			`View defined constraints in database`

			```
			`SHOW CONSTRAINTS`
			```

			`Before using MERGE, create first a unique constaint`

			```
			`CREATE CONSTRAINT Genre_name ON (g:Genre) ASSERT g.name IS UNIQUE`
			`CREATE CONSTRAINT Genre_name IF NOT EXISTS ON (x:Genre) ASSERT x.name IS UNIQUE`
			```

			`Creating the new nodes from a node property`

			```
			`MATCH (m:Movie)`
			`UNWIND m.genres AS genre`
			`WITH m, genre`
			`MERGE (g:Genre {name:genre})`
			`MERGE (m)-[:IN_GENRE]->(g)`
			```

			`eq`

			```
			`unwind ['aap','olifant'] as a`
			`return a`
			```

			`Removing a node property, set it to NULL`

			```
			`MATCH (m:Movie)`
			`SET m.genres = null`
			```

			`Show the schema of a database`

			```
			`CALL db.schema.visualization`
			```

			`# Importing Large Datasets with Cypher`

			`Data Importer can be used for small to medium datasets containing less than 1M rows`

			`In Cypher, by default, the execution of your code is a single transaction. Break up the execution of the Cypher into multiple transactions. Reduces the amount of memory needed for the import. In Neo4j:`

			```
			`:auto USING PERIODIC COMMIT LOAD CSV ....`
			```

			`The advantage of performing the import in multiple passes is that you can check the graph after each import to see if it is getting closer to the data model. If the CSV file were extremely large, you might want to consider a single pass.`