Summaries/Databases/Neo4j/Preparing for Importing Dat...

3.3 KiB

title updated created
Preparing for Importing Data 2022-07-31 11:38:21Z 2022-07-30 13:45:09Z

What does importing data mean?

Cypher has a built-in clause, LOAD CSV for importing CSV files. If you have a JSON or XML file, you must use the APOC library to import the data, but you can also import CSV with APOC. And the Neo4j Data Importer enables you to import CSV data without writing any Cypher code.

The types of data that you can store as properties in Neo4j include:

  • String
  • Long (integer values)
  • Double (decimal values)
  • Boolean
  • Date/Datetime
  • Point (spatial)
  • StringArray (comma-separated list of strings)
  • LongArray (comma-separated list of integer values)
  • DoubleArray (comma-separated list of decimal values)

Two ways that you can import CSV data:

  1. Using the Neo4j Data Importer.
  2. Writing Cypher code to perform the import.

Steps for preparing for importing data

  1. Understand the data in the source CSV files.
  2. Inspect and clean (if necessary) the data in the source data files.
  3. Create or understand the graph data model you will be implementing during the import.

Understanding the Source Data

CSV files, you must determine:

  • Whether the CSV file will have header information, describing the names of the fields.
  • What the delimiter will be for the fields in each row.

Including headers in the CSV file reduces syncing issues and is a recommended Neo4j best practice.

A Neo4j best practice is to use an ID as a unique property value for each node. If the IDs in your CSV file are not unique for the same entity (node), you will have problems when you load the data and try to create relationships between existing nodes.

Inspecting the Data for Import

Important: By default all of these fields in each row will be read in as string types. Use FIELDTERMINATOR is delimiter is not the default ','

Test if all rows in the csv file can be read. For example:

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing/ratings.csv'
AS row
RETURN count(row)

Is the data clean?

Check:

  • Are quotes used correctly?
  • If an element has no value will an empty string be used?
  • Are UTF-8 prefixes used (for example \uc)?
  • Do some fields have trailing spaces?
  • Do the fields contain binary zeros?
  • Understand how lists are formed (default is to use colon(:) as the separator.
  • Any obvious typos?

Overview of the Neo4j Data Importer

The benefit of the Data Importer is that you need not know Cypher to load the data. It is useful for loading small to medium CSV files that contain fewer that 1M rows. Data that is imported into the graph can be interpreted as string, integer, float, or boolean data.

Requirements for using the Data Importer

  • You must use CSV files for import.
  • CSV files must reside on your local system so you can load them into the graph app.
  • CSV data must be clean.
  • IDs must be unique for all nodes you will be creating.
  • The CSV file must have headers.
  • The DBMS must be started.

If you have de-normalized data, you will need to perform a multi-pass import. That is, you cannot create multiple nodes and relationship types from a single CSV file.

The Neo4j Data Importer can import or export mappings to a JSON file or to a ZIP file, if you also want to include the CSV files.