Neo4j ETL

概述

Neo4j Desktop 中的 Neo4j-ETL 用户界面
管理多个 RDBMS 连接
自动从关系型数据库中提取数据库元数据
推导图模型
可视化编辑标签、关系类型、属性名称及类型
将当前模型可视化为图形
将映射持久化为 JSON
从关系型数据库中检索相关的 CSV 数据
通过 neo4j-import、bolt-connector、cypher-shell、neo4j-shell 运行导入
内置 MySQL、PostgreSQL 支持，允许在 Neo4j 企业版中使用自定义 JDBC 驱动

许可证

本工具在 NEO4J 预发布许可协议下授权使用。

问题、反馈与贡献

您可以提交 GitHub Issues 或将反馈发送至 feedback@neo4j.com
也请加入我们的 neo4j-users Slack 并在 #help-import 频道中提问

下载与运行

下载并解压最新的 neo4j-etl.zip。

命令使用示例

最小命令行

./bin/neo4j-etl export \
 --rdbms:url <url> --rdbms:user <user> --rdbms:password <password> \
 --destination $NEO4J_HOME/data/databases/graph.db/ --import-tool $NEO4J_HOME/bin \
 --csv-directory $NEO4J_HOME/import

完整的命令行选项集

./bin/neo4j-etl export \
 --rdbms:url <url> --rdbms:user <user> --rdbms:password <password> --rdbms:schema <schema> \
 --using { bulk:neo4j-import | cypher:neo4j-shell | cypher:shell | cypher:direct | cypher:batch | cypher:fromSQL } \
 --neo4j:url <neo4j url> --neo4j:user <neo4j user> --neo4j:password <neo4j password> \
 --destination $NEO4J_HOME/data/databases/graph.db/ --import-tool $NEO4J_HOME/bin \
 --csv-directory $NEO4J_HOME/import --options-file import-tool-options.json --force --debug

用于 cypher:batch 和 cypher:fromSQL 导入模式的附加命令行选项

 --unwind-batch-size <value> (Batch size that will be used for unwind data) \
 --tx-batch-size <value> (Transaction Batch size that will be used for unwind commit) \

有关详细用法，请参阅：工具文档。

Neo4j-Desktop

您可以通过添加相应的应用程序密钥将 Neo4j ETL 添加到 Neo4j Desktop。请咨询您的 Neo4j 联系人或发送电子邮件至 devrel@neo4j.com

下次启动 Neo4j Desktop 时，您将看到 Neo4j ETL 作为一个交互式 UI 界面。

配置驱动程序

加载映射

编辑映射

导入数据

driver

load mapping

edit mapping

import data

如果您想测试对 Neo4j Desktop UI 的更改，可以尝试

进行必要的更改
执行 npm install && yarn start 以验证 UI 是否运行正常
在 Neo4j Desktop 中安装生产版 ETL 工具
替换 $DESKTOP 中的相关 JavaScript
重新运行 Neo4j Desktop 应用并验证 UI 和功能是否符合预期

JDBC 驱动程序

MySQL 和 PostgreSQL 的驱动程序已内置于 Neo4j-ETL 工具中。

要使用其他 JDBC 驱动程序，请使用这些下载链接和 JDBC URL。将 JDBC 驱动程序 jar 文件提供给命令行工具或 Neo4j-ETL 应用程序。并在 --rdbms:url 参数中或 JDBC-URL 输入字段中使用该 JDBC-URL。

数据库 JDBC-URL 驱动程序来源

数据库	JDBC-URL	驱动程序来源
Oracle	`jdbc:oracle:thin:<user>/<pass>@<host>:<port>/<service_name>`	Oracle JDBC 驱动程序
MS SQLServer	`jdbc:sqlserver://;servername=<servername>;databaseName=<database>;user=<user>;password=<pass>`	SQLServer 驱动程序
IBM DB2	`jdbc:db2://<host>:<port/5021>/<database>:user=<user>;password=<pass>;`	DB2 驱动程序
Derby	`jdbc:derby:derbyDB`	JDK6 起已包含
Cassandra	`jdbc:cassandra://<host>:<port/9042>/<database>`	Cassandra JDBC 封装器
SAP Hana	`jdbc:sap://<host>:<port/39015>/?user=<user>&password=<pass>`	SAP Hana ngdbc 驱动程序
MySQL	`jdbc:mysql://<hostname>:<port/3306>/<database>?user=<user>&password=<pass>`	MySQL 驱动程序
PostgreSQL	`jdbc:postgresql://<hostname>/<database>?user=<user>&password=<pass>`	PostgreSQL JDBC 驱动程序

Oracle

jdbc:oracle:thin:<user>/<pass>@<host>:<port>/<service_name>

Oracle JDBC 驱动程序

MS SQLServer

jdbc:sqlserver://;servername=<servername>;databaseName=<database>;user=<user>;password=<pass>

SQLServer 驱动程序

IBM DB2

jdbc:db2://<host>:<port/5021>/<database>:user=<user>;password=<pass>;

DB2 驱动程序

Derby

jdbc:derby:derbyDB

JDK6 起已包含

Cassandra

jdbc:cassandra://<host>:<port/9042>/<database>

Cassandra JDBC 封装器

SAP Hana

jdbc:sap://<host>:<port/39015>/?user=<user>&password=<pass>

SAP Hana ngdbc 驱动程序

MySQL

jdbc:mysql://<hostname>:<port/3306>/<database>?user=<user>&password=<pass>

MySQL 驱动程序

PostgreSQL

jdbc:postgresql://<hostname>/<database>?user=<user>&password=<pass>

PostgreSQL JDBC 驱动程序

简介

Neo4j ETL，特别是 neo4j-etl 命令行工具，可用于将建模良好（即规范化）的关系数据导入 Neo4j。它应用了一些简单的规则来转换关系模型。

流程概述如下

读取数据库元数据并生成 mapping.json
通过 Neo4j Desktop 中的 neo4j-etl-ui 选择性地编辑 mapping.json
将关系数据导出为 CSV
生成映射头信息
导入 Neo4j，使用
1. 用于初始离线批量加载的 neo4j-import 工具
2. 用于增量离线批量加载的 neo4j-shell 工具
3. 用于增量在线单事务加载的 cypher-shell 工具
4. 用于增量在线批处理加载的 java bolt driver

架构图

它是什么

命令行工具
Java API/库
推断架构并保存到映射文件中
过滤和合并策略
读取映射文件以从其他数据库导出数据，然后
通过不同工具（neo4j-import, neo4j-shell, cypher-shell, java bolt driver）导入 Neo
支持离线和在线模式工作
支持在空图（初始加载）和非空图（增量）中导入
构建索引和约束
支持类 Unix 系统和微软操作系统
支持大多数主流关系数据库，如 MySQL、PostgreSQL、Oracle 和 Microsoft SQL
支持用户指定的 JDBC 驱动程序
用于可视化修改映射的 UI 工具

未来计划

针对名称、数据、链接的自定义映射规则 + 转换
集成到第三方 ETL 管道中的示例
更多数据类型（二进制、日期时间、地理位置）

适用对象

学习使用 Neo4j 进行初始数据导入的开发者
提供 Neo4j 数据集成方案的合作伙伴
基于良好建模的关系数据构建应用的企业级开发者

悬而未决的问题

日期和二进制数据类型
安全性（安全连接、密码处理、数据加密）

neo4j-etl 命令行工具

这是您用来从关系型数据库中检索和映射元数据，并驱动从关系库导出及向 Neo4j 数据库导入的命令行工具。

通过图形用户界面，您可以预览生成的图形数据模型，并最终通过更改标签、属性名称、关系类型和属性类型进行调整。

它支持所有带有 JDBC 驱动程序的关系数据库，如 MySQL、PostgreSQL、Oracle 和 Microsoft SQL。

您可以从 GitHub 获取导入工具的最新版本。

下载并解压缩操作系统特定的 zip / tar.gz 后，您还需要下载适当的 JDBC 驱动程序并将其添加到 lib 文件夹中。

您可以按照下表中的相应链接下载适当的驱动程序 jar 文件

供应商	JDBC 驱动程序 URL
MySql	https://dev.mysqlserver.cn/downloads/connector/j/
PostgreSql	https://jdbc.postgresql.ac.cn/download.html
Oracle	http://www.oracle.com/technetwork/database/features/jdbc/default-2280470.html
Microsoft SQL Server	https://www.microsoft.com/en-us/download/details.aspx?id=55539

供应商

JDBC 驱动程序 URL

MySql

https://dev.mysqlserver.cn/downloads/connector/j/

PostgreSql

https://jdbc.postgresql.ac.cn/download.html

Oracle

http://www.oracle.com/technetwork/database/features/jdbc/default-2280470.html

Microsoft SQL Server

https://www.microsoft.com/en-us/download/details.aspx?id=55539

对于超大型数据库，请确保有足够的磁盘空间用于 CSV 导出和 Neo4j 数据存储，并拥有足够的 RAM 和 CPU 以快速完成导入。

可用命令

'generate-metadata-mapping' 命令

NAME
        neo4j-etl generate-metadata-mapping - Create RDBMS to Neo4j metadata
        mapping Json.

SYNOPSIS
        neo4j-etl generate-metadata-mapping
                [ {--columns | --cols} <Columns>... ]
                [ --config-file <Configuration File> ]
                [ {-d | --database} <RDBMS database> ] [ --debug ]
                [ --delimiter <delimiter> ] [ {--driver | --jars} <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>... ]
                [ {--exclusion-mode | --exc} <Tables: exclude|include|none(default)> ]
                [ {--exclusion-mode-column-type | --exctype} <exclude|include|none(default)> ]
                [ {--exclusion-mode-columns | --excc} <exclude|include|none(default)> ]
                [ {--exclusion-mode-tables | --exct} <exclude|include|none(default)> ]
                [ --options-file <option file> ] [ --output-mapping-file <file|stdout> ]
                [ {-p | --port} <RDBMS port> ] [ --quote <quote> ]
                [ {--rdbms:fetch-size | --fs} <fetch-size> ]
                [ {--rdbms:password | --password} <RDBMS password> ]
                [ {--rdbms:schema | -s | --schema} <schema> ]
                [ {--rdbms:url | --url} <RDBMS url> ]
                [ {--rdbms:user | -u | --user} <RDBMS user> ]
                [ {--relationship-name | --rel-name} <table(default)|column> ]
                [ --schemas <Schemas>... ] [ {--tables | --tabs} <Tables>... ]
                [ --tiny-int <byte(default)|boolean> ] [ --types <Types>... ] [--] [ <table1 table2 ...>... ]

OPTIONS
        --columns <Columns>, --cols <Columns>
            Lists all columns to include/exclude by name or pattern
            Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.orders\..*_id' or
            'northwind\.orders\..*_id' ,
            '-g' <PATTERN> for grep syntax, ex. '-g .*\.orders\..*_id' or
            'northwind\.orders\..*_id' ,
            or '-l' <LIST> to list all columns names ex. '-l
            northwind.customers.id,northwind.purchase.id,northwind.orders.id'

        --config-file <Configuration File>
            Specify the path to a file containing the configuration for the
            selected command

        -d <RDBMS database>, --database <RDBMS database>
            RDBMS database.

            This option is required if any of the following options are
            specified: host


        --debug
            Print detailed diagnostic output.

        --delimiter <delimiter>
            Delimiter to separate fields in CSV.

        --driver <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>, --jars <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>
            List of additional drivers as a list

        --exclusion-mode <Tables: exclude|include|none(default)>, --exc <Tables: exclude|include|none(default)>
            Specifies how to handle table exclusion. Options are mutually
            exclusive.
            exclude: Excludes specified tables from the process. All other
            tables will be included.
            include: Includes specified tables only. All other tables will be
            excluded.
            none: All tables are included in the process.

        --exclusion-mode-column-type <exclude|include|none(default)>, --exctype
        <exclude|include|none(default)>
            Specifies how to handle column type exclusion. Options are mutually
            exclusive.
            exclude: Excludes specified columns types from the process. All
            other columns types will be included.
            include: Includes specified columns types only. All other columns
            types will be excluded.
            none: All columns types are included in the process.

        --exclusion-mode-columns <exclude|include|none(default)>, --excc
        <exclude|include|none(default)>
            Specifies how to handle column exclusion. Options are mutually
            exclusive.
            exclude: Excludes specified columns from the process. All other
            columns will be included.
            include: Includes specified columns only. All other columns will be
            excluded.
            none: All columns are included in the process.

        --exclusion-mode-tables <exclude|include|none(default)>, --exct
        <exclude|include|none(default)>
            Specifies how to handle table exclusion. Options are mutually
            exclusive.
            exclude: Excludes specified tables from the process. All other
            tables will be included.
            include: Includes specified tables only. All other tables will be
            excluded.
            none: All tables are included in the process.

        --options-file <option file>
            Path to file containing Neo4j import tool options.

        --output-mapping-file <file|stdout>
            Path to the output metadata mapping file.

        -p <RDBMS port>, --port <RDBMS port>
            Port number to use for connection to RDBMS.

        --quote <quote>
            Character to treat as quotation character for values in CSV data.

        --rdbms:fetch-size <fetch-size>, --fs <fetch-size>
            RDBMS Fetch size

        --rdbms:password <RDBMS password>, --password <RDBMS password>
            Password for login to RDBMS.

            This option is required if any of the following options are
            specified: --rdbms:url, --url


        --rdbms:schema <schema>, -s <schema>, --schema <schema>
            RDBMS schema.

        --rdbms:url <RDBMS url>, --url <RDBMS url>
            Url to use for connection to RDBMS.

        --rdbms:user <RDBMS user>, -u <RDBMS user>, --user <RDBMS user>
            User for login to RDBMS.

            This option is required if any of the following options are
            specified: --rdbms:url, --url


        --relationship-name <table(default)|column>, --rel-name
        <table(default)|column>
            Specifies whether to get the name for relationships from table names
            or column names.

        --schemas <Schemas>
            Lists all schemas to include by name or pattern.
            Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.north.*',
            '-g' <PATTERN> for grep syntax, ex. '-g .*\.north.*' ,
            or '-l' <LIST> to list all schemas names ex. '-l northwind,exc'

        --tables <Tables>, --tabs <Tables>
            Lists all tables to include/exclude by name or pattern.
            Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.purchase.*' or
            'northwind.purchase.*' ,
            '-g' <PATTERN> for grep syntax, ex. '-g .*\.purchase.*' or
            'northwind.purchase.*' ,
            or '-l' <LIST> to list all tables names ex. '-l
            customers,purchase,orders'

        --tiny-int <byte(default)|boolean>
            Specifies whether to convert TinyInt to byte or boolean

        --types <Types>
            Lists all column types to include/exclude by name separated by
            commas. Valid values:
            unknown,
            binary,
            bit,
            character,
            id,
            integer,
            real,
            reference,
            temporal,
            url,
            xml,
            large_object,
            object;

        --
            This option can be used to separate command-line options from the
            list of arguments (useful when arguments might be mistaken for
            command-line options)

        <table1 table2 ...>
            Tables to be excluded/included

'export' 命令

NAME
        neo4j-etl export - Export from RDBMS and import into NEO4J via CSV
        files.

SYNOPSIS
        neo4j-etl export [ {--columns | --cols} <Columns>... ]
                [ --config-file <Configuration File> ]
                [ --csv-directory <csv directory> ]
                [ {-d | --database} <RDBMS database> ] [ --debug ]
                [ --delimiter <delimiter> ] [ --destination <directory> ] [ {--driver | --jars} <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>... ]
                [ {--exclusion-mode | --exc} <Tables: exclude|include|none(default)> ]
                [ {--exclusion-mode-column-type | --exctype} <exclude|include|none(default)> ]
                [ {--exclusion-mode-columns | --excc} <exclude|include|none(default)> ]
                [ {--exclusion-mode-tables | --exct} <exclude|include|none(default)> ]
                [ --force ] [ --import-tool <import tool path> ]
                [ --mapping-file <file|stdin> ] [ {--neo4j:password | --graph:password | --graph:neo4j:password} <neo4j password> ]
                [ {--neo4j:url | --graph:url | --graph:neo4j:url} <neo4j url> ]
                [ {--neo4j:user | --graph:user | --graph:neo4j:user} <neo4j user> ]
                [ --options-file <option file> ] [ --output-mapping-file <file|stdout> ]
                [ {-p | --port} <RDBMS port> ] [ --quote <quote> ]
                [ {--rdbms:fetch-size | --fs} <fetch-size> ]
                [ {--rdbms:password | --password} <RDBMS password> ]
                [ {--rdbms:schema | -s | --schema} <schema> ]
                [ {--rdbms:url | --url} <RDBMS url> ]
                [ {--rdbms:user | -u | --user} <RDBMS user> ]
                [ {--relationship-name | --rel-name} <table(default)|column> ]
                [ --schemas <Schemas>... ] [ {--tables | --tabs} <Tables>... ]
                [ --tiny-int <byte(default)|boolean> ] [ --types <Types>... ]
                [ --using <import tool> ] [--] [ <table1 table2 ...>... ]

OPTIONS
        --columns <Columns>, --cols <Columns>
            Lists all columns to include/exclude by name or pattern
            Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.orders\..*_id' or
            'northwind\.orders\..*_id' ,
            '-g' <PATTERN> for grep syntax, ex. '-g .*\.orders\..*_id' or
            'northwind\.orders\..*_id' ,
            or '-l' <LIST> to list all columns names ex. '-l
            northwind.customers.id,northwind.purchase.id,northwind.orders.id'

        --config-file <Configuration File>
            Specify the path to a file containing the configuration for the
            selected command

        --csv-directory <csv directory>
            Path to directory for intermediate CSV files.

        -d <RDBMS database>, --database <RDBMS database>
            RDBMS database.

            This option is required if any of the following options are
            specified: host


        --debug
            Print detailed diagnostic output.

        --delimiter <delimiter>
            Delimiter to separate fields in CSV.

        --destination <directory>
            Path to destination store directory.

        --driver <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>, --jars <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>
            List of additional drivers as a list

        --exclusion-mode <Tables: exclude|include|none(default)>, --exc <Tables: exclude|include|none(default)>
            Specifies how to handle table exclusion. Options are mutually
            exclusive.
            exclude: Excludes specified tables from the process. All other
            tables will be included.
            include: Includes specified tables only. All other tables will be
            excluded.
            none: All tables are included in the process.

        --exclusion-mode-column-type <exclude|include|none(default)>, --exctype
        <exclude|include|none(default)>
            Specifies how to handle column type exclusion. Options are mutually
            exclusive.
            exclude: Excludes specified columns types from the process. All
            other columns types will be included.
            include: Includes specified columns types only. All other columns
            types will be excluded.
            none: All columns types are included in the process.

        --exclusion-mode-columns <exclude|include|none(default)>, --excc
        <exclude|include|none(default)>
            Specifies how to handle column exclusion. Options are mutually
            exclusive.
            exclude: Excludes specified columns from the process. All other
            columns will be included.
            include: Includes specified columns only. All other columns will be
            excluded.
            none: All columns are included in the process.

        --exclusion-mode-tables <exclude|include|none(default)>, --exct
        <exclude|include|none(default)>
            Specifies how to handle table exclusion. Options are mutually
            exclusive.
            exclude: Excludes specified tables from the process. All other
            tables will be included.
            include: Includes specified tables only. All other tables will be
            excluded.
            none: All tables are included in the process.

        --force
            Force delete destination store directory if it already exists.

        --import-tool <import tool path>
            Path to directory containing Neo4j import tool. Mandatory only for Offline Bulk Import mode

        --mapping-file <file|stdin>
            Path to an existing metadata mapping file. The name 'stdin' will
            cause the CSV resources definitions to be read from standard input.

        --neo4j:password <neo4j password>, --graph:password <neo4j password>,
        --graph:neo4j:password <neo4j password>
            Password for login to Neo4j.

        --neo4j:url <neo4j url>, --graph:url <neo4j url>, --graph:neo4j:url
        <neo4j url>
            Url to use for connection to Neo4j.

        --neo4j:user <neo4j user>, --graph:user <neo4j user>, --graph:neo4j:user
        <neo4j user>
            User for login to Neo4j.

        --options-file <option file>
            Path to file containing Neo4j import tool options.

        --output-mapping-file <file|stdout>
            Path to the output metadata mapping file.

        -p <RDBMS port>, --port <RDBMS port>
            Port number to use for connection to RDBMS.

        --quote <quote>
            Character to treat as quotation character for values in CSV data.

        --rdbms:fetch-size <fetch-size>, --fs <fetch-size>
            RDBMS Fetch size

        --rdbms:password <RDBMS password>, --password <RDBMS password>
            Password for login to RDBMS.

            This option is required if any of the following options are
            specified: --rdbms:url, --url


        --rdbms:schema <schema>, -s <schema>, --schema <schema>
            RDBMS schema.

        --rdbms:url <RDBMS url>, --url <RDBMS url>
            Url to use for connection to RDBMS.

        --rdbms:user <RDBMS user>, -u <RDBMS user>, --user <RDBMS user>
            User for login to RDBMS.

            This option is required if any of the following options are
            specified: --rdbms:url, --url


        --relationship-name <table(default)|column>, --rel-name
        <table(default)|column>
            Specifies whether to get the name for relationships from table names
            or column names.

        --schemas <Schemas>
            Lists all schemas to include by name or pattern.
            Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.north.*',
            '-g' <PATTERN> for grep syntax, ex. '-g .*\.north.*' ,
            or '-l' <LIST> to list all schemas names ex. '-l northwind,exc'

        --tables <Tables>, --tabs <Tables>
            Lists all tables to include/exclude by name or pattern.
            Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.purchase.*' or
            'northwind.purchase.*' ,
            '-g' <PATTERN> for grep syntax, ex. '-g .*\.purchase.*' or
            'northwind.purchase.*' ,
            or '-l' <LIST> to list all tables names ex. '-l
            customers,purchase,orders'

        --tiny-int <byte(default)|boolean>
            Specifies whether to convert TinyInt to byte or boolean

        --types <Types>
            Lists all column types to include/exclude by name separated by
            commas. Valid values:
            unknown,
            binary,
            bit,
            character,
            id,
            integer,
            real,
            reference,
            temporal,
            url,
            xml,
            large_object,
            object;

        --using <import tool>
            Import tool that will be used to load data into neo4j.

        --
            This option can be used to separate command-line options from the
            list of arguments (useful when arguments might be mistaken for
            command-line options)

        <table1 table2 ...>
            Tables to be excluded/included

参数用法

There are two ways for write Etl parameters:

1) write parameters in command line:

    $NEO4J_HOME/bin/neo4j-etl export|generate-metadata-mapping
      --rdbms:url jdbc:oracle:thin:@localhost:49161:XE
      --rdbms:user northwind --rdbms :password northwind
      --rdbms:schema northwind
      --using bulk:neo4j-import
      --import-tool $NEO4J_HOME/bin
      --csv-directory /tmp/northwind
      --options-file /tmp/northwind/options.json
      --quote '"' --force
      ...

2) use a config file:

    $NEO4J_HOME/bin/neo4j-etl export|generate-metadata-mapping  \
    --config-file <path to .config file>

    Above there is an Example of config file.

#EXAMPLE - ETL CONFIG FILE

#RDBMS
rdbms-url=url
rdbms-schema=schema
rdbms-password=neo4j
rdbms-user=neo4j
rdbms-fetch-size=10000

#NEO4J
using=cypher:direct
neo4j-url=bolt://127.0.0.1:7687
neo4j-user=neo4j
neo4j-password=neo4j

#RULES
exclusion-mode-tables=INCLUDE
tables=-l table1,table2,...
exclusion-mode-columns=INCLUDE
columns=-l column1,column2,...
exclusion-mode-column-types=EXCLUDE
column-types=type1,type2,...

#MISC
output-mapping-file=path_to_output_mapping_file
import-tool=path_to_import_tool
csv-directory=path_to_directory
mapping-file=path_to_file

debug=false

示例会话：基础

生成元数据映射

export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0

mkdir -p /tmp/northwind

$NEO4J_HOME/bin/neo4j-etl generate-metadata-mapping \
  --rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
  --rdbms:user northwind --rdbms:password northwind \
  --rdbms:schema northwind --output-mapping-file /tmp/northwind/mapping.json

多数据库支持

Neo4j 4.0 企业版具有多租户支持，为了支持此功能，我们在导入视图中添加了一个选择选项框，如下所示

如果您使用命令行工具，可以通过传递 --neo4j:database-name 选项来指定目标数据库，方式如下

echo '{ "multiline-fields" : "true" }' > /tmp/northwind/options.json

$NEO4J_HOME/bin/neo4j-etl export  \
  --rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
  --rdbms:user northwind --rdbms:password northwind \
  --rdbms:schema northwind \
  --using cypher:direct | cypher:batch | cypher:fromSQL \
  --neo4j:url bolt://:7687 \
  --neo4j:user neo4j
  --neo4j:password neo4j \
  --neo4j:database-name myDatabase \
  --import-tool $NEO4J_HOME/bin  \
  --csv-directory /tmp/northwind \
  --options-file /tmp/northwind/options.json \
  --quote '"' --force

通过 `neo4j-import` 工具进行离线批量导入以实现初始加载（Neo4j 数据库必须为空）

注意

请考虑到由于 neo4j-admin import 取消了对 JSON 文件作为选项源的支持，如果数据库涉及大量实体，导致命令行过长，我们可能会遇到问题。

echo '{ "multiline-fields" : "true" }' > /tmp/northwind/options.json

$NEO4J_HOME/bin/neo4j-etl export  \
  --rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
  --rdbms:user northwind --rdbms :password northwind \
  --rdbms:schema northwind \
  --using bulk:neo4j-import \
  --import-tool $NEO4J_HOME/bin  \
  --csv-directory /tmp/northwind \
  --options-file /tmp/northwind/options.json \
  --quote '"' --force

测试离线批量导入结果

$NEO4J_HOME/bin/neo4j-shell -path $NEO4J_HOME/data/databases/graph.db/ -c 'MATCH (n) RETURN labels(n), count(*);'

 +--------------------------+
 | labels(n)     | count(*) |
 +--------------------------+
 | ["Shipper"]   | 3        |
 | ["Employee"]  | 9        |
 | ["Region"]    | 4        |
 | ["Customer"]  | 93       |
 | ["Territory"] | 53       |
 | ["Product"]   | 77       |
 | ["Supplier"]  | 29       |
 | ["Order"]     | 830      |
 | ["Category"]  | 8        |
 +--------------------------+
 9 rows

通过 `java-bolt-driver` 进行在线批处理导入以实现增量加载（Neo4j 可以已经包含数据）

它可以以 3 种模式完成

'cypher:direct' (使用 LOAD CSV)
'cypher:batch' (以与 cypher:direct 相同的方式创建 CSV 文件，但将其转换为 UNWIND 语句而非 LOAD CSV)
'cypher:fromSQL' (将 RDBMS 的表格结果数据进行转换，而不创建 CSV 文件)

在通过 java-bolt-diver 进行的在线模式下，为了允许从临时文件夹导入 CSV，必须

启用属性 dbms.security.allow_csv_import_from_file_urls 并设置为 true，以允许 LOAD CSV 从外部资源读取 CSV 文件
移除或注释掉属性 dbms.directories.import=import，以防止 Neo4j 在默认导入文件夹中搜索 CSV 文件

对于 Neo4j 远程实例，您必须使用 cypher:batch 导入模式

echo '{ "multiline-fields" : "true" }' > /tmp/northwind/options.json

$NEO4J_HOME/bin/neo4j-etl export  \
  --rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
  --rdbms:user northwind --rdbms:password northwind \
  --rdbms:schema northwind \
  --using cypher:direct | cypher:batch | cypher:fromSQL \
  --neo4j:url bolt://:7687 \
  --neo4j:user neo4j --neo4j:password neo4j \
  --import-tool $NEO4J_HOME/bin  \
  --csv-directory /tmp/northwind \
  --options-file /tmp/northwind/options.json \
  --quote '"' --force

Additional command line options for `cypher:batch` and `cypher:fromSQL` import modes:

 --unwindBatchSize <value> (Batch size that will be used for unwind data) \
 --txBatchSize <value> (Transaction Batch size that will be used for unwind commit) \

测试在线增量批处理导入结果

$NEO4J_HOME/bin/cypher-shell -a bolt://:7687 -u neo4j -p neo4j 'MATCH (n) RETURN labels(n), count(*);'

 +--------------------------+
 | labels(n)     | count(*) |
 +--------------------------+
 | ["Shipper"]   | 3        |
 | ["Employee"]  | 9        |
 | ["Region"]    | 4        |
 | ["Customer"]  | 93       |
 | ["Territory"] | 53       |
 | ["Product"]   | 77       |
 | ["Supplier"]  | 29       |
 | ["Order"]     | 830      |
 | ["Category"]  | 8        |
 +--------------------------+
 9 rows

示例会话：Docker + Northwind

此示例会话基于 Northwind 示例数据集。

DDL 脚本可在此处获得

MySQL

下载、启动并配置带有 MySQL 的 docker 容器

docker pull mysql

docker run --name neo4j-etl-mysql -e MYSQL_ROOT_PASSWORD=admin -e MYSQL_DATABASE=northwind -e MYSQL_USER=neo4j -e MYSQL_PASSWORD=neo4j -d -p 3306:3306 mysql:latest

docker exec -it neo4j-etl-mysql bash
root@eb6f279fdb88:/# mysql -u root -p
Enter password: admin
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant all privileges on *.* to 'neo4j'@'%' with grant option;
Query OK, 0 rows affected (0.00 sec)

mysql> quit;
Bye
root@bf99fbc0d31c:/# exit
exit

通过以下 sql 脚本加载数据库：https://raw.githubusercontent.com/neo4j-contrib/neo4j-etl/master/neo4j-etl-it/src/main/resources/scripts/mysql/northwind.sql

export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0

mkdir -p /tmp/northwind

echo '{ "multiline-fields" : "true" }' > /tmp/northwind/options.json

./bin/neo4j-etl export \
  --rdbms:url jdbc:mysql://:5433/northwind?autoReconnect=true&useSSL=false \
  --rdbms:user neo4j --rdbms:password neo4j \
  --import-tool $NEO4J_HOME/bin \
  --options-file /tmp/northwind/options.json \
  --csv-directory /tmp/northwind \
  --destination $NEO4J_HOME/data/databases/graph.db/ \
  --quote '"' --force

PostgreSQL

下载、启动并配置带有 PostgreSQL 9.6.2 的 docker 容器

docker pull postgres

docker run --name neo4j-etl-postgres -e POSTGRES_USER=neo4j -e POSTGRES_PASSWORD=neo4j -d -p 5433:5432 postgres

docker run -it --rm --link neo4j-etl-postgres:postgres postgres psql -h postgres -U neo4j
Password for user neo4j:
psql (9.6.2)
Type "help" for help.

neo4j=# DROP DATABASE IF EXISTS northwind;
neo4j=# CREATE DATABASE northwind WITH OWNER 'neo4j' ENCODING 'UTF8' LC_COLLATE = 'en_US.utf8' LC_CTYPE = 'en_US.utf8';
neo4j=# \q

通过以下 sql 脚本加载数据库：northwind.sql

export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0

mkdir -p /tmp/northwind

echo '{"multiline-fields":"true"}' > /tmp/northwind/options.json

./bin/neo4j-etl export \
  --rdbms:url jdbc:postgresql://:5433/northwind?ssl=false \
  --rdbms:user neo4j --rdbms:password neo4j \
  --import-tool $NEO4J_HOME/bin \
  --options-file /tmp/northwind/options.json \
  --csv-directory /tmp/northwind \
  --destination $NEO4J_HOME/data/databases/graph.db/ \
  --quote '"' --force

Oracle

下载、启动并配置带有 Oracle XE 11g 的 docker 容器

docker pull wnameless/oracle-xe-11g

docker run --name neo4j-etl-oracle -d -p 49160:22 -p 49161:1521 wnameless/oracle-xe-11g

ssh root@localhost -p 49160
root@localhost's password: admin

Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.9.13-moby x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
Last login: Mon May  1 17:32:48 2017 from 172.17.0.1

root@692c446a274b:~# wget https://raw.githubusercontent.com/neo4j-contrib/neo4j-etl/master/neo4j-etl-it/src/main/resources/scripts/oracle/northwind.sql

root@692c446a274b:~# sqlplus system/oracle

SQL> CREATE USER northwind IDENTIFIED BY northwind;
SQL> GRANT DBA TO northwind;
SQL> CONN northwind/northwind;
SQL> SET sqlblanklines ON;
SQL> @northwind.sql
SQL> quit;

root@692c446a274b:~# exit

export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0

mkdir -p /tmp/northwind

echo '{"multiline-fields":"true"}' > /tmp/northwind/options.json

./bin/neo4j-etl export \
  --rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
  --rdbms:user northwind --rdbms:password northwind \
  --rdbms:schema northwind \
  --import-tool $NEO4J_HOME/bin \
  --options-file /tmp/northwind/options.json \
  --csv-directory /tmp/northwind \
  --destination $NEO4J_HOME/data/databases/graph.db/ \
  --quote '"' --force
  --driver /tmp/ojdbc6-11.2.0.3.jar

Microsoft SQL

下载、启动并配置带有 Microsoft SQL Server 的 docker 容器

docker run --name neo4j-etl-mssql -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=Passw0rd!' -p 1433:1433 -d microsoft/mssql-server-linux

如果您想连接到 Microsoft SQL 客户端控制台，可以运行以下命令

docker exec -it neo4j-etl-mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P 'Passw0rd!' -d <DATABASE>

export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0

mkdir -p /tmp/wideworldimporters

echo '{"multiline-fields":"true"}' > /tmp/wideworldimporters/options.json

./bin/neo4j-etl export \
    --rdbms:password "Passw0rd!" \
    --rdbms:user sa \
    --rdbms:url "jdbc:sqlserver://:1433;databaseName=WideWorldImporters" \
    --import-tool $NEO4J_HOME/bin \
    --options-file /tmp/wideworldimporters/options.json \
    --csv-directory /tmp/wideworldimporters \
    --destination $NEO4J_HOME/data/databases/graph.db/ \
    --driver /tmp/mssql-jdbc-6.1.0.jre8.jar \

如何将 World Wide Importers 数据库导入到 MS SQL Server Docker 实例中

# Create docker instance for MS-SQL Server
docker run --name mssql-etl \
-e MSSQL_COLLATION=Latin1_General_100_CI_AS \
-e 'ACCEPT_EULA=Y' \
-e 'SA_PASSWORD=<YOUR_PASSWORD>' \
-p 1433:1433 \
-v /tmp:/tmp \
-d microsoft/mssql-server-linux:2017-latest

# Download World Wide Importers backup file
wget https://github.com/Microsoft/sql-server-samples/releases/download/wide-world-importers-v1.0/WideWorldImporters-Full.bak

# Create a backup directory
sudo docker exec -it mssql-etl mkdir /var/opt/mssql/backup

# Load backup file into the container
sudo docker cp WideWorldImporters-Full.bak mssql-etl:/var/opt/mssql/backup

# Restore Wide World Importers database
sudo docker exec -it mssql-etl /opt/mssql-tools/bin/sqlcmd \
-S localhost \
-U SA \
-P '<YOUR_PASSWORD>' \
-Q 'RESTORE FILELISTONLY FROM DISK = "/var/opt/mssql/backup/WideWorldImporters-Full.bak"' \
| tr -s ' ' \
| cut -d ' ' -f 1-2

sudo docker exec -it mssql-etl /opt/mssql-tools/bin/sqlcmd \
-S localhost \
-U SA \
-P '<YOUR_PASSWORD>' \
-Q 'RESTORE DATABASE WideWorldImporters FROM DISK = "/var/opt/mssql/backup/WideWorldImporters-Full.bak" WITH MOVE "WWI_Primary" TO "/var/opt/mssql/data/WideWorldImporters.mdf", MOVE "WWI_UserData" TO "/var/opt/mssql/data/WideWorldImporters_userdata.ndf", MOVE "WWI_Log" TO "/var/opt/mssql/data/WideWorldImporters.ldf", MOVE "WWI_InMemory_Data_1" TO "/var/opt/mssql/data/WideWorldImporters_InMemory_Data_1"'

功能

使用映射规则推断架构 (generate-metadata-mapping)

基于以下规则的通用关系数据库映射
带有外键的表被视为连接，并作为带有关系的节点导入
例：Person -> Address 被导入为 (Person)-[:ADDRESS_ID]->(Address)
带有两个外键的表被视为 连接表 (JoinTable)，并作为关系导入
例：Student <- Student_Course -> Course 被导入为
(Student) -[:STUDENT_COURSE]-> (Course)
具有超过两个外键的表被视为 中间节点，并作为 带有多个关系的节点 导入
例：Order_Detail -> Shipping_Address, Order_Detail -> Payment_Information, Order_Detail -> Shipment_Instructions 被导入为

(Shipping_Address) -[:SHIPPING]-> (Order_Detail)
(Payment_Information) -[:PAYMENT]-> (Order_Detail)
(Shipment_Instructions) -[:SHIPMENT]-> (Order_Detail)

通过复合键解析关系。
支持大多数数据类型。
- TinyInt 可以作为 Byte 或 Boolean 导入 （这是为了支持 MySQL 中保存为 TinyInt 的布尔值）
- 日期 (Dates) 被导入为 字符串 (String)
- Blobs 在导入时会被跳过，直到导入工具支持二进制数组数据。
- Decimal 类型待确认。
关系名称可以使用列名或 所引用的表名
- --relationship-name=table 时，Person->Address 将变为 (Person)-[:ADDRESS]->(Address)
- --relationship-name=column 将变为 (Person)-[:ADDRESS_ID]->(Address)
使用 --include 和 --exclude 过滤您想要包含或排除的表
待办事项：使用 --include 和 --exclude 过滤您想要包含或排除的列
待办事项：根据需要使用 <TBA> 标志保留自然键（标记为 PrimaryKeys 和 ForeignKeys）
- 外键通常用于在 2 个节点之间创建关系，而无需保存为属性。
- 使用此标志，节点将保留该值作为属性。
- 例：贷款具有贷款申请人的 SSN，这通常用于连接 Loan 和 Person 节点。
- 使用此标志后，Loan 节点也会保留 SSN 作为属性。

通过 UI 编辑映射

Neo4j-ETL 图形应用程序可以添加到 Neo4j Desktop 中，从而实现映射的可视化编辑和交互式导入。

UI 允许您更改和设置首选的标签名称、属性名称和类型、关系类型，并预览生成的图形。

导出数据 (export)

根据 mappings.json 生成来自关系源的 CSV 文件
- 待办事项：使用性能更高的流式 API
提供正确的标签、关系类型和标题来导入 CSV
- 待办事项：缺乏将选项作为文件而不是命令行参数传递给 neo4j-import 工具的能力
- 待办事项：创建与非主键列相关的索引和约束

Neo4j ETL

概述

许可证

问题、反馈与贡献

下载与运行

命令使用示例

Neo4j-Desktop

JDBC 驱动程序

简介

架构图

它是什么

未来计划

适用对象

悬而未决的问题

neo4j-etl 命令行工具

可用命令

'generate-metadata-mapping' 命令

'export' 命令

参数用法

示例会话：基础

生成元数据映射

多数据库支持

通过 neo4j-import 工具进行离线批量导入以实现初始加载（Neo4j 数据库必须为空）

通过 java-bolt-driver 进行在线批处理导入以实现增量加载（Neo4j 可以已经包含数据）

示例会话：Docker + Northwind

MySQL

PostgreSQL

Oracle

Microsoft SQL

如何将 World Wide Importers 数据库导入到 MS SQL Server Docker 实例中

功能

使用映射规则推断架构 (generate-metadata-mapping)

通过 UI 编辑映射

导出数据 (export)

通过 `neo4j-import` 工具进行离线批量导入以实现初始加载（Neo4j 数据库必须为空）

通过 `java-bolt-driver` 进行在线批处理导入以实现增量加载（Neo4j 可以已经包含数据）