Neo4j ETL
概述
-
Neo4j Desktop 中的 Neo4j-ETL 用户界面
-
管理多个 RDBMS 连接
-
自动从关系型数据库中提取数据库元数据
-
推导图模型
-
可视化编辑标签、关系类型、属性名称及类型
-
将当前模型可视化为图形
-
将映射持久化为 JSON
-
从关系型数据库中检索相关的 CSV 数据
-
通过 neo4j-import、bolt-connector、cypher-shell、neo4j-shell 运行导入
-
内置 MySQL、PostgreSQL 支持,允许在 Neo4j 企业版中使用自定义 JDBC 驱动
许可证
本工具在 NEO4J 预发布许可协议 下授权使用。
问题、反馈与贡献
-
您可以提交 GitHub Issues 或将反馈发送至 feedback@neo4j.com
-
也请加入我们的 neo4j-users Slack 并在 #help-import 频道中提问
下载与运行
下载并解压最新的 neo4j-etl.zip。
命令使用示例
./bin/neo4j-etl export \ --rdbms:url <url> --rdbms:user <user> --rdbms:password <password> \ --destination $NEO4J_HOME/data/databases/graph.db/ --import-tool $NEO4J_HOME/bin \ --csv-directory $NEO4J_HOME/import
./bin/neo4j-etl export \
--rdbms:url <url> --rdbms:user <user> --rdbms:password <password> --rdbms:schema <schema> \
--using { bulk:neo4j-import | cypher:neo4j-shell | cypher:shell | cypher:direct | cypher:batch | cypher:fromSQL } \
--neo4j:url <neo4j url> --neo4j:user <neo4j user> --neo4j:password <neo4j password> \
--destination $NEO4J_HOME/data/databases/graph.db/ --import-tool $NEO4J_HOME/bin \
--csv-directory $NEO4J_HOME/import --options-file import-tool-options.json --force --debug
cypher:batch 和 cypher:fromSQL 导入模式的附加命令行选项--unwind-batch-size <value> (Batch size that will be used for unwind data) \ --tx-batch-size <value> (Transaction Batch size that will be used for unwind commit) \
有关详细用法,请参阅:工具文档。
Neo4j-Desktop
您可以通过添加相应的应用程序密钥将 Neo4j ETL 添加到 Neo4j Desktop。请咨询您的 Neo4j 联系人或发送电子邮件至 devrel@neo4j.com
下次启动 Neo4j Desktop 时,您将看到 Neo4j ETL 作为一个交互式 UI 界面。
| 配置驱动程序 | 加载映射 | 编辑映射 | 导入数据 |
|---|---|---|---|
|
|
|
|
如果您想测试对 Neo4j Desktop UI 的更改,可以尝试
-
进行必要的更改
-
执行 npm install && yarn start 以验证 UI 是否运行正常
-
在 Neo4j Desktop 中安装生产版 ETL 工具
-
替换 $DESKTOP 中的相关 JavaScript
-
重新运行 Neo4j Desktop 应用并验证 UI 和功能是否符合预期
JDBC 驱动程序
MySQL 和 PostgreSQL 的驱动程序已内置于 Neo4j-ETL 工具中。
要使用其他 JDBC 驱动程序,请使用这些下载链接和 JDBC URL。将 JDBC 驱动程序 jar 文件提供给命令行工具或 Neo4j-ETL 应用程序。并在 --rdbms:url 参数中或 JDBC-URL 输入字段中使用该 JDBC-URL。
| 数据库 | JDBC-URL | 驱动程序来源 |
|---|---|---|
Oracle |
|
|
MS SQLServer |
|
|
IBM DB2 |
|
|
Derby |
|
JDK6 起已包含 |
Cassandra |
|
|
SAP Hana |
|
|
MySQL |
|
|
PostgreSQL |
|
简介
Neo4j ETL,特别是 neo4j-etl 命令行工具,可用于将建模良好(即规范化)的关系数据导入 Neo4j。它应用了一些简单的规则来转换关系模型。
流程概述如下
-
读取数据库元数据并生成 mapping.json
-
通过 Neo4j Desktop 中的
neo4j-etl-ui选择性地编辑 mapping.json -
将关系数据导出为 CSV
-
生成映射头信息
-
导入 Neo4j,使用
-
用于初始 离线 批量加载的
neo4j-import工具 -
用于增量 离线 批量加载的
neo4j-shell工具 -
用于增量 在线 单事务加载的
cypher-shell工具 -
用于增量 在线 批处理加载的
java bolt driver
-
架构图
neo4j-etl 命令行工具
这是您用来从关系型数据库中检索和映射元数据,并驱动从关系库导出及向 Neo4j 数据库导入的命令行工具。
通过图形用户界面,您可以预览生成的图形数据模型,并最终通过更改标签、属性名称、关系类型和属性类型进行调整。
它支持所有带有 JDBC 驱动程序的关系数据库,如 MySQL、PostgreSQL、Oracle 和 Microsoft SQL。
您可以从 GitHub 获取导入工具的最新版本。
下载并解压缩操作系统特定的 zip / tar.gz 后,您还需要下载适当的 JDBC 驱动程序并将其添加到 lib 文件夹中。
您可以按照下表中的相应链接下载适当的驱动程序 jar 文件
| 供应商 | JDBC 驱动程序 URL |
|---|---|
http://www.oracle.com/technetwork/database/features/jdbc/default-2280470.html |
|
https://www.microsoft.com/en-us/download/details.aspx?id=55539 |
| 对于超大型数据库,请确保有足够的磁盘空间用于 CSV 导出和 Neo4j 数据存储,并拥有足够的 RAM 和 CPU 以快速完成导入。 |
'generate-metadata-mapping' 命令
NAME
neo4j-etl generate-metadata-mapping - Create RDBMS to Neo4j metadata
mapping Json.
SYNOPSIS
neo4j-etl generate-metadata-mapping
[ {--columns | --cols} <Columns>... ]
[ --config-file <Configuration File> ]
[ {-d | --database} <RDBMS database> ] [ --debug ]
[ --delimiter <delimiter> ] [ {--driver | --jars} <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>... ]
[ {--exclusion-mode | --exc} <Tables: exclude|include|none(default)> ]
[ {--exclusion-mode-column-type | --exctype} <exclude|include|none(default)> ]
[ {--exclusion-mode-columns | --excc} <exclude|include|none(default)> ]
[ {--exclusion-mode-tables | --exct} <exclude|include|none(default)> ]
[ --options-file <option file> ] [ --output-mapping-file <file|stdout> ]
[ {-p | --port} <RDBMS port> ] [ --quote <quote> ]
[ {--rdbms:fetch-size | --fs} <fetch-size> ]
[ {--rdbms:password | --password} <RDBMS password> ]
[ {--rdbms:schema | -s | --schema} <schema> ]
[ {--rdbms:url | --url} <RDBMS url> ]
[ {--rdbms:user | -u | --user} <RDBMS user> ]
[ {--relationship-name | --rel-name} <table(default)|column> ]
[ --schemas <Schemas>... ] [ {--tables | --tabs} <Tables>... ]
[ --tiny-int <byte(default)|boolean> ] [ --types <Types>... ] [--] [ <table1 table2 ...>... ]
OPTIONS
--columns <Columns>, --cols <Columns>
Lists all columns to include/exclude by name or pattern
Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.orders\..*_id' or
'northwind\.orders\..*_id' ,
'-g' <PATTERN> for grep syntax, ex. '-g .*\.orders\..*_id' or
'northwind\.orders\..*_id' ,
or '-l' <LIST> to list all columns names ex. '-l
northwind.customers.id,northwind.purchase.id,northwind.orders.id'
--config-file <Configuration File>
Specify the path to a file containing the configuration for the
selected command
-d <RDBMS database>, --database <RDBMS database>
RDBMS database.
This option is required if any of the following options are
specified: host
--debug
Print detailed diagnostic output.
--delimiter <delimiter>
Delimiter to separate fields in CSV.
--driver <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>, --jars <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>
List of additional drivers as a list
--exclusion-mode <Tables: exclude|include|none(default)>, --exc <Tables: exclude|include|none(default)>
Specifies how to handle table exclusion. Options are mutually
exclusive.
exclude: Excludes specified tables from the process. All other
tables will be included.
include: Includes specified tables only. All other tables will be
excluded.
none: All tables are included in the process.
--exclusion-mode-column-type <exclude|include|none(default)>, --exctype
<exclude|include|none(default)>
Specifies how to handle column type exclusion. Options are mutually
exclusive.
exclude: Excludes specified columns types from the process. All
other columns types will be included.
include: Includes specified columns types only. All other columns
types will be excluded.
none: All columns types are included in the process.
--exclusion-mode-columns <exclude|include|none(default)>, --excc
<exclude|include|none(default)>
Specifies how to handle column exclusion. Options are mutually
exclusive.
exclude: Excludes specified columns from the process. All other
columns will be included.
include: Includes specified columns only. All other columns will be
excluded.
none: All columns are included in the process.
--exclusion-mode-tables <exclude|include|none(default)>, --exct
<exclude|include|none(default)>
Specifies how to handle table exclusion. Options are mutually
exclusive.
exclude: Excludes specified tables from the process. All other
tables will be included.
include: Includes specified tables only. All other tables will be
excluded.
none: All tables are included in the process.
--options-file <option file>
Path to file containing Neo4j import tool options.
--output-mapping-file <file|stdout>
Path to the output metadata mapping file.
-p <RDBMS port>, --port <RDBMS port>
Port number to use for connection to RDBMS.
--quote <quote>
Character to treat as quotation character for values in CSV data.
--rdbms:fetch-size <fetch-size>, --fs <fetch-size>
RDBMS Fetch size
--rdbms:password <RDBMS password>, --password <RDBMS password>
Password for login to RDBMS.
This option is required if any of the following options are
specified: --rdbms:url, --url
--rdbms:schema <schema>, -s <schema>, --schema <schema>
RDBMS schema.
--rdbms:url <RDBMS url>, --url <RDBMS url>
Url to use for connection to RDBMS.
--rdbms:user <RDBMS user>, -u <RDBMS user>, --user <RDBMS user>
User for login to RDBMS.
This option is required if any of the following options are
specified: --rdbms:url, --url
--relationship-name <table(default)|column>, --rel-name
<table(default)|column>
Specifies whether to get the name for relationships from table names
or column names.
--schemas <Schemas>
Lists all schemas to include by name or pattern.
Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.north.*',
'-g' <PATTERN> for grep syntax, ex. '-g .*\.north.*' ,
or '-l' <LIST> to list all schemas names ex. '-l northwind,exc'
--tables <Tables>, --tabs <Tables>
Lists all tables to include/exclude by name or pattern.
Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.purchase.*' or
'northwind.purchase.*' ,
'-g' <PATTERN> for grep syntax, ex. '-g .*\.purchase.*' or
'northwind.purchase.*' ,
or '-l' <LIST> to list all tables names ex. '-l
customers,purchase,orders'
--tiny-int <byte(default)|boolean>
Specifies whether to convert TinyInt to byte or boolean
--types <Types>
Lists all column types to include/exclude by name separated by
commas. Valid values:
unknown,
binary,
bit,
character,
id,
integer,
real,
reference,
temporal,
url,
xml,
large_object,
object;
--
This option can be used to separate command-line options from the
list of arguments (useful when arguments might be mistaken for
command-line options)
<table1 table2 ...>
Tables to be excluded/included
'export' 命令
NAME
neo4j-etl export - Export from RDBMS and import into NEO4J via CSV
files.
SYNOPSIS
neo4j-etl export [ {--columns | --cols} <Columns>... ]
[ --config-file <Configuration File> ]
[ --csv-directory <csv directory> ]
[ {-d | --database} <RDBMS database> ] [ --debug ]
[ --delimiter <delimiter> ] [ --destination <directory> ] [ {--driver | --jars} <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>... ]
[ {--exclusion-mode | --exc} <Tables: exclude|include|none(default)> ]
[ {--exclusion-mode-column-type | --exctype} <exclude|include|none(default)> ]
[ {--exclusion-mode-columns | --excc} <exclude|include|none(default)> ]
[ {--exclusion-mode-tables | --exct} <exclude|include|none(default)> ]
[ --force ] [ --import-tool <import tool path> ]
[ --mapping-file <file|stdin> ] [ {--neo4j:password | --graph:password | --graph:neo4j:password} <neo4j password> ]
[ {--neo4j:url | --graph:url | --graph:neo4j:url} <neo4j url> ]
[ {--neo4j:user | --graph:user | --graph:neo4j:user} <neo4j user> ]
[ --options-file <option file> ] [ --output-mapping-file <file|stdout> ]
[ {-p | --port} <RDBMS port> ] [ --quote <quote> ]
[ {--rdbms:fetch-size | --fs} <fetch-size> ]
[ {--rdbms:password | --password} <RDBMS password> ]
[ {--rdbms:schema | -s | --schema} <schema> ]
[ {--rdbms:url | --url} <RDBMS url> ]
[ {--rdbms:user | -u | --user} <RDBMS user> ]
[ {--relationship-name | --rel-name} <table(default)|column> ]
[ --schemas <Schemas>... ] [ {--tables | --tabs} <Tables>... ]
[ --tiny-int <byte(default)|boolean> ] [ --types <Types>... ]
[ --using <import tool> ] [--] [ <table1 table2 ...>... ]
OPTIONS
--columns <Columns>, --cols <Columns>
Lists all columns to include/exclude by name or pattern
Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.orders\..*_id' or
'northwind\.orders\..*_id' ,
'-g' <PATTERN> for grep syntax, ex. '-g .*\.orders\..*_id' or
'northwind\.orders\..*_id' ,
or '-l' <LIST> to list all columns names ex. '-l
northwind.customers.id,northwind.purchase.id,northwind.orders.id'
--config-file <Configuration File>
Specify the path to a file containing the configuration for the
selected command
--csv-directory <csv directory>
Path to directory for intermediate CSV files.
-d <RDBMS database>, --database <RDBMS database>
RDBMS database.
This option is required if any of the following options are
specified: host
--debug
Print detailed diagnostic output.
--delimiter <delimiter>
Delimiter to separate fields in CSV.
--destination <directory>
Path to destination store directory.
--driver <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>, --jars <--driver <PATH_TO_YOUR_JAR1> --driver <PATH_TO_YOUR_JAR2>>
List of additional drivers as a list
--exclusion-mode <Tables: exclude|include|none(default)>, --exc <Tables: exclude|include|none(default)>
Specifies how to handle table exclusion. Options are mutually
exclusive.
exclude: Excludes specified tables from the process. All other
tables will be included.
include: Includes specified tables only. All other tables will be
excluded.
none: All tables are included in the process.
--exclusion-mode-column-type <exclude|include|none(default)>, --exctype
<exclude|include|none(default)>
Specifies how to handle column type exclusion. Options are mutually
exclusive.
exclude: Excludes specified columns types from the process. All
other columns types will be included.
include: Includes specified columns types only. All other columns
types will be excluded.
none: All columns types are included in the process.
--exclusion-mode-columns <exclude|include|none(default)>, --excc
<exclude|include|none(default)>
Specifies how to handle column exclusion. Options are mutually
exclusive.
exclude: Excludes specified columns from the process. All other
columns will be included.
include: Includes specified columns only. All other columns will be
excluded.
none: All columns are included in the process.
--exclusion-mode-tables <exclude|include|none(default)>, --exct
<exclude|include|none(default)>
Specifies how to handle table exclusion. Options are mutually
exclusive.
exclude: Excludes specified tables from the process. All other
tables will be included.
include: Includes specified tables only. All other tables will be
excluded.
none: All tables are included in the process.
--force
Force delete destination store directory if it already exists.
--import-tool <import tool path>
Path to directory containing Neo4j import tool. Mandatory only for Offline Bulk Import mode
--mapping-file <file|stdin>
Path to an existing metadata mapping file. The name 'stdin' will
cause the CSV resources definitions to be read from standard input.
--neo4j:password <neo4j password>, --graph:password <neo4j password>,
--graph:neo4j:password <neo4j password>
Password for login to Neo4j.
--neo4j:url <neo4j url>, --graph:url <neo4j url>, --graph:neo4j:url
<neo4j url>
Url to use for connection to Neo4j.
--neo4j:user <neo4j user>, --graph:user <neo4j user>, --graph:neo4j:user
<neo4j user>
User for login to Neo4j.
--options-file <option file>
Path to file containing Neo4j import tool options.
--output-mapping-file <file|stdout>
Path to the output metadata mapping file.
-p <RDBMS port>, --port <RDBMS port>
Port number to use for connection to RDBMS.
--quote <quote>
Character to treat as quotation character for values in CSV data.
--rdbms:fetch-size <fetch-size>, --fs <fetch-size>
RDBMS Fetch size
--rdbms:password <RDBMS password>, --password <RDBMS password>
Password for login to RDBMS.
This option is required if any of the following options are
specified: --rdbms:url, --url
--rdbms:schema <schema>, -s <schema>, --schema <schema>
RDBMS schema.
--rdbms:url <RDBMS url>, --url <RDBMS url>
Url to use for connection to RDBMS.
--rdbms:user <RDBMS user>, -u <RDBMS user>, --user <RDBMS user>
User for login to RDBMS.
This option is required if any of the following options are
specified: --rdbms:url, --url
--relationship-name <table(default)|column>, --rel-name
<table(default)|column>
Specifies whether to get the name for relationships from table names
or column names.
--schemas <Schemas>
Lists all schemas to include by name or pattern.
Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.north.*',
'-g' <PATTERN> for grep syntax, ex. '-g .*\.north.*' ,
or '-l' <LIST> to list all schemas names ex. '-l northwind,exc'
--tables <Tables>, --tabs <Tables>
Lists all tables to include/exclude by name or pattern.
Use '-r' <PATTERN> to filter by regex, ex. '-r .*\.purchase.*' or
'northwind.purchase.*' ,
'-g' <PATTERN> for grep syntax, ex. '-g .*\.purchase.*' or
'northwind.purchase.*' ,
or '-l' <LIST> to list all tables names ex. '-l
customers,purchase,orders'
--tiny-int <byte(default)|boolean>
Specifies whether to convert TinyInt to byte or boolean
--types <Types>
Lists all column types to include/exclude by name separated by
commas. Valid values:
unknown,
binary,
bit,
character,
id,
integer,
real,
reference,
temporal,
url,
xml,
large_object,
object;
--using <import tool>
Import tool that will be used to load data into neo4j.
--
This option can be used to separate command-line options from the
list of arguments (useful when arguments might be mistaken for
command-line options)
<table1 table2 ...>
Tables to be excluded/included
参数用法
There are two ways for write Etl parameters:
1) write parameters in command line:
$NEO4J_HOME/bin/neo4j-etl export|generate-metadata-mapping
--rdbms:url jdbc:oracle:thin:@localhost:49161:XE
--rdbms:user northwind --rdbms :password northwind
--rdbms:schema northwind
--using bulk:neo4j-import
--import-tool $NEO4J_HOME/bin
--csv-directory /tmp/northwind
--options-file /tmp/northwind/options.json
--quote '"' --force
...
2) use a config file:
$NEO4J_HOME/bin/neo4j-etl export|generate-metadata-mapping \
--config-file <path to .config file>
Above there is an Example of config file.
#EXAMPLE - ETL CONFIG FILE
#RDBMS
rdbms-url=url
rdbms-schema=schema
rdbms-password=neo4j
rdbms-user=neo4j
rdbms-fetch-size=10000
#NEO4J
using=cypher:direct
neo4j-url=bolt://127.0.0.1:7687
neo4j-user=neo4j
neo4j-password=neo4j
#RULES
exclusion-mode-tables=INCLUDE
tables=-l table1,table2,...
exclusion-mode-columns=INCLUDE
columns=-l column1,column2,...
exclusion-mode-column-types=EXCLUDE
column-types=type1,type2,...
#MISC
output-mapping-file=path_to_output_mapping_file
import-tool=path_to_import_tool
csv-directory=path_to_directory
mapping-file=path_to_file
debug=false
示例会话:基础
生成元数据映射
export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0 mkdir -p /tmp/northwind $NEO4J_HOME/bin/neo4j-etl generate-metadata-mapping \ --rdbms:url jdbc:oracle:thin:@localhost:49161:XE \ --rdbms:user northwind --rdbms:password northwind \ --rdbms:schema northwind --output-mapping-file /tmp/northwind/mapping.json
多数据库支持
Neo4j 4.0 企业版具有 多租户支持,为了支持此功能,我们在导入视图中添加了一个选择选项框,如下所示
如果您使用命令行工具,可以通过传递 --neo4j:database-name 选项来指定目标数据库,方式如下
echo '{ "multiline-fields" : "true" }' > /tmp/northwind/options.json
$NEO4J_HOME/bin/neo4j-etl export \
--rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
--rdbms:user northwind --rdbms:password northwind \
--rdbms:schema northwind \
--using cypher:direct | cypher:batch | cypher:fromSQL \
--neo4j:url bolt://:7687 \
--neo4j:user neo4j
--neo4j:password neo4j \
--neo4j:database-name myDatabase \
--import-tool $NEO4J_HOME/bin \
--csv-directory /tmp/northwind \
--options-file /tmp/northwind/options.json \
--quote '"' --force
通过 neo4j-import 工具进行离线批量导入以实现初始加载(Neo4j 数据库必须为空)
请考虑到由于 neo4j-admin import 取消了对 JSON 文件作为选项源的支持,如果数据库涉及大量实体,导致命令行过长,我们可能会遇到问题。
echo '{ "multiline-fields" : "true" }' > /tmp/northwind/options.json
$NEO4J_HOME/bin/neo4j-etl export \
--rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
--rdbms:user northwind --rdbms :password northwind \
--rdbms:schema northwind \
--using bulk:neo4j-import \
--import-tool $NEO4J_HOME/bin \
--csv-directory /tmp/northwind \
--options-file /tmp/northwind/options.json \
--quote '"' --force
$NEO4J_HOME/bin/neo4j-shell -path $NEO4J_HOME/data/databases/graph.db/ -c 'MATCH (n) RETURN labels(n), count(*);' +--------------------------+ | labels(n) | count(*) | +--------------------------+ | ["Shipper"] | 3 | | ["Employee"] | 9 | | ["Region"] | 4 | | ["Customer"] | 93 | | ["Territory"] | 53 | | ["Product"] | 77 | | ["Supplier"] | 29 | | ["Order"] | 830 | | ["Category"] | 8 | +--------------------------+ 9 rows
通过 java-bolt-driver 进行在线批处理导入以实现增量加载(Neo4j 可以已经包含数据)
它可以以 3 种模式完成
-
'cypher:direct' (使用 LOAD CSV)
-
'cypher:batch' (以与 cypher:direct 相同的方式创建 CSV 文件,但将其转换为 UNWIND 语句而非 LOAD CSV)
-
'cypher:fromSQL' (将 RDBMS 的表格结果数据进行转换,而不创建 CSV 文件)
在通过 java-bolt-diver 进行的在线模式下,为了允许从临时文件夹导入 CSV,必须
-
启用属性
dbms.security.allow_csv_import_from_file_urls并设置为 true,以允许 LOAD CSV 从外部资源读取 CSV 文件 -
移除或注释掉属性
dbms.directories.import=import,以防止 Neo4j 在默认导入文件夹中搜索 CSV 文件
对于 Neo4j 远程实例,您必须使用 cypher:batch 导入模式
echo '{ "multiline-fields" : "true" }' > /tmp/northwind/options.json
$NEO4J_HOME/bin/neo4j-etl export \
--rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
--rdbms:user northwind --rdbms:password northwind \
--rdbms:schema northwind \
--using cypher:direct | cypher:batch | cypher:fromSQL \
--neo4j:url bolt://:7687 \
--neo4j:user neo4j --neo4j:password neo4j \
--import-tool $NEO4J_HOME/bin \
--csv-directory /tmp/northwind \
--options-file /tmp/northwind/options.json \
--quote '"' --force
Additional command line options for `cypher:batch` and `cypher:fromSQL` import modes:
--unwindBatchSize <value> (Batch size that will be used for unwind data) \
--txBatchSize <value> (Transaction Batch size that will be used for unwind commit) \
$NEO4J_HOME/bin/cypher-shell -a bolt://:7687 -u neo4j -p neo4j 'MATCH (n) RETURN labels(n), count(*);' +--------------------------+ | labels(n) | count(*) | +--------------------------+ | ["Shipper"] | 3 | | ["Employee"] | 9 | | ["Region"] | 4 | | ["Customer"] | 93 | | ["Territory"] | 53 | | ["Product"] | 77 | | ["Supplier"] | 29 | | ["Order"] | 830 | | ["Category"] | 8 | +--------------------------+ 9 rows
示例会话:Docker + Northwind
此示例会话基于 Northwind 示例数据集。
DDL 脚本可在此处获得
MySQL
下载、启动并配置带有 MySQL 的 docker 容器
docker pull mysql docker run --name neo4j-etl-mysql -e MYSQL_ROOT_PASSWORD=admin -e MYSQL_DATABASE=northwind -e MYSQL_USER=neo4j -e MYSQL_PASSWORD=neo4j -d -p 3306:3306 mysql:latest docker exec -it neo4j-etl-mysql bash root@eb6f279fdb88:/# mysql -u root -p Enter password: admin Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.18 MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> grant all privileges on *.* to 'neo4j'@'%' with grant option; Query OK, 0 rows affected (0.00 sec) mysql> quit; Bye root@bf99fbc0d31c:/# exit exit
通过以下 sql 脚本加载数据库:https://raw.githubusercontent.com/neo4j-contrib/neo4j-etl/master/neo4j-etl-it/src/main/resources/scripts/mysql/northwind.sql
export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0
mkdir -p /tmp/northwind
echo '{ "multiline-fields" : "true" }' > /tmp/northwind/options.json
./bin/neo4j-etl export \
--rdbms:url jdbc:mysql://:5433/northwind?autoReconnect=true&useSSL=false \
--rdbms:user neo4j --rdbms:password neo4j \
--import-tool $NEO4J_HOME/bin \
--options-file /tmp/northwind/options.json \
--csv-directory /tmp/northwind \
--destination $NEO4J_HOME/data/databases/graph.db/ \
--quote '"' --force
PostgreSQL
下载、启动并配置带有 PostgreSQL 9.6.2 的 docker 容器
docker pull postgres docker run --name neo4j-etl-postgres -e POSTGRES_USER=neo4j -e POSTGRES_PASSWORD=neo4j -d -p 5433:5432 postgres docker run -it --rm --link neo4j-etl-postgres:postgres postgres psql -h postgres -U neo4j Password for user neo4j: psql (9.6.2) Type "help" for help. neo4j=# DROP DATABASE IF EXISTS northwind; neo4j=# CREATE DATABASE northwind WITH OWNER 'neo4j' ENCODING 'UTF8' LC_COLLATE = 'en_US.utf8' LC_CTYPE = 'en_US.utf8'; neo4j=# \q
通过以下 sql 脚本加载数据库:northwind.sql
export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0
mkdir -p /tmp/northwind
echo '{"multiline-fields":"true"}' > /tmp/northwind/options.json
./bin/neo4j-etl export \
--rdbms:url jdbc:postgresql://:5433/northwind?ssl=false \
--rdbms:user neo4j --rdbms:password neo4j \
--import-tool $NEO4J_HOME/bin \
--options-file /tmp/northwind/options.json \
--csv-directory /tmp/northwind \
--destination $NEO4J_HOME/data/databases/graph.db/ \
--quote '"' --force
Oracle
下载、启动并配置带有 Oracle XE 11g 的 docker 容器
docker pull wnameless/oracle-xe-11g
docker run --name neo4j-etl-oracle -d -p 49160:22 -p 49161:1521 wnameless/oracle-xe-11g
ssh root@localhost -p 49160
root@localhost's password: admin
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.9.13-moby x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Last login: Mon May 1 17:32:48 2017 from 172.17.0.1
root@692c446a274b:~# wget https://raw.githubusercontent.com/neo4j-contrib/neo4j-etl/master/neo4j-etl-it/src/main/resources/scripts/oracle/northwind.sql
root@692c446a274b:~# sqlplus system/oracle
SQL> CREATE USER northwind IDENTIFIED BY northwind;
SQL> GRANT DBA TO northwind;
SQL> CONN northwind/northwind;
SQL> SET sqlblanklines ON;
SQL> @northwind.sql
SQL> quit;
root@692c446a274b:~# exit
export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0
mkdir -p /tmp/northwind
echo '{"multiline-fields":"true"}' > /tmp/northwind/options.json
./bin/neo4j-etl export \
--rdbms:url jdbc:oracle:thin:@localhost:49161:XE \
--rdbms:user northwind --rdbms:password northwind \
--rdbms:schema northwind \
--import-tool $NEO4J_HOME/bin \
--options-file /tmp/northwind/options.json \
--csv-directory /tmp/northwind \
--destination $NEO4J_HOME/data/databases/graph.db/ \
--quote '"' --force
--driver /tmp/ojdbc6-11.2.0.3.jar
Microsoft SQL
下载、启动并配置带有 Microsoft SQL Server 的 docker 容器
docker run --name neo4j-etl-mssql -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=Passw0rd!' -p 1433:1433 -d microsoft/mssql-server-linux
如果您想连接到 Microsoft SQL 客户端控制台,可以运行以下命令
docker exec -it neo4j-etl-mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P 'Passw0rd!' -d <DATABASE>
export NEO4J_HOME=/path/to/neo4j-enterprise-3.4.0
mkdir -p /tmp/wideworldimporters
echo '{"multiline-fields":"true"}' > /tmp/wideworldimporters/options.json
./bin/neo4j-etl export \
--rdbms:password "Passw0rd!" \
--rdbms:user sa \
--rdbms:url "jdbc:sqlserver://:1433;databaseName=WideWorldImporters" \
--import-tool $NEO4J_HOME/bin \
--options-file /tmp/wideworldimporters/options.json \
--csv-directory /tmp/wideworldimporters \
--destination $NEO4J_HOME/data/databases/graph.db/ \
--driver /tmp/mssql-jdbc-6.1.0.jre8.jar \
如何将 World Wide Importers 数据库导入到 MS SQL Server Docker 实例中
# Create docker instance for MS-SQL Server docker run --name mssql-etl \ -e MSSQL_COLLATION=Latin1_General_100_CI_AS \ -e 'ACCEPT_EULA=Y' \ -e 'SA_PASSWORD=<YOUR_PASSWORD>' \ -p 1433:1433 \ -v /tmp:/tmp \ -d microsoft/mssql-server-linux:2017-latest # Download World Wide Importers backup file wget https://github.com/Microsoft/sql-server-samples/releases/download/wide-world-importers-v1.0/WideWorldImporters-Full.bak # Create a backup directory sudo docker exec -it mssql-etl mkdir /var/opt/mssql/backup # Load backup file into the container sudo docker cp WideWorldImporters-Full.bak mssql-etl:/var/opt/mssql/backup # Restore Wide World Importers database sudo docker exec -it mssql-etl /opt/mssql-tools/bin/sqlcmd \ -S localhost \ -U SA \ -P '<YOUR_PASSWORD>' \ -Q 'RESTORE FILELISTONLY FROM DISK = "/var/opt/mssql/backup/WideWorldImporters-Full.bak"' \ | tr -s ' ' \ | cut -d ' ' -f 1-2 sudo docker exec -it mssql-etl /opt/mssql-tools/bin/sqlcmd \ -S localhost \ -U SA \ -P '<YOUR_PASSWORD>' \ -Q 'RESTORE DATABASE WideWorldImporters FROM DISK = "/var/opt/mssql/backup/WideWorldImporters-Full.bak" WITH MOVE "WWI_Primary" TO "/var/opt/mssql/data/WideWorldImporters.mdf", MOVE "WWI_UserData" TO "/var/opt/mssql/data/WideWorldImporters_userdata.ndf", MOVE "WWI_Log" TO "/var/opt/mssql/data/WideWorldImporters.ldf", MOVE "WWI_InMemory_Data_1" TO "/var/opt/mssql/data/WideWorldImporters_InMemory_Data_1"'
功能
使用映射规则推断架构 (generate-metadata-mapping)
-
基于以下规则的通用关系数据库映射
-
带有外键的 表 被视为 连接,并作为带有 关系 的 节点 导入
-
例:
Person -> Address被导入为(Person)-[:ADDRESS_ID]->(Address) -
带有两个外键的 表 被视为 连接表 (JoinTable),并作为 关系 导入
-
例:
Student <- Student_Course -> Course被导入为
(Student) -[:STUDENT_COURSE]-> (Course) -
具有超过两个外键的 表 被视为 中间节点,并作为 带有多个关系的节点 导入
-
例:
Order_Detail -> Shipping_Address, Order_Detail -> Payment_Information, Order_Detail -> Shipment_Instructions被导入为
(Shipping_Address) -[:SHIPPING]-> (Order_Detail) (Payment_Information) -[:PAYMENT]-> (Order_Detail) (Shipment_Instructions) -[:SHIPMENT]-> (Order_Detail)
-
通过复合键解析关系。
-
支持大多数数据类型。
-
TinyInt 可以作为 Byte 或 Boolean 导入 (这是为了支持 MySQL 中保存为 TinyInt 的布尔值)
-
日期 (Dates) 被导入为 字符串 (String)
-
Blobs 在导入时会被跳过,直到导入工具支持二进制数组数据。
-
Decimal 类型待确认。
-
-
关系名称可以使用 列名 或 所引用的表名
-
--relationship-name=table时,Person->Address将变为(Person)-[:ADDRESS]->(Address) -
--relationship-name=column将变为(Person)-[:ADDRESS_ID]->(Address)
-
-
使用
--include和--exclude过滤您想要包含或排除的表 -
待办事项:使用
--include和--exclude过滤您想要包含或排除的列 -
待办事项:根据需要使用 <TBA> 标志保留自然键(标记为 PrimaryKeys 和 ForeignKeys)
-
外键通常用于在 2 个节点之间创建关系,而无需保存为属性。
-
使用此标志,节点将保留该值作为属性。
-
例:贷款具有贷款申请人的 SSN,这通常用于连接
Loan和Person节点。 -
使用此标志后,
Loan节点也会保留SSN作为属性。
-



