导出为 CSV
导出 CSV 的过程将数据导出为 Python 和 R 生态系统中数据科学库更支持的格式。我们也可以将数据导出为 JSON 格式,以便导入其他工具或用于查询结果的通用共享。本节描述的过程支持导出到文件或作为流(stream)输出。
对于 apoc.export.csv.all、apoc.export.csv.data 和 apoc.export.csv.graph,节点和关系的属性按字母顺序排列,并使用以下结构:
_id,_labels,<list_nodes_properties_naturally_sorted>,_start,_end,_type,<list_rel_properties_naturally_sorted>.
对于一个包含节点属性 age, city, kids, male, name 和 street,以及关系属性 bar 和 foo 的图,我们将得到以下结构:
_id,_labels,age,city,kids,male,name,street,_start,_end,_type,bar,foo
导出的标签按字母顺序排列。labels() 函数的输出不是排序的,请将其与 apoc.coll.sort() 结合使用。
请注意,为了执行正确的 Point 序列化,不建议导出坐标为 x,y 和 crs: 'wgs-84' 的点,例如 point({x: 56.7, y: 12.78, crs: 'wgs-84'})。否则,该点将被导出为经度和纬度(以及高度),而不是 x 和 y(以及 z)。
可用过程
下表描述了可用的过程:
| 限定名称 | 类型 |
|---|---|
apoc.export.csv.all |
|
apoc.export.csv.data |
|
apoc.export.csv.graph |
|
apoc.export.csv.query |
|
导出到文件
默认情况下,导出到文件系统是被禁用的。我们可以通过在 apoc.conf 中设置以下属性来启用它:
apoc.export.file.enabled=true
有关访问 apoc.conf 的更多信息,请参阅配置选项章节。
如果我们尝试在未先设置此属性的情况下使用任何导出过程,我们将收到以下错误消息:
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Export to files not enabled, please set apoc.export.file.enabled=true in your apoc.conf. Otherwise, if you are running in a cloud environment without filesystem access, use the |
导出文件被写入 import 目录,该目录由 server.directories.import 属性定义。这意味着我们提供的任何文件路径都是相对于此目录的。如果我们尝试写入绝对路径(例如 /tmp/filename),我们将收到类似于以下内容的错误消息:
Failed to invoke procedure: Caused by: java.io.FileNotFoundException: /path/to/neo4j/import/tmp/fileName (No such file or directory) |
我们可以通过在 apoc.conf 中设置以下属性来允许写入文件系统上的任何位置:
apoc.import.file.use_neo4j_config=false
|
Neo4j 现在将能够写入文件系统上的任何位置,因此在设置此属性之前,请确保这是您的意图。 |
示例
本节包含展示如何使用导出到 CSV 过程的示例。这些示例基于电影数据集,可以通过运行以下 Cypher 查询来导入:
CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
CREATE
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),
(LillyW)-[:DIRECTED]->(TheMatrix),
(LanaW)-[:DIRECTED]->(TheMatrix),
(JoelS)-[:PRODUCED]->(TheMatrix);
下方的 Neo4j Browser 可视化显示了导入的图
将整个数据库导出为 CSV
apoc.export.csv.all 过程将整个数据库导出到 CSV 文件或作为流输出。
movies.csv:CALL apoc.export.csv.all("movies.csv", {})
| file | source | format | 节点 | relationships | 属性 | time | rows | batchSize | batches | done | data |
|---|---|---|---|---|---|---|---|---|---|---|---|
"movies.csv" |
"database: nodes(8), rels(7)" |
"csv" |
8 |
7 |
21 |
39 |
15 |
20000 |
1 |
TRUE |
NULL |
movies.csv 的内容如下所示:
"_id","_labels","born","name","released","tagline","title","_start","_end","_type","roles"
"188",":Movie","","","1999","Welcome to the Real World","The Matrix",,,,
"189",":Person","1964","Keanu Reeves","","","",,,,
"190",":Person","1967","Carrie-Anne Moss","","","",,,,
"191",":Person","1961","Laurence Fishburne","","","",,,,
"192",":Person","1960","Hugo Weaving","","","",,,,
"193",":Person","1967","Lilly Wachowski","","","",,,,
"194",":Person","1965","Lana Wachowski","","","",,,,
"195",":Person","1952","Joel Silver","","","",,,,
,,,,,,,"189","188","ACTED_IN","[""Neo""]"
,,,,,,,"190","188","ACTED_IN","[""Trinity""]"
,,,,,,,"191","188","ACTED_IN","[""Morpheus""]"
,,,,,,,"192","188","ACTED_IN","[""Agent Smith""]"
,,,,,,,"193","188","DIRECTED",""
,,,,,,,"194","188","DIRECTED",""
,,,,,,,"195","188","PRODUCED",""
data 列中返回整个数据库的流:CALL apoc.export.csv.all(null, {stream:true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data
| file | 节点 | relationships | 属性 | data |
|---|---|---|---|---|
|
|
|
|
"\"_id\",\"_labels\",\"born\",\"name\",\"released\",\"tagline\",\"title\",\"_start\",\"_end\",\"_type\",\"roles\" \"188\",\":Movie\",\"\",\"\",\"1999\",\"Welcome to the Real World\",\"The Matrix\",,,, \"189\",\":Person\",\"1964\",\"Keanu Reeves\",\"\",\"\",\"\",,,, \"190\",\":Person\",\"1967\",\"Carrie-Anne Moss\",\"\",\"\",\"\",,,, \"191\",\":Person\",\"1961\",\"Laurence Fishburne\",\"\",\"\",\"\",,,, \"192\",\":Person\",\"1960\",\"Hugo Weaving\",\"\",\"\",\"\",,,, \"193\",\":Person\",\"1967\",\"Lilly Wachowski\",\"\",\"\",\"\",,,, \"194\",\":Person\",\"1965\",\"Lana Wachowski\",\"\",\"\",\"\",,,, \"195\",\":Person\",\"1952\",\"Joel Silver\",\"\",\"\",\"\",,,, ,,,,,,,\"189\",\"188\",\"ACTED_IN\",\"[\"\"Neo\"\"]\" ,,,,,,,\"190\",\"188\",\"ACTED_IN\",\"[\"\"Trinity\"\"]\" ,,,,,,,\"191\",\"188\",\"ACTED_IN\",\"[\"\"Morpheus\"\"]\" ,,,,,,,\"192\",\"188\",\"ACTED_IN\",\"[\"\"Agent Smith\"\"]\" ,,,,,,,\"193\",\"188\",\"DIRECTED\",\"\" ,,,,,,,\"194\",\"188\",\"DIRECTED\",\"\" ,,,,,,,\"195\",\"188\",\"PRODUCED\",\"\" " |
将指定的节点和关系导出为 CSV
apoc.export.csv.data 过程将指定的节点和关系导出到 CSV 文件或作为流输出。
:Person 标签且 name 属性以 L 开头的节点导出到文件 movies-l.csv:MATCH (person:Person)
WHERE person.name STARTS WITH "L"
WITH collect(person) AS people
CALL apoc.export.csv.data(people, [], "movies-l.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
| file | source | format | 节点 | relationships | 属性 | time | rows | batchSize | batches | done | data |
|---|---|---|---|---|---|---|---|---|---|---|---|
"movies-l.csv" |
"data: nodes(3), rels(0)" |
"csv" |
3 |
0 |
6 |
2 |
3 |
20000 |
1 |
TRUE |
NULL |
movies-l.csv 的内容如下所示:
"_id","_labels","born","name","_start","_end","_type"
"191",":Person","1961","Laurence Fishburne",,,
"193",":Person","1967","Lilly Wachowski",,,
"194",":Person","1965","Lana Wachowski",,,
ACTED_IN 关系以及该关系两侧带有 Person 和 Movie 标签的节点导出到文件 movies-actedIn.csv:MATCH (person:Person)-[actedIn:ACTED_IN]->(movie:Movie)
WITH collect(DISTINCT person) AS people, collect(DISTINCT movie) AS movies, collect(actedIn) AS actedInRels
CALL apoc.export.csv.data(people + movies, actedInRels, "movies-actedIn.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
| file | source | format | 节点 | relationships | 属性 | time | rows | batchSize | batches | done | data |
|---|---|---|---|---|---|---|---|---|---|---|---|
"movies-actedIn.csv" |
"data: nodes(5), rels(4)" |
"csv" |
5 |
4 |
15 |
2 |
9 |
20000 |
1 |
TRUE |
NULL |
movies-actedIn.csv 的内容如下所示:
"_id","_labels","born","name","released","tagline","title","_start","_end","_type","roles"
"189",":Person","1964","Keanu Reeves","","","",,,,
"190",":Person","1967","Carrie-Anne Moss","","","",,,,
"191",":Person","1961","Laurence Fishburne","","","",,,,
"192",":Person","1960","Hugo Weaving","","","",,,,
"188",":Movie","","","1999","Welcome to the Real World","The Matrix",,,,
,,,,,,,"189","188","ACTED_IN","[""Neo""]"
,,,,,,,"190","188","ACTED_IN","[""Trinity""]"
,,,,,,,"191","188","ACTED_IN","[""Morpheus""]"
,,,,,,,"192","188","ACTED_IN","[""Agent Smith""]"
data 列中返回所有 ACTED_IN 关系以及该关系两侧带有 Person 和 Movie 标签的节点的流:MATCH (person:Person)-[actedIn:ACTED_IN]->(movie:Movie)
WITH collect(DISTINCT person) AS people, collect(DISTINCT movie) AS movies, collect(actedIn) AS actedInRels
CALL apoc.export.csv.data(people + movies, actedInRels, null, {stream: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data
| file | 节点 | relationships | 属性 | data |
|---|---|---|---|---|
|
|
|
|
"\"_id\",\"_labels\",\"born\",\"name\",\"released\",\"tagline\",\"title\",\"_start\",\"_end\",\"_type\",\"roles\" \"190\",\":Person\",\"1967\",\"Carrie-Anne Moss\",\"\",\"\",\"\",,,, \"189\",\":Person\",\"1964\",\"Keanu Reeves\",\"\",\"\",\"\",,,, \"191\",\":Person\",\"1961\",\"Laurence Fishburne\",\"\",\"\",\"\",,,, \"192\",\":Person\",\"1960\",\"Hugo Weaving\",\"\",\"\",\"\",,,, \"188\",\":Movie\",\"\",\"\",\"1999\",\"Welcome to the Real World\",\"The Matrix\",,,, ,,,,,,,\"189\",\"188\",\"ACTED_IN\",\"[\"\"Neo\"\"]\" ,,,,,,,\"190\",\"188\",\"ACTED_IN\",\"[\"\"Trinity\"\"]\" ,,,,,,,\"191\",\"188\",\"ACTED_IN\",\"[\"\"Morpheus\"\"]\" ,,,,,,,\"192\",\"188\",\"ACTED_IN\",\"[\"\"Agent Smith\"\"]\" " |
将 Cypher 查询结果导出为 CSV
apoc.export.csv.query 过程将 Cypher 查询的结果导出到 CSV 文件或作为流输出。
DIRECTED 关系以及该关系两侧带有 Person 和 Movie 标签的节点导出到文件 movies-directed.csv:WITH "MATCH path = (person:Person)-[:DIRECTED]->(movie)
RETURN person.name AS name, person.born AS born,
movie.title AS title, movie.tagline AS tagline, movie.released AS released" AS query
CALL apoc.export.csv.query(query, "movies-directed.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data;
| file | source | format | 节点 | relationships | 属性 | time | rows | batchSize | batches | done | data |
|---|---|---|---|---|---|---|---|---|---|---|---|
"movies-directed.csv" |
"statement: cols(5)" |
"csv" |
0 |
0 |
10 |
3 |
2 |
20000 |
1 |
TRUE |
NULL |
movies-directed.csv 的内容如下所示:
"name","born","role","title","tagline","released"
"Lilly Wachowski","1967","","The Matrix","Welcome to the Real World","1999"
"Lana Wachowski","1965","","The Matrix","Welcome to the Real World","1999"
DIRECTED 关系以及该关系两侧带有 Person 和 Movie 标签的节点的流:WITH "MATCH path = (person:Person)-[:DIRECTED]->(movie)
RETURN person.name AS name, person.born AS born,
movie.title AS title, movie.tagline AS tagline, movie.released AS released" AS query
CALL apoc.export.csv.query(query, null, {stream: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data;
| file | 节点 | relationships | 属性 | data |
|---|---|---|---|---|
|
|
|
|
"\"name\",\"born\",\"title\",\"tagline\",\"released\" \"Lilly Wachowski\",\"1967\",\"The Matrix\",\"Welcome to the Real World\",\"1999\" \"Lana Wachowski\",\"1965\",\"The Matrix\",\"Welcome to the Real World\",\"1999\" " |
|
您还可以压缩要导出的文件。请点击此处获取更多信息 |
当启用 bulkImport 配置时,它会创建一个可用于 Neo4j 批量导入的文件列表。
此配置仅可与 apoc.export.csv.all 和 apoc.export.csv.graph 一起使用。
所有创建的文件命名如下:
-
节点文件由输入文件的名称加上
.nodes.[LABEL_NAME].csv组成。 -
关系文件由输入文件的名称加上
.relationships.[TYPE_NAME].csv组成。
如果节点或关系有多个标签/类型,它将为每个标签/类型创建一个文件。
配置参数
该过程支持以下配置参数
| 参数 | 默认 | description(描述) |
|---|---|---|
batchSize |
20000 |
批处理大小。 |
delim |
"," |
CSV 文件的分隔符字符。 |
arrayDelim |
";" |
用于数组的分隔符字符(用于批量导入)。 |
quotes |
'always' |
用于 CSV 的引用字符,可能的值为:
|
differentiateNulls |
false |
在空字符串和 null 值之间创建区分。其行为受
|
useTypes |
false |
将类型添加到文件头中。 |
bulkImport |
true |
为 Neo4j Admin 导入创建文件。 |
timeoutSeconds |
100 |
查询在超时前应运行的最大秒数。 |
separateHeader |
false |
创建两个文件:一个用于头部,一个用于数据。 |
streamStatements |
false |
通过配置 |
stream |
false |
等同于 |
引用配置示例
可以使用不同的引用策略将图导出到 CSV 文件。
以下查询是在包含一个节点的图上执行的:
CREATE (:Quote {name: 'foo,bar,baz', array:["a","b","c"], other: 123})
不带引用导出
CALL apoc.export.csv.all("fileNoQuote.csv", {quotes: 'none'})
返回以下 CSV 文件:
_id,_labels,array,name,other,_start,_end,_type
5,:Quote,["a","b","c"],foo,bar,baz,123,,,
区分 null 值配置示例
默认情况下,空字符串和 null 值被视为相同,但可以使用 differentiateNulls 配置项更改此行为。
以下查询是在包含两个节点的图上执行的:
CREATE (:Quote {value1: "Hello", value2: ""})
CREATE (:Quote {value1: "I have a \"null\" value", value2: null})
带引用并区分 null 值导出
CALL apoc.export.csv.all("fileAlwaysQuote.csv", {quotes: 'always', differentiateNulls: true})
返回以下 CSV 文件:
"_id","_labels","value1","value2","_start","_end","_type"
"0",":Quote","Hello","",,,
"1",":Quote","I have a ""null"" value",,,,
value2 的空字符串值由引号 "" 表示,而 null 值则留空。
必要时带引用并区分 null 值导出
CALL apoc.export.csv.all("fileIfNeededQuote.csv", {quotes: 'ifNeeded', differentiateNulls: true})
返回以下 CSV 文件:
_id,_labels,value1,value2,_start,_end,_type
0,:Quote,Hello,"",,,
1,:Quote,"I have a ""null"" value",,,,
字符串 Hello 未加引号,因为它不包含任何特殊字符。字符串 I have a "null" value 被加了引号,因为它包含特殊字符。空字符串被加了引号以区别于 null 值。null 值为空。