apoc.import.graphml过程
|
从本地文件导入数据需要在 apoc.conf 中设置 |
语法 |
|
||
描述 |
从提供的 GraphML 文件导入图。 |
||
输入参数 |
名称 |
类型 |
描述 |
|
|
用于导入数据的文件名或二进制数据。 |
|
|
|
|
|
返回参数 |
名称 |
类型 |
描述 |
|
|
导入数据所用的文件名。 |
|
|
|
导入数据的来源:"file"(文件)、"binary"(二进制)或 "file/binary"。 |
|
|
|
文件格式:["csv", "graphml", "json"]。 |
|
|
|
导入的节点数量。 |
|
|
|
导入的关系数量。 |
|
|
|
导入的属性数量。 |
|
|
|
导入持续时间。 |
|
|
|
返回的行数。 |
|
|
|
导入过程中使用的批处理大小。 |
|
|
|
导入过程中执行的批次数。 |
|
|
|
导入是否成功完成。 |
|
|
|
导入返回的数据。 |
|
配置参数
该过程支持以下配置参数
| 名称 | 类型 | 默认 | 描述 |
|---|---|---|---|
|
|
false |
根据 |
|
|
RELATED |
如果 GraphML 文件中未指定关系类型,则使用的默认关系类型 |
|
|
false |
存储 |
|
|
20000 |
每个事务处理的元素数量 |
|
|
|
允许接收二进制数据,可以是未压缩的(值: |
|
|
空映射 |
见下文 |
|
|
空映射 |
见下文 |
源/目标配置
当源节点和/或目标节点不在文件中时,允许导入关系,通过自定义标签和属性搜索节点。为此,我们可以在配置映射中插入 source: {label: '<MY_SOURCE_LABEL>', id: ’<MY_SOURCE_ID>'}` 和/或 source: {label: '<MY_TARGET_LABEL>', id: ’<MY_TARGET_ID>'}`。通过这种方式,我们可以通过 edge 标签的 source 和 target 属性来搜索开始和结束节点。
例如,配置映射为 {source: {id: 'myId', label: 'Foo'}, target: {id: 'other', label: 'Bar'}},且边行记录为 <edge id="e0" source="n0" target="n1" label="KNOWS"><data key="label">KNOWS</data></edge> 时,我们会搜索源节点 (:Foo {myId: 'n0'}) 和结束节点 (:Bar {other: 'n1'})。id 键是可选的(默认值为 'id')。
输出参数
| 名称 | 类型 |
|---|---|
file |
STRING |
source |
STRING |
format |
STRING |
节点 |
INTEGER(整数) |
relationships |
INTEGER(整数) |
属性 |
INTEGER(整数) |
time |
INTEGER(整数) |
rows |
INTEGER(整数) |
batchSize |
INTEGER(整数) |
batches |
INTEGER(整数) |
done |
布尔值 (BOOLEAN) |
data |
STRING |
从文件读取
默认情况下,禁止从文件系统导入。我们可以通过在 apoc.conf 中设置以下属性来启用它
apoc.import.file.enabled=true
如果我们尝试在未先设置此属性的情况下使用任何导入过程,将会收到以下错误消息
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf |
导入文件从 import 目录读取,该目录由 server.directories.import 属性定义。这意味着我们提供的任何文件路径都是相对于此目录的。如果我们尝试从绝对路径(例如 /tmp/filename)读取,将会收到类似于以下的错误消息
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Can’t read url or key file:/path/to/neo4j/import/tmp/filename as json: /path/to/neo4j//import/tmp/filename (No such file or directory) |
我们可以通过在 apoc.conf 中设置以下属性来启用从文件系统任意位置读取文件
apoc.import.file.use_neo4j_config=false
|
Neo4j 现在可以从文件系统的任何位置读取,因此在设置此属性之前,请确保这是您的意图。 |
使用示例
导入简单的 GraphML 文件
simple.graphml 文件包含了来自 GraphML 入门指南的图形表示。
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="https://w3org.cn/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="undirected">
<node id="n0"/>
<node id="n1"/>
<node id="n2"/>
<node id="n3"/>
<node id="n4"/>
<node id="n5"/>
<node id="n6"/>
<node id="n7"/>
<node id="n8"/>
<node id="n9"/>
<node id="n10"/>
<edge source="n0" target="n2"/>
<edge source="n1" target="n2"/>
<edge source="n2" target="n3"/>
<edge source="n3" target="n5"/>
<edge source="n3" target="n4"/>
<edge source="n4" target="n6"/>
<edge source="n6" target="n5"/>
<edge source="n5" target="n7"/>
<edge source="n6" target="n8"/>
<edge source="n8" target="n7"/>
<edge source="n8" target="n9"/>
<edge source="n8" target="n10"/>
</graph>
</graphml>
simple.graphml 导入图形CALL apoc.import.graphml("http://graphml.graphdrawing.org/primer/simple.graphml", {})
如果我们运行此查询,将看到以下输出
| file | source | format | 节点 | relationships | 属性 | time | rows | batchSize | batches | done | data |
|---|---|---|---|---|---|---|---|---|---|---|---|
"http://graphml.graphdrawing.org/primer/simple.graphml" |
"file" |
"graphml" |
11 |
12 |
0 |
618 |
0 |
-1 |
0 |
TRUE |
NULL |
我们也可以将 simple.graphml 复制到 Neo4j 的 import 目录中,并从那里导入文件。
然后,我们可以通过以下方式运行导入过程
simple.graphml 导入图形CALL apoc.import.graphml("file://simple.graphml", {})
下方的 Neo4j Browser 可视化显示了导入的图
导入由 Export GraphML 过程创建的 GraphML 文件
movies.graphml 包含了 Neo4j 电影图谱的一个子集,由 Export GraphML 过程生成。
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="https://w3org.cn/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="born" for="node" attr.name="born"/>
<key id="name" for="node" attr.name="name"/>
<key id="tagline" for="node" attr.name="tagline"/>
<key id="label" for="node" attr.name="label"/>
<key id="title" for="node" attr.name="title"/>
<key id="released" for="node" attr.name="released"/>
<key id="roles" for="edge" attr.name="roles"/>
<key id="label" for="edge" attr.name="label"/>
<graph id="G" edgedefault="directed">
<node id="n188" labels=":Movie"><data key="labels">:Movie</data><data key="title">The Matrix</data><data key="tagline">Welcome to the Real World</data><data key="released">1999</data></node>
<node id="n189" labels=":Person"><data key="labels">:Person</data><data key="born">1964</data><data key="name">Keanu Reeves</data></node>
<node id="n190" labels=":Person"><data key="labels">:Person</data><data key="born">1967</data><data key="name">Carrie-Anne Moss</data></node>
<node id="n191" labels=":Person"><data key="labels">:Person</data><data key="born">1961</data><data key="name">Laurence Fishburne</data></node>
<node id="n192" labels=":Person"><data key="labels">:Person</data><data key="born">1960</data><data key="name">Hugo Weaving</data></node>
<node id="n193" labels=":Person"><data key="labels">:Person</data><data key="born">1967</data><data key="name">Lilly Wachowski</data></node>
<node id="n194" labels=":Person"><data key="labels">:Person</data><data key="born">1965</data><data key="name">Lana Wachowski</data></node>
<node id="n195" labels=":Person"><data key="labels">:Person</data><data key="born">1952</data><data key="name">Joel Silver</data></node>
<edge id="e267" source="n189" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Neo"]</data></edge>
<edge id="e268" source="n190" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Trinity"]</data></edge>
<edge id="e269" source="n191" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Morpheus"]</data></edge>
<edge id="e270" source="n192" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Agent Smith"]</data></edge>
<edge id="e271" source="n193" target="n188" label="DIRECTED"><data key="label">DIRECTED</data></edge>
<edge id="e272" source="n194" target="n188" label="DIRECTED"><data key="label">DIRECTED</data></edge>
<edge id="e273" source="n195" target="n188" label="PRODUCED"><data key="label">PRODUCED</data></edge>
</graph>
</graphml>
movies.graphml 导入图形CALL apoc.import.graphml("movies.graphml", {})
如果我们运行此查询,将看到以下输出
| file | source | format | 节点 | relationships | 属性 | time | rows | batchSize | batches | done | data |
|---|---|---|---|---|---|---|---|---|---|---|---|
"movies.graphml" |
"file" |
"graphml" |
8 |
7 |
36 |
23 |
0 |
-1 |
0 |
TRUE |
NULL |
我们可以运行以下查询来查看导入的图形
MATCH p=()-->()
RETURN p
| p |
|---|
({name: "Laurence Fishburne", born: "1961", labels: ":Person"})-[:ACTED_IN {roles: "[\"Morpheus\"]", label: "ACTED_IN"}]→({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"}) |
({name: "Carrie-Anne Moss", born: "1967", labels: ":Person"})-[:ACTED_IN {roles: "[\"Trinity\"]", label: "ACTED_IN"}]→({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"}) |
({name: "Lana Wachowski", born: "1965", labels: ":Person"})-[:DIRECTED {label: "DIRECTED"}]→({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"}) |
({name: "Joel Silver", born: "1952", labels: ":Person"})-[:PRODUCED {label: "PRODUCED"}]→({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"}) |
({name: "Lilly Wachowski", born: "1967", labels: ":Person"})-[:DIRECTED {label: "DIRECTED"}]→({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"}) |
({name: "Keanu Reeves", born: "1964", labels: ":Person"})-[:ACTED_IN {roles: "[\"Neo\"]", label: "ACTED_IN"}]→({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"}) |
({name: "Hugo Weaving", born: "1960", labels: ":Person"})-[:ACTED_IN {roles: "[\"Agent Smith\"]", label: "ACTED_IN"}]→({tagline: "Welcome to the Real World", title: "The Matrix", released: "1999", labels: ":Movie"}) |
GraphML 文件中定义的标签已被添加到每个节点的 labels 属性中,而不是作为原生节点标签添加。我们可以设置配置属性 readLabels: true 来导入原生标签
movies.graphml 导入图形并存储节点标签CALL apoc.import.graphml("movies.graphml", {readLabels: true})
| file | source | format | 节点 | relationships | 属性 | time | rows | batchSize | batches | done | data |
|---|---|---|---|---|---|---|---|---|---|---|---|
"movies.graphml" |
"file" |
"graphml" |
8 |
7 |
21 |
23 |
0 |
-1 |
0 |
TRUE |
NULL |
现在让我们重新运行查询以查看导入的图形
MATCH p=()-->()
RETURN;
| p |
|---|
(:Person {name: "Lilly Wachowski", born: "1967"})-[:DIRECTED]→(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"}) |
(:Person {name: "Carrie-Anne Moss", born: "1967"})-[:ACTED_IN {roles: "[\"Trinity\"]"}]→(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"}) |
(:Person {name: "Hugo Weaving", born: "1960"})-[:ACTED_IN {roles: "[\"Agent Smith\"]"}]→(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"}) |
(:Person {name: "Laurence Fishburne", born: "1961"})-[:ACTED_IN {roles: "[\"Morpheus\"]"}]→(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"}) |
(:Person {name: "Keanu Reeves", born: "1964"})-[:ACTED_IN {roles: "[\"Neo\"]"}]→(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"}) |
(:Person {name: "Joel Silver", born: "1952"})-[:PRODUCED]→(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"}) |
(:Person {name: "Lana Wachowski", born: "1965"})-[:DIRECTED]→(:Movie {tagline: "Welcome to the Real World", title: "The Matrix", released: "1999"}) |
二进制文件
您还可以从二进制 byte[](未压缩)或压缩文件中导入(允许的压缩算法有:GZIP、BZIP2、DEFLATE、BLOCK_LZ4、FRAMED_SNAPPY)。
CALL apoc.import.graphml(`binaryGzipByteArray`, {compression: 'GZIP'})
或
CALL apoc.import.graphml(`binaryFileNotCompressed`, {compression: 'NONE'})
例如,这可以很好地与 apoc.util.compress 函数配合使用
WITH apoc.util.compress('<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="https://w3org.cn/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="directed">
<node id="n0"> <data key="labels">:FOO</data><data key="name">foo</data> </node>
<node id="n1"> <data key="labels">:BAR</data><data key="name">bar</data> <data key="kids">[a,b,c]</data> </node>
<edge id="e0" source="n0" target="n1"> <data key="label">:EDGE_LABEL</data> <data key="name">foo</data> </edge>
</graph>
</graphml>', {compression: 'DEFLATE'}) as xmlCompressed
CALL apoc.import.graphml(xmlCompressed, {compression: 'DEFLATE'})
YIELD source, format, nodes, relationships, properties
RETURN source, format, nodes, relationships, properties
| source | format | 节点 | relationships | 属性 |
|---|---|---|---|---|
"binary" |
"graphml" |
2 |
1 |
7 |
分离式 GraphML 文件的往返操作
使用此数据集
CREATE (f:Foo:Foo2:Foo0 {name:'foo', born:Date('2018-10-10'), place:point({ longitude: 56.7, latitude: 12.78, height: 100 })})-[:KNOWS]->(b:Bar {name:'bar',age:42, place:point({ longitude: 56.7, latitude: 12.78})});
CREATE (:Foo {name: 'zzz'})-[:KNOWS]->(:Bar {age: 0});
CREATE (:Foo {name: 'aaa'})-[:KNOWS {id: 1}]->(:Bar {age: 666});
我们可以执行这 3 个导出查询
// Foo nodes
call apoc.export.graphml.query('MATCH (start:Foo)-[:KNOWS]->(:Bar) RETURN start', 'queryNodesFoo.graphml', {useTypes: true});
// Bar nodes
call apoc.export.graphml.query('MATCH (:Foo)-[:KNOWS]->(end:Bar) RETURN end', 'queryNodesBar.graphml', {useTypes: true});
// KNOWS rels
MATCH (:Foo)-[rel:KNOWS]->(:Bar)
WITH collect(rel) as rels
call apoc.export.graphml.data([], rels, 'queryRelationship.graphml', {useTypes: true})
YIELD nodes, relationships RETURN nodes, relationships;
在这种情况下,我们将获得这 3 个文件:.queryNodesFoo.graphml
<?xml version='1.0' encoding='UTF-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="https://w3org.cn/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="born" for="node" attr.name="born" attr.type="string"/>
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="place" for="node" attr.name="place" attr.type="string"/>
<key id="labels" for="node" attr.name="labels" attr.type="string"/>
<graph id="G" edgedefault="directed">
<node id="n0" labels=":Foo:Foo0:Foo2"><data key="labels">:Foo:Foo0:Foo2</data><data key="born">2018-10-10</data><data key="name">foo</data><data key="place">{"crs":"wgs-84-3d","latitude":12.78,"longitude":56.7,"height":100.0}</data></node>
<node id="n3" labels=":Foo"><data key="labels">:Foo</data><data key="name">zzz</data></node>
<node id="n5" labels=":Foo"><data key="labels">:Foo</data><data key="name">aaa</data></node>
</graph>
</graphml>
<?xml version='1.0' encoding='UTF-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="https://w3org.cn/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="place" for="node" attr.name="place" attr.type="string"/>
<key id="age" for="node" attr.name="age" attr.type="long"/>
<key id="labels" for="node" attr.name="labels" attr.type="string"/>
<graph id="G" edgedefault="directed">
<node id="n1" labels=":Bar"><data key="labels">:Bar</data><data key="name">bar</data><data key="age">42</data><data key="place">{"crs":"wgs-84","latitude":12.78,"longitude":56.7,"height":null}</data></node>
<node id="n4" labels=":Bar"><data key="labels">:Bar</data><data key="age">0</data></node>
<node id="n6" labels=":Bar"><data key="labels">:Bar</data><data key="age">666</data></node>
</graph>
</graphml>
<?xml version='1.0' encoding='UTF-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="https://w3org.cn/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="label" for="edge" attr.name="label" attr.type="string"/>
<key id="id" for="edge" attr.name="id" attr.type="long"/>
<graph id="G" edgedefault="directed">
<edge id="e0" source="n0" target="n1" label="KNOWS"><data key="label">KNOWS</data></edge>
<edge id="e1" source="n3" target="n4" label="KNOWS"><data key="label">KNOWS</data></edge>
<edge id="e2" source="n5" target="n6" label="KNOWS"><data key="label">KNOWS</data><data key="id">1</data></edge>
</graph>
</graphml>
因此,我们可以在另一个数据库中通过以下查询方式进行导入,以重现原始数据集
CALL apoc.import.graphml('queryNodesFoo.graphml', {readLabels: true, storeNodeIds: true});
CALL apoc.import.graphml('queryNodesBar.graphml', {readLabels: true, storeNodeIds: true});
CALL apoc.import.graphml('queryRelationship.graphml', {readLabels: true, source: {label: 'Foo'}, target: {label: 'Bar'}});
请注意,我们必须先执行节点的导入,并使用 useTypes: true 将 node 标签的 id 属性作为属性导入,并使用 readLabels 来填充节点的标签。
使用自定义属性键
另外,我们可以利用自定义属性并避免导入 id 属性(通过 useTypes:true),操作如下(使用与之前相同的数据集和节点导出查询)
// KNOWS rels
MATCH (:Foo)-[rel:KNOWS]->(:Bar)
WITH collect(rel) as rels
call apoc.export.graphml.data([], rels, 'queryRelationship.graphml',
{useTypes: true, source: {id: 'name'}, label: {id: 'age'}})
YIELD nodes, relationships RETURN nodes, relationships;
强烈建议使用唯一约束以确保唯一性,因此本例中针对标签 Foo 和属性 name,以及标签 Bar 和属性 age 设置约束
上述查询生成此关系文件
<?xml version='1.0' encoding='UTF-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="https://w3org.cn/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="label" for="edge" attr.name="label" attr.type="string"/>
<key id="id" for="edge" attr.name="id" attr.type="long"/>
<graph id="G" edgedefault="directed">
<edge id="e0" source="foo" sourceType="string" target="42" targetType="long" label="KNOWS"><data key="label">KNOWS</data></edge>
<edge id="e1" source="zzz" sourceType="string" target="0" targetType="long" label="KNOWS"><data key="label">KNOWS</data></edge>
<edge id="e2" source="aaa" sourceType="string" target="666" targetType="long" label="KNOWS"><data key="label">KNOWS</data><data key="id">1</data></edge>
</graph>
</graphml>
最后,我们可以使用与上述相同的 id(名称和年龄)导入文件
CALL apoc.import.graphml('queryNodesFoo.graphml', {readLabels: true});
CALL apoc.import.graphml('queryNodesBar.graphml', {readLabels: true});
CALL apoc.import.graphml('queryRelationship.graphml',
{readLabels: true, source: {label: 'Foo', id: 'name'}, target: {label: 'Bar', id: 'age'}});
导入带有 NaN 值的 GraphML 文件
某些程序导出的属性带有 NaN 值。自 APOC 2025.07 起,现在可以处理这种情况。NaN 值在 Cypher 中属于 FLOAT 类型,因此所有导入 NaN 值的数字类型都将被表示为 FLOAT。
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="https://w3org.cn/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="age" for="node" attr.name="age" attr.type="long"/>
<key id="height" for="node" attr.name="height" attr.type="int"/>
<key id="weight" for="node" attr.name="weight" attr.type="double"/>
<key id="shoeSize" for="node" attr.name="shoeSize" attr.type="float"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="age">27</data>
<data key="height">175</data>
<data key="weight">71.2</data>
<data key="shoeSize">43.5</data>
</node>
<node id="n1">
<data key="age">nan</data>
<data key="height">nan</data>
<data key="weight">nan</data>
<data key="shoeSize">nan</data>
</node>
</graph>
</graphml>
首先,使用以下命令导入图形
CALL apoc.import.graphml('nanExample.graphml',{readLabels: true, storeNodeIds:true})
然后使用以下命令查看类型和值
MATCH (a)
WITH a.age AS age, a.height AS height, a.weight AS weight, a.shoeSize AS shoeSize
RETURN
age, valueType(age) AS ageType,
height, valueType(height) AS heightType,
weight, valueType(weight) AS weightType,
shoeSize, valueType(shoeSize) AS shoeSizeType
| age | ageType | height | heightType | weight | weightType | shoeSize | shoeSizeType |
|---|---|---|---|---|---|---|---|
NaN |
"FLOAT NOT NULL" |
NaN |
"FLOAT NOT NULL" |
NaN |
"FLOAT NOT NULL" |
NaN |
"FLOAT NOT NULL" |
27 |
"INTEGER NOT NULL" |
175 |
"INTEGER NOT NULL" |
71.2 |
"FLOAT NOT NULL" |
43.5 |
"FLOAT NOT NULL" |