创建作业规范文件

作业配置文件指导 Dataflow 如何运行导入（数据来源、如何映射到 Neo4j 等）。它由一个包含四个部分的 JSON 对象组成。

作业规范 JSON 框架

{
  "version": "1",
  "config": { ... },  (1)
  "sources": [  (2)
    { ... }
  ],
  "targets": [  (3)
    { ... }
  ],
  "actions": [  (4)
    { ... }
  ]
}

1	`config` —— 影响导入执行方式的全局标志（可选）
2	`sources` —— 数据源定义（关系型）
3	`targets` —— 数据目标定义（图：节点/关系/Cypher 查询）
4	`actions` —— 一次性操作（可选）

从宏观层面来看，作业从 sources 获取数据，对其进行转换并导入到 targets 中。

有效的规范文件至少包含一个源对象和一个目标对象。

完整示例

以下是一个开箱即用的作业规范文件示例，用于导入公开的 movies 数据集。

该数据集包含实体 Person（人物）和 Movie（电影），并通过 DIRECTED（导演）和 ACTED_IN（参演）关系链接在一起。换句话说，每个 Person 可能 DIRECTED 和/或 ACTED_IN 一部 Movie。实体和关系都附带了额外细节。数据源自以下文件：persons.csv, movies.csv, acted_in.csv, directed.csv。

接下来的部分将对其进行详细拆解，并提供各部分的上下文信息。我们建议在阅读本指南时对照作业规范示例。

{
  "version": "1",
  "config": {
    "reset_db": true
  },
  "sources": [
    {
      "type": "text",
      "name": "persons",
      "urls": ["gs://neo4j-examples/persons.csv"],
      "format": "excel",
      "header": ["person_tmdbId","bio","born","bornIn","died","person_imdbId","name","person_poster","person_url"]
    },
    {
      "type": "text",
      "name": "movies",
      "urls": ["gs://neo4j-examples/movies.csv"],
      "format": "excel",
      "header": ["movieId","title","budget","countries","movie_imdbId","imdbRating","imdbVotes","languages","plot","movie_poster","released","revenue","runtime","movie_tmdbId","movie_url","year","genres"]
    },
    {
      "type": "text",
      "name": "directed",
      "urls": ["gs://neo4j-examples/directed.csv"],
      "format": "excel",
      "header": ["movieId","person_tmdbId"]
    },
    {
      "type": "text",
      "name": "acted_in",
      "urls": ["gs://neo4j-examples/acted_in.csv"],
      "format": "excel",
      "header": ["movieId","person_tmdbId","role"]
    }
  ],
  "targets": {
    "nodes": [
      {
        "source": "persons",
        "name": "Persons",
        "write_mode": "merge",
        "labels": [ "Person" ],
        "properties": [
          {
            "source_field": "person_tmdbId",
            "target_property": "id",
            "target_property_type": "string"
          },
          {
            "source_field": "name",
            "target_property": "name",
            "target_property_type": "string"
          },
          {
            "source_field": "bornIn",
            "target_property": "bornLocation",
            "target_property_type": "string"
          },
          {
            "source_field": "born",
            "target_property": "bornDate",
            "target_property_type": "date"
          },
          {
            "source_field": "died",
            "target_property": "diedDate",
            "target_property_type": "date"
          }
        ],
        "schema": {
          "key_constraints": [
            {
              "name": "personIdKey",
              "label": "Person",
              "properties": ["id"]
            }
          ],
          "unique_constraints": [
            {
              "name": "personNameUnique",
              "label": "Person",
              "properties": ["name"]
            }
          ]
        }
      },
      {
        "source": "movies",
        "name": "Movies",
        "write_mode": "merge",
        "labels": [ "Movie" ],
        "properties": [
          {
            "source_field": "movieId",
            "target_property": "id",
            "target_property_type": "string"
          },
          {
            "source_field": "title",
            "target_property": "title",
            "target_property_type": "string"
          },
          {
            "source_field": "year",
            "target_property": "releaseYear",
            "target_property_type": "string"
          },
          {
            "source_field": "imdbRating",
            "target_property": "imdbRating",
            "target_property_type": "float"
          }
        ],
        "schema": {
          "key_constraints": [
            {
              "name": "movieIdKey",
              "label": "Movie",
              "properties": ["id"]
            }
          ],
          "unique_constraints": [
            {
              "name": "movieTitleUnique",
              "label": "Movie",
              "properties": ["title"]
            }
          ]
        }
      }
    ],
    "relationships": [
      {
        "source": "directed",
        "name": "Directed",
        "type": "DIRECTED",
        "write_mode": "merge",
        "node_match_mode": "match",
        "start_node_reference": "Persons",
        "end_node_reference": "Movies"
      },
      {
        "source": "acted_in",
        "name": "Acted_in",
        "type": "ACTED_IN",
        "write_mode": "merge",
        "node_match_mode": "match",
        "start_node_reference": "Persons",
        "end_node_reference": "Movies",
        "properties": [
          {
            "source_field": "role",
            "target_property": "role",
            "target_property_type": "string"
          }
        ]
      }
    ]
  }
}

配置

config 对象包含导入作业的全局配置。所有设置都有默认值，因此除非您希望更改它们，否则无需指定。

配置设置及其默认值

"config": {
  "reset_db": false,
  "index_all_properties": false,
  "node_target_batch_size": 5000,
  "relationship_target_batch_size": 1000,
  "query_target_batch_size": 1000,
  "node_target_parallelism": 10,
  "relationship_target_parallelism": 1,
  "query_target_parallelism": 1
}

reset_db (bool) —— 是否在导入前清空目标数据库。将删除数据、索引和约束。
index_all_properties (bool) —— 是否为所有属性创建索引。请参阅 Cypher^® → 搜索性能索引。
node_target_batch_size (int) —— 每个节点目标导入事务要处理的行数。
relationship_target_batch_size (int) —— 每个关系目标事务要处理的行数。
query_target_batch_size (int) —— 每个自定义查询事务要处理的行数。
node_target_parallelism (int) —— 每个 worker 节点目标的最大并发事务数。
relationship_target_parallelism (int) —— 每个 worker 关系目标的最大并发事务数。设置大于 1 的值时应小心，因为它们可能会导致死锁。
query_target_parallelism (int) —— 每个 worker 自定义 Cypher 查询目标的最大并发事务数。设置大于 1 的值时应小心，因为它们可能会导致死锁。

数据源 (Sources)

sources 部分以列表形式包含数据源的定义。作为一个粗略的准则，您可以将其理解为 一张表 <=> 一个源。导入程序将利用源提供的数据，并将其提供给目标，目标最终将其映射到 Neo4j 中。

源对象至少必须指定 type、name、urls 和 header 属性。默认的列分隔符和行分隔符根据指定的 format 设置，遵循 Apache 的 CSVFormat。

源对象规范及其默认值

{
  "type": "text",
  "name": "<sourceName>",
  "urls": [ "<csvPath1>", "<csvPath2>", ... ],
  "format": "default",
  "column_delimiter": "",
  "line_separator": "",
  "header": "<colName1>,<colName2>,..."
}

type (string) —— text。
name (string) —— 源的友好名称（在所有名称中必须唯一）。您将在规范文件的其他部分使用此名称引用该源。
urls (list of strings) —— CSV 文件的 Google Storage 位置（例如 gs://neo4j-datasets/movies.csv）。

如何获取文件的 Google Storage 位置？

要获取 Cloud 存储桶中文件的 Google Storage 位置，请通过右侧的三个点展开文件选项，然后选择 Copy gsutil URI（复制 gsutil URI）。
format (string) —— 所提供 CSV 文件的格式。
有效值包括：default、excel、informix、mongo、mongo_tsv、mysql、oracle、postgres、postgresql_csv、rfc4180。
格式行为遵循 Apache 的 CSVFormat。
column_delimiter (string) —— CSV 字段分隔符。
line_separator (string) —— CSV 行分隔符。
header (string) —— CSV 文件包含的字段名称的完整列表（按顺序排列）。或者，列表可以仅限于前几列。此处指定的列名称控制目标将映射到的行字段名称。

header 字段必须指定 CSV 包含的所有列，或者从第一列开始的连续子集。不可能指定任意的列子集。

针对列为 ID,name,title,rating 的 CSV，有效/无效的 header 示例

有效

ID,name,title,rating

有效

ID,name

有效

ID,name,title

无效

ID,rating

无效

title,rating

示例

从 persons.csv 文件导入行的源对象示例

{
  "type": "text",
  "name": "persons",
  "urls": "gs://neo4j-examples/persons.csv",
  "format": "excel",
  "header": "person_tmdbId,bio,born,bornIn,died,person_imdbId,name,person_poster,person_url"
}

目标 (Targets)

targets 部分包含导入后生成的图实体定义。

您至少必须指定一个目标对象。

Neo4j 使用节点（例如 movies, people）表示对象，并使用关系（例如 ACTED_IN, DIRECTED）将它们连接起来。targets 部分中的每个对象都将根据源数据在 Neo4j 中生成相应的实体（节点或关系）。也可以运行自定义 Cypher 查询。

目标规范框架

"targets": {
  "nodes": [ ... ],
  "relationships": [ ... ],
  "queries": [ ... ]
}

默认情况下，您无需考虑节点和关系之间的依赖关系。关系目标始终在与其起始节点和结束节点对应的目标之后处理。不过，可以将其他目标添加为依赖项。

节点对象

节点实体必须在 targets 对象内以键为 nodes 的列表中进行分组。

节点目标规范框架

"targets": {
  "nodes": [
    { <nodeSpec1> },
    { <nodeSpec2> },
    ...
  ]
}

必填字段

每个节点对象至少必须具备 source、name、labels、properties 和 write_mode 属性。

{
  "source": "<sourceName>",
  "name": "<targetName>",
  "labels": ["<label1>", "<label2>", ...],
  "properties": [
    {
      "source_field": "<bigQueryColumnName>",
      "target_field": "<neo4jPropertyName>",
      "target_property_type": "<neo4jPropertyType>"
    },
    { <propertyObj2> },
    ...
  ],
  "write_mode": "merge"
}

source (string) —— 此目标应从中提取数据的源名称。应与 sources 对象中的名称之一匹配。
name (string) —— 目标的友好名称（在所有名称中必须唯一）。
labels (list of strings) —— 用于标记节点的标签。
properties (list of objects) —— 源列与节点属性之间的映射。
target_property_type 的有效值包括：boolean、byte_array（假设为 base64 编码）、date、duration、float、integer、local_date、local_datetime、local_time、point、string、zoned_datetime、zoned_time。
write_mode (string) —— Neo4j 中的创建模式。可以是 create 或 merge。有关 Cypher 子句行为的信息，请参阅 CREATE 和 MERGE。

模式定义

您可以通过 schema 对象在导入的节点上创建索引和约束。模式设置等同于手动运行相关的 CREATE INDEX/CONSTRAINT 命令，只不过它们会在导入每种实体类型之前自动运行。

如果全局配置 index_all_properties 设置为 true，所有属性都将使用范围索引进行索引。

节点目标模式定义及其默认值

{
  ...
  "schema": {
    "enable_type_constraints": true,
    "key_constraints": [
      {
        "name": "<constraintName>",
        "label": "<label>",
        "properties": ["<neo4jPropertyName1>", "<neo4jPropertyName2>", ...],
        "options": {}
      }
    ],
    "unique_constraints": [
      {
        "name": "<constraintName>",
        "label": "<label>",
        "properties": ["<neo4jPropertyName1>", "<neo4jPropertyName2>", ...],
        "options": {}
      }
    ],
    "existence_constraints": [
      {
        "name": "<constraintName>",
        "label": "<label>",
        "property": "<neo4jPropertyName>"
      }
    ],
    "range_indexes": [
      {
        "name": "<indexName>",
        "label": "<label>",
        "properties": ["<neo4jPropertyName1>", "<neo4jPropertyName2>", ...],
      }
    ],
    "text_indexes": [
      {
        "name": "<indexName>",
        "label": "<label>",
        "property": "<neo4jPropertyName>",
        "options": {}
      }
    ],
    "point_indexes": [
      {
        "name": "<indexName>",
        "label": "<label>",
        "property": "<neo4jPropertyName>",
        "options": {}
      }
    ],
    "fulltext_indexes": [
      {
        "name": "<indexName>",
        "labels": ["label1", "label2", ...],
        "properties": ["<neo4jPropertyName1>", "<neo4jPropertyName2>", ...],
        "options": {}
      }
    ],
    "vector_indexes": [
      {
        "name": "<indexName>",
        "label": "<label>",
        "property": "<neo4jPropertyName>",
        "options": {}
      }
    ]
  }
}

每个对象的属性如下

name (string) —— 要在 Neo4j 中创建的索引或约束的名称。
label (string) 或 labels (list of strings) —— 应在其上强制执行索引或约束的标签。
property (string) 或 properties (list of strings) —— 应在其上强制执行索引或约束的属性。
options (object) —— 创建索引或约束时所使用的选项（参考每个索引和约束类型的单独页面）。如果存在，则是可选的，但对于向量索引，它是强制性的。

源数据不得为 key_constraints 列包含空值，否则它们将与节点键约束冲突。如果源数据在此方面不规范，请考虑在相关的 source.query 字段中预先清理数据，剔除不满足约束的行（例如 WHERE person_tmbdId IS NOT NULL）。或者，在源转换中使用 where 属性。

key_constraints 和 existence_constraints 选项需要 Neo4j/Aura 企业版，在 Neo4j 社区版安装中运行时不起作用。

配置

节点目标配置选项及其默认值

{
  ...
  "active": true,
  "source_transformations": {
    "enable_grouping": true
  },
  "depends_on": ["<dependencyTargetName1>", "<dependencyTargetName2>", ...]
}

active (bool) —— 是否应将此目标包含在导入中（默认：true）。
source_transformations (object) —— 如果 enable_grouping 设置为 true，导入程序将在 key_constraints 和 properties 中指定的所有字段上附加 SQL GROUP BY 子句。如果设置为 false，源中的任何重复数据都将被推送到 Neo4j 中，这可能会引发约束错误或降低插入效率。该对象还可以包含聚合函数和更多字段，请参阅源转换。
depends_on (list of strings) —— 应该在当前目标之前执行的目标 name。

示例

用于导入 Person 节点的节点对象示例

{
  "source": "persons",
  "name": "Persons",
  "labels": [ "Person" ],
  "properties": [
    {
      "source_field": "person_tmdbId",
      "target_field": "id",
      "target_property_type": "string"
    },
    {
      "source_field": "name",
      "target_field": "name",
      "target_property_type": "string"
    },
    {
      "source_field": "bornIn",
      "target_field": "bornLocation",
      "target_property_type": "string"
    },
    {
      "source_field": "born",
      "target_field": "bornDate",
      "target_property_type": "local_date"
    },
    {
      "source_field": "died",
      "target_field": "diedDate",
      "target_property_type": "local_date"
    }
  ],
  "schema": {
    "key_constraints": [
      {
        "name": "personIdKey",
        "label": "Person",
        "properties": ["id"]
      }
    ],
    "unique_constraints": [
      {
        "name": "personNameUnique",
        "label": "Person",
        "properties": ["name"]
      }
    ]
  }
}

关系对象

关系实体必须在 targets 对象内以键为 relationships 的列表中进行分组。

关系目标规范框架

"targets": {
  ...
  "relationships": [
    { <relationshipSpec1> },
    { <relationshipSpec2> },
    ...
  ]
}

必填字段

每个关系对象至少必须具备 source、name、type、start_node_reference、end_node_reference、node_match_mode 和 write_mode 属性。

{
  "source": "<sourceName>",
  "name": "<targetName>",
  "type": "<relationshipType>",
  "start_node_reference": "<nodeTarget>",
  "end_node_reference": "<nodeTarget>",
  "node_match_mode": "<match/merge>",
  "write_mode": "<create/merge>"
}

source (string) —— 此目标应从中提取数据的源名称。应与 sources 对象中的名称之一匹配。
name (string) —— 目标的友好名称（在所有名称中必须唯一）。
type (string) —— 分配给该关系的类型。
node_match_mode (string) —— 在创建关系之前，使用哪个 Cypher 子句来获取源/结束节点。有效值为 match 或 merge，分别对应 Cypher 子句 MATCH 和 MERGE。
write_mode (string) —— Neo4j 中的创建模式。可以是 create 或 merge。有关 Cypher 子句行为的信息，请参阅 CREATE 和 MERGE。

start/end_node_reference 属性包含有关该关系链接哪些节点目标的信息。您可以以两种方式指定它们。

  "start_node_reference": "<nodeTargetName>",
  "end_node_reference": "<nodeTargetName>",

start_node_reference (string) —— 作为关系起始节点的节点目标名称。
end_node_reference (string) —— 作为关系结束节点的节点目标名称。

示例

  "start_node_reference": "Persons",
  "end_node_reference": "Movies",

  "start_node_reference": {
    "name": "<nodeTargetName>",
    "key_mappings": [
      {
        "source_field": "<sourceMappingKey>",
        "node_property": "<nodeTargetMappingKey>"
      }
    ]
  },
  "end_node_reference": {
    "name": "<nodeTargetName>",
    "key_mappings": [
      {
        "source_field": "<sourceMappingKey>",
        "node_property": "<nodeTargetMappingKey>"
      }
    ]
  },

start_node_reference (object) —— 作为关系起始节点的节点目标名称，以及源中作为键的列名 (source_field) 和节点目标中作为键的导入属性 (node_property)。
end_node_reference (object) —— 作为关系结束节点的节点目标名称，以及源中作为键的列名 (source_field) 和节点目标中作为键的导入属性 (node_property)。

示例

  "start_node_reference": {
    "name": "Persons",
    "key_mappings": [
      {
        "source_field": "person_tmdbId",
        "node_property": "id"
      }
    ]
  },
  "end_node_reference": {
    "name": "Movies",
    "key_mappings": [
      {
        "source_field": "movieId",
        "node_property": "id"
      }
    ]
  },

您可以在 key_mappings 中列出多个对象（每个对象结构相同）以处理复合键。

属性

关系也可以将源列映射为属性。

{
  ...
  "properties": [
    {
      "source_field": "<bigQueryColumnName>",
      "target_field": "<neo4jPropertyName>",
      "target_property_type": "<neo4jPropertyType>"
    },
    { <propertyObj2> },
    ...
  ]
}

properties (list of objects) —— 源列与关系属性之间的映射。
target_property_type 的有效值包括：boolean、byte_array（假设为 base64 编码）、date、duration、float、integer、local_date、local_datetime、local_time、point、string、zoned_datetime、zoned_time。

模式定义

您可以通过 schema 对象在导入的关系上创建索引和约束。模式设置等同于手动运行相关的 CREATE INDEX/CONSTRAINT 命令，只不过它们会在导入每种关系类型之前自动运行。

如果全局配置 index_all_properties 设置为 true，所有属性都将使用范围索引进行索引。

关系目标模式定义及其默认值

{
  ...
  "schema": {
    "enable_type_constraints": true,
    "key_constraints": [
      {
        "name": "<constraintName>",
        "type": "<relationshipType>",
        "properties": ["<neo4jPropertyName1>", "<neo4jPropertyName2>", ...],
        "options": {}
      }
    ],
    "unique_constraints": [
      {
        "name": "<constraintName>",
        "type": "<relationshipType>",
        "properties": ["<neo4jPropertyName1>", "<neo4jPropertyName2>", ...],
        "options": {}
      }
    ],
    "existence_constraints": [
      {
        "name": "<constraintName>",
        "type": "<relationshipType>",
        "property": "<neo4jPropertyName>"
      }
    ],
    "range_indexes": [
      {
        "name": "<indexName>",
        "type": "<relationshipType>",
        "properties": ["<neo4jPropertyName1>", "<neo4jPropertyName2>", ...],
      }
    ],
    "text_indexes": [
      {
        "name": "<indexName>",
        "type": "<relationshipType>",
        "property": "<neo4jPropertyName>",
        "options": {}
      }
    ],
    "point_indexes": [
      {
        "name": "<indexName>",
        "type": "<relationshipType>",
        "property": "<neo4jPropertyName>",
        "options": {}
      }
    ],
    "fulltext_indexes": [
      {
        "name": "<indexName>",
        "types": ["<relationshipType1>", "<relationshipType2>", ...],
        "properties": ["<neo4jPropertyName1>", "<neo4jPropertyName2>", ...],
        "options": {}
      }
    ],
    "vector_indexes": [
      {
        "name": "<indexName>",
        "type": "<relationshipType>",
        "property": "<neo4jPropertyName>",
        "options": {}
      }
    ]
  }
}

每个对象的属性如下

name (string) —— 要在 Neo4j 中创建的索引或约束的名称。
type (string) —— 应在其上强制执行索引或约束的类型。
property (string) 或 properties (list of strings) —— 应在其上强制执行索引或约束的属性。
options (object) —— 创建索引或约束时所使用的选项（参考每个索引和约束类型的单独页面）。如果存在，则是可选的，但对于向量索引，它是强制性的。

源数据不得为 key_constraints 列包含空值，否则它们将与关系键约束冲突。如果源数据在此方面不规范，请考虑在相关的 source.query 字段中预先清理数据，剔除不满足约束的行（例如 WHERE person_tmbdId IS NOT NULL）。或者，在源转换中使用 where 属性。

key_constraints 和 existence_constraints 选项需要 Neo4j/Aura 企业版，在 Neo4j 社区版安装中运行时不起作用。

配置

关系目标配置选项及其默认值

{
  ...
  "active": true,
  "source_transformations": {
    "enable_grouping": true
  },
  "depends_on": ["<dependencyTargetName1>", "<dependencyTargetName2>", ...]
}

active (bool) —— 是否应将此目标包含在导入中。
source_transformations (object) —— 如果 enable_grouping 设置为 true，导入程序将在 key_constraints 和 properties 中指定的所有字段上执行 SQL GROUP BY。如果设置为 false，源中的任何重复数据都将被推送到 Neo4j 中，这可能会引发约束错误或降低插入效率。该对象还可以包含聚合函数和更多字段，请参阅源转换。
depends_on (list of strings) —— 应该在当前目标之前执行的目标 name。

示例

用于导入 ACTED_IN 关系的关系对象示例

{
  "source": "acted_in",
  "name": "Acted_in",
  "type": "ACTED_IN",
  "write_mode": "merge",
  "node_match_mode": "match",
  "start_node_reference": "Persons",
  "end_node_reference": "Movies",
  "properties": [
    {
      "source_field": "role",
      "target_field": "role",
      "target_property_type": "string"
    }
  ]
}

自定义查询目标

当导入需要不适合节点/关系目标格式的复杂查询时，自定义查询目标非常有用。查询目标通过变量 $rows 接收成批的行数据。

自定义查询必须在 targets 对象内以键为 queries 的列表中进行分组。

查询目标规范框架

"targets": {
  ...
  "queries": [
    { <querySpec1> },
    { <querySpec2> },
    ...
  ]
}

不要使用自定义查询运行不直接依赖于源的 Cypher；请改用 actions。一次性查询（尤其是如果不是幂等的）不适合在自定义查询目标中使用。原因是目标中的查询是分批运行的，因此根据从源中提取的 $rows 批次数量，自定义查询可能会运行多次。

必填字段

每个查询目标至少必须具备 source、name 和 query 属性。

{
  "source": "<sourceName>",
  "name": "<targetName>",
  "query": "<cypherQuery>"
}

source (string) —— 此目标应从中提取数据的源名称。应与 sources 对象中的名称之一匹配。
name (string) —— 目标的友好名称（在所有名称中必须唯一）。
query (string) —— Cypher 查询。源数据作为列表在参数 $rows 中可用。

配置

查询目标配置选项及其默认值

{
  ...
  "active": true,
  "depends_on": ["<dependencyTargetName1>", "<dependencyTargetName2>", ...]
}

active (bool) —— 是否应将此目标包含在导入中。
depends_on (list of strings) —— 应该在当前目标之前执行的目标 name。

示例

用于导入 Person 节点并在创建时设置日期的查询对象示例

{
  "custom_query": {
    "name": "Person nodes",
    "source": "persons",
    "query": "UNWIND $rows AS row WHERE row.person_tmdbId IS NOT NULL MERGE (p:Person {id: row.person_tmdbId, name: row.name, born_in: row.bornIn, born: date(row.born), died: date(row.died)}) ON CREATE SET p.created_time=datetime()"
  }
}

源转换

每个节点和关系目标都可以可选地具有包含聚合函数的 source_transformation 属性。这对于从更细粒度的源中提取更高维度的信息非常有用。聚合会生成额外的字段，这些字段可用于属性映射。

"source_transformations": {
  "enable_grouping": true,
  "aggregations": [ {
    "expression": "",
    "field_name": ""
   },
   { aggregationObj2 }, ...
  ],
  "limit": -1,
  "where": "",
  "order_by": [
    {
      "expression": "column_name",
      "order": "<asc/desc>"
    },
    { orderObj2 }, ...
  ],
}

enable_grouping (bool) —— 必须为 true，aggregations/where 才能生效。
aggregations (list of objects) —— 聚合在 expression 属性中指定为 SQL 查询，结果以 field_name 中指定的名称作为源列提供。
limit (int) —— 限制考虑导入的源行数（默认为无限制，编码为 -1）。
where (string) —— 在导入前过滤源数据（使用 SQL WHERE 子句格式）。
order_by (list of objects) —— 对源执行排序。

示例

基于虚拟数据集的转换对象示例

{
  "enable_grouping": true,
  "aggregations": [
    {
      "expression": "SUM(unit_price*quantity)",
      "field_name": "total_amount_sold"
    },
    {
      "expression": "SUM(quantity)",
      "field_name": "total_quantity_sold"
    }
  ],
  "limit": 50,
  "where": "sourceId IS NOT NULL"
}

操作 (Actions)

actions 部分包含可在导入过程的特定步骤之前或之后运行的命令。每个步骤都称为一个 stage。例如，您可以在步骤完成时提交 HTTP 请求、在源上执行 SQL 查询或在 Neo4j 目标实例上运行 Cypher 语句。

操作规范框架

  ...
  "actions": [
    { <actionSpec1> },
    { <actionSpec2> },
    ...
  ]

每个操作对象至少必须具备 name、type 和 stage 属性。其他属性取决于操作类型。

{
  "type": "http",
  "name": "<actionName>",
  "stage": "<stageName>",
  "method": "<get/post>",
  "url": "<targetUrl>",
  "headers": {}
}

type (string) —— 操作类型。
name (string) —— 操作的友好名称（在所有名称中必须唯一）。
stage (string) —— 操作应在导入的哪个阶段运行。有效值为：start、post_sources、pre_nodes、post_nodes、pre_relationships、post_relationships、pre_queries、post_queries、end。
method (string) —— HTTP 方法；可以是 get 或 post。
url (string) —— HTTP 请求应指向的 URL。
headers (object, optional) —— 请求头。

在导入完成后发送 GET 请求的操作示例

{
  "type": "http",
  "name": "Post load ping",
  "stage": "end",
  "method": "get",
  "url": "/success",
  "headers": {
    "secret": "314159",
    "moreSecret": "17320"
  }
}

{
  "type": "cypher",
  "name": "<actionName>",
  "stage": "<stageName>",
  "query": "<cypherQuery>",
  "execution_mode": "<transaction/autocommit>"
}

type (string) —— 操作类型。
name (string) —— 操作的友好名称（在所有名称中必须唯一）。
stage (string) —— 操作应在导入的哪个阶段运行。有效值为：start、post_sources、pre_nodes、post_nodes、pre_relationships、post_relationships、pre_queries、post_queries、end。
query (string) —— 要运行的 Cypher 查询。
execution_mode (string, optional) —— 查询应在什么模式下执行。有效值为 transaction、autocommit（默认：transaction）。

在导入完成后创建 importJob 节点的操作示例

{
  "type": "cypher",
  "name": "Post load log",
  "stage": "end",
  "query": "MERGE (:importJob {date: datetime()})"
}

{
  "type": "bigquery",
  "name": "<actionName>",
  "stage": "<stageName>",
  "sql": "<sqlQuery>"
}

type (string) —— 操作类型。
name (string) —— 操作的友好名称（在所有名称中必须唯一）。
stage (string) —— 操作应在导入的哪个阶段运行。有效值为：start、post_sources、pre_nodes、post_nodes、pre_relationships、post_relationships、pre_queries、post_queries、end。
sql (string) —— 要运行的 SQL 查询。

在导入完成后发送 GET 请求的操作示例

{
  "type": "bigquery",
  "name": "Post load log",
  "stage": "end",
  "sql": "INSERT INTO logs.imports (time) VALUES (NOW())"
}

变量

暂不支持变量。