Weaviate

以下是所有可用的 Weaviate 过程列表。请注意，该列表及其过程签名与其他向量数据库（如 Qdrant）的过程保持一致。

名称 (name) description（描述）

名称 (name)	description（描述）
apoc.vectordb.weaviate.info($host, $collectionName, $config)	获取指定集合的信息；如果集合不存在，则抛出 FileNotFoundException 异常。
apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config)	创建一个名为第 2 个参数所指定的集合，并使用指定的 `similarity`（相似度）和 `size`（大小）。默认端点为 `<hostOrKey param>/schema`。
apoc.vectordb.weaviate.deleteCollection(hostOrKey, collection, $config)	删除第 2 个参数所指定的集合。默认端点为 `<hostOrKey param>/schema/<collection param>`。
apoc.vectordb.weaviate.upsert(hostOrKey, collection, vectors, $config)	在第 2 个参数指定的集合中插入或更新向量 [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]。默认端点为 `<hostOrKey param>/objects`。
apoc.vectordb.weaviate.delete(hostOrKey, collection, ids, $config)	删除具有指定 `ids` 的向量。默认端点为 `<hostOrKey param>/schema`。
apoc.vectordb.weaviate.get(hostOrKey, collection, ids, $config)	获取具有指定 `ids` 的向量。默认端点为 `<hostOrKey param>/schema`。
apoc.vectordb.weaviate.query(hostOrKey, collection, vector, filter, limit, $config)	在第 2 个参数指定的集合中，检索与定义的 `vector` 最接近的 `limit` 条结果。请注意，除了常规配置参数外，此过程还需要一个 `field: [listOfProperty]` 配置，以定义从后台运行的 GraphQL 中检索哪些属性。默认端点为 `<hostOrKey param>/graphql`。
apoc.vectordb.weaviate.getAndUpdate(hostOrKey, collection, ids, $config)	获取具有指定 `ids` 的向量，并可选择创建/更新 Neo4j 实体。默认端点为 `<hostOrKey param>/schema`。
apoc.vectordb.weaviate.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config)	在第 2 个参数指定的集合中，检索与定义的 `vector` 最接近的 `limit` 条结果，并可选择创建/更新 Neo4j 实体。请注意，除了常规配置参数外，此过程还需要一个 `field: [listOfProperty]` 配置，以定义从后台运行的 GraphQL 中检索哪些属性。默认端点为 `<hostOrKey param>/graphql`。

apoc.vectordb.weaviate.info($host, $collectionName, $config)

获取指定集合的信息；如果集合不存在，则抛出 FileNotFoundException 异常。

apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config)

创建一个名为第 2 个参数所指定的集合，并使用指定的 similarity（相似度）和 size（大小）。默认端点为 <hostOrKey param>/schema。

apoc.vectordb.weaviate.deleteCollection(hostOrKey, collection, $config)

删除第 2 个参数所指定的集合。默认端点为 <hostOrKey param>/schema/<collection param>。

apoc.vectordb.weaviate.upsert(hostOrKey, collection, vectors, $config)

在第 2 个参数指定的集合中插入或更新向量 [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]。默认端点为 <hostOrKey param>/objects。

apoc.vectordb.weaviate.delete(hostOrKey, collection, ids, $config)

删除具有指定 ids 的向量。默认端点为 <hostOrKey param>/schema。

apoc.vectordb.weaviate.get(hostOrKey, collection, ids, $config)

获取具有指定 ids 的向量。默认端点为 <hostOrKey param>/schema。

apoc.vectordb.weaviate.query(hostOrKey, collection, vector, filter, limit, $config)

在第 2 个参数指定的集合中，检索与定义的 vector 最接近的 limit 条结果。请注意，除了常规配置参数外，此过程还需要一个 field: [listOfProperty] 配置，以定义从后台运行的 GraphQL 中检索哪些属性。默认端点为 <hostOrKey param>/graphql。

apoc.vectordb.weaviate.getAndUpdate(hostOrKey, collection, ids, $config)

获取具有指定 ids 的向量，并可选择创建/更新 Neo4j 实体。默认端点为 <hostOrKey param>/schema。

apoc.vectordb.weaviate.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config)

在第 2 个参数指定的集合中，检索与定义的 vector 最接近的 limit 条结果，并可选择创建/更新 Neo4j 实体。请注意，除了常规配置参数外，此过程还需要一个 field: [listOfProperty] 配置，以定义从后台运行的 GraphQL 中检索哪些属性。默认端点为 <hostOrKey param>/graphql。

其中第 1 个参数可以是 APOC 配置中定义的键，例如 apoc.weaviate.<key>.host=myHost。若 hostOrKey 为 null，则默认值为 'https://:8080/v1'。

示例

获取集合信息（利用此 API）

CALL apoc.vectordb.weaviate.info($host, 'test_collection', {<optional config>})

表 1. 结果示例
值
{"vectorizer": "none", "invertedIndexConfig": {"bm25": {"b": 0.75, "k1": 1.2}, "stopwords": {"additions": null, "removals": null, "preset": en}, "cleanupIntervalSeconds": 60}, "vectorIndexConfig": {"ef": -1, "dynamicEfMin": 100, "pq": {"centroids": 256, "trainingLimit": 100000, "encoder": {"type": "kmeans", "distribution": "log-normal"}, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": cosine, "skip": false, "dynamicEfFactor": 8, "bq": {"enabled": false}, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64}, "multiTenancyConfig": {"enabled": false}, "vectorIndexType": "hnsw", "replicationConfig": {"factor": 1}, "shardingConfig": {"desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id"}, "class": "TestCollection", "properties": [{"name": "city", "description": "This property was generated by Weaviate’s auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": "word", "indexSearchable": true, "dataType": ["text"]}, {"name": "foo", "description": "This property was generated by Weaviate’s auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": word, "indexSearchable": true, "dataType": ["text"]} ] }

创建集合（利用此 API）

CALL apoc.vectordb.weaviate.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})

表 2. 结果示例
向量化器	倒排索引配置	向量索引配置	多租户配置	向量索引类型	复制配置	分片配置	类	属性
none	{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60}	{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 }	{ "enabled": false }	hnsw	{ "factor": 1 }	{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" }	TestCollection	null

使用 API 密钥针对远程连接创建集合（参见此处）

CALL apoc.vectordb.weaviate.createCollection("https://<weaviateInstanceId>.weaviate.network",
    'TestCollection',
    'cosine',
    4,
    {headers: {Authorization: 'Bearer <apiKey>'}})

表 3. 结果示例
向量化器	倒排索引配置	向量索引配置	多租户配置	向量索引类型	复制配置	分片配置	类	属性
none	{"bm25": { "b": 0.75, "k1": 1.2 }, "stopwords": { "additions": null, "removals": null, "preset": "en" }, "cleanupIntervalSeconds": 60}	{ "ef": -1, "dynamicEfMin": 100, "pq": { "centroids": 256, "trainingLimit": 100000, "encoder": { "type": "kmeans", "distribution": "log-normal" }, "enabled": false, "bitCompression": false, "segments": 0 }, "distance": "cosine", "skip": false, "dynamicEfFactor": 8, "bq": { "enabled": false }, "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64 }	{ "enabled": false }	hnsw	{ "factor": 1 }	{ "desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id" }	TestCollection	null

删除集合（利用此 API）

CALL apoc.vectordb.weaviate.deleteCollection($host, 'test_collection', {<optional config>})

返回一个空结果。

插入/更新向量（利用此 API）

CALL apoc.vectordb.weaviate.upsert($host, 'test_collection',
    [
        {id: "8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
        {id: "9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308", vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
    ],
    {<optional config>})

表 4. 结果示例
最后更新时间（Unix 时间戳）	向量 (vector)	id	创建时间（Unix 时间戳）	类	属性
1721293838439	[0.05, 0.61, 0.76, 0.74]	8ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308	1721293838439	TestCollection	{city: "Berlin", foo: "one"}
1721293838439	[0.19, 0.81, 0.75, 0.11]	9ef2b3a7-1e56-4ddd-b8c3-2ca8901ce308	1721293838439	TestCollection	{city: "London", foo: "two"}

获取向量（利用此 API）

CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {<optional config>})

表 5. 结果示例
score	元数据 (metadata)	id	向量 (vector)	文本 (text)	实体 (entity)
null	{city: "Berlin", foo: "one"}	null	null	null	null
null	{city: "Berlin", foo: "two"}	null	null	null	null

获取带有 {allResults: true} 的向量

CALL apoc.vectordb.weaviate.get($host, 'test_collection', [1,2], {allResults: true, <optional config>})

表 6. 结果示例
score	元数据 (metadata)	id	向量 (vector)	文本 (text)	实体 (entity)
null	{city: "Berlin", foo: "one"}	1	[…]	null	null
null	{city: "Berlin", foo: "two"}	2	[…]	null	null

查询向量（利用此处）

CALL apoc.vectordb.weaviate.query($host,
    'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    '{operator: Equal, valueString: "London", path: ["city"]}',
    5,
    {fields: ["city", "foo"], allResults: true, <other optional config>})

表 7. 结果示例
score	元数据 (metadata)	id	向量 (vector)	文本 (text)
1,	{city: "Berlin", foo: "one"}	1	[…]	null
0.1	{city: "Berlin", foo: "two"}	2	[…]	null

我们可以定义映射以获取相关的节点和关系，并利用向量元数据选择性地创建它们。

例如，如果我们通过上述 upsert 过程创建了 2 个向量，我们可以填充一些现有的节点（例如 (:Test {myId: 'one'}) 和 (:Test {myId: 'two'})）

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        embeddingKey: "vect",
        nodeLabel: "Test",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

这将填充两个节点为：(:Test {myId: 'one', city: 'Berlin', vect: [vector1]}) 和 (:Test {myId: 'two', city: 'London', vect: [vector2]})，它们将在 entity 列结果中返回。

我们还可以将映射配置中的 mode 设置为 CREATE_IF_MISSING（不存在时创建节点）、READ_ONLY（搜索节点/关系，但不进行更新）或 UPDATE_EXISTING（默认行为）。

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        mode: "CREATE_IF_MISSING",
        embeddingKey: "vect",
        nodeLabel: "Test",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

这会创建如上所示的 2 个新节点。

或者，我们可以填充现有关系（例如 (:Start)-[:TEST {myId: 'one'}]→(:End) 和 (:Start)-[:TEST {myId: 'two'}]→(:End)）

CALL apoc.vectordb.weaviate.queryAndUpdate($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        embeddingKey: "vect",
        relType: "TEST",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

这会填充两个关系为：()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-() 和 ()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()，它们将在 entity 列结果中返回。

我们也可以为 apoc.vectordb.weaviate.query 过程使用映射，以搜索符合 label/type 和 metadataKey 的节点/关系，而无需进行更新（即等同于 *.queryOrUpdate 过程，且映射配置中包含 mode: "READ_ONLY"）。

例如，对于之前的关系，我们可以执行以下过程，该过程仅返回 rel 列中的关系：

CALL apoc.vectordb.weaviate.query($host, 'test_collection',
    [0.2, 0.1, 0.9, 0.7],
    {},
    5,
    { fields: ["city", "foo"],
      mapping: {
        relType: "TEST",
        entityKey: "myId",
        metadataKey: "foo"
      }
    })

我们也可以在 apoc.vectordb.weaviate.get* 过程中使用映射。

为了优化性能，我们可以选择使用 YIELD 来指定 apoc.vectordb.weaviate.query 和 apoc.vectordb.weaviate.get 过程返回的字段。

例如，通过执行 CALL apoc.vectordb.weaviate.query(…) YIELD metadata, score, id，RestAPI 请求将包含 {"with_payload": false, "with_vectors": false}，这样我们就不会返回不需要的其他值。

可以将向量数据库过程与 apoc.ml.rag 一起使用，如下所示：

CALL apoc.vectordb.weaviate.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD score, node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value

这将返回一个字符串，通过利用向量数据库的嵌入（embeddings）来回答 $question。

删除向量（利用此 API）

CALL apoc.vectordb.weaviate.delete($host, 'test_collection', [1,2], {<optional config>})

表 8. 结果示例
值
["1", "2"]