微软 Azure 认知服务
Microsoft Azure 认知服务 API 使用机器学习来发现文本中的洞察和关系。本章中的过程充当了对该 API 调用的封装,用于从存储为节点属性的文本中提取实体、关键短语并提供情感分析。
每个过程有两种模式
-
Stream(流模式) - 返回由 API 返回的 JSON 构建的映射(map)
过程概述
过程描述如下
| 限定名称 | 类型 | 版本 |
|---|---|---|
|
|
|
|
apoc.nlp.azure.entities.stream
|
|
|
apoc.nlp.azure.keyPhrases.graph
|
|
|
apoc.nlp.azure.keyPhrases.stream
|
|
|
apoc.nlp.azure.sentiment.graph
|
|
|
apoc.nlp.azure.sentiment.stream
|
|
|
|
目前,Microsoft Azure 认知服务 API 支持 10 多种语言的文本输入。为了获得更好的结果,请确保您的文本属于 认知服务支持的语言。 |
实体提取
实体提取过程 (apoc.nlp.azure.entities.*) 是对 Azure 文本分析 API 的 实体 (Entities) 端点的封装。此 API 方法返回给定文档中已知实体和通用命名实体(“人”、“地点”、“组织”等)的列表。
过程描述如下
| 签名 |
|---|
apoc.nlp.azure.entities.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
apoc.nlp.azure.entities.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
该过程支持以下配置参数
| 名称 (name) | type | 默认 | description(描述) |
|---|---|---|---|
键 (key) |
字符串 |
null |
Microsoft.CognitiveServicesTextAnalytics API 密钥 |
url |
字符串 |
null |
Microsoft.CognitiveServicesTextAnalytics 端点 |
nodeProperty |
字符串 |
文本 (text) |
提供的节点上包含待分析非结构化文本的属性 |
此外,apoc.nlp.azure.entities.graph 支持以下配置参数
| 名称 (name) | type | 默认 | description(描述) |
|---|---|---|---|
scoreCutoff |
双精度浮点数 |
0.0 |
在图中显示实体所需得分的下限。值必须在 0 到 1 之间。 得分是 Amazon Comprehend 对检测准确性信心水平的一个指标。 |
write |
布尔值 |
false |
持久化实体图 |
writeRelationshipType |
字符串 |
ENTITY |
从源节点到实体节点的关系类型 |
writeRelationshipProperty |
字符串 |
score |
从源节点到实体节点的关系属性 |
CALL apoc.nlp.azure.entities.stream(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.azure.entities.graph(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String,
scoreCutoff: Double,
writeRelationshipType: String,
writeRelationshipProperty: String,
write: Boolean
})
YIELD graph
关键短语
关键短语过程 (apoc.nlp.azure.keyPhrases.*) 是对 Azure 文本分析 API 的 关键短语 (Key Phrases) 端点的封装。关键短语是输入文本中的关键谈论点。
该过程描述如下
| 签名 |
|---|
apoc.nlp.azure.keyPhrases.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
apoc.nlp.azure.keyPhrases.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
该过程支持以下配置参数
| 名称 (name) | type | 默认 | description(描述) |
|---|---|---|---|
键 (key) |
字符串 |
null |
Microsoft.CognitiveServicesTextAnalytics API 密钥 |
url |
字符串 |
null |
Microsoft.CognitiveServicesTextAnalytics 端点 |
nodeProperty |
字符串 |
文本 (text) |
提供的节点上包含待分析非结构化文本的属性 |
此外,apoc.nlp.azure.keyPhrases.graph 支持以下配置参数
| 名称 (name) | type | 默认 | description(描述) |
|---|---|---|---|
write |
布尔值 |
false |
持久化关键短语图谱 |
writeRelationshipType |
字符串 |
KEY_PHRASE |
从源节点到关键短语节点的关系类型 |
CALL apoc.nlp.azure.keyPhrases.stream(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.azure.keyPhrases.graph(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String,
writeRelationshipType: String,
write: Boolean
})
YIELD graph
情感分析
情感过程 (apoc.nlp.azure.sentiment.*) 是对 Azure 文本分析 API 的 情感 (Sentiment) 端点的封装。API 返回一个 0 到 1 之间的数值评分。接近 1 的得分表示积极情绪,接近 0 的得分表示消极情绪。0.5 的得分表示无明显情感(例如事实陈述)。
过程描述如下
| 签名 |
|---|
apoc.nlp.azure.sentiment.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
apoc.nlp.azure.sentiment.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
该过程支持以下配置参数
| 名称 (name) | type | 默认 | description(描述) |
|---|---|---|---|
键 (key) |
字符串 |
null |
Microsoft.CognitiveServicesTextAnalytics API 密钥 |
url |
字符串 |
null |
Microsoft.CognitiveServicesTextAnalytics 端点 |
nodeProperty |
字符串 |
文本 (text) |
提供的节点上包含待分析非结构化文本的属性 |
此外,apoc.nlp.azure.sentiment.graph 支持以下配置参数
| 名称 (name) | type | 默认 | description(描述) |
|---|---|---|---|
write |
布尔值 |
false |
持久化情感图谱 |
CALL apoc.nlp.azure.sentiment.stream(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.azure.sentiment.graph(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String,
write: Boolean
})
YIELD graph
安装依赖项
NLP 过程依赖于 Kotlin 和客户端库,这些库未包含在 APOC Extended 库中。
这些依赖项包含在 apoc-nlp-dependencies-2025.10.0-all.jar 中,可从 发布页面 下载。下载该文件后,应将其放入 plugins 目录并重启 Neo4j 服务器。
设置 API 密钥和 URL
我们可以按照 快速入门:使用文本分析客户端库 文章中的说明生成 API 密钥和 URL。完成后,我们应该能看到一个列出凭据的页面,类似于下面的截图
在本例中,我们的 API URL 是 https://neo4j-nlp-text-analytics.cognitiveservices.azure.com/,我们可以使用其中任一隐藏的密钥。
让我们填充并执行以下命令来创建包含这些详细信息的参数。
apiKey 和 apiSecret 参数
:param apiKey => ("<api-key-here>");
:param apiUrl => ("<api-url-here>");
或者,我们可以将这些凭据添加到 apoc.conf 中,并使用静态值存储函数检索它们。请参阅 静态值存储
apoc.static.azure.apiKey=<api-key-here>
apoc.static.azure.apiUrl=<api-url-here>
apoc.conf 中检索 AWS 凭据
RETURN apoc.static.getAll("azure") AS azure;
| azure |
|---|
{apiKey: "<api-key-here>", apiUrl: "<api-url-here>"} |
示例
本节中的示例基于以下示例图
CREATE (:Article {
uri: "/blog/pokegraph-gotta-graph-em-all/",
body: "These days I’m rarely more than a few feet away from my Nintendo Switch and I play board games, card games and role playing games with friends at least once or twice a week. I’ve even organised lunch-time Mario Kart 8 tournaments between the Neo4j European offices!"
});
CREATE (:Article {
uri: "https://en.wikipedia.org/wiki/Nintendo_Switch",
body: "The Nintendo Switch is a video game console developed by Nintendo, released worldwide in most regions on March 3, 2017. It is a hybrid console that can be used as a home console and portable device. The Nintendo Switch was unveiled on October 20, 2016. Nintendo offers a Joy-Con Wheel, a small steering wheel-like unit that a Joy-Con can slot into, allowing it to be used for racing games such as Mario Kart 8."
});
实体提取
首先,让我们从一个 Article 节点提取实体。我们要分析的文本存储在节点的 body 属性中,因此我们需要通过 nodeProperty 配置参数指定它。
MATCH (a:Article {uri: "/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.entities.stream(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;
| 实体 (entity) |
|---|
{name: "Nintendo Switch", wikipediaId: "Nintendo Switch", type: "Other", matches: [{length: 15, text: "Nintendo Switch", wikipediaScore: 0.8339868065025469, offset: 56}], bingId: "b3d617ef-81fc-4188-9a2b-a5cf1f8534b5", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Nintendo_Switch"} |
{name: "Nintendo Switch", type: "Organization", matches: [{length: 15, entityTypeScore: 0.94, text: "Nintendo Switch", offset: 56}]} |
{name: "Oberon Media", wikipediaId: "Oberon Media", type: "Organization", matches: [{length: 6, text: "I play", wikipediaScore: 0.032446316016667254, offset: 76}], bingId: "166f6e0f-33b7-8707-bb8b-5a932c498333", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Oberon_Media"} |
{name: "a week", subType: "Duration", type: "DateTime", matches: [{length: 6, entityTypeScore: 0.8, text: "a week", offset: 166}]} |
{name: "Mario Kart 8", wikipediaId: "Mario Kart 8", type: "Other", matches: [{length: 12, text: "Mario Kart 8", wikipediaScore: 0.7802000593632747, offset: 205}], bingId: "ce6f55ec-d3d7-032a-0bf8-15ad3e8df3f4", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Mario_Kart_8"} |
{name: "Mario Kart", type: "Organization", matches: [{length: 10, entityTypeScore: 0.72, text: "Mario Kart", offset: 205}]} |
{name: "8", subType: "Number", type: "Quantity", matches: [{length: 1, entityTypeScore: 0.8, text: "8", offset: 216}]} |
{name: "Neo4j", wikipediaId: "Neo4j", type: "Other", matches: [{length: 5, text: "Neo4j", wikipediaScore: 0.8150388253887939, offset: 242}], bingId: "bc2f436b-8edd-6ba6-b2d3-69901348d653", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Neo4j"} |
{name: "Europe", wikipediaId: "Europe", type: "Location", matches: [{length: 8, text: "European", wikipediaScore: 0.00591759926701263, offset: 248}], bingId: "501457aa-5b70-cfba-cfd8-be882b4bac1e", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Europe"} |
我们得到了 9 个不同的实体,尽管我们可以看到其中一些指的是相同的事物,只是 type 值不同。我们可以应用一个 Cypher 语句,为每个实体创建一个节点,并从这些节点向 Article 节点创建一条 ENTITY 关系。
MATCH (a:Article {uri: "/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.entities.stream(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
WITH a, entity.name AS entity, collect(entity.type) AS types
MERGE (e:Entity {name: entity})
SET e.type = types
MERGE (a)-[:ENTITY]->(e);
或者,我们可以使用图模式自动创建实体图谱。除了具有 Entity 标签外,每个实体节点还会根据 type 属性的值拥有另一个标签。默认情况下,返回的是虚拟图谱。
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g
我们可以在 Pokemon 和 Nintendo Switch 实体图谱 中看到该虚拟图谱的 Neo4j 浏览器可视化效果。
在此可视化中,我们还可以看到每个实体节点的得分。此得分代表 API 对其检测该实体的置信度。我们可以使用 scoreCutoff 属性为得分指定最低截止值。
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
scoreCutoff: 0.7,
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g
我们可以在 置信度 >= 0.7 的 Pokemon 和 Nintendo Switch 实体图谱 中看到该虚拟图谱的 Neo4j 浏览器可视化效果。
如果我们对这个图感到满意并希望将其持久化到 Neo4j 中,可以通过指定 write: true 配置来实现。
HAS_ENTITY 关系
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
scoreCutoff: 0.7,
writeRelationshipType: "HAS_ENTITY",
writeRelationshipProperty: "azureEntityScore",
write: true
})
YIELD graph AS g
RETURN g;
然后,我们可以编写一个查询来返回已创建的实体。
MATCH (article:Article)
RETURN article.uri AS article,
[(article)-[r:HAS_ENTITY]->(e:Entity) | {text: e.text, score: r.azureEntityScore}] AS entities;
| article | entities |
|---|---|
"/blog/pokegraph-gotta-graph-em-all/" |
[{score: 0.72, text: "Mario Kart"}, {score: 0.7802000593632747, text: "Mario Kart 8"}, {score: 0.8, text: "8"}, {score: 0.8, text: "a week"}, {score: 0.94, text: "Nintendo Switch"}, {score: 0.8150388253887939, text: "Neo4j"}] |
"https://en.wikipedia.org/wiki/Nintendo_Switch" |
[{score: 0.9023679924293266, text: "Joy-Con"}, {score: 0.98, text: "Nintendo"}, {score: 0.8, text: "March 3, 2017"}, {score: 0.9355623498560008, text: "Nintendo Switch"}, {score: 0.92, text: "Mario Kart"}, {score: 0.8, text: "8"}, {score: 0.8863202650046607, text: "Mario Kart 8"}, {score: 0.8, text: "October 20, 2016"}] |
关键短语
现在,让我们从 Article 节点提取关键短语。我们要分析的文本存储在节点的 body 属性中,因此我们需要通过 nodeProperty 配置参数指定它。
MATCH (a:Article {uri: "/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.keyPhrases.stream(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body"
})
YIELD value
UNWIND value.keyPhrases AS keyPhrase
RETURN keyPhrase;
| keyPhrase |
|---|
"board games" |
"card games" |
"tournaments" |
"role" |
"organised lunch-time Mario Kart" |
"Neo4j European offices" |
"Nintendo Switch" |
"friends" |
"feet" |
"days" |
或者,我们可以使用图模式自动创建关键短语图谱。每个提取的关键短语都会创建一个带有 KeyPhrase 标签的节点。
默认情况下,返回一个虚拟图,但可以通过指定 write: true 配置来持久化该图。
MATCH (a:Article {uri: "/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.keyPhrases.graph(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
writeRelationshipType: "KEY_PHRASE",
write: true
})
YIELD graph AS g
RETURN g;
我们可以在 Pokemon 关键短语图谱 中看到该虚拟图谱的 Neo4j 浏览器可视化效果。
然后,我们可以编写一个查询来返回已创建的关键短语。
MATCH (a:Article {uri: "/blog/pokegraph-gotta-graph-em-all/"})
RETURN a.uri AS article,
[(a)-[:KEY_PHRASE]->(k:KeyPhrase) | k.text] AS keyPhrases;
| article | keyPhrases |
|---|---|
"/blog/pokegraph-gotta-graph-em-all/" |
["card games", "board games", "friends", "feet", "Nintendo Switch", "days", "organised lunch-time Mario Kart", "tournaments", "Neo4j European offices", "role"] |
情感分析
现在,让我们提取 Article 节点的情感。我们要分析的文本存储在节点的 body 属性中,因此我们需要通过 nodeProperty 配置参数指定它。
MATCH (a:Article {uri: "/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.sentiment.stream(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body"
})
YIELD value
RETURN value;
| 值 |
|---|
{score: 0.5, id: "0"} |
或者,我们可以使用图模式自动存储情感及其得分。
默认情况下,返回的是虚拟图谱,但可以通过指定 write: true 配置来持久化该图谱。情感得分存储在 sentimentScore 属性中。
MATCH (a:Article {uri: "/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.sentiment.graph(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
write: true
})
YIELD graph AS g
UNWIND g.nodes AS node
RETURN node {.uri, .sentimentScore} AS node;
| 节点 |
|---|
{uri: "/blog/pokegraph-gotta-graph-em-all/", sentimentScore: 0.5} |