根据搜索提示查找电影
一旦电影节点拥有对其标题和情节进行编码的嵌入,并且数据库中存在针对这些嵌入的向量索引,您就可以检索匹配模糊描述(即搜索提示)的电影,这有点类似于在搜索引擎中输入几个关键词来查找相关网页。
本页的示例展示了如何检索与提示 a criminal is changed through love 相关的电影。可以想象成去电影院并询问:“你会向我推荐哪些电影?今天我想看那些因爱情而改变的罪犯的电影。”主要区别在于,电影院的对话使用自然语言,而在数据库中搜索时,提示首先会被转换为嵌入。随后向量索引使用该提示嵌入检索出嵌入与搜索提示最相似的节点。
| 始终使用相同的模型生成嵌入:选择一个模型,并用它为数据集和搜索提示生成嵌入。尝试混合不同向量维度的模型会导致错误。维度相同的模型可以工作,但由于训练方式不同,通常也不会很好地协同工作。在创建向量索引时提供维度,这能确保只索引该尺寸的向量,并在查询时明确因维度不匹配而失败。 |
使用开源库生成的嵌入的相似度
本示例使用 SentenceTransformers 检索与描述 a criminal is changed through love 相关的节点。
from sentence_transformers import SentenceTransformer
import neo4j
URI = '<database-uri>'
AUTH = ('<username>', '<password>')
DB_NAME = '<database-name>' # examples: 'recommendations-5.26', 'neo4j'
model = SentenceTransformer('all-MiniLM-L6-v2') (1)
query_prompt = 'a criminal is changed through love' (2)
query_embedding = model.encode(query_prompt)
with neo4j.GraphDatabase.driver(URI, auth=AUTH) as driver:
driver.verify_connectivity()
related_movies, _, _ = driver.execute_query('''
MATCH (movie:Movie)
SEARCH movie IN ( (3)
VECTOR INDEX moviePlots
FOR $queryEmbedding
LIMIT 5
) SCORE AS similarityScore
RETURN movie.title AS title, movie.plot AS plot, similarityScore
''', queryEmbedding=query_embedding,
database_=DB_NAME)
print(f'Movies whose plot and title relate to `{query_prompt}`:')
for record in related_movies:
print(record)
| 1 | 为搜索提示生成嵌入时必须使用与生成被搜索嵌入相同的模型。本教程使用 all-MiniLM-L6-V2 生成嵌入,这里也复用了该模型。 |
| 2 | 查询提示包含一个模糊的电影检索描述。随后它被编码为嵌入,以便用于查询相似的节点。 |
| 3 | 要查询向量索引,请使用 Cypher 子句 SEARCH。数据库返回向量索引 moviePlots 中与 queryVector 最相似的 5 个节点,并附带它们与查询嵌入匹配程度的 score。 |
Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.792834997177124>
<Record title='Laura' plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.7741715908050537>
<Record title='Despicable Me' plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.772994875907898>
<Record title='Laws of Attraction' plot='Amidst a sea of litigation, two New York City divorce lawyers find love.' score=0.7727792263031006>
<Record title='Love the Hard Way' plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.7681001424789429>
使用 OpenAI 和其他云服务生成的嵌入的相似度
本示例使用 OpenAI 检索与描述 a criminal is changed through love 相关的节点。
import neo4j
URI = '<database-uri>'
AUTH = ('<username>', '<password>')
DB_NAME = '<database-name>' # examples: 'recommendations-5.26', 'neo4j'
openAI_token = '<OpenAI API token>' (1)
search_prompt = 'a criminal is changed through love' (2)
with neo4j.GraphDatabase.driver(URI, auth=AUTH) as driver:
driver.verify_connectivity()
records, summary, _ = driver.execute_query('''
WITH ai.text.embed($searchPrompt, 'OpenAI', {
token: $token, model: 'text-embedding-3-small'
}) AS queryVector (3)
MATCH (movie:Movie)
SEARCH movie IN ( (4)
VECTOR INDEX moviePlots
FOR queryVector
LIMIT 5
) SCORE AS similarityScore
RETURN movie.title AS title, movie.plot AS plot, similarityScore
''', searchPrompt=search_prompt, token=openAI_token,
database_=DB_NAME)
print(f'Movies whose plot and title relate to `{search_prompt}`:')
for record in records:
print(record)
| 1 | 您的 OpenAI API 令牌,例如 sk-proj-XXXX。 |
| 2 | 查询提示包含一个模糊的电影检索描述。 |
| 3 | 查询提示通过 Cypher 函数 ai.text.embed() 编码为嵌入,以便用于查询相似节点。 |
| 4 | 要查询向量索引,请使用 Cypher 子句 SEARCH。数据库返回向量索引 moviePlots 中与 queryVector 最相似的 5 个节点,并附带它们与查询嵌入匹配程度的 score。 |
Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' node.plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.9272396564483643>
<Record title='Love the Hard Way' node.plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.9221653938293457>
<Record title='Laura' node.plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.9215129017829895>
<Record title='Despicable Me' node.plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.9206478595733643>
<Record title='Cook the Thief His Wife & Her Lover, The' node.plot='The wife of an abusive criminal finds solace in the arms of a kind regular guest in her husbands restaurant.' score=0.9205931425094604>
匹配质量
匹配质量完全取决于嵌入模型和数据集,而非 Neo4j 向量索引。 嵌入始终在 Neo4j 外部 生成;数据库仅 存储 它们为属性。
考虑使用 SentenceTransformers(OpenAI 的结果类似)时,使用搜索提示 a criminal is changed through love 检索到的节点
Movies whose plot and title relate to `a criminal is changed through love`:
<Record title='I Love You Phillip Morris' plot="A cop turns con man once he comes out of the closet. Once imprisoned, he meets the second love of his life, whom he'll stop at nothing to be with." score=0.792834997177124>
<Record title='Laura' plot='A police detective falls in love with the woman whose murder he is investigating.' score=0.7741715908050537>
<Record title='Despicable Me' plot='When a criminal mastermind uses a trio of orphan girls as pawns for a grand scheme, he finds their love is profoundly changing him for the better.' score=0.772994875907898>
<Record title='Laws of Attraction' plot='Amidst a sea of litigation, two New York City divorce lawyers find love.' score=0.7727792263031006>
<Record title='Love the Hard Way' plot='The story of a petty thief who meets an innocent young woman and brings her into his world of crime while she teaches him the lessons of enjoying life and being loved.' score=0.7681001424789429>
本示例表明嵌入符合预期:Despicable Me 排名第三,相关性得分为 77%。同时,它也展示了 嵌入模型的局限性,因为检索到的电影实际上并未真正与提示相关。
-
Laura并没有“犯罪因爱情而改变”,但其中有一位警探(常与criminals合作),在一起谋杀案件的情境中坠入love(同样与criminals相关)。 -
Laws of Attraction完全没有criminals,但包含:attraction(与love相关),litigation(通常在法庭上进行,法庭与criminals有关),lawyers(常与criminals关联),以及love,尽管是在律师之间。 -
Love the Hard Way则几乎相反:一名无辜的学生爱上了一个较低层次的criminal(petty thief),随后陷入了跌宕的恶性循环。
即使这些电影几乎与搜索提示无关,数据库是正确的:它们是根据嵌入最相关的结果。嵌入为何未能以人们期望的方式编码意义,这不是向量索引的问题,而是外部 AI 模型本身的问题。如果您的搜索提示返回的结果不佳,应检查使用的嵌入模型及其所应用的数据集,而不是仅仅调整 Neo4j 的设置。