LlamaIndex + Neo4j 集成

概述

LlamaIndex 是一个用于构建 LLM（大语言模型）驱动应用程序的开源数据编排框架。它提供了用于从多种来源获取数据的数据连接器、强大的索引和检索机制、查询引擎和聊天界面、用于复杂 Agent 应用的事件驱动工作流，以及与向量存储、数据库和其他 LLM 框架的无缝集成。

安装

pip install llama-index-core llama-index-tools-mcp llama-index-vector-stores-neo4jvector

核心功能

用于构建多智能体应用程序的事件驱动工作流和 FunctionAgent
通过 llama-index-vector-stores-neo4jvector 包实现原生 Neo4j 集成
通过 llama-index-tools-mcp 提供 MCP 服务器支持
使用 FunctionTool.from_defaults() 创建自定义工具
支持几乎所有主流 LLM 提供商（OpenAI、Anthropic、Google、Cohere、Mistral、AWS Bedrock、Azure 等）
用于文档解析 (LlamaParse)、分类 (LlamaClassify) 和提取 (LlamaExtract) 的 LlamaCloud 工具

示例

Notebook	描述
llamaindex_functionagent.ipynb	使用 LlamaIndex、Neo4j MCP 服务器、自定义工具、向量搜索和 FunctionAgent 工作流构建公司研究 Agent
build_knowledge_graph_with_neo4j_llamacloud.ipynb	使用 LlamaClassify、LlamaExtract 和 Neo4j 知识图谱构建实现的法律文档处理端到端流水线

Notebook

描述

llamaindex_functionagent.ipynb

使用 LlamaIndex、Neo4j MCP 服务器、自定义工具、向量搜索和 FunctionAgent 工作流构建公司研究 Agent

build_knowledge_graph_with_neo4j_llamacloud.ipynb

使用 LlamaClassify、LlamaExtract 和 Neo4j 知识图谱构建实现的法律文档处理端到端流水线

扩展点

1. MCP 集成

LlamaIndex 通过 llama-index-tools-mcp 包支持 MCP 服务器。使用 BasicMCPClient 和 McpToolSpec 连接到 MCP 服务器并检索工具。

Neo4j MCP 服务器： 利用官方 Neo4j MCP 服务器进行模式读取和 Cypher 查询执行

2. 直接 Neo4j 集成

LlamaIndex 提供原生的 Neo4j 集成

Neo4jVectorStore： 通过 llama-index-vector-stores-neo4jvector 进行向量存储集成，支持对图数据进行语义搜索，并支持混合搜索、元数据过滤和自定义检索查询
Neo4j Python 驱动程序： 您始终可以直接使用 Neo4j Python 驱动程序，在自定义工具中执行 Cypher 查询

3. 自定义工具/函数

使用 FunctionTool.from_defaults() 定义自定义 Neo4j 工具

实现通过 Neo4j Python 驱动程序执行 Cypher 查询的函数
使用 QueryEngineTool 将 Neo4j 向量存储封装为工具
在单个 FunctionAgent 中结合使用 MCP 工具和自定义工具

4. LlamaCloud 工具

使用 LlamaCloud 服务从文档构建知识图谱

LlamaParse： 解析复杂的文档格式（PDF、演示文稿等）
LlamaClassify： 基于 AI 的自定义规则文档分类
LlamaExtract： 使用 Pydantic 模式提取结构化数据

5. Text-to-Cypher 和 GraphRAG 检索

LlamaIndex 提供 TextToCypherRetriever 和 VectorContextRetriever，用于构建结合语义搜索与自然语言 Cypher 生成的 GraphRAG Agent。这两个检索器均针对 Neo4jPropertyGraphStore 工作，并可组合在单个查询引擎中，作为 Agent 工具对外提供。

from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.core.retrievers import (
    CustomPGRetriever,
    VectorContextRetriever,
    TextToCypherRetriever,
)
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.tools import QueryEngineTool

graph_store = Neo4jPropertyGraphStore(
    username="companies",
    password="companies",
    url="neo4j+s://demo.neo4jlabs.com:7687",
    database="companies",
)

# Semantic search over article chunks linked to company nodes
vector_retriever = VectorContextRetriever(
    graph_store,
    include_text=True,
    similarity_top_k=3,
)

# Natural language → Cypher for structured graph queries
cypher_retriever = TextToCypherRetriever(graph_store)

# Combine into a query engine and wrap as an agent tool
query_engine = RetrieverQueryEngine.from_args(
    graph_store.as_retriever(
        sub_retrievers=[vector_retriever, cypher_retriever]
    )
)

research_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="company_research",
    description=(
        "Search news and relationships in the companies knowledge graph. "
        "Use for questions about organizations, industries, leadership, and recent articles."
    ),
)

6. Neo4j 查询引擎工具

llama-index-tools-neo4j 包提供了一个 Neo4jQueryToolSpec，它可以在 Neo4j 图谱上创建现成的查询引擎。可用的引擎类型包括基于向量的实体检索、基于关键字的检索、混合检索、原始向量索引检索、KnowledgeGraphQueryEngine 和 KnowledgeGraphRAGRetriever。每种类型都作为可调用工具公开，Agent 可在运行时进行选择。

pip install llama-index-tools-neo4j

MCP 身份验证

支持的机制

✅ 环境变量 (STDIO 传输) - 对于本地 MCP 服务器，请在启动进程前设置环境变量。BasicMCPClient 可以通过 stdio 传输连接到本地进程。

✅ HTTP 标头 (HTTP/SSE 传输) - 对于远程 MCP 服务器，通过 headers 参数传递 API 密钥或 Bearer 令牌（例如：Authorization: Basic ${CREDENTIALS} 或 Authorization: Bearer ${API_TOKEN}）。

✅ OAuth 2.0 (客户端内) - BasicMCPClient 通过 with_oauth() 方法支持 OAuth 2.0 身份验证，并支持可配置的令牌存储。

配置示例 (HTTP 传输)

import os
import base64
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

# Set environment variables for the MCP server
os.environ["NEO4J_URI"] = "neo4j+s://demo.neo4jlabs.com"
os.environ["NEO4J_DATABASE"] = "companies"
os.environ["NEO4J_MCP_TRANSPORT"] = "http"

# Credentials passed via HTTP headers
credentials = base64.b64encode(
    f"{os.environ['NEO4J_USERNAME']}:{os.environ['NEO4J_PASSWORD']}".encode()
).decode()

mcp_client = BasicMCPClient(
    "https://:80/mcp",
    headers={"Authorization": f"Basic {credentials}"},
)

mcp_tool_spec = McpToolSpec(client=mcp_client)
tools = await mcp_tool_spec.to_tool_list_async()