Files
MemoryBear/docs/rag/end-to-end/source-full.md
Multica PM Agent 343a5eebe3
Some checks failed
Sync to Gitee / sync (push) Has been cancelled
docs(rag): add MemoryBear RAG implementation docs v1.0
Submit the formed RAG documentation set produced across Sprint-1/2/3
(WS-12 through WS-26) under docs/rag/. Includes:

- README.md / INDEX.md: landing + total index (responsibility matrix,
  review verdicts, dual-link to source issues)
- overview/: full-pipeline architecture (4 .mmd diagrams),
  11-stage boundary contracts, doc map, source-code inventory
- pipeline/: 5 deep-dives (Loader/Parser/Chunking, Embedding,
  VDB & retrieval, GraphRAG, Rerank/Prompt/LLM)
- graphrag/, end-to-end/: v1.0 formal versions with full source
  retained as reference
- evolution/: 11 architecture-refactor proposals,
  6-direction roadmap, capability map
- review/: S3-T1 / S3-T2 final reviews, S2-T7 final summary
- _indexes/: glossary (81 terms), source->doc reverse index, chart index
- _release/: v1.0-RC1 release manifest, versioning convention,
  ops & freshness plan
- _meta/README.md: placeholder noting WS-12 governance assets gap

Aggregate review score 92.6/100 (8/8 PASS, 31/31 source-code spot
checks hit). The legacy docs/ ignore in .gitignore is narrowed to
docs/* with an explicit allowlist for docs/rag/.

Refs: WS-26
Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 10:51:48 +08:00

645 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "[S2-T6] 端到端检索-生成调用链路与时序图"
author: AI 知识库解决方案专家
source-commit: feae2f2e (MemoryBear)
last-reviewed-at: 2026-05-08
scope: api/app/{services,app_chat_service,draft_run_service,core/agent/langchain_agent,core/models/{llm,rerank,embedding},core/rag/{nlp/search,vdb/elasticsearch/elasticsearch_vector,app/naive,graphrag/{search,general/index}}}
---
# [S2-T6] 端到端检索-生成调用链路与时序图
## 一句话定位
本文档是 Sprint-2 的"全链路串联"文档,将 [S2-T1]~[S2-T5] 五篇独立深度文档中的调用栈、数据结构与配置项,整合为**两条端到端时序图**Query 端 + Indexing 端)、**一张关键路径表**、**三套多场景调用链**与**一张错误降级路径图**。所有函数引用均直接来源于子任务文档,未凭空虚构。
---
## 1. Query 端 E2E 时序图
**场景**用户通过分享链接发起对话Agent 调用知识库检索工具,最终流式输出答案。
```mermaid
sequenceDiagram
autonumber
actor U as 用户
participant FE as 前端 (Web)
participant API as FastAPI<br/>api/main.py
participant CS as AppChatService<br/>services/app_chat_service.py
participant AS as AgentRunService<br/>services/draft_run_service.py
participant Agent as LangChainAgent<br/>core/agent/langchain_agent.py
participant Tool as knowledge_retrieval_tool<br/>draft_run_service.py:195
participant KR as knowledge_retrieval()<br/>core/rag/nlp/search.py:36
participant RK as _retrieve_for_knowledge()<br/>core/rag/nlp/search.py:149
participant VDB as ElasticSearchVector<br/>core/rag/vdb/elasticsearch/
participant ES as Elasticsearch
participant Graph as KGSearch<br/>core/rag/graphrag/search.py:19
participant LLM as RedBearLLM<br/>core/models/llm.py
participant CM as Chat Model<br/>core/rag/llm/chat_model.py
U->>FE: 输入 Query
FE->>API: POST /api/v1/chat<br/>{message, conversation_id, ...}
API->>CS: await agnet_chat()<br/>app_chat_service.py:43
Note over CS: 同步/阻塞: 模型配置加载 + 工具组装
CS->>CS: 加载 features_config + 文件校验
CS->>CS: ModelApiKeyService.get_available_api_key()<br/>获取 LLM api_key/model_name
CS->>CS: render_prompt_message()<br/>变量替换 system_prompt
CS->>AS: load_knowledge_retrieval_config()<br/>组装知识检索工具
CS->>Agent: LangChainAgent()<br/>langchain_agent.py:26
Note over Agent: 输入: system_prompt + tools<br/>max_iterations = 5 + len(tools)*2
Agent->>Agent: _prepare_messages()<br/>langchain_agent.py:230<br/>组装: history + context + query
Note over Agent: 数据结构: List[BaseMessage]<br/>[SystemMessage, HumanMessage, AIMessage, ...]
Agent->>LLM: invoke(messages)<br/>models/llm.py:65
LLM->>CM: _chat()<br/>chat_model.py:122
Note over CM: 同步/阻塞 HTTP 调用<br/>stream=False (首轮判断工具)
CM-->>LLM: AIMessage(content="", tool_calls=[...])
LLM-->>Agent: 需调用 knowledge_retrieval_tool
Agent->>Tool: 执行知识检索工具
Tool->>KR: knowledge_retrieval(query, config)<br/>search.py:36
Note over KR: 输入: query=str<br/>config={knowledge_bases, retrieve_type, reranker_id, use_graph}
loop 遍历每个知识库
KR->>RK: _retrieve_for_knowledge()<br/>search.py:149
Note over RK: 输入: db_knowledge, kb_config<br/>输出: List[DocumentChunk]
alt retrieve_type == "semantic" (纯向量)
RK->>VDB: search_by_vector()<br/>elasticsearch_vector.py:374
VDB->>VDB: embeddings.embed_query(query)<br/>models/embedding.py:65
VDB->>ES: script_score: cosineSimilarity()<br/>filter: metadata.status=1
ES-->>VDB: List[hit] (score /2 归一化到 [0,1])
else retrieve_type == "participle" (纯关键词)
RK->>VDB: search_by_full_text()<br/>elasticsearch_vector.py:468
VDB->>ES: match + ik_max_word<br/>filter: metadata.status=1
ES-->>VDB: List[hit] (_score/max_score 归一化)
else retrieve_type == "hybrid" (混合)
par 双路并发
RK->>VDB: search_by_vector() [异步]
RK->>VDB: search_by_full_text() [异步]
end
RK->>RK: metadata.doc_id 去重
RK->>VDB: rerank(query, docs, top_k)<br/>elasticsearch_vector.py:560
VDB->>VDB: RedBearRerank.compress_documents()<br/>models/rerank.py:11
end
alt retrieve_type == "graph" 且 use_graph=true
RK->>Graph: kg_retriever.retrieval()<br/>graphrag/search.py:19
Graph->>Graph: query_rewrite() LLM 提取实体+类型
Graph->>ES: 三路召回: entity/relation/community
ES-->>Graph: {page_content: entities+relations+community}
Graph-->>RK: DocumentChunk 插入 rs[0]
end
end
alt reranker_id 配置
KR->>KR: rerank()<br/>search.py:284
KR->>KR: RedBearRerank.compress_documents()<br/>models/rerank.py:11
Note over KR: 外部 rerank API 调用<br/>同步/阻塞, 100-500ms
end
KR-->>Tool: List[DocumentChunk]<br/>page_content + metadata
Tool->>Tool: chunks 拼接为 context 字符串
Tool-->>Agent: f"检索到以下相关信息: {context}"
Agent->>Agent: _prepare_messages()<br/>追加工具结果到消息列表
Agent->>LLM: astream_events(version="v2")<br/>models/llm.py:117
LLM->>CM: _chat_streamly()<br/>chat_model.py:152
Note over CM: 异步/流式 HTTP SSE<br/>yield (delta, token_count)
loop 每收到一个 token chunk
CM-->>LLM: GenerationChunk
LLM-->>Agent: on_chat_model_stream event
Agent-->>CS: yield SSE chunk
CS-->>API: StreamingResponse
API-->>FE: data: {"content": "..."}
FE-->>U: 逐字渲染
end
CS->>CS: _filter_citations()<br/>draft_run_service.py:474<br/>引用过滤 + 下载链接
CS-->>API: {content, citations, tokens_used}
API-->>FE: JSON 响应
```
### 1.1 关键调用栈注释
| 步骤 | 函数 | 文件:行号 | 同步/异步 | 输入 | 输出 |
|------|------|-----------|-----------|------|------|
| 1 | `agnet_chat()` | `services/app_chat_service.py:43` | `async` | message, config, files | Dict |
| 2 | `LangChainAgent.__init__()` | `core/agent/langchain_agent.py:26` | 同步 | model_name, tools, system_prompt | Agent 实例 |
| 3 | `_prepare_messages()` | `core/agent/langchain_agent.py:230` | 同步 | message, history, context | `List[BaseMessage]` |
| 4 | `knowledge_retrieval()` | `core/rag/nlp/search.py:36` | 同步 | query, config | `List[DocumentChunk]` |
| 5 | `_retrieve_for_knowledge()` | `core/rag/nlp/search.py:149` | 同步 | db_knowledge, kb_config | `List[DocumentChunk]` |
| 6 | `search_by_vector()` | `core/rag/vdb/elasticsearch/elasticsearch_vector.py:374` | 同步 | query, top_k, score_threshold | `List[DocumentChunk]` |
| 7 | `embed_query()` | `core/models/embedding.py:65` | 同步 | query_str | `List[float]` |
| 8 | `search_by_full_text()` | `core/rag/vdb/elasticsearch/elasticsearch_vector.py:468` | 同步 | query, top_k, score_threshold | `List[DocumentChunk]` |
| 9 | `rerank()` (独立) | `core/rag/nlp/search.py:284` | 同步 | query, docs, top_k | `List[DocumentChunk]` |
| 10 | `RedBearRerank.compress_documents()` | `core/models/rerank.py:11` | 同步 | documents, query | `List[Document]` |
| 11 | `KGSearch.retrieval()` | `core/rag/graphrag/search.py:19` | 同步 | question, kb_ids, emb_mdl | Dict |
| 12 | `_chat_streamly()` | `core/rag/llm/chat_model.py:152` | 异步流式 | messages | `AsyncGenerator` |
| 13 | `_filter_citations()` | `services/draft_run_service.py:474` | 同步 | features_config, citations | List[Dict] |
### 1.2 输入输出数据结构
```python
# 1. DocumentChunk (检索结果单元)
# core/rag/models/chunk.py
class DocumentChunk(BaseModel):
page_content: str # chunk 文本内容
vector: list[float] | None # 向量(检索阶段通常为空)
metadata: dict = {
"doc_id": str, # 文档唯一标识
"file_name": str, # 原始文件名
"score": float, # 相似度/重排序分数
"knowledge_id": str, # 所属知识库
...
}
# 2. knowledge_retrieval 配置结构
config = {
"knowledge_bases": [{
"kb_id": str,
"retrieve_type": "participle" | "semantic" | "hybrid" | "graph",
"similarity_threshold": float, # 默认 0.2
"vector_similarity_weight": float, # 默认 0.3
"top_k": int, # 默认 4
}],
"reranker_id": str | None,
"reranker_top_k": int, # 默认 1024
"use_graph": bool, # 是否启用 GraphRAG
}
# 3. LangChainAgent 消息结构
messages = [
SystemMessage(content="system_prompt + skill_prompts"),
HumanMessage(content="历史消息..."),
AIMessage(content="历史回复..."),
HumanMessage(content="参考信息:\n\n{chunks}\n\n用户问题:\n{query}"),
]
```
---
## 2. Indexing 端 E2E 时序图
**场景**:用户上传 PDF 文档到知识库系统完成解析、分块、Embedding、写入 ES + 构建图谱。
```mermaid
sequenceDiagram
autonumber
actor U as 用户
participant API as document_controller.py
participant Task as Celery Task<br/>tasks.py
participant Chunk as chunk()<br/>core/rag/app/naive.py:508
participant Parser as DeepDoc Parser<br/>core/rag/deepdoc/parser/
participant NLP as naive_merge<br/>core/rag/nlp/__init__.py
participant Emb as RedBearEmbeddings<br/>core/models/embedding.py
participant VDB as ElasticSearchVector<br/>core/rag/vdb/elasticsearch/
participant ES as Elasticsearch
participant Graph as GraphRAG Index<br/>core/rag/graphrag/general/index.py
U->>API: POST /documents<br/>上传文件 + knowledge_id
API->>API: 保存原始文件到存储
API->>Task: 异步触发 chunk 任务
Task->>Chunk: chunk(filename, binary, ...)<br/>naive.py:508
Note over Chunk: 总入口,按扩展名分派
alt PDF 格式
Chunk->>Chunk: 按 parser_config.layout_recognize 选引擎<br/>PARSERS dict: naive.py:97
Chunk->>Parser: Pdf.__call__()<br/>pdf_parser.py:522
Parser->>Parser: __images__() OCR<br/>ocr.py:522
Parser->>Parser: _layouts_rec() 版面识别<br/>layout_recognizer.py:147
Parser->>Parser: _table_transformer_job() TSR<br/>table_structure_recognizer.py
Parser->>Parser: _text_merge() + _concat_downward()<br/>XGBoost 段落连接
Parser-->>Chunk: sections=[(text, position_tag), ...]<br/>tables=[...]
else DOCX 格式
Chunk->>Parser: Docx.parse()<br/>docx_parser.py:9
Parser-->>Chunk: sections=[(text, image), ...]
else Excel/CSV
Chunk->>Parser: ExcelParser.__call__()<br/>excel_parser.py:203
Parser-->>Chunk: sections (每行一段)
else Markdown
Chunk->>Parser: MarkdownParser<br/>markdown_parser.py:10
Parser-->>Chunk: sections (element block)
end
Chunk->>NLP: naive_merge(sections)<br/>nlp/__init__.py:562
Note over NLP: 按 token 上限 + delimiter 切分<br/>默认 chunk_token_num=512 (PDF) / 128 (其他)
NLP->>NLP: tokenize_chunks()<br/>nlp/__init__.py:258
Note over NLP: 注入 ES 字段:<br/>content_with_weight, content_ltks, content_sm_ltks,<br/>page_num_int, position_int, top_int, docnm_kwd
Chunk-->>Task: List[Dict] (ES doc 格式)
Task->>Emb: embed_documents(texts)<br/>models/embedding.py:65
Note over Emb: 多 provider 支持:<br/>OpenAI/DashScope/Volcano/Xinference/...
Emb-->>Task: List[List[float]]
Task->>VDB: add_chunks(chunks, embeddings)<br/>elasticsearch_vector.py:55
VDB->>VDB: create_collection() 懒建索引<br/>elasticsearch_vector.py:65
Note over VDB: mapping: page_content(text+ik),<br/>metadata(object), vector(dense_vector+cosine)
VDB->>ES: helpers.bulk(actions)<br/>批量写入
ES-->>VDB: result (success count)
alt GraphRAG 启用 (use_graphrag=true)
Task->>Graph: run_graphrag_for_kb()<br/>graphrag/general/index.py:122
Graph->>Graph: generate_subgraph()<br/>index.py:333
Note over Graph: LLM 抽取 entities + relations<br/>多轮 gleaning (max=2)
Graph->>Graph: merge_subgraph()<br/>index.py:409
Graph->>ES: 写入 entity/relation chunks<br/>带 q_{dim}_vec 向量字段
alt General 模式 + with_resolution
Graph->>Graph: EntityResolution()<br/>entity_resolution.py:53
Note over Graph: 编辑距离预筛选 + LLM 批量判断<br/>batch=100, concurrent=5
end
alt General 模式 + with_community
Graph->>Graph: leiden.run()<br/>leiden.py:95
Graph->>Graph: CommunityReportsExtractor()<br/>community_reports_extractor.py:55
Graph->>ES: 写入 community_report chunks
end
end
Task-->>API: {ok_documents, failed_documents, seconds}
API-->>U: 入库完成通知
```
### 2.1 关键调用栈注释
| 步骤 | 函数 | 文件:行号 | 同步/异步 | 输入 | 输出 |
|------|------|-----------|-----------|------|------|
| 1 | `chunk()` | `core/rag/app/naive.py:508` | 同步 | filename/binary, parser_config | `List[Dict]` ES doc |
| 2 | `Pdf.__call__()` | `pdf_parser.py:1006` | 同步 | filename, callback | sections, tables |
| 3 | `OCR.__call__()` | `vision/ocr.py:522` | 同步 | PIL.Image | text_boxes |
| 4 | `LayoutRecognizer4YOLOv10.__call__()` | `layout_recognizer.py:147` | 同步 | image_list | layout_types |
| 5 | `naive_merge()` | `core/rag/nlp/__init__.py:562` | 同步 | sections, chunk_token_num | `List[str]` chunks |
| 6 | `tokenize_chunks()` | `core/rag/nlp/__init__.py:258` | 同步 | chunks, doc | `List[Dict]` ES docs |
| 7 | `embed_documents()` | `core/models/embedding.py:65` | 同步 | texts | `List[List[float]]` |
| 8 | `add_chunks()` | `core/rag/vdb/elasticsearch/elasticsearch_vector.py:55` | 同步 | chunks, embeddings | uuids |
| 9 | `create_collection()` | `elasticsearch_vector.py:609` | 同步 | embeddings | mapping created |
| 10 | `helpers.bulk()` | elasticsearch.helpers | 同步 | actions | (success, errors) |
| 11 | `run_graphrag_for_kb()` | `graphrag/general/index.py:122` | 异步 (trio) | document_ids | subgraphs |
| 12 | `generate_subgraph()` | `graphrag/general/index.py:333` | 异步 | extractor, chunks | nx.Graph |
| 13 | `EntityResolution.__call__()` | `entity_resolution.py:53` | 异步 | graph, nodes | merged_graph |
| 14 | `leiden.run()` | `graphrag/general/leiden.py:95` | 同步 | graph | communities |
### 2.2 ES Doc 字段契约
```python
# 写入 ES 的 chunk 文档结构 (来自 S2-T1 §6.7)
{
"docnm_kwd": str, # 文件名 (keyword)
"title_tks": str, # 标题粗分词
"title_sm_tks": str, # 标题细分词
"content_with_weight": str, # 原始 chunk 文本 (BM25 加权)
"content_ltks": str, # 内容粗分词 (whitespace analyzer)
"content_sm_ltks": str, # 内容细分词
"page_num_int": [int], # 页码列表
"position_int": [(p,x0,x1,y0,y1)], # 坐标
"top_int": [int], # 行顶 y 坐标
"image": bytes | None, # PIL.Image 二进制
"doc_type_kwd": str | None, # "image" 或空
"q_{dim}_vec": [float], # Embedding 向量 (S2-T2 补充)
"metadata": {
"doc_id": str,
"file_name": str,
"knowledge_id": str,
"status": 1,
}
}
```
---
## 3. 关键路径表 (Critical Path Table)
> 耗时基线基于代码注释、log 锚点及工程经验估算。实际值取决于文档复杂度、模型 provider、网络延迟与 ES 集群规模。
| # | 环节 | 关键函数 | 文件:行号 | P50 | P95 | 阻塞/非阻塞 | 瓶颈标记 |
|---|------|---------|-----------|-----|-----|------------|---------|
| 1 | **PDF 解析 (OCR+Layout+TSR)** | `Pdf.__call__()` | `deepdoc/parser/pdf_parser.py:1006` | 3s | 15s | 阻塞 (CPU/GPU) | 🔴 |
| 2 | **Chunking (tokenize)** | `naive_merge()` + `tokenize_chunks()` | `nlp/__init__.py:562,258` | 50ms | 200ms | 阻塞 (本地 CPU) | 🟡 |
| 3 | **Embedding (批量)** | `embed_documents()` | `models/embedding.py:65` | 200ms | 1s | 阻塞 (网络 I/O) | 🔴 |
| 4 | **ES 批量写入** | `helpers.bulk()` | `elasticsearch_vector.py:85` | 100ms | 500ms | 阻塞 (网络 I/O) | 🟡 |
| 5 | **GraphRAG 实体抽取** | `generate_subgraph()` | `graphrag/general/index.py:333` | 30s | 120s | 阻塞 (LLM I/O) | 🔴 |
| 6 | **GraphRAG 实体消歧** | `EntityResolution.__call__()` | `entity_resolution.py:53` | 10s | 60s | 阻塞 (LLM I/O) | 🔴 |
| 7 | **GraphRAG 社区报告** | `CommunityReportsExtractor.__call__()` | `community_reports_extractor.py:55` | 20s | 90s | 阻塞 (LLM I/O) | 🔴 |
| 8 | **Query Embedding** | `embed_query()` | `models/embedding.py:65` | 50ms | 300ms | 阻塞 (网络 I/O) | 🟡 |
| 9 | **ES 向量检索** | `search_by_vector()` | `elasticsearch_vector.py:374` | 30ms | 200ms | 阻塞 (网络 I/O) | 🟡 |
| 10 | **ES 关键词检索** | `search_by_full_text()` | `elasticsearch_vector.py:468` | 20ms | 100ms | 阻塞 (网络 I/O) | 🟢 |
| 11 | **外部 Rerank** | `RedBearRerank.compress_documents()` | `models/rerank.py:11` | 100ms | 500ms | 阻塞 (网络 I/O) | 🟡 |
| 12 | **GraphRAG 检索** | `KGSearch.retrieval()` | `graphrag/search.py:19` | 200ms | 1s | 阻塞 (LLM+ES) | 🟡 |
| 13 | **LLM 首次调用 (判断工具)** | `_chat()` | `chat_model.py:122` | 500ms | 3s | 阻塞 (网络 I/O) | 🔴 |
| 14 | **LLM 流式生成** | `_chat_streamly()` | `chat_model.py:152` | 500ms | 5s | 非阻塞 (SSE 流式) | 🔴 |
| 15 | **引用回填** | `Dealer.insert_citations()` | `search.py:489` | 100ms | 500ms | 阻塞 (本地 embedding) | 🟡 |
### 3.1 瓶颈分析
| 瓶颈 | 根因 | 缓解方向 |
|------|------|---------|
| PDF 解析 (P95=15s) | OCR + Layout + TSR 串行执行GPU 模型加载慢 | MinerU 替代 / 异步队列 / 预加载模型 |
| Embedding API (P95=1s) | 外部 API 延迟batch_size=16 不够大 | 本地 Xinference / GPUStack 部署 |
| GraphRAG 建图 (P95=120s) | LLM 多轮抽取,单文档串行 | 增加 max_parallel_documents / 增量更新 |
| LLM 流式输出 (P95=5s) | 首次 token (TTFT) 慢,长答案总耗时长 | 缓存高频 query / 缩短 max_tokens |
---
## 4. 多场景调用链
### 4.1 场景 A纯向量检索问答
**适用**:语义匹配质量高的知识库,用户问题与文档表述风格一致。
```
[User Query]
AppChatService.agnet_chat() [services/app_chat_service.py:43] async
LangChainAgent.invoke() [core/agent/langchain_agent.py:65] sync
knowledge_retrieval_tool 调用
knowledge_retrieval() [core/rag/nlp/search.py:36] sync
_retrieve_for_knowledge() [core/rag/nlp/search.py:149] sync
│ retrieve_type="semantic"
ElasticSearchVector.search_by_vector() [core/rag/vdb/elasticsearch/elasticsearch_vector.py:374] sync
├─► embed_query(query) [core/models/embedding.py:65] sync, HTTP
│ │
│ ▼
│ List[float] query_vector
ES script_score: cosineSimilarity(params.query_vector, 'vector') + 1.0
filter: metadata.status=1
List[DocumentChunk] (score /2 归一化到 [0,1])
score_threshold 过滤 (默认 0.3)
返回 top_k chunks → Agent 上下文组装
LLM _chat_streamly() 流式生成答案
```
**数据结构流转**
```
query: str
→ query_vector: List[float] (dim=512/768/1024/1536)
→ ES hits: List[{_score, _source}]
→ DocumentChunk[] (score ∈ [0,1])
→ context: str (chunks 用 "\n\n" 拼接)
→ messages: List[BaseMessage] (system + history + context + query)
→ SSE stream: AsyncGenerator[str]
```
### 4.2 场景 B混合检索问答 (关键词 + 向量)
**适用**:关键词精准度与语义召回互补的场景,如技术文档库。
```
[User Query]
knowledge_retrieval() [core/rag/nlp/search.py:36] sync
_retrieve_for_knowledge() [core/rag/nlp/search.py:149] sync
│ retrieve_type="hybrid" (默认分支)
┌─────────────────────────────────────────┐
│ 双路并发 (asyncio.gather) │
│ │
│ 路 1: search_by_vector() │
│ [elasticsearch_vector.py:374] │
│ → embed_query() → ES script_score │
│ → 归一化 score /2 → [0,1] │
│ │
│ 路 2: search_by_full_text() │
│ [elasticsearch_vector.py:468] │
│ → match + ik_max_word → BM25 │
│ → 归一化 _score/max_score → [0,1] │
└─────────────────────────────────────────┘
metadata.doc_id 去重 (后到的丢弃)
ElasticSearchVector.rerank() [elasticsearch_vector.py:560] sync
RedBearRerank.compress_documents() [core/models/rerank.py:11] sync
│ 外部 API 调用 (Xinference/GPUStack/DashScope)
按 relevance_score 降序取 top_k
返回 DocumentChunk[] → Agent
```
**融合公式**(路径 B 应用层):
```
candidates = vector_topk(q) bm25_topk(q)
deduped = unique_by(metadata.doc_id, candidates)
final = reranker(query, deduped)[:top_k] (若配置 reranker)
or sort_by_score_desc(deduped)[:top_k] (未配置时)
```
### 4.3 场景 CGraphRAG 关系推理问答
**适用**:需要多跳推理、实体关联分析、全局洞察的复杂问答。
```
[User Query]
knowledge_retrieval() [core/rag/nlp/search.py:36] sync
_retrieve_for_knowledge() [core/rag/nlp/search.py:149] sync
│ retrieve_type="graph"
├─► 先执行 hybrid 检索 (同场景 B)
KGSearch.retrieval() [core/rag/graphrag/search.py:19] sync
query_rewrite() [graphrag/search.py:33]
├─► LLM Prompt: minirag_query2kwd
│ 输入: question + TYPE_POOL (从 ES 采样)
│ 输出: {answer_type_keywords, entities_from_query}
┌─────────────────────────────────────────┐
│ 三路召回并行 │
│ │
│ 路 1: get_relevant_ents_by_keywords() │
│ → embed_query(entities) → ES knn │
│ → 实体向量相似度召回 (sim_threshold=0.3)│
│ │
│ 路 2: get_relevant_ents_by_types() │
│ → answer_type_keywords 精确匹配 │
│ │
│ 路 3: get_relevant_relations_by_txt() │
│ → 关系向量相似度召回 │
└─────────────────────────────────────────┘
n-hop 路径扩展 (预计算)
│ sim_decay = 1/(2 + hop_depth)
融合打分: score = sim × pagerank
│ 实体排序: sim × pagerank
│ 关系排序: sim × pagerank × boost
Token 预算截断 (max_token 递减)
社区报告召回 (comm_topn=1)
返回: {page_content: entities + relations + community,
metadata: {...}, vector: None}
插入 hybrid 结果头部: rs.insert(0, graph_chunk)
Agent 上下文组装 → LLM 生成
```
**GraphRAG 建图调用链**(前置条件):
```
tasks.py:build_graphrag_for_kb()
→ run_graphrag_for_kb() [graphrag/general/index.py:122]
→ generate_subgraph() [index.py:333]
→ LLM 抽取 entities + relations (多轮 gleaning, max=2)
→ merge_subgraph() [index.py:409]
→ graph_merge() [utils.py:199]
→ [可选] EntityResolution() [entity_resolution.py:53]
→ [可选] leiden.run() [leiden.py:95]
→ [可选] CommunityReportsExtractor() [community_reports_extractor.py:55]
→ ES 写入 entity/relation/community chunks
```
---
## 5. 错误传播与降级路径
### 5.1 错误传播矩阵
| 环节 | 失败模式 | 影响范围 | 兜底逻辑 | 源码位置 |
|------|---------|---------|---------|---------|
| **PDF 解析** | OCR 模型缺失 / GPU 不可用 | 单文档失败 | `callback(-1, "OCR model not found")`,任务标记为 failed_document | `pdf_parser.py:50` |
| **LibreOffice 转换** | soffice 未安装 / 120s 超时 | PPT/DOC 失败 | 抛 HTTP 500无自动降级 | `utils/libre_office.py:11` |
| **Embedding API** | 超时 / 限流 / 鉴权失败 | 单批 chunks 失败 | 抛出异常helpers.bulk 不捕获,整批失败需重试 | `models/embedding.py:65` |
| **ES 写入** | ConnectionTimeout / 集群不可用 | 单批 chunks 失败 | `ATTEMPT_TIME=2` 重试,回连后重发 | `utils/es_conn.py:294` |
| **GraphRAG 抽取** | LLM 输出格式错误 | 单 chunk 失败 | `json_repair` 容错 + max_errors=3超限时跳过 | `extractor.py:97` |
| **GraphRAG 消歧** | LLM 超时 (280s) | 消歧失败 | `trio.move_on_after` 超时,跳过消歧阶段 | `entity_resolution.py:53` |
| **知识库检索** | 单 KB 不可用 | 其他 KB 不受影响 | `try/except continue`,失败 KB 被跳过 | `search.py:110` |
| **向量检索为空** | 阈值过严 / 维度不匹配 | 当前 KB 无结果 | fallback: 降低 min_match 0.3→0.1,提高 similarity 0.1→0.17 | `search.py:447` |
| **外部 Rerank** | API 超时 / 模型不可用 | 无重排序结果 | fallback: 返回原始结果(不打乱顺序) | `search.py:115` |
| **GraphRAG 检索** | 图谱未建 / ES 查询失败 | 无图谱增强结果 | fallback: 仅返回 hybrid 结果 | `search.py:263` |
| **LLM 调用** | RATE_LIMIT / SERVER_ERROR | 生成失败 | 重试 5 次 + 随机抖动;仍失败返回 `**ERROR**: ...` | `chat_model.py:64` |
| **LLM 截断** | finish_reason="length" | 答案不完整 | 自动追加截断提示 (中英文自适应) | `chat_model.py:152` |
| **引用回填** | embedding 匹配失败 | 无引用标记 | 跳过 citation 插入,返回裸文本 | `search.py:489` |
### 5.2 降级路径图
```
正常路径:
Query → Hybrid 检索 → Rerank → LLM 生成 → 引用回填 → 输出
降级路径 1 (检索为空):
Query → Hybrid 检索 (空) → fallback 降低阈值重试 → 仍空 → LLM 直接回答 (无上下文)
降级路径 2 (Rerank 失败):
Query → Hybrid 检索 → Rerank API 超时 → fallback 返回原始排序 → LLM 生成
降级路径 3 (GraphRAG 失败):
Query → Hybrid 检索 → GraphRAG 查询失败 → fallback 仅 hybrid 结果 → LLM 生成
降级路径 4 (单 KB 失败):
Query → KB-A (失败, try/except) + KB-B (成功) → 合并结果 → LLM 生成
降级路径 5 (LLM 失败):
Query → 检索成功 → LLM 调用失败 (5 次重试后) → 返回 "**ERROR**: 服务暂不可用"
降级路径 6 (ES 集群不可用):
Query → ES 连接失败 → 无检索结果 → LLM 直接回答 (无上下文) / 返回错误
```
### 5.3 关键降级代码片段
```python
# 1. 单 KB 失败不影响整体 (search.py:110)
try:
rs, chat_model, embedding_model = _retrieve_for_knowledge(...)
all_results.extend(rs)
except Exception as e:
print(f"retrieval knowledge({kb_id}) failed: {str(e)}")
continue # 跳过失败 KB
# 2. Rerank 失败 fallback (search.py:115-128)
if reranker_id and all_results:
try:
all_results = rerank(...)
except Exception as rerank_error:
logger.warning("Reranker failed, falling back to original results")
# fallback: 保持原始排序
# 3. 检索为空 fallback (search.py:447-459)
if total == 0:
matchText, _ = self.qryr.question(qst, min_match=0.1) # 0.3 → 0.1
matchDense.extra_options["similarity"] = 0.17 # 0.1 → 0.17
res = self.dataStore.search(...)
# 4. GraphRAG 失败 fallback (search.py:263)
try:
graph_doc = kg_retriever.retrieval(...)
rs.insert(0, DocumentChunk(...))
except Exception as graph_error:
logger.warning(f"Graph retrieval failed...") # 仅 hybrid 结果
# 5. LLM 重试 (chat_model.py:64-89)
retry_max = LLM_MAX_RETRIES # 默认 5
while retry_max > 0:
try:
return self.client.chat.completions.create(...)
except (RateLimitError, APIConnectionError, APIError):
time.sleep(random.uniform(1, LLM_BASE_DELAY * 2 ** (5-retry_max)))
retry_max -= 1
```
---
## 附录:跨文档引用索引
| 本章节 | 引用来源 | 被引文档 |
|--------|---------|---------|
| §1 Loader/Parser/Chunking | `naive.py:508`, `naive_merge()` | [S2-T1] |
| §1/§2 Embedding | `embed_documents()`, `embed_query()` | [S2-T2] |
| §1/§2 VDB 检索与写入 | `search_by_vector()`, `add_chunks()`, mapping | [S2-T3] |
| §1/§2 GraphRAG | `KGSearch.retrieval()`, `run_graphrag()` | [S2-T4] |
| §1 Rerank/Prompt/LLM | `RedBearRerank`, `_chat_streamly()`, `_filter_citations()` | [S2-T5] |
---
*本文档直接整合自 [S2-T1]~[S2-T5] 五篇子任务文档的源码引用与流程描述,所有文件:行号均可在 MemoryBear 仓库 commit `feae2f2e` 中验证。*