Some checks failed
Sync to Gitee / sync (push) Has been cancelled
Submit the formed RAG documentation set produced across Sprint-1/2/3 (WS-12 through WS-26) under docs/rag/. Includes: - README.md / INDEX.md: landing + total index (responsibility matrix, review verdicts, dual-link to source issues) - overview/: full-pipeline architecture (4 .mmd diagrams), 11-stage boundary contracts, doc map, source-code inventory - pipeline/: 5 deep-dives (Loader/Parser/Chunking, Embedding, VDB & retrieval, GraphRAG, Rerank/Prompt/LLM) - graphrag/, end-to-end/: v1.0 formal versions with full source retained as reference - evolution/: 11 architecture-refactor proposals, 6-direction roadmap, capability map - review/: S3-T1 / S3-T2 final reviews, S2-T7 final summary - _indexes/: glossary (81 terms), source->doc reverse index, chart index - _release/: v1.0-RC1 release manifest, versioning convention, ops & freshness plan - _meta/README.md: placeholder noting WS-12 governance assets gap Aggregate review score 92.6/100 (8/8 PASS, 31/31 source-code spot checks hit). The legacy docs/ ignore in .gitignore is narrowed to docs/* with an explicit allowlist for docs/rag/. Refs: WS-26 Co-authored-by: multica-agent <github@multica.ai>
645 lines
29 KiB
Markdown
645 lines
29 KiB
Markdown
---
|
||
title: "[S2-T6] 端到端检索-生成调用链路与时序图"
|
||
author: AI 知识库解决方案专家
|
||
source-commit: feae2f2e (MemoryBear)
|
||
last-reviewed-at: 2026-05-08
|
||
scope: api/app/{services,app_chat_service,draft_run_service,core/agent/langchain_agent,core/models/{llm,rerank,embedding},core/rag/{nlp/search,vdb/elasticsearch/elasticsearch_vector,app/naive,graphrag/{search,general/index}}}
|
||
---
|
||
|
||
# [S2-T6] 端到端检索-生成调用链路与时序图
|
||
|
||
## 一句话定位
|
||
|
||
本文档是 Sprint-2 的"全链路串联"文档,将 [S2-T1]~[S2-T5] 五篇独立深度文档中的调用栈、数据结构与配置项,整合为**两条端到端时序图**(Query 端 + Indexing 端)、**一张关键路径表**、**三套多场景调用链**与**一张错误降级路径图**。所有函数引用均直接来源于子任务文档,未凭空虚构。
|
||
|
||
---
|
||
|
||
## 1. Query 端 E2E 时序图
|
||
|
||
**场景**:用户通过分享链接发起对话,Agent 调用知识库检索工具,最终流式输出答案。
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
autonumber
|
||
actor U as 用户
|
||
participant FE as 前端 (Web)
|
||
participant API as FastAPI<br/>api/main.py
|
||
participant CS as AppChatService<br/>services/app_chat_service.py
|
||
participant AS as AgentRunService<br/>services/draft_run_service.py
|
||
participant Agent as LangChainAgent<br/>core/agent/langchain_agent.py
|
||
participant Tool as knowledge_retrieval_tool<br/>draft_run_service.py:195
|
||
participant KR as knowledge_retrieval()<br/>core/rag/nlp/search.py:36
|
||
participant RK as _retrieve_for_knowledge()<br/>core/rag/nlp/search.py:149
|
||
participant VDB as ElasticSearchVector<br/>core/rag/vdb/elasticsearch/
|
||
participant ES as Elasticsearch
|
||
participant Graph as KGSearch<br/>core/rag/graphrag/search.py:19
|
||
participant LLM as RedBearLLM<br/>core/models/llm.py
|
||
participant CM as Chat Model<br/>core/rag/llm/chat_model.py
|
||
|
||
U->>FE: 输入 Query
|
||
FE->>API: POST /api/v1/chat<br/>{message, conversation_id, ...}
|
||
|
||
API->>CS: await agnet_chat()<br/>app_chat_service.py:43
|
||
Note over CS: 同步/阻塞: 模型配置加载 + 工具组装
|
||
|
||
CS->>CS: 加载 features_config + 文件校验
|
||
CS->>CS: ModelApiKeyService.get_available_api_key()<br/>获取 LLM api_key/model_name
|
||
CS->>CS: render_prompt_message()<br/>变量替换 system_prompt
|
||
CS->>AS: load_knowledge_retrieval_config()<br/>组装知识检索工具
|
||
|
||
CS->>Agent: LangChainAgent()<br/>langchain_agent.py:26
|
||
Note over Agent: 输入: system_prompt + tools<br/>max_iterations = 5 + len(tools)*2
|
||
|
||
Agent->>Agent: _prepare_messages()<br/>langchain_agent.py:230<br/>组装: history + context + query
|
||
Note over Agent: 数据结构: List[BaseMessage]<br/>[SystemMessage, HumanMessage, AIMessage, ...]
|
||
|
||
Agent->>LLM: invoke(messages)<br/>models/llm.py:65
|
||
LLM->>CM: _chat()<br/>chat_model.py:122
|
||
Note over CM: 同步/阻塞 HTTP 调用<br/>stream=False (首轮判断工具)
|
||
|
||
CM-->>LLM: AIMessage(content="", tool_calls=[...])
|
||
LLM-->>Agent: 需调用 knowledge_retrieval_tool
|
||
|
||
Agent->>Tool: 执行知识检索工具
|
||
Tool->>KR: knowledge_retrieval(query, config)<br/>search.py:36
|
||
Note over KR: 输入: query=str<br/>config={knowledge_bases, retrieve_type, reranker_id, use_graph}
|
||
|
||
loop 遍历每个知识库
|
||
KR->>RK: _retrieve_for_knowledge()<br/>search.py:149
|
||
Note over RK: 输入: db_knowledge, kb_config<br/>输出: List[DocumentChunk]
|
||
|
||
alt retrieve_type == "semantic" (纯向量)
|
||
RK->>VDB: search_by_vector()<br/>elasticsearch_vector.py:374
|
||
VDB->>VDB: embeddings.embed_query(query)<br/>models/embedding.py:65
|
||
VDB->>ES: script_score: cosineSimilarity()<br/>filter: metadata.status=1
|
||
ES-->>VDB: List[hit] (score /2 归一化到 [0,1])
|
||
else retrieve_type == "participle" (纯关键词)
|
||
RK->>VDB: search_by_full_text()<br/>elasticsearch_vector.py:468
|
||
VDB->>ES: match + ik_max_word<br/>filter: metadata.status=1
|
||
ES-->>VDB: List[hit] (_score/max_score 归一化)
|
||
else retrieve_type == "hybrid" (混合)
|
||
par 双路并发
|
||
RK->>VDB: search_by_vector() [异步]
|
||
RK->>VDB: search_by_full_text() [异步]
|
||
end
|
||
RK->>RK: metadata.doc_id 去重
|
||
RK->>VDB: rerank(query, docs, top_k)<br/>elasticsearch_vector.py:560
|
||
VDB->>VDB: RedBearRerank.compress_documents()<br/>models/rerank.py:11
|
||
end
|
||
|
||
alt retrieve_type == "graph" 且 use_graph=true
|
||
RK->>Graph: kg_retriever.retrieval()<br/>graphrag/search.py:19
|
||
Graph->>Graph: query_rewrite() LLM 提取实体+类型
|
||
Graph->>ES: 三路召回: entity/relation/community
|
||
ES-->>Graph: {page_content: entities+relations+community}
|
||
Graph-->>RK: DocumentChunk 插入 rs[0]
|
||
end
|
||
end
|
||
|
||
alt reranker_id 配置
|
||
KR->>KR: rerank()<br/>search.py:284
|
||
KR->>KR: RedBearRerank.compress_documents()<br/>models/rerank.py:11
|
||
Note over KR: 外部 rerank API 调用<br/>同步/阻塞, 100-500ms
|
||
end
|
||
|
||
KR-->>Tool: List[DocumentChunk]<br/>page_content + metadata
|
||
Tool->>Tool: chunks 拼接为 context 字符串
|
||
Tool-->>Agent: f"检索到以下相关信息: {context}"
|
||
|
||
Agent->>Agent: _prepare_messages()<br/>追加工具结果到消息列表
|
||
Agent->>LLM: astream_events(version="v2")<br/>models/llm.py:117
|
||
LLM->>CM: _chat_streamly()<br/>chat_model.py:152
|
||
Note over CM: 异步/流式 HTTP SSE<br/>yield (delta, token_count)
|
||
|
||
loop 每收到一个 token chunk
|
||
CM-->>LLM: GenerationChunk
|
||
LLM-->>Agent: on_chat_model_stream event
|
||
Agent-->>CS: yield SSE chunk
|
||
CS-->>API: StreamingResponse
|
||
API-->>FE: data: {"content": "..."}
|
||
FE-->>U: 逐字渲染
|
||
end
|
||
|
||
CS->>CS: _filter_citations()<br/>draft_run_service.py:474<br/>引用过滤 + 下载链接
|
||
CS-->>API: {content, citations, tokens_used}
|
||
API-->>FE: JSON 响应
|
||
```
|
||
|
||
### 1.1 关键调用栈注释
|
||
|
||
| 步骤 | 函数 | 文件:行号 | 同步/异步 | 输入 | 输出 |
|
||
|------|------|-----------|-----------|------|------|
|
||
| 1 | `agnet_chat()` | `services/app_chat_service.py:43` | `async` | message, config, files | Dict |
|
||
| 2 | `LangChainAgent.__init__()` | `core/agent/langchain_agent.py:26` | 同步 | model_name, tools, system_prompt | Agent 实例 |
|
||
| 3 | `_prepare_messages()` | `core/agent/langchain_agent.py:230` | 同步 | message, history, context | `List[BaseMessage]` |
|
||
| 4 | `knowledge_retrieval()` | `core/rag/nlp/search.py:36` | 同步 | query, config | `List[DocumentChunk]` |
|
||
| 5 | `_retrieve_for_knowledge()` | `core/rag/nlp/search.py:149` | 同步 | db_knowledge, kb_config | `List[DocumentChunk]` |
|
||
| 6 | `search_by_vector()` | `core/rag/vdb/elasticsearch/elasticsearch_vector.py:374` | 同步 | query, top_k, score_threshold | `List[DocumentChunk]` |
|
||
| 7 | `embed_query()` | `core/models/embedding.py:65` | 同步 | query_str | `List[float]` |
|
||
| 8 | `search_by_full_text()` | `core/rag/vdb/elasticsearch/elasticsearch_vector.py:468` | 同步 | query, top_k, score_threshold | `List[DocumentChunk]` |
|
||
| 9 | `rerank()` (独立) | `core/rag/nlp/search.py:284` | 同步 | query, docs, top_k | `List[DocumentChunk]` |
|
||
| 10 | `RedBearRerank.compress_documents()` | `core/models/rerank.py:11` | 同步 | documents, query | `List[Document]` |
|
||
| 11 | `KGSearch.retrieval()` | `core/rag/graphrag/search.py:19` | 同步 | question, kb_ids, emb_mdl | Dict |
|
||
| 12 | `_chat_streamly()` | `core/rag/llm/chat_model.py:152` | 异步流式 | messages | `AsyncGenerator` |
|
||
| 13 | `_filter_citations()` | `services/draft_run_service.py:474` | 同步 | features_config, citations | List[Dict] |
|
||
|
||
### 1.2 输入输出数据结构
|
||
|
||
```python
|
||
# 1. DocumentChunk (检索结果单元)
|
||
# core/rag/models/chunk.py
|
||
class DocumentChunk(BaseModel):
|
||
page_content: str # chunk 文本内容
|
||
vector: list[float] | None # 向量(检索阶段通常为空)
|
||
metadata: dict = {
|
||
"doc_id": str, # 文档唯一标识
|
||
"file_name": str, # 原始文件名
|
||
"score": float, # 相似度/重排序分数
|
||
"knowledge_id": str, # 所属知识库
|
||
...
|
||
}
|
||
|
||
# 2. knowledge_retrieval 配置结构
|
||
config = {
|
||
"knowledge_bases": [{
|
||
"kb_id": str,
|
||
"retrieve_type": "participle" | "semantic" | "hybrid" | "graph",
|
||
"similarity_threshold": float, # 默认 0.2
|
||
"vector_similarity_weight": float, # 默认 0.3
|
||
"top_k": int, # 默认 4
|
||
}],
|
||
"reranker_id": str | None,
|
||
"reranker_top_k": int, # 默认 1024
|
||
"use_graph": bool, # 是否启用 GraphRAG
|
||
}
|
||
|
||
# 3. LangChainAgent 消息结构
|
||
messages = [
|
||
SystemMessage(content="system_prompt + skill_prompts"),
|
||
HumanMessage(content="历史消息..."),
|
||
AIMessage(content="历史回复..."),
|
||
HumanMessage(content="参考信息:\n\n{chunks}\n\n用户问题:\n{query}"),
|
||
]
|
||
```
|
||
|
||
---
|
||
|
||
## 2. Indexing 端 E2E 时序图
|
||
|
||
**场景**:用户上传 PDF 文档到知识库,系统完成解析、分块、Embedding、写入 ES + 构建图谱。
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
autonumber
|
||
actor U as 用户
|
||
participant API as document_controller.py
|
||
participant Task as Celery Task<br/>tasks.py
|
||
participant Chunk as chunk()<br/>core/rag/app/naive.py:508
|
||
participant Parser as DeepDoc Parser<br/>core/rag/deepdoc/parser/
|
||
participant NLP as naive_merge<br/>core/rag/nlp/__init__.py
|
||
participant Emb as RedBearEmbeddings<br/>core/models/embedding.py
|
||
participant VDB as ElasticSearchVector<br/>core/rag/vdb/elasticsearch/
|
||
participant ES as Elasticsearch
|
||
participant Graph as GraphRAG Index<br/>core/rag/graphrag/general/index.py
|
||
|
||
U->>API: POST /documents<br/>上传文件 + knowledge_id
|
||
API->>API: 保存原始文件到存储
|
||
API->>Task: 异步触发 chunk 任务
|
||
|
||
Task->>Chunk: chunk(filename, binary, ...)<br/>naive.py:508
|
||
Note over Chunk: 总入口,按扩展名分派
|
||
|
||
alt PDF 格式
|
||
Chunk->>Chunk: 按 parser_config.layout_recognize 选引擎<br/>PARSERS dict: naive.py:97
|
||
Chunk->>Parser: Pdf.__call__()<br/>pdf_parser.py:522
|
||
Parser->>Parser: __images__() OCR<br/>ocr.py:522
|
||
Parser->>Parser: _layouts_rec() 版面识别<br/>layout_recognizer.py:147
|
||
Parser->>Parser: _table_transformer_job() TSR<br/>table_structure_recognizer.py
|
||
Parser->>Parser: _text_merge() + _concat_downward()<br/>XGBoost 段落连接
|
||
Parser-->>Chunk: sections=[(text, position_tag), ...]<br/>tables=[...]
|
||
else DOCX 格式
|
||
Chunk->>Parser: Docx.parse()<br/>docx_parser.py:9
|
||
Parser-->>Chunk: sections=[(text, image), ...]
|
||
else Excel/CSV
|
||
Chunk->>Parser: ExcelParser.__call__()<br/>excel_parser.py:203
|
||
Parser-->>Chunk: sections (每行一段)
|
||
else Markdown
|
||
Chunk->>Parser: MarkdownParser<br/>markdown_parser.py:10
|
||
Parser-->>Chunk: sections (element block)
|
||
end
|
||
|
||
Chunk->>NLP: naive_merge(sections)<br/>nlp/__init__.py:562
|
||
Note over NLP: 按 token 上限 + delimiter 切分<br/>默认 chunk_token_num=512 (PDF) / 128 (其他)
|
||
|
||
NLP->>NLP: tokenize_chunks()<br/>nlp/__init__.py:258
|
||
Note over NLP: 注入 ES 字段:<br/>content_with_weight, content_ltks, content_sm_ltks,<br/>page_num_int, position_int, top_int, docnm_kwd
|
||
|
||
Chunk-->>Task: List[Dict] (ES doc 格式)
|
||
|
||
Task->>Emb: embed_documents(texts)<br/>models/embedding.py:65
|
||
Note over Emb: 多 provider 支持:<br/>OpenAI/DashScope/Volcano/Xinference/...
|
||
Emb-->>Task: List[List[float]]
|
||
|
||
Task->>VDB: add_chunks(chunks, embeddings)<br/>elasticsearch_vector.py:55
|
||
VDB->>VDB: create_collection() 懒建索引<br/>elasticsearch_vector.py:65
|
||
Note over VDB: mapping: page_content(text+ik),<br/>metadata(object), vector(dense_vector+cosine)
|
||
VDB->>ES: helpers.bulk(actions)<br/>批量写入
|
||
ES-->>VDB: result (success count)
|
||
|
||
alt GraphRAG 启用 (use_graphrag=true)
|
||
Task->>Graph: run_graphrag_for_kb()<br/>graphrag/general/index.py:122
|
||
Graph->>Graph: generate_subgraph()<br/>index.py:333
|
||
Note over Graph: LLM 抽取 entities + relations<br/>多轮 gleaning (max=2)
|
||
Graph->>Graph: merge_subgraph()<br/>index.py:409
|
||
Graph->>ES: 写入 entity/relation chunks<br/>带 q_{dim}_vec 向量字段
|
||
|
||
alt General 模式 + with_resolution
|
||
Graph->>Graph: EntityResolution()<br/>entity_resolution.py:53
|
||
Note over Graph: 编辑距离预筛选 + LLM 批量判断<br/>batch=100, concurrent=5
|
||
end
|
||
|
||
alt General 模式 + with_community
|
||
Graph->>Graph: leiden.run()<br/>leiden.py:95
|
||
Graph->>Graph: CommunityReportsExtractor()<br/>community_reports_extractor.py:55
|
||
Graph->>ES: 写入 community_report chunks
|
||
end
|
||
end
|
||
|
||
Task-->>API: {ok_documents, failed_documents, seconds}
|
||
API-->>U: 入库完成通知
|
||
```
|
||
|
||
### 2.1 关键调用栈注释
|
||
|
||
| 步骤 | 函数 | 文件:行号 | 同步/异步 | 输入 | 输出 |
|
||
|------|------|-----------|-----------|------|------|
|
||
| 1 | `chunk()` | `core/rag/app/naive.py:508` | 同步 | filename/binary, parser_config | `List[Dict]` ES doc |
|
||
| 2 | `Pdf.__call__()` | `pdf_parser.py:1006` | 同步 | filename, callback | sections, tables |
|
||
| 3 | `OCR.__call__()` | `vision/ocr.py:522` | 同步 | PIL.Image | text_boxes |
|
||
| 4 | `LayoutRecognizer4YOLOv10.__call__()` | `layout_recognizer.py:147` | 同步 | image_list | layout_types |
|
||
| 5 | `naive_merge()` | `core/rag/nlp/__init__.py:562` | 同步 | sections, chunk_token_num | `List[str]` chunks |
|
||
| 6 | `tokenize_chunks()` | `core/rag/nlp/__init__.py:258` | 同步 | chunks, doc | `List[Dict]` ES docs |
|
||
| 7 | `embed_documents()` | `core/models/embedding.py:65` | 同步 | texts | `List[List[float]]` |
|
||
| 8 | `add_chunks()` | `core/rag/vdb/elasticsearch/elasticsearch_vector.py:55` | 同步 | chunks, embeddings | uuids |
|
||
| 9 | `create_collection()` | `elasticsearch_vector.py:609` | 同步 | embeddings | mapping created |
|
||
| 10 | `helpers.bulk()` | elasticsearch.helpers | 同步 | actions | (success, errors) |
|
||
| 11 | `run_graphrag_for_kb()` | `graphrag/general/index.py:122` | 异步 (trio) | document_ids | subgraphs |
|
||
| 12 | `generate_subgraph()` | `graphrag/general/index.py:333` | 异步 | extractor, chunks | nx.Graph |
|
||
| 13 | `EntityResolution.__call__()` | `entity_resolution.py:53` | 异步 | graph, nodes | merged_graph |
|
||
| 14 | `leiden.run()` | `graphrag/general/leiden.py:95` | 同步 | graph | communities |
|
||
|
||
### 2.2 ES Doc 字段契约
|
||
|
||
```python
|
||
# 写入 ES 的 chunk 文档结构 (来自 S2-T1 §6.7)
|
||
{
|
||
"docnm_kwd": str, # 文件名 (keyword)
|
||
"title_tks": str, # 标题粗分词
|
||
"title_sm_tks": str, # 标题细分词
|
||
"content_with_weight": str, # 原始 chunk 文本 (BM25 加权)
|
||
"content_ltks": str, # 内容粗分词 (whitespace analyzer)
|
||
"content_sm_ltks": str, # 内容细分词
|
||
"page_num_int": [int], # 页码列表
|
||
"position_int": [(p,x0,x1,y0,y1)], # 坐标
|
||
"top_int": [int], # 行顶 y 坐标
|
||
"image": bytes | None, # PIL.Image 二进制
|
||
"doc_type_kwd": str | None, # "image" 或空
|
||
"q_{dim}_vec": [float], # Embedding 向量 (S2-T2 补充)
|
||
"metadata": {
|
||
"doc_id": str,
|
||
"file_name": str,
|
||
"knowledge_id": str,
|
||
"status": 1,
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 3. 关键路径表 (Critical Path Table)
|
||
|
||
> 耗时基线基于代码注释、log 锚点及工程经验估算。实际值取决于文档复杂度、模型 provider、网络延迟与 ES 集群规模。
|
||
|
||
| # | 环节 | 关键函数 | 文件:行号 | P50 | P95 | 阻塞/非阻塞 | 瓶颈标记 |
|
||
|---|------|---------|-----------|-----|-----|------------|---------|
|
||
| 1 | **PDF 解析 (OCR+Layout+TSR)** | `Pdf.__call__()` | `deepdoc/parser/pdf_parser.py:1006` | 3s | 15s | 阻塞 (CPU/GPU) | 🔴 |
|
||
| 2 | **Chunking (tokenize)** | `naive_merge()` + `tokenize_chunks()` | `nlp/__init__.py:562,258` | 50ms | 200ms | 阻塞 (本地 CPU) | 🟡 |
|
||
| 3 | **Embedding (批量)** | `embed_documents()` | `models/embedding.py:65` | 200ms | 1s | 阻塞 (网络 I/O) | 🔴 |
|
||
| 4 | **ES 批量写入** | `helpers.bulk()` | `elasticsearch_vector.py:85` | 100ms | 500ms | 阻塞 (网络 I/O) | 🟡 |
|
||
| 5 | **GraphRAG 实体抽取** | `generate_subgraph()` | `graphrag/general/index.py:333` | 30s | 120s | 阻塞 (LLM I/O) | 🔴 |
|
||
| 6 | **GraphRAG 实体消歧** | `EntityResolution.__call__()` | `entity_resolution.py:53` | 10s | 60s | 阻塞 (LLM I/O) | 🔴 |
|
||
| 7 | **GraphRAG 社区报告** | `CommunityReportsExtractor.__call__()` | `community_reports_extractor.py:55` | 20s | 90s | 阻塞 (LLM I/O) | 🔴 |
|
||
| 8 | **Query Embedding** | `embed_query()` | `models/embedding.py:65` | 50ms | 300ms | 阻塞 (网络 I/O) | 🟡 |
|
||
| 9 | **ES 向量检索** | `search_by_vector()` | `elasticsearch_vector.py:374` | 30ms | 200ms | 阻塞 (网络 I/O) | 🟡 |
|
||
| 10 | **ES 关键词检索** | `search_by_full_text()` | `elasticsearch_vector.py:468` | 20ms | 100ms | 阻塞 (网络 I/O) | 🟢 |
|
||
| 11 | **外部 Rerank** | `RedBearRerank.compress_documents()` | `models/rerank.py:11` | 100ms | 500ms | 阻塞 (网络 I/O) | 🟡 |
|
||
| 12 | **GraphRAG 检索** | `KGSearch.retrieval()` | `graphrag/search.py:19` | 200ms | 1s | 阻塞 (LLM+ES) | 🟡 |
|
||
| 13 | **LLM 首次调用 (判断工具)** | `_chat()` | `chat_model.py:122` | 500ms | 3s | 阻塞 (网络 I/O) | 🔴 |
|
||
| 14 | **LLM 流式生成** | `_chat_streamly()` | `chat_model.py:152` | 500ms | 5s | 非阻塞 (SSE 流式) | 🔴 |
|
||
| 15 | **引用回填** | `Dealer.insert_citations()` | `search.py:489` | 100ms | 500ms | 阻塞 (本地 embedding) | 🟡 |
|
||
|
||
### 3.1 瓶颈分析
|
||
|
||
| 瓶颈 | 根因 | 缓解方向 |
|
||
|------|------|---------|
|
||
| PDF 解析 (P95=15s) | OCR + Layout + TSR 串行执行,GPU 模型加载慢 | MinerU 替代 / 异步队列 / 预加载模型 |
|
||
| Embedding API (P95=1s) | 外部 API 延迟,batch_size=16 不够大 | 本地 Xinference / GPUStack 部署 |
|
||
| GraphRAG 建图 (P95=120s) | LLM 多轮抽取,单文档串行 | 增加 max_parallel_documents / 增量更新 |
|
||
| LLM 流式输出 (P95=5s) | 首次 token (TTFT) 慢,长答案总耗时长 | 缓存高频 query / 缩短 max_tokens |
|
||
|
||
---
|
||
|
||
## 4. 多场景调用链
|
||
|
||
### 4.1 场景 A:纯向量检索问答
|
||
|
||
**适用**:语义匹配质量高的知识库,用户问题与文档表述风格一致。
|
||
|
||
```
|
||
[User Query]
|
||
│
|
||
▼
|
||
AppChatService.agnet_chat() [services/app_chat_service.py:43] async
|
||
│
|
||
▼
|
||
LangChainAgent.invoke() [core/agent/langchain_agent.py:65] sync
|
||
│
|
||
▼
|
||
knowledge_retrieval_tool 调用
|
||
│
|
||
▼
|
||
knowledge_retrieval() [core/rag/nlp/search.py:36] sync
|
||
│
|
||
▼
|
||
_retrieve_for_knowledge() [core/rag/nlp/search.py:149] sync
|
||
│ retrieve_type="semantic"
|
||
▼
|
||
ElasticSearchVector.search_by_vector() [core/rag/vdb/elasticsearch/elasticsearch_vector.py:374] sync
|
||
│
|
||
├─► embed_query(query) [core/models/embedding.py:65] sync, HTTP
|
||
│ │
|
||
│ ▼
|
||
│ List[float] query_vector
|
||
│
|
||
▼
|
||
ES script_score: cosineSimilarity(params.query_vector, 'vector') + 1.0
|
||
filter: metadata.status=1
|
||
│
|
||
▼
|
||
List[DocumentChunk] (score /2 归一化到 [0,1])
|
||
│
|
||
▼
|
||
score_threshold 过滤 (默认 0.3)
|
||
│
|
||
▼
|
||
返回 top_k chunks → Agent 上下文组装
|
||
│
|
||
▼
|
||
LLM _chat_streamly() 流式生成答案
|
||
```
|
||
|
||
**数据结构流转**:
|
||
```
|
||
query: str
|
||
→ query_vector: List[float] (dim=512/768/1024/1536)
|
||
→ ES hits: List[{_score, _source}]
|
||
→ DocumentChunk[] (score ∈ [0,1])
|
||
→ context: str (chunks 用 "\n\n" 拼接)
|
||
→ messages: List[BaseMessage] (system + history + context + query)
|
||
→ SSE stream: AsyncGenerator[str]
|
||
```
|
||
|
||
### 4.2 场景 B:混合检索问答 (关键词 + 向量)
|
||
|
||
**适用**:关键词精准度与语义召回互补的场景,如技术文档库。
|
||
|
||
```
|
||
[User Query]
|
||
│
|
||
▼
|
||
knowledge_retrieval() [core/rag/nlp/search.py:36] sync
|
||
│
|
||
▼
|
||
_retrieve_for_knowledge() [core/rag/nlp/search.py:149] sync
|
||
│ retrieve_type="hybrid" (默认分支)
|
||
▼
|
||
┌─────────────────────────────────────────┐
|
||
│ 双路并发 (asyncio.gather) │
|
||
│ │
|
||
│ 路 1: search_by_vector() │
|
||
│ [elasticsearch_vector.py:374] │
|
||
│ → embed_query() → ES script_score │
|
||
│ → 归一化 score /2 → [0,1] │
|
||
│ │
|
||
│ 路 2: search_by_full_text() │
|
||
│ [elasticsearch_vector.py:468] │
|
||
│ → match + ik_max_word → BM25 │
|
||
│ → 归一化 _score/max_score → [0,1] │
|
||
└─────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
metadata.doc_id 去重 (后到的丢弃)
|
||
│
|
||
▼
|
||
ElasticSearchVector.rerank() [elasticsearch_vector.py:560] sync
|
||
│
|
||
▼
|
||
RedBearRerank.compress_documents() [core/models/rerank.py:11] sync
|
||
│ 外部 API 调用 (Xinference/GPUStack/DashScope)
|
||
▼
|
||
按 relevance_score 降序取 top_k
|
||
│
|
||
▼
|
||
返回 DocumentChunk[] → Agent
|
||
```
|
||
|
||
**融合公式**(路径 B 应用层):
|
||
```
|
||
candidates = vector_topk(q) ∪ bm25_topk(q)
|
||
deduped = unique_by(metadata.doc_id, candidates)
|
||
final = reranker(query, deduped)[:top_k] (若配置 reranker)
|
||
or sort_by_score_desc(deduped)[:top_k] (未配置时)
|
||
```
|
||
|
||
### 4.3 场景 C:GraphRAG 关系推理问答
|
||
|
||
**适用**:需要多跳推理、实体关联分析、全局洞察的复杂问答。
|
||
|
||
```
|
||
[User Query]
|
||
│
|
||
▼
|
||
knowledge_retrieval() [core/rag/nlp/search.py:36] sync
|
||
│
|
||
▼
|
||
_retrieve_for_knowledge() [core/rag/nlp/search.py:149] sync
|
||
│ retrieve_type="graph"
|
||
├─► 先执行 hybrid 检索 (同场景 B)
|
||
│
|
||
▼
|
||
KGSearch.retrieval() [core/rag/graphrag/search.py:19] sync
|
||
│
|
||
▼
|
||
query_rewrite() [graphrag/search.py:33]
|
||
│
|
||
├─► LLM Prompt: minirag_query2kwd
|
||
│ 输入: question + TYPE_POOL (从 ES 采样)
|
||
│ 输出: {answer_type_keywords, entities_from_query}
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────┐
|
||
│ 三路召回并行 │
|
||
│ │
|
||
│ 路 1: get_relevant_ents_by_keywords() │
|
||
│ → embed_query(entities) → ES knn │
|
||
│ → 实体向量相似度召回 (sim_threshold=0.3)│
|
||
│ │
|
||
│ 路 2: get_relevant_ents_by_types() │
|
||
│ → answer_type_keywords 精确匹配 │
|
||
│ │
|
||
│ 路 3: get_relevant_relations_by_txt() │
|
||
│ → 关系向量相似度召回 │
|
||
└─────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
n-hop 路径扩展 (预计算)
|
||
│ sim_decay = 1/(2 + hop_depth)
|
||
▼
|
||
融合打分: score = sim × pagerank
|
||
│ 实体排序: sim × pagerank
|
||
│ 关系排序: sim × pagerank × boost
|
||
▼
|
||
Token 预算截断 (max_token 递减)
|
||
│
|
||
▼
|
||
社区报告召回 (comm_topn=1)
|
||
│
|
||
▼
|
||
返回: {page_content: entities + relations + community,
|
||
metadata: {...}, vector: None}
|
||
│
|
||
▼
|
||
插入 hybrid 结果头部: rs.insert(0, graph_chunk)
|
||
│
|
||
▼
|
||
Agent 上下文组装 → LLM 生成
|
||
```
|
||
|
||
**GraphRAG 建图调用链**(前置条件):
|
||
```
|
||
tasks.py:build_graphrag_for_kb()
|
||
→ run_graphrag_for_kb() [graphrag/general/index.py:122]
|
||
→ generate_subgraph() [index.py:333]
|
||
→ LLM 抽取 entities + relations (多轮 gleaning, max=2)
|
||
→ merge_subgraph() [index.py:409]
|
||
→ graph_merge() [utils.py:199]
|
||
→ [可选] EntityResolution() [entity_resolution.py:53]
|
||
→ [可选] leiden.run() [leiden.py:95]
|
||
→ [可选] CommunityReportsExtractor() [community_reports_extractor.py:55]
|
||
→ ES 写入 entity/relation/community chunks
|
||
```
|
||
|
||
---
|
||
|
||
## 5. 错误传播与降级路径
|
||
|
||
### 5.1 错误传播矩阵
|
||
|
||
| 环节 | 失败模式 | 影响范围 | 兜底逻辑 | 源码位置 |
|
||
|------|---------|---------|---------|---------|
|
||
| **PDF 解析** | OCR 模型缺失 / GPU 不可用 | 单文档失败 | `callback(-1, "OCR model not found")`,任务标记为 failed_document | `pdf_parser.py:50` |
|
||
| **LibreOffice 转换** | soffice 未安装 / 120s 超时 | PPT/DOC 失败 | 抛 HTTP 500,无自动降级 | `utils/libre_office.py:11` |
|
||
| **Embedding API** | 超时 / 限流 / 鉴权失败 | 单批 chunks 失败 | 抛出异常,helpers.bulk 不捕获,整批失败需重试 | `models/embedding.py:65` |
|
||
| **ES 写入** | ConnectionTimeout / 集群不可用 | 单批 chunks 失败 | `ATTEMPT_TIME=2` 重试,回连后重发 | `utils/es_conn.py:294` |
|
||
| **GraphRAG 抽取** | LLM 输出格式错误 | 单 chunk 失败 | `json_repair` 容错 + max_errors=3,超限时跳过 | `extractor.py:97` |
|
||
| **GraphRAG 消歧** | LLM 超时 (280s) | 消歧失败 | `trio.move_on_after` 超时,跳过消歧阶段 | `entity_resolution.py:53` |
|
||
| **知识库检索** | 单 KB 不可用 | 其他 KB 不受影响 | `try/except continue`,失败 KB 被跳过 | `search.py:110` |
|
||
| **向量检索为空** | 阈值过严 / 维度不匹配 | 当前 KB 无结果 | fallback: 降低 min_match 0.3→0.1,提高 similarity 0.1→0.17 | `search.py:447` |
|
||
| **外部 Rerank** | API 超时 / 模型不可用 | 无重排序结果 | fallback: 返回原始结果(不打乱顺序) | `search.py:115` |
|
||
| **GraphRAG 检索** | 图谱未建 / ES 查询失败 | 无图谱增强结果 | fallback: 仅返回 hybrid 结果 | `search.py:263` |
|
||
| **LLM 调用** | RATE_LIMIT / SERVER_ERROR | 生成失败 | 重试 5 次 + 随机抖动;仍失败返回 `**ERROR**: ...` | `chat_model.py:64` |
|
||
| **LLM 截断** | finish_reason="length" | 答案不完整 | 自动追加截断提示 (中英文自适应) | `chat_model.py:152` |
|
||
| **引用回填** | embedding 匹配失败 | 无引用标记 | 跳过 citation 插入,返回裸文本 | `search.py:489` |
|
||
|
||
### 5.2 降级路径图
|
||
|
||
```
|
||
正常路径:
|
||
Query → Hybrid 检索 → Rerank → LLM 生成 → 引用回填 → 输出
|
||
|
||
降级路径 1 (检索为空):
|
||
Query → Hybrid 检索 (空) → fallback 降低阈值重试 → 仍空 → LLM 直接回答 (无上下文)
|
||
|
||
降级路径 2 (Rerank 失败):
|
||
Query → Hybrid 检索 → Rerank API 超时 → fallback 返回原始排序 → LLM 生成
|
||
|
||
降级路径 3 (GraphRAG 失败):
|
||
Query → Hybrid 检索 → GraphRAG 查询失败 → fallback 仅 hybrid 结果 → LLM 生成
|
||
|
||
降级路径 4 (单 KB 失败):
|
||
Query → KB-A (失败, try/except) + KB-B (成功) → 合并结果 → LLM 生成
|
||
|
||
降级路径 5 (LLM 失败):
|
||
Query → 检索成功 → LLM 调用失败 (5 次重试后) → 返回 "**ERROR**: 服务暂不可用"
|
||
|
||
降级路径 6 (ES 集群不可用):
|
||
Query → ES 连接失败 → 无检索结果 → LLM 直接回答 (无上下文) / 返回错误
|
||
```
|
||
|
||
### 5.3 关键降级代码片段
|
||
|
||
```python
|
||
# 1. 单 KB 失败不影响整体 (search.py:110)
|
||
try:
|
||
rs, chat_model, embedding_model = _retrieve_for_knowledge(...)
|
||
all_results.extend(rs)
|
||
except Exception as e:
|
||
print(f"retrieval knowledge({kb_id}) failed: {str(e)}")
|
||
continue # 跳过失败 KB
|
||
|
||
# 2. Rerank 失败 fallback (search.py:115-128)
|
||
if reranker_id and all_results:
|
||
try:
|
||
all_results = rerank(...)
|
||
except Exception as rerank_error:
|
||
logger.warning("Reranker failed, falling back to original results")
|
||
# fallback: 保持原始排序
|
||
|
||
# 3. 检索为空 fallback (search.py:447-459)
|
||
if total == 0:
|
||
matchText, _ = self.qryr.question(qst, min_match=0.1) # 0.3 → 0.1
|
||
matchDense.extra_options["similarity"] = 0.17 # 0.1 → 0.17
|
||
res = self.dataStore.search(...)
|
||
|
||
# 4. GraphRAG 失败 fallback (search.py:263)
|
||
try:
|
||
graph_doc = kg_retriever.retrieval(...)
|
||
rs.insert(0, DocumentChunk(...))
|
||
except Exception as graph_error:
|
||
logger.warning(f"Graph retrieval failed...") # 仅 hybrid 结果
|
||
|
||
# 5. LLM 重试 (chat_model.py:64-89)
|
||
retry_max = LLM_MAX_RETRIES # 默认 5
|
||
while retry_max > 0:
|
||
try:
|
||
return self.client.chat.completions.create(...)
|
||
except (RateLimitError, APIConnectionError, APIError):
|
||
time.sleep(random.uniform(1, LLM_BASE_DELAY * 2 ** (5-retry_max)))
|
||
retry_max -= 1
|
||
```
|
||
|
||
---
|
||
|
||
## 附录:跨文档引用索引
|
||
|
||
| 本章节 | 引用来源 | 被引文档 |
|
||
|--------|---------|---------|
|
||
| §1 Loader/Parser/Chunking | `naive.py:508`, `naive_merge()` | [S2-T1] |
|
||
| §1/§2 Embedding | `embed_documents()`, `embed_query()` | [S2-T2] |
|
||
| §1/§2 VDB 检索与写入 | `search_by_vector()`, `add_chunks()`, mapping | [S2-T3] |
|
||
| §1/§2 GraphRAG | `KGSearch.retrieval()`, `run_graphrag()` | [S2-T4] |
|
||
| §1 Rerank/Prompt/LLM | `RedBearRerank`, `_chat_streamly()`, `_filter_citations()` | [S2-T5] |
|
||
|
||
---
|
||
|
||
*本文档直接整合自 [S2-T1]~[S2-T5] 五篇子任务文档的源码引用与流程描述,所有文件:行号均可在 MemoryBear 仓库 commit `feae2f2e` 中验证。* |