%% MemoryBear 在线检索时序图(Query Pipeline) %% 起点:用户 Query;终点:LLM 生成的回答 sequenceDiagram autonumber participant User as 用户/API participant WF as Workflow Engine
(workflow/nodes/knowledge/node.py) participant Config as config.py
KnowledgeRetrievalNodeConfig participant Retriever as nlp/search.py
knowledge_retrieval() participant Dealer as nlp/search.py:349
Dealer.search() participant Qryr as nlp/query.py
Query理解 participant ESVec as ESVector
elasticsearch_vector.py participant Graph as graphrag/search.py
KGSearch.retrieval() participant Rerank as models/rerank.py
RedBearRerank participant Prompt as prompts/generator.py participant LLM as llm/chat_model.py
Base.chat() participant Cache as utils/redis_conn.py Note over User,Cache: === 阶段 1:Query 准备 === User->>WF: 用户输入 Query WF->>WF: _render_template(query, variable_pool) WF->>Config: 读取 knowledge_bases[]
reranker_id / retrieve_type Note over Retriever,ESVec: === 阶段 2:多知识库检索 === loop 每个 Knowledge Base WF->>Retriever: knowledge_retrieval(query, config) Retriever->>DB: 验证 KB 状态 (chunk_num>0, status=1) alt RetrieveType == PARTICIPLE Retriever->>ESVec: search_by_full_text(query, top_k) ESVec->>ESVec: match on page_content (ik_max_word) ESVec-->>Retriever: List[DocumentChunk] else RetrieveType == SEMANTIC Retriever->>ESVec: search_by_vector(query, top_k) ESVec->>ESVec: script_score cosineSimilarity ESVec-->>Retriever: List[DocumentChunk] else RetrieveType == HYBRID par Retriever->>ESVec: search_by_vector() ESVec-->>Retriever: rs1 and Retriever->>ESVec: search_by_full_text() ESVec-->>Retriever: rs2 end Retriever->>Retriever: _deduplicate_docs(rs1, rs2) Retriever->>Rerank: rerank(query, docs, top_k) Rerank->>Rerank: similarity() 交叉编码评分 Rerank-->>Retriever: sorted docs else RetrieveType == GRAPH par Retriever->>ESVec: search_by_vector() ESVec-->>Retriever: rs1 and Retriever->>ESVec: search_by_full_text() ESVec-->>Retriever: rs2 end Retriever->>Retriever: dedup + rerank Retriever->>Graph: kg_retriever.retrieval(question) Graph->>Graph: query_rewrite() → keywords + entities Graph->>ESVec: get_relevant_ents_by_keywords() Graph->>ESVec: get_relevant_relations_by_txt() Graph->>Graph: n_hop_with_weight 路径扩展 Graph->>Graph: Score = pagerank * sim Graph->>Graph: _community_retrieval_() Graph-->>Retriever: Entity+Relation+CommunityReport chunk Retriever->>Retriever: insert(0, graph_result) end Retriever-->>WF: List[DocumentChunk] end WF->>WF: _deduplicate_docs(all_results) alt reranker_id 配置 WF->>Rerank: rerank(query, all_results, reranker_top_k) Rerank-->>WF: reranked chunks end Note over Prompt,Cache: === 阶段 3:Prompt 组装 + LLM 生成 === WF->>WF: 返回 {"chunks": [...], "citations": [...]} WF->>Prompt: citation_prompt(chunks) Prompt->>Prompt: 组装 System Prompt + 检索上下文 Prompt->>Cache: get_llm_cache(model, prompt) alt cache miss Prompt->>LLM: chat(system, history, gen_conf) LLM-->>Prompt: answer, tokens Prompt->>Cache: set_llm_cache(model, prompt, answer) else cache hit Cache-->>Prompt: cached answer end Note over User,Cache: === 阶段 4:后处理 === Prompt->>Dealer: insert_citations(answer, chunks, chunk_v) Dealer->>Dealer: pagerank*sim 定位引用位置 Dealer-->>Prompt: answer_with_citations, cited_ids Prompt-->>User: 最终回答(含引用标注)