Some checks failed
Sync to Gitee / sync (push) Has been cancelled
Submit the formed RAG documentation set produced across Sprint-1/2/3 (WS-12 through WS-26) under docs/rag/. Includes: - README.md / INDEX.md: landing + total index (responsibility matrix, review verdicts, dual-link to source issues) - overview/: full-pipeline architecture (4 .mmd diagrams), 11-stage boundary contracts, doc map, source-code inventory - pipeline/: 5 deep-dives (Loader/Parser/Chunking, Embedding, VDB & retrieval, GraphRAG, Rerank/Prompt/LLM) - graphrag/, end-to-end/: v1.0 formal versions with full source retained as reference - evolution/: 11 architecture-refactor proposals, 6-direction roadmap, capability map - review/: S3-T1 / S3-T2 final reviews, S2-T7 final summary - _indexes/: glossary (81 terms), source->doc reverse index, chart index - _release/: v1.0-RC1 release manifest, versioning convention, ops & freshness plan - _meta/README.md: placeholder noting WS-12 governance assets gap Aggregate review score 92.6/100 (8/8 PASS, 31/31 source-code spot checks hit). The legacy docs/ ignore in .gitignore is narrowed to docs/* with an explicit allowlist for docs/rag/. Refs: WS-26 Co-authored-by: multica-agent <github@multica.ai>
79 lines
3.3 KiB
Plaintext
79 lines
3.3 KiB
Plaintext
%% MemoryBear GraphRAG 索引构建时序图
|
||
%% 覆盖 Light 与 General 两条分支的差异
|
||
|
||
sequenceDiagram
|
||
autonumber
|
||
participant Celery as Celery<br/>tasks.py:473
|
||
participant Index as graphrag/general/index.py<br/>run_graphrag_for_kb()
|
||
participant KGExt as GraphExtractor<br/>light/graph_extractor.py:31<br/>general/graph_extractor.py:34
|
||
participant LLM as llm/chat_model.py
|
||
participant ES as ESVector<br/>elasticsearch_vector.py
|
||
participant Merge as merge_subgraph()
|
||
participant Resolve as entity_resolution.py<br/>EntityResolution
|
||
participant Leiden as general/leiden.py<br/>run()
|
||
participant Community as general/<br/>community_reports_extractor.py:37
|
||
|
||
Note over Celery,Community: === 触发条件 ===
|
||
Celery->>Celery: build_graphrag_for_kb(kb_id)
|
||
Celery->>Celery: 检查 parser_config.graphrag.use_graphrag
|
||
Celery->>Index: run_graphrag_for_kb(row, document_ids, ...)
|
||
|
||
Note over Index,LLM: === 阶段 1:子图生成 (按 chunk) ===
|
||
Index->>Index: init_graphrag(task, vector_size)
|
||
Index->>Index: generate_subgraph() per chunk
|
||
|
||
loop 每个 chunk
|
||
Index->>KGExt: _process_single_content(chunk_key_dp, chunk_text)
|
||
|
||
alt Light 分支
|
||
KGExt->>KGExt: LightRAG-style prompt<br/>+ content_keywords 提取
|
||
KGExt->>KGExt: GLEANING loop (max 2)
|
||
else General 分支
|
||
KGExt->>KGExt: MS GraphRAG-style prompt<br/>perform_variable_replacements
|
||
KGExt->>KGExt: tiktoken logit-bias Y/N loop
|
||
end
|
||
|
||
KGExt->>LLM: LLM 调用 → entities + relations JSON
|
||
LLM-->>KGExt: extracted data
|
||
KGExt->>KGExt: _merge_nodes() + _merge_edges()
|
||
KGExt-->>Index: (entities_data, relationships_data)
|
||
end
|
||
|
||
Index->>ES: store subgraph (entities + relations chunks)
|
||
|
||
Note over Merge,ES: === 阶段 2:子图合并 ===
|
||
Index->>Merge: merge_subgraph()
|
||
Merge->>ES: get_graph() 加载全局图
|
||
Merge->>Merge: graph_merge(old_graph, subgraph, change)
|
||
Merge->>Merge: nx.pagerank(new_graph)
|
||
Merge->>ES: set_graph() 写回全局图 + entities + relations
|
||
|
||
Note over Resolve,ES: === 阶段 3:实体消歧 (可选) ===
|
||
opt with_resolution == True
|
||
Index->>Resolve: resolve_entities(graph, subgraph_nodes)
|
||
Resolve->>LLM: 两两实体相似度 LLM 匹配
|
||
LLM-->>Resolve: 合并建议
|
||
Resolve->>Resolve: nx.pagerank(graph)
|
||
Resolve->>ES: set_graph()
|
||
end
|
||
|
||
Note over Leiden,Community: === 阶段 4:社区报告 (General only) ===
|
||
opt with_community == True (General)
|
||
Index->>Leiden: leiden.run(graph)
|
||
Leiden->>Leiden: graspologic.partition.<br/>hierarchical_leiden<br/>max_cluster_size=12
|
||
Leiden-->>Index: {level: {community_id: {nodes: [...]}}}
|
||
|
||
loop 每个 community (nodes >= 2)
|
||
Index->>Community: __call__(graph, callback)
|
||
Community->>Community: 构建 entity_df + relation_df
|
||
Community->>LLM: COMMUNITY_REPORT_PROMPT
|
||
LLM-->>Community: {title, summary, findings, rating}
|
||
Community->>Community: add_community_info2graph()
|
||
end
|
||
|
||
Community->>ES: index community_report chunks
|
||
end
|
||
|
||
Note over Index,ES: === Mind Map (独立功能,非主链路) ===
|
||
Note right of Index: mind_map_extractor.py<br/>由外部调用,非索引管道<br/>sections → 层级 markdown mind map
|