%% MemoryBear GraphRAG 索引构建时序图 %% 覆盖 Light 与 General 两条分支的差异 sequenceDiagram autonumber participant Celery as Celery
tasks.py:473 participant Index as graphrag/general/index.py
run_graphrag_for_kb() participant KGExt as GraphExtractor
light/graph_extractor.py:31
general/graph_extractor.py:34 participant LLM as llm/chat_model.py participant ES as ESVector
elasticsearch_vector.py participant Merge as merge_subgraph() participant Resolve as entity_resolution.py
EntityResolution participant Leiden as general/leiden.py
run() participant Community as general/
community_reports_extractor.py:37 Note over Celery,Community: === 触发条件 === Celery->>Celery: build_graphrag_for_kb(kb_id) Celery->>Celery: 检查 parser_config.graphrag.use_graphrag Celery->>Index: run_graphrag_for_kb(row, document_ids, ...) Note over Index,LLM: === 阶段 1:子图生成 (按 chunk) === Index->>Index: init_graphrag(task, vector_size) Index->>Index: generate_subgraph() per chunk loop 每个 chunk Index->>KGExt: _process_single_content(chunk_key_dp, chunk_text) alt Light 分支 KGExt->>KGExt: LightRAG-style prompt
+ content_keywords 提取 KGExt->>KGExt: GLEANING loop (max 2) else General 分支 KGExt->>KGExt: MS GraphRAG-style prompt
perform_variable_replacements KGExt->>KGExt: tiktoken logit-bias Y/N loop end KGExt->>LLM: LLM 调用 → entities + relations JSON LLM-->>KGExt: extracted data KGExt->>KGExt: _merge_nodes() + _merge_edges() KGExt-->>Index: (entities_data, relationships_data) end Index->>ES: store subgraph (entities + relations chunks) Note over Merge,ES: === 阶段 2:子图合并 === Index->>Merge: merge_subgraph() Merge->>ES: get_graph() 加载全局图 Merge->>Merge: graph_merge(old_graph, subgraph, change) Merge->>Merge: nx.pagerank(new_graph) Merge->>ES: set_graph() 写回全局图 + entities + relations Note over Resolve,ES: === 阶段 3:实体消歧 (可选) === opt with_resolution == True Index->>Resolve: resolve_entities(graph, subgraph_nodes) Resolve->>LLM: 两两实体相似度 LLM 匹配 LLM-->>Resolve: 合并建议 Resolve->>Resolve: nx.pagerank(graph) Resolve->>ES: set_graph() end Note over Leiden,Community: === 阶段 4:社区报告 (General only) === opt with_community == True (General) Index->>Leiden: leiden.run(graph) Leiden->>Leiden: graspologic.partition.
hierarchical_leiden
max_cluster_size=12 Leiden-->>Index: {level: {community_id: {nodes: [...]}}} loop 每个 community (nodes >= 2) Index->>Community: __call__(graph, callback) Community->>Community: 构建 entity_df + relation_df Community->>LLM: COMMUNITY_REPORT_PROMPT LLM-->>Community: {title, summary, findings, rating} Community->>Community: add_community_info2graph() end Community->>ES: index community_report chunks end Note over Index,ES: === Mind Map (独立功能,非主链路) === Note right of Index: mind_map_extractor.py
由外部调用,非索引管道
sections → 层级 markdown mind map