%% MemoryBear GraphRAG 索引构建时序图
%% 覆盖 Light 与 General 两条分支的差异
sequenceDiagram
autonumber
participant Celery as Celery
tasks.py:473
participant Index as graphrag/general/index.py
run_graphrag_for_kb()
participant KGExt as GraphExtractor
light/graph_extractor.py:31
general/graph_extractor.py:34
participant LLM as llm/chat_model.py
participant ES as ESVector
elasticsearch_vector.py
participant Merge as merge_subgraph()
participant Resolve as entity_resolution.py
EntityResolution
participant Leiden as general/leiden.py
run()
participant Community as general/
community_reports_extractor.py:37
Note over Celery,Community: === 触发条件 ===
Celery->>Celery: build_graphrag_for_kb(kb_id)
Celery->>Celery: 检查 parser_config.graphrag.use_graphrag
Celery->>Index: run_graphrag_for_kb(row, document_ids, ...)
Note over Index,LLM: === 阶段 1:子图生成 (按 chunk) ===
Index->>Index: init_graphrag(task, vector_size)
Index->>Index: generate_subgraph() per chunk
loop 每个 chunk
Index->>KGExt: _process_single_content(chunk_key_dp, chunk_text)
alt Light 分支
KGExt->>KGExt: LightRAG-style prompt
+ content_keywords 提取
KGExt->>KGExt: GLEANING loop (max 2)
else General 分支
KGExt->>KGExt: MS GraphRAG-style prompt
perform_variable_replacements
KGExt->>KGExt: tiktoken logit-bias Y/N loop
end
KGExt->>LLM: LLM 调用 → entities + relations JSON
LLM-->>KGExt: extracted data
KGExt->>KGExt: _merge_nodes() + _merge_edges()
KGExt-->>Index: (entities_data, relationships_data)
end
Index->>ES: store subgraph (entities + relations chunks)
Note over Merge,ES: === 阶段 2:子图合并 ===
Index->>Merge: merge_subgraph()
Merge->>ES: get_graph() 加载全局图
Merge->>Merge: graph_merge(old_graph, subgraph, change)
Merge->>Merge: nx.pagerank(new_graph)
Merge->>ES: set_graph() 写回全局图 + entities + relations
Note over Resolve,ES: === 阶段 3:实体消歧 (可选) ===
opt with_resolution == True
Index->>Resolve: resolve_entities(graph, subgraph_nodes)
Resolve->>LLM: 两两实体相似度 LLM 匹配
LLM-->>Resolve: 合并建议
Resolve->>Resolve: nx.pagerank(graph)
Resolve->>ES: set_graph()
end
Note over Leiden,Community: === 阶段 4:社区报告 (General only) ===
opt with_community == True (General)
Index->>Leiden: leiden.run(graph)
Leiden->>Leiden: graspologic.partition.
hierarchical_leiden
max_cluster_size=12
Leiden-->>Index: {level: {community_id: {nodes: [...]}}}
loop 每个 community (nodes >= 2)
Index->>Community: __call__(graph, callback)
Community->>Community: 构建 entity_df + relation_df
Community->>LLM: COMMUNITY_REPORT_PROMPT
LLM-->>Community: {title, summary, findings, rating}
Community->>Community: add_community_info2graph()
end
Community->>ES: index community_report chunks
end
Note over Index,ES: === Mind Map (独立功能,非主链路) ===
Note right of Index: mind_map_extractor.py
由外部调用,非索引管道
sections → 层级 markdown mind map