MemoryBear/docs/rag/overview/04-graphrag-indexing.mmd

%% MemoryBear GraphRAG 索引构建时序图
%% 覆盖 Light 与 General 两条分支的差异

sequenceDiagram
    autonumber
    participant Celery as Celery<br/>tasks.py:473
    participant Index as graphrag/general/index.py<br/>run_graphrag_for_kb()
    participant KGExt as GraphExtractor<br/>light/graph_extractor.py:31<br/>general/graph_extractor.py:34
    participant LLM as llm/chat_model.py
    participant ES as ESVector<br/>elasticsearch_vector.py
    participant Merge as merge_subgraph()
    participant Resolve as entity_resolution.py<br/>EntityResolution
    participant Leiden as general/leiden.py<br/>run()
    participant Community as general/<br/>community_reports_extractor.py:37

    Note over Celery,Community: === 触发条件 ===
    Celery->>Celery: build_graphrag_for_kb(kb_id)
    Celery->>Celery: 检查 parser_config.graphrag.use_graphrag
    Celery->>Index: run_graphrag_for_kb(row, document_ids, ...)

    Note over Index,LLM: === 阶段 1：子图生成 (按 chunk) ===
    Index->>Index: init_graphrag(task, vector_size)
    Index->>Index: generate_subgraph() per chunk

    loop 每个 chunk
        Index->>KGExt: _process_single_content(chunk_key_dp, chunk_text)

        alt Light 分支
            KGExt->>KGExt: LightRAG-style prompt<br/>+ content_keywords 提取
            KGExt->>KGExt: GLEANING loop (max 2)
        else General 分支
            KGExt->>KGExt: MS GraphRAG-style prompt<br/>perform_variable_replacements
            KGExt->>KGExt: tiktoken logit-bias Y/N loop
        end

        KGExt->>LLM: LLM 调用 → entities + relations JSON
        LLM-->>KGExt: extracted data
        KGExt->>KGExt: _merge_nodes() + _merge_edges()
        KGExt-->>Index: (entities_data, relationships_data)
    end

    Index->>ES: store subgraph (entities + relations chunks)

    Note over Merge,ES: === 阶段 2：子图合并 ===
    Index->>Merge: merge_subgraph()
    Merge->>ES: get_graph() 加载全局图
    Merge->>Merge: graph_merge(old_graph, subgraph, change)
    Merge->>Merge: nx.pagerank(new_graph)
    Merge->>ES: set_graph() 写回全局图 + entities + relations

    Note over Resolve,ES: === 阶段 3：实体消歧 (可选) ===
    opt with_resolution == True
        Index->>Resolve: resolve_entities(graph, subgraph_nodes)
        Resolve->>LLM: 两两实体相似度 LLM 匹配
        LLM-->>Resolve: 合并建议
        Resolve->>Resolve: nx.pagerank(graph)
        Resolve->>ES: set_graph()
    end

    Note over Leiden,Community: === 阶段 4：社区报告 (General only) ===
    opt with_community == True (General)
        Index->>Leiden: leiden.run(graph)
        Leiden->>Leiden: graspologic.partition.<br/>hierarchical_leiden<br/>max_cluster_size=12
        Leiden-->>Index: {level: {community_id: {nodes: [...]}}}

        loop 每个 community (nodes >= 2)
            Index->>Community: __call__(graph, callback)
            Community->>Community: 构建 entity_df + relation_df
            Community->>LLM: COMMUNITY_REPORT_PROMPT
            LLM-->>Community: {title, summary, findings, rating}
            Community->>Community: add_community_info2graph()
        end

        Community->>ES: index community_report chunks
    end

    Note over Index,ES: === Mind Map (独立功能，非主链路) ===
    Note right of Index: mind_map_extractor.py<br/>由外部调用，非索引管道<br/>sections → 层级 markdown mind map