Files
MemoryBear/docs/rag/evolution/capability-map.mmd
Multica PM Agent 343a5eebe3
Some checks failed
Sync to Gitee / sync (push) Has been cancelled
docs(rag): add MemoryBear RAG implementation docs v1.0
Submit the formed RAG documentation set produced across Sprint-1/2/3
(WS-12 through WS-26) under docs/rag/. Includes:

- README.md / INDEX.md: landing + total index (responsibility matrix,
  review verdicts, dual-link to source issues)
- overview/: full-pipeline architecture (4 .mmd diagrams),
  11-stage boundary contracts, doc map, source-code inventory
- pipeline/: 5 deep-dives (Loader/Parser/Chunking, Embedding,
  VDB & retrieval, GraphRAG, Rerank/Prompt/LLM)
- graphrag/, end-to-end/: v1.0 formal versions with full source
  retained as reference
- evolution/: 11 architecture-refactor proposals,
  6-direction roadmap, capability map
- review/: S3-T1 / S3-T2 final reviews, S2-T7 final summary
- _indexes/: glossary (81 terms), source->doc reverse index, chart index
- _release/: v1.0-RC1 release manifest, versioning convention,
  ops & freshness plan
- _meta/README.md: placeholder noting WS-12 governance assets gap

Aggregate review score 92.6/100 (8/8 PASS, 31/31 source-code spot
checks hit). The legacy docs/ ignore in .gitignore is narrowed to
docs/* with an explicit allowlist for docs/rag/.

Refs: WS-26
Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 10:51:48 +08:00

99 lines
3.7 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
%% MemoryBear RAG 能力地图Capability Map
%% 横轴:能力域;纵轴:成熟度(已有 / 近期可上 / 中长期愿景)
%% 与 [S3-T1] 提议的 Retriever / Reranker / Generator / Embedder 抽象接口对齐
graph LR
classDef have fill:#10b981,stroke:#065f46,color:#fff,stroke-width:1px
classDef near fill:#f59e0b,stroke:#92400e,color:#fff,stroke-width:1px
classDef vision fill:#6366f1,stroke:#3730a3,color:#fff,stroke-width:1px
classDef domain fill:#e5e7eb,stroke:#374151,color:#111,stroke-width:1px
subgraph DLOAD[数据接入]
L1[Web 爬虫]:::have
L2[飞书 / 语雀 / 文件上传]:::have
L3[企业 IM / 邮件 / Notion / S3 增量同步]:::near
L4[流式数据 / Kafka / CDC]:::vision
end
subgraph DPARSE[解析与多模态采集]
P1[deepdoc PDF/OCR/Layout/Table]:::have
P2[图片 OCR + VLM describe]:::have
P3[音频 ASR]:::have
P4[视频 VLM 整体描述]:::have
P5[音视频时间戳化抽帧 + 关键帧 caption]:::near
P6[原生 CLIP/BGE-VL 跨模态嵌入]:::vision
end
subgraph DCHUNK[切分与表征]
C1[naive_merge / 类型化 chunker]:::have
C2[RagTokenizer 中英分词]:::have
C3[Late-Interaction / ColBERT 子词表征]:::near
C4[语义分块 + 自适应粒度]:::vision
end
subgraph DEMB[Embedding]
E1[10+ Provider 工厂]:::have
E2[问题增强 question_proposal]:::have
E3[Sparse 向量 / SPLADE 学习稀疏]:::near
E4[Multi-Vector / 多语种统一编码]:::vision
end
subgraph DVDB[向量与检索]
V1[ES dense_vector + BM25]:::have
V2[FusionExpr 0.05/0.95 加权融合]:::have
V3[KGSearch N-hop + Community]:::have
V4[HNSW 量化 / Sparse 索引上线]:::near
V5[语义路由 / 多检索器自适应组合]:::near
V6[联邦检索 / 跨租户隐私检索]:::vision
end
subgraph DRANK[重排序]
R1[内置 token+vector 融合排序]:::have
R2[Jina / DashScope / Xinference 外部 Reranker]:::have
R3[Cross-Encoder 蒸馏 + 在线 PairWise 学习]:::near
R4[基于反馈的自动 Reranker 微调]:::vision
end
subgraph DKG[知识图谱]
K1[GraphRAG light + general]:::have
K2[entity_resolution + Leiden 社区]:::have
K3[增量图演化 + 时间戳]:::near
K4[路径解释性 + Neo4j 双引擎]:::near
K5[多源图融合 / 自动本体演化]:::vision
end
subgraph DMEM[对话记忆]
M1[memory.forgetting_engine Ebbinghaus]:::have
M2[memory.reflection_engine 周期反思]:::have
M3[langgraph 读图 Agent]:::have
M4[短期 ↔ 长期 ↔ 检索召回三段桥接]:::near
M5[人格化记忆策略 + 用户偏好学习]:::vision
end
subgraph DEVAL[评估与反馈闭环]
EV1[README F1/BLEU/J 论文级评估]:::have
EV2[RAGAS / TruLens 集成 + 在线 A/B]:::near
EV3[👍/👎 反馈 → Rerank 微调闭环]:::near
EV4[自演化路由策略 / RLHF 长记忆]:::vision
end
subgraph DOPS[平台与可观测]
O1[Celery 任务链 + Redis 缓存]:::have
O2[FastAPI / Swagger]:::have
O3[OpenTelemetry Trace + 检索指标看板]:::near
O4[Prompt 仓库 + Eval CI / 灰度发布]:::vision
end
%% 跨域依赖(仅画关键边,避免过密)
DLOAD --> DPARSE
DPARSE --> DCHUNK
DCHUNK --> DEMB
DEMB --> DVDB
DVDB --> DRANK
DRANK -. citations .-> DOPS
DCHUNK -. async .-> DKG
DKG --> DVDB
DEVAL -. metrics .-> DRANK
DEVAL -. metrics .-> DVDB
DMEM -. memory-augmented retrieval .-> DVDB
DMEM -. summary into prompt .-> DRANK