Some checks failed
Sync to Gitee / sync (push) Has been cancelled
Submit the formed RAG documentation set produced across Sprint-1/2/3 (WS-12 through WS-26) under docs/rag/. Includes: - README.md / INDEX.md: landing + total index (responsibility matrix, review verdicts, dual-link to source issues) - overview/: full-pipeline architecture (4 .mmd diagrams), 11-stage boundary contracts, doc map, source-code inventory - pipeline/: 5 deep-dives (Loader/Parser/Chunking, Embedding, VDB & retrieval, GraphRAG, Rerank/Prompt/LLM) - graphrag/, end-to-end/: v1.0 formal versions with full source retained as reference - evolution/: 11 architecture-refactor proposals, 6-direction roadmap, capability map - review/: S3-T1 / S3-T2 final reviews, S2-T7 final summary - _indexes/: glossary (81 terms), source->doc reverse index, chart index - _release/: v1.0-RC1 release manifest, versioning convention, ops & freshness plan - _meta/README.md: placeholder noting WS-12 governance assets gap Aggregate review score 92.6/100 (8/8 PASS, 31/31 source-code spot checks hit). The legacy docs/ ignore in .gitignore is narrowed to docs/* with an explicit allowlist for docs/rag/. Refs: WS-26 Co-authored-by: multica-agent <github@multica.ai>
99 lines
3.7 KiB
Plaintext
99 lines
3.7 KiB
Plaintext
%% MemoryBear RAG 能力地图(Capability Map)
|
||
%% 横轴:能力域;纵轴:成熟度(已有 / 近期可上 / 中长期愿景)
|
||
%% 与 [S3-T1] 提议的 Retriever / Reranker / Generator / Embedder 抽象接口对齐
|
||
graph LR
|
||
classDef have fill:#10b981,stroke:#065f46,color:#fff,stroke-width:1px
|
||
classDef near fill:#f59e0b,stroke:#92400e,color:#fff,stroke-width:1px
|
||
classDef vision fill:#6366f1,stroke:#3730a3,color:#fff,stroke-width:1px
|
||
classDef domain fill:#e5e7eb,stroke:#374151,color:#111,stroke-width:1px
|
||
|
||
subgraph DLOAD[数据接入]
|
||
L1[Web 爬虫]:::have
|
||
L2[飞书 / 语雀 / 文件上传]:::have
|
||
L3[企业 IM / 邮件 / Notion / S3 增量同步]:::near
|
||
L4[流式数据 / Kafka / CDC]:::vision
|
||
end
|
||
|
||
subgraph DPARSE[解析与多模态采集]
|
||
P1[deepdoc PDF/OCR/Layout/Table]:::have
|
||
P2[图片 OCR + VLM describe]:::have
|
||
P3[音频 ASR]:::have
|
||
P4[视频 VLM 整体描述]:::have
|
||
P5[音视频时间戳化抽帧 + 关键帧 caption]:::near
|
||
P6[原生 CLIP/BGE-VL 跨模态嵌入]:::vision
|
||
end
|
||
|
||
subgraph DCHUNK[切分与表征]
|
||
C1[naive_merge / 类型化 chunker]:::have
|
||
C2[RagTokenizer 中英分词]:::have
|
||
C3[Late-Interaction / ColBERT 子词表征]:::near
|
||
C4[语义分块 + 自适应粒度]:::vision
|
||
end
|
||
|
||
subgraph DEMB[Embedding]
|
||
E1[10+ Provider 工厂]:::have
|
||
E2[问题增强 question_proposal]:::have
|
||
E3[Sparse 向量 / SPLADE 学习稀疏]:::near
|
||
E4[Multi-Vector / 多语种统一编码]:::vision
|
||
end
|
||
|
||
subgraph DVDB[向量与检索]
|
||
V1[ES dense_vector + BM25]:::have
|
||
V2[FusionExpr 0.05/0.95 加权融合]:::have
|
||
V3[KGSearch N-hop + Community]:::have
|
||
V4[HNSW 量化 / Sparse 索引上线]:::near
|
||
V5[语义路由 / 多检索器自适应组合]:::near
|
||
V6[联邦检索 / 跨租户隐私检索]:::vision
|
||
end
|
||
|
||
subgraph DRANK[重排序]
|
||
R1[内置 token+vector 融合排序]:::have
|
||
R2[Jina / DashScope / Xinference 外部 Reranker]:::have
|
||
R3[Cross-Encoder 蒸馏 + 在线 PairWise 学习]:::near
|
||
R4[基于反馈的自动 Reranker 微调]:::vision
|
||
end
|
||
|
||
subgraph DKG[知识图谱]
|
||
K1[GraphRAG light + general]:::have
|
||
K2[entity_resolution + Leiden 社区]:::have
|
||
K3[增量图演化 + 时间戳]:::near
|
||
K4[路径解释性 + Neo4j 双引擎]:::near
|
||
K5[多源图融合 / 自动本体演化]:::vision
|
||
end
|
||
|
||
subgraph DMEM[对话记忆]
|
||
M1[memory.forgetting_engine Ebbinghaus]:::have
|
||
M2[memory.reflection_engine 周期反思]:::have
|
||
M3[langgraph 读图 Agent]:::have
|
||
M4[短期 ↔ 长期 ↔ 检索召回三段桥接]:::near
|
||
M5[人格化记忆策略 + 用户偏好学习]:::vision
|
||
end
|
||
|
||
subgraph DEVAL[评估与反馈闭环]
|
||
EV1[README F1/BLEU/J 论文级评估]:::have
|
||
EV2[RAGAS / TruLens 集成 + 在线 A/B]:::near
|
||
EV3[👍/👎 反馈 → Rerank 微调闭环]:::near
|
||
EV4[自演化路由策略 / RLHF 长记忆]:::vision
|
||
end
|
||
|
||
subgraph DOPS[平台与可观测]
|
||
O1[Celery 任务链 + Redis 缓存]:::have
|
||
O2[FastAPI / Swagger]:::have
|
||
O3[OpenTelemetry Trace + 检索指标看板]:::near
|
||
O4[Prompt 仓库 + Eval CI / 灰度发布]:::vision
|
||
end
|
||
|
||
%% 跨域依赖(仅画关键边,避免过密)
|
||
DLOAD --> DPARSE
|
||
DPARSE --> DCHUNK
|
||
DCHUNK --> DEMB
|
||
DEMB --> DVDB
|
||
DVDB --> DRANK
|
||
DRANK -. citations .-> DOPS
|
||
DCHUNK -. async .-> DKG
|
||
DKG --> DVDB
|
||
DEVAL -. metrics .-> DRANK
|
||
DEVAL -. metrics .-> DVDB
|
||
DMEM -. memory-augmented retrieval .-> DVDB
|
||
DMEM -. summary into prompt .-> DRANK
|