feat(multimodal): support document image extraction and inline vision processing

Added document image extraction capability for PDF and DOCX files, including page/index metadata and storage integration. Extended `process_files` with `document_image_recognition` flag to conditionally enable vision-based image processing when model supports it. Updated knowledge repository and workflow node logic to enforce status=1 checks. Added PyMuPDF dependency.
This commit is contained in:
Timebomb2018
2026-04-24 11:18:50 +08:00
parent 9fdb952396
commit 767eb5e6f2
13 changed files with 397 additions and 52 deletions

View File

@@ -821,7 +821,7 @@ def get_rag_content(
for document in documents:
try:
kb = knowledge_repository.get_knowledge_by_id(db, document.kb_id)
if not kb:
if not (kb and kb.status == 1):
business_logger.warning(f"知识库不存在: kb_id={document.kb_id}")
continue