- Pass workspace_id to multimodal_service.process_files across app_chat_service, draft_run_service
- Fetch tenant_id from workspace in multimodal_service for proper file storage scoping
- Update image placeholder format from "[第N页 第M张图片]" to "[图片 第N页 第M张图片]" for clarity
- Add strict URL preservation rules to system prompt for agents handling document images
- Refactor _save_doc_image_to_storage to accept explicit tenant_id and workspace_id instead of inferring from FileMetadata
Added document image extraction capability for PDF and DOCX files, including page/index metadata and storage integration. Extended `process_files` with `document_image_recognition` flag to conditionally enable vision-based image processing when model supports it. Updated knowledge repository and workflow node logic to enforce status=1 checks. Added PyMuPDF dependency.
- Change extract_document_text from private to instance method in multimodal service for external access
- Optimize knowledge base search logic to improve parallel retrieval performance
1. Historical multimodal message writing is incorporated into the conversation context;
2. Resolve the issues where csv, json, and txt files cannot be recognized due to encoding problems;
3. File quantity limit;
4. Error details
1.The end users are still bound to the app.
2. Multi-modal file support includes xlsx, csv, and json.
3. The file routing protocol is consistent with the page routing.
1. Increase support for visual models and multimodal models;
2. The application and workflow can input various multimodal files such as images, documents, audio, and videos.