Added document image extraction capability for PDF and DOCX files, including page/index metadata and storage integration. Extended `process_files` with `document_image_recognition` flag to conditionally enable vision-based image processing when model supports it. Updated knowledge repository and workflow node logic to enforce status=1 checks. Added PyMuPDF dependency.
Multiple services were performing individual database queries for FileMetadata when resolving missing file names/sizes. This change batches the queries using `in_()` to reduce database round trips and improve performance.
feat(app): support file metadata in chat messages and DSL app overwrite
- Extended chat message file objects with `name`, `size`, and `file_type` fields across app_chat_service and workflow_service
- Added ability to overwrite existing app configurations via DSL import in app_dsl_service, including type validation and config update logic for AgentConfig, MultiAgentConfig, and WorkflowConfig
Create OpenClawTool class inheriting BuiltinTool with dedicated config
Remove all x-openclaw special handling from CustomTool (~270 lines)
Add multi-operation support (print_task, device_query, image_understand, general)
Change ensure_builtin_tools_initialized to incremental mode for auto-provisioning
Fix OperationTool and LangchainAdapter to support OpenClaw operation routing
- Detect x-openclaw flag in OpenAPI schema and init dedicated config
- Implement multimodal input/output (image download, compress, base64)
- Add OpenClaw connection test and status validation in tool service
- Fix auth_config token check to support both api_key and bearer_token
- Inject runtime context (user_id, conversation_id, files) in chat services
1. Handling the storage of multimodal messages and adapting to the loading of historical messages for multi-round conversations;
2. Obtain the interface for retrieving the voice status of the reply;
3. File Information Retrieval Interface
1. Historical multimodal message writing is incorporated into the conversation context;
2. Resolve the issues where csv, json, and txt files cannot be recognized due to encoding problems;
3. File quantity limit;
4. Error details
1.Voice output is generated in a streaming manner.
2.Multimodal file storage type repair;
3.Adding features to the configuration of the sub-agents in the multi-agent system
1. Increase support for visual models and multimodal models;
2. The application and workflow can input various multimodal files such as images, documents, audio, and videos.