- Delete ExtractionOrchestrator (~2500 lines) and write_tools legacy path;
MemoryService/WritePipeline is now the sole write path
- Remove NEW_PIPELINE_ENABLED feature flag from memory_agent_service
- Simplify pilot_run_service to always use PilotWritePipeline
- Add dialog_at field to statement and triplet extraction prompts as the
primary reference time for resolving relative temporal expressions
- Rewrite relative time phrases (e.g. 昨天, 下周) into concrete dates
directly in statement_text when stably resolvable from dialog_at
- Rename extracat_Pruning.jinja2 to extracat_pruning.jinja2; expand
few-shot examples and update memory type enum (drop NULL, add
agreement/repetition/other)
Remove the deprecated expired_at field from all graph models, Neo4j
Cypher queries, repositories, and pipeline code. Replace with dialog_at
on StatementNode to track the original dialog timestamp.
- Strip expired_at from DialogueNode, ChunkNode, StatementNode,
ExtractedEntityNode, edges, and all Cypher queries
- Add dialog_at to MessageItem schema and propagate through extraction
and graph build steps
- Extract emotion/metadata async submission from WritePipeline into
a generic _submit_celery_task helper
- Add post_store_dedup_and_alias_merge Celery task for async alias
merging and second-layer dedup after Neo4j write
- Switch pytest async backend from anyio to asyncio_mode=auto
- Replace extract_user_metadata_task with entity-level extract_metadata_batch_task
- Add MetadataExtractionStep following ExtractionStep pattern with Jinja2 prompts
- Flatten MetadataExtractionResponse to 9-field schema (aliases, core_facts, etc.)
- Add Cypher queries for incremental metadata writeback and alias edge redirection
- Wire _extract_metadata into WritePipeline as Step 3.6 (fire-and-forget)
- Add pilot_write() to MemoryService; refactor pilot_run_service to use it
- Extract snapshot logic into WriteSnapshotRecorder
- Add valid_at/invalid_at passthrough in triplet extraction prompt (both zh/en)
- Propagate temporal_validity to EntityEntityEdge in ExtractionOrchestrator
- Use coalesce() for valid_at/invalid_at in Neo4j cypher queries to handle NULLs
- Fix workspace_id/config_id UUID parsing in read_memory config resolution
- Downgrade verbose extraction pipeline logs from info to debug
- Remove UUID and short API key patterns from sensitive filter to reduce false positives
- Standardize log message format (use = spacing, end_user_id label)
- Fix misindented TODO comment in write_pipeline.py
- Rename StatementExtractionStep → StatementTemporalExtractionStep and
extract_statement.jinja2 → extract_statement_temporal.jinja2 to reflect
merged temporal extraction logic
- Move extraction_pipeline_orchestrator.py out of steps/ to engine root
- Move dedup_step.py into steps/ directory
- Introduce WriteMemoryRequest schema to replace positional args in write_memory()
- Extract _resolve_and_load_config, _preprocess_files, _write_neo4j, and
_invalidate_interest_cache as private helpers in MemoryAgentService
- Remove shadow pipeline and simplify NEW_PIPELINE_ENABLED branch
- Merge 类型归属/成员隶属/任职服务 relation types into single 归属身份关系 in triplet prompt
- Add alias merge logic (别名属于) in deduplication and MERGE_ALIAS_BELONGS_TO Cypher query
- Add StorageType, Language, MessageItem enums/models to memory_agent_schema
- Reduce AgentMemory_Long_Term.DEFAULT_SCOPE from 6 to 1
- Delete standalone extract_temporal.jinja2 (logic merged into statement step)
Introduce ExtractionStep abstraction with modular pipeline stages:
- Add base ExtractionStep class with render/call/parse lifecycle
- Implement StatementExtractionStep, TripletExtractionStep,
EmbeddingStep, EmotionStep, GraphBuildStep, and DedupStep
- Add SidecarStepFactory for hot-pluggable non-critical steps
- Define Pydantic I/O schemas for all pipeline stages
- Refactor WritePipeline to orchestrate new step-based flow
- Add NEW_PIPELINE_ENABLED env switch for old/new pipeline routing
- Add emotion_enabled config flag to MemoryConfig
- Fix workspace_id reference in get_end_user_connected_config
Introduce a layered pipeline architecture for the memory write flow:
- WritePipeline: orchestrates preprocess → extract → store → cluster → summarize
with deadlock retry, resource cleanup, and pilot-run support
- MemoryService: facade that delegates to WritePipeline, placeholder methods
for read/forget/reflect
- BearLogger: structured step-level logging with perf threshold alerts
- Shadow pipeline integration in MemoryAgentService (env-gated pilot run)
Also includes:
- Fix deprecated SQLAlchemy declarative_base import
- Extend Neo4j Entity fulltext index to cover description and aliases
- Migrate Pydantic schemas to v2 (ConfigDict, field_validator)
The default thinking budget tokens value was changed from 10000 to 1024 in base.py, and the minimum validation constraint was updated from 1024 to 1 in app_schema.py to allow smaller budgets while maintaining backward compatibility.
- Replace plain image URLs with `<img src="..." data-url="...">` HTML tags in multimodal and document extractor services
- Propagate citations from workflow end events to client responses
- Update system prompts to instruct LLMs to render images using Markdown `` with strict UUID-preserving URL copying
- Introduce `reasoning_content`, `suggested_questions`, `citations`, and `audio_status` fields in conversation and app response schemas
- Conditionally set `audio_status` to `"pending"` only when `audio_url` is present
- Replace `model_dump` override with `@model_serializer(mode="wrap")` for cleaner serialization logic
- Change knowledge base validation failure from `RuntimeError` to warning + `continue` to avoid halting retrieval on invalid KB
- Augment log search with app type filtering to enable keyword searching within workflow_executions.
- Introduce execution sequence markers to ensure logs are displayed in the correct chronological order.
- Ameliorate error handling to capture successful node outputs alongside failure details.
- Rectify the processing of empty JSON bodies in HTTP request nodes.
- Fix relative imports in memory_read.py to use absolute app paths
- Change celery scheduler command from `python app/celery_task_scheduler.py` to `python -m app.celery_task_scheduler`
- Fix relative imports in memory_read.py to use absolute app paths
- Change celery scheduler command from `python app/celery_task_scheduler.py` to `python -m app.celery_task_scheduler`
- Fix relative imports in memory_read.py to use absolute app paths
- Change celery scheduler command from `python app/celery_task_scheduler.py` to `python -m app.celery_task_scheduler`
- Consolidate node data retrieval from workflow_executions.output_data to unify storage access.
- Optimize the construction of messages and execution records to support opening suggestions.
- Eliminate redundant queries and storage logic to simplify the overall codebase structure.
- Removed `last_request` field and related logic for storing raw request string
- Replaced `_extract_output` and `_extract_extra_fields` to use `process_data` instead of `request`
- Updated `_build_content` to directly parse JSON body without intermediate rendering step
- Modified `execute` to generate `process_data` from actual HTTP request object instead of manual string building
- Added `process_data` field to `HttpRequestNodeOutput` model for consistent debugging info
- Augment workflow logs with execution status fields and loop node information.
- Refactor log service to handle distinct processing logic for workflows and agents.
- Construct message and node logs derived from workflow_executions data.
Added `allow_download` flag to citation config and `download_url` field to citation output. Implemented `/citations/{document_id}/download` endpoint to serve original files when enabled. Removed unused `files` field and `HttpRequestDataProcessing` model from HTTP request node config.
- Rectify exception propagation during node execution failures to ensure errors are correctly raised.
- Bolster workflow logging to support failed status records and persist node execution data, including loop nodes.
- Rectify exception propagation during node execution failures to ensure errors are correctly raised.
- Bolster workflow logging to support failed status records and persist node execution data, including loop nodes.
- Rectify exception propagation during node execution failures to ensure errors are correctly raised.
- Bolster workflow logging to support failed status records and persist node execution data, including loop nodes.
Added document image extraction capability for PDF and DOCX files, including page/index metadata and storage integration. Extended `process_files` with `document_image_recognition` flag to conditionally enable vision-based image processing when model supports it. Updated knowledge repository and workflow node logic to enforce status=1 checks. Added PyMuPDF dependency.
- feat(http_request): augment debugging capabilities with raw request generation and improved error handling.
- feat(app_log): extend session filtering logic to support retrieving all session types.
- feat(log): add 'process' field to node execution records for better data tracking.
- Parameterize SKIP/LIMIT in Cypher query instead of f-string interpolation
- Add UUID format validation in validate_end_user_in_workspace before DB query
- Update limit/depth Query descriptions to clarify auto-cap behavior in service layer
- Move uuid import to module level in api_key_utils.py
Modified files:
- api/app/services/memory_explicit_service.py
- api/app/core/api_key_utils.py
- api/app/controllers/service/user_memory_api_controller.py
- Augment HTTP request node capabilities and add generated curl commands for easier debugging.
feat(log): implement workflow execution logs and search functionality
- Add detailed logging for workflow node execution and enable search capabilities within application logs.
feat(auth): introduce middleware to verify application publication status
- Add a check to ensure the application is published before allowing access.
fix(converter): rectify variable handling logic in Dify converter
- Correct issues related to processing variables within the Dify converter module.
refactor(model): remove quota check decorator from model update operations
- Decouple quota validation from the model update process to streamline the logic.
Add external service APIs for memory detail queries
Provides memory data access endpoints for external service integration
Add utility functions for API key user resolution and end_user validation
Modified files:
- api/app/controllers/service/user_memory_api_controller.py
- api/app/core/api_key_utils.py
- api/app/controllers/service/__init__.py