Commit Graph

727 Commits

Author SHA1 Message Date
lanceyq
aa9eb66668 feat(memory): add alias invalidation support for entity alias management
Introduce the `别名失效` predicate to handle cases where an alias is
explicitly no longer applicable to an entity.

Changes:
- write_pipeline.py: extend _merge_alias_in_memory to process
  `别名失效` edges — removes invalidated alias names from target
  entity's aliases list in-memory before Neo4j write
- cypher_queries.py: add REMOVE_INVALID_ALIASES and DELETE_ALIAS_NODES
  queries; update REDIRECT_ALIAS_EDGES to handle both `别名属于` and
  `别名失效` predicates
- tasks.py: add step 1.5 in post_store_dedup_and_alias_merge_task to
  execute REMOVE_INVALID_ALIASES and sync removals to PostgreSQL;
  add step 3 to delete alias nodes after edge redirection; add
  snapshot step 3.5 for post-merge entity state; pass snapshot_dir
  to the task
- end_user_info_repository.py: add remove_aliases() method to remove
  specified aliases from end_user_info.aliases (case-insensitive)
- write_snapshot_recorder.py: add save_alias_merge_result() static
  method to write stage 8 snapshot after alias merge and deletion
- extract_triplet.jinja2: document `别名失效` predicate with usage
  rules — only use when conversation explicitly negates an alias
2026-05-08 11:28:44 +08:00
lanceyq
e3ab19dd4f feat(memory): sync user entity aliases and metadata to PostgreSQL
- Add `aliases` and `end_user_id` fields to user entity dicts in
  `collect_user_entities_for_metadata` so downstream tasks can write
  them to PostgreSQL
- Add `update_aliases_and_metadata` method to `EndUserInfoRepository`
  for incremental, case-insensitive dedup merge of aliases and
  structured metadata fields
- Add `_sync_end_user_info_pg` helper in tasks.py that writes aliases
  and extracted metadata to `end_user_info`, and back-fills
  `end_user.other_name` when empty
- Call `_sync_end_user_info_pg` from `extract_metadata_batch_task`
  after Neo4j write, and also when no new metadata but aliases exist
- Filter `meta_data` response in `UserMemoryService.get_end_user_info`
  to expose only four core fields: goals, traits, interests, core_facts
2026-05-08 11:28:44 +08:00
lanceyq
d255f33f1f [changes] dashscope applies patches and modifies prompts 2026-05-08 11:28:33 +08:00
lanceyq
6419dcd932 [commit] Refactor write pipeline 2026-05-08 11:28:24 +08:00
lanceyq
9dc9b7aee7 refactor(memory): remove legacy extraction pipeline and add dialog_at temporal grounding
- Delete ExtractionOrchestrator (~2500 lines) and write_tools legacy path;
  MemoryService/WritePipeline is now the sole write path
- Remove NEW_PIPELINE_ENABLED feature flag from memory_agent_service
- Simplify pilot_run_service to always use PilotWritePipeline
- Add dialog_at field to statement and triplet extraction prompts as the
  primary reference time for resolving relative temporal expressions
- Rewrite relative time phrases (e.g. 昨天, 下周) into concrete dates
  directly in statement_text when stably resolvable from dialog_at
- Rename extracat_Pruning.jinja2 to extracat_pruning.jinja2; expand
  few-shot examples and update memory type enum (drop NULL, add
  agreement/repetition/other)
2026-05-08 11:28:24 +08:00
lanceyq
cf389bb978 refactor(memory): remove expired_at field and add dialog_at timestamp
Remove the deprecated expired_at field from all graph models, Neo4j
Cypher queries, repositories, and pipeline code. Replace with dialog_at
on StatementNode to track the original dialog timestamp.

- Strip expired_at from DialogueNode, ChunkNode, StatementNode,
  ExtractedEntityNode, edges, and all Cypher queries
- Add dialog_at to MessageItem schema and propagate through extraction
  and graph build steps
- Extract emotion/metadata async submission from WritePipeline into
  a generic _submit_celery_task helper
- Add post_store_dedup_and_alias_merge Celery task for async alias
  merging and second-layer dedup after Neo4j write
- Switch pytest async backend from anyio to asyncio_mode=auto
2026-05-08 11:27:59 +08:00
lanceyq
d66d601e41 refactor(memory): redesign metadata extraction as async pipeline step
- Replace extract_user_metadata_task with entity-level extract_metadata_batch_task
- Add MetadataExtractionStep following ExtractionStep pattern with Jinja2 prompts
- Flatten MetadataExtractionResponse to 9-field schema (aliases, core_facts, etc.)
- Add Cypher queries for incremental metadata writeback and alias edge redirection
- Wire _extract_metadata into WritePipeline as Step 3.6 (fire-and-forget)
- Add pilot_write() to MemoryService; refactor pilot_run_service to use it
- Extract snapshot logic into WriteSnapshotRecorder
2026-05-08 11:27:51 +08:00
lanceyq
4af9b02815 feat(memory): propagate temporal validity fields through extraction pipeline
- Add valid_at/invalid_at passthrough in triplet extraction prompt (both zh/en)
- Propagate temporal_validity to EntityEntityEdge in ExtractionOrchestrator
- Use coalesce() for valid_at/invalid_at in Neo4j cypher queries to handle NULLs
- Fix workspace_id/config_id UUID parsing in read_memory config resolution
- Downgrade verbose extraction pipeline logs from info to debug
- Remove UUID and short API key patterns from sensitive filter to reduce false positives
- Standardize log message format (use = spacing, end_user_id label)
- Fix misindented TODO comment in write_pipeline.py
2026-05-08 11:26:24 +08:00
lanceyq
1f0c88a5f0 refactor(memory): consolidate write pipeline and rename statement extraction step
- Rename StatementExtractionStep → StatementTemporalExtractionStep and
  extract_statement.jinja2 → extract_statement_temporal.jinja2 to reflect
  merged temporal extraction logic
- Move extraction_pipeline_orchestrator.py out of steps/ to engine root
- Move dedup_step.py into steps/ directory
- Introduce WriteMemoryRequest schema to replace positional args in write_memory()
- Extract _resolve_and_load_config, _preprocess_files, _write_neo4j, and
  _invalidate_interest_cache as private helpers in MemoryAgentService
- Remove shadow pipeline and simplify NEW_PIPELINE_ENABLED branch
- Merge 类型归属/成员隶属/任职服务 relation types into single 归属身份关系 in triplet prompt
- Add alias merge logic (别名属于) in deduplication and MERGE_ALIAS_BELONGS_TO Cypher query
- Add StorageType, Language, MessageItem enums/models to memory_agent_schema
- Reduce AgentMemory_Long_Term.DEFAULT_SCOPE from 6 to 1
- Delete standalone extract_temporal.jinja2 (logic merged into statement step)
2026-05-08 11:26:24 +08:00
lanceyq
7747ed7ac1 refactor(memory): enhance extraction ontology and add assistant pruning graph support
- Expand entity type ontology with detailed definitions, examples, and notes
  (merged types: 地点设施, 物品设备, 产品服务, 软件平台, 角色职业, 知识能力, 偏好习惯目标, 称呼别名, 智能体)
- Add relation ontology taxonomy with 15 predicate categories and usage rules
- Strengthen reference resolution rules: resolve pronouns before extraction,
  skip unresolvable references entirely
- Add guidelines to avoid extracting abstract propositions, emotions, and
  low-value entities (effort/reward/success patterns)
- Add 7 new extraction examples covering edge cases
- Add AssistantOriginal/AssistantPruned node models and graph persistence
  (PRUNED_TO and BELONGS_TO_DIALOG edges, Neo4j indexes and constraints)
- Add graph_build_step.py for building graph nodes/edges from DialogData
- Update write_pipeline.py to pass assistant pruning nodes/edges to graph saver
- Update data_pruning.py with related preprocessing changes
2026-05-08 11:26:24 +08:00
lanceyq
2355536b44 refactor(memory): add PilotWritePipeline and enrich extraction schema
- Add dedicated PilotWritePipeline (statement → triplet → graph_build → layer-1 dedup, no Neo4j write)
- Add type_description/predicate_description fields across entity and triplet models, Cypher queries, and graph builders
- Refactor data_pruning with LRU cache and snapshot support; skip assistant chunks in extraction
- Remove strict Predicate enum whitelist; support statement_text alias in legacy extractor
- Wire PipelineSnapshot through preprocessing and emotion extraction for debug tracing
- Add PILOT_RUN_USE_REFACTORED_PIPELINE env toggle for pipeline selection
2026-05-08 11:26:04 +08:00
lanceyq
b0ddd12cc6 feat(memory): add emotion batch extraction task and improve extraction prompts
- Add extract_emotion_batch_task for async emotion extraction
- Refine Chinese entity types and relation types in extraction prompts
- Add STATEMENT_EMOTION_UPDATE Cypher query for Neo4j backfill
- Refactor statement_step and triplet_step implementations
2026-05-08 11:26:04 +08:00
lanceyq
a98011fc8a feat(memory): implement step-based extraction pipeline architecture
Introduce ExtractionStep abstraction with modular pipeline stages:
- Add base ExtractionStep class with render/call/parse lifecycle
- Implement StatementExtractionStep, TripletExtractionStep,
  EmbeddingStep, EmotionStep, GraphBuildStep, and DedupStep
- Add SidecarStepFactory for hot-pluggable non-critical steps
- Define Pydantic I/O schemas for all pipeline stages
- Refactor WritePipeline to orchestrate new step-based flow
- Add NEW_PIPELINE_ENABLED env switch for old/new pipeline routing
- Add emotion_enabled config flag to MemoryConfig
- Fix workspace_id reference in get_end_user_connected_config
2026-05-08 11:26:04 +08:00
lanceyq
41535c34e6 feat(memory): add WritePipeline and MemoryService facade
Introduce a layered pipeline architecture for the memory write flow:
- WritePipeline: orchestrates preprocess → extract → store → cluster → summarize
  with deadlock retry, resource cleanup, and pilot-run support
- MemoryService: facade that delegates to WritePipeline, placeholder methods
  for read/forget/reflect
- BearLogger: structured step-level logging with perf threshold alerts
- Shadow pipeline integration in MemoryAgentService (env-gated pilot run)

Also includes:
- Fix deprecated SQLAlchemy declarative_base import
- Extend Neo4j Entity fulltext index to cover description and aliases
- Migrate Pydantic schemas to v2 (ConfigDict, field_validator)
2026-05-08 11:26:04 +08:00
Eternity
e38a60e107 feat(core): add configurable SANDBOX_URL for code node sandbox requests 2026-04-29 20:24:10 +08:00
Timebomb2018
28694fefb0 fix(app): adjust thinking budget tokens default and validation range
The default thinking budget tokens value was changed from 10000 to 1024 in base.py, and the minimum validation constraint was updated from 1024 to 1 in app_schema.py to allow smaller budgets while maintaining backward compatibility.
2026-04-28 16:10:44 +08:00
Timebomb2018
531d785629 fix(multimodal): support HTML image tags in document extraction and chat responses
- Replace plain image URLs with `<img src="..." data-url="...">` HTML tags in multimodal and document extractor services
- Propagate citations from workflow end events to client responses
- Update system prompts to instruct LLMs to render images using Markdown `![alt](url)` with strict UUID-preserving URL copying
2026-04-27 17:56:58 +08:00
山程漫悟
ce4a3daec7 Merge pull request #1012 from SuanmoSuanyangTechnology/fix/wxy-032
feat(workflow): augment logging queries and ameliorate error handling
2026-04-27 16:00:49 +08:00
Timebomb2018
faf8d1a51a fix(workflow): add reasoning content, suggested questions, citations and audio status support
- Introduce `reasoning_content`, `suggested_questions`, `citations`, and `audio_status` fields in conversation and app response schemas
- Conditionally set `audio_status` to `"pending"` only when `audio_url` is present
- Replace `model_dump` override with `@model_serializer(mode="wrap")` for cleaner serialization logic
- Change knowledge base validation failure from `RuntimeError` to warning + `continue` to avoid halting retrieval on invalid KB
2026-04-27 15:35:26 +08:00
wxy
b64bcc2c50 feat(workflow): augment logging queries and ameliorate error handling
- Augment log search with app type filtering to enable keyword searching within workflow_executions.
- Introduce execution sequence markers to ensure logs are displayed in the correct chronological order.
- Ameliorate error handling to capture successful node outputs alongside failure details.
- Rectify the processing of empty JSON bodies in HTTP request nodes.
2026-04-27 15:20:25 +08:00
Ke Sun
675c7faf32 Merge pull request #1004 from SuanmoSuanyangTechnology/fix/memory_search
fix(api): convert config_id to string in write_router
2026-04-25 11:08:51 +08:00
Eternity
cd34d5f5ce fix(api): convert config_id to string in write_router 2026-04-24 20:13:46 +08:00
Ke Sun
1403b38648 Merge pull request #1003 from SuanmoSuanyangTechnology/fix/memory_search
fix(api): convert end_user_id to string in write_router
2026-04-24 19:59:24 +08:00
Eternity
b6e27da7b0 fix(api): convert end_user_id to string in write_router 2026-04-24 19:56:55 +08:00
Ke Sun
ed5f98a746 Merge pull request #1000 from SuanmoSuanyangTechnology/fix/memory_search
fix(api): correct import paths in memory_read and celery task command
2026-04-24 19:11:23 +08:00
Eternity
422af69904 fix(api): correct import paths in memory_read and celery task command
- Fix relative imports in memory_read.py to use absolute app paths
- Change celery scheduler command from `python app/celery_task_scheduler.py` to `python -m app.celery_task_scheduler`
2026-04-24 19:09:18 +08:00
山程漫悟
6cb48664b7 Merge pull request #992 from wanxunyang/develop-wxy
fix(workflow): rectify error handling and bolster execution logging
2026-04-24 18:58:40 +08:00
Eternity
8dee2eae6a fix(api): correct import paths in memory_read and celery task command
- Fix relative imports in memory_read.py to use absolute app paths
- Change celery scheduler command from `python app/celery_task_scheduler.py` to `python -m app.celery_task_scheduler`
2026-04-24 18:50:58 +08:00
wxy
f63bcd6321 refactor(tool): flatten request body parameters for model exposure
- Refactor the extraction logic in tool service to flatten request body parameters into independent arguments exposed to the model.
2026-04-24 18:49:55 +08:00
Eternity
caef0fe44e fix(api): correct import paths in memory_read and celery task command
- Fix relative imports in memory_read.py to use absolute app paths
- Change celery scheduler command from `python app/celery_task_scheduler.py` to `python -m app.celery_task_scheduler`
2026-04-24 18:36:27 +08:00
wxy
21eb500680 refactor(workflow): streamline node execution handling and log service logic
- Consolidate node data retrieval from workflow_executions.output_data to unify storage access.
- Optimize the construction of messages and execution records to support opening suggestions.
- Eliminate redundant queries and storage logic to simplify the overall codebase structure.
2026-04-24 18:20:14 +08:00
Ke Sun
c70f536acc Merge pull request #986 from SuanmoSuanyangTechnology/feat/episodic-memory-detail-and-pagination
feat:episodic memory detail and pagination
2026-04-24 18:19:11 +08:00
Ke Sun
5f96a6380e Merge pull request #990 from SuanmoSuanyangTechnology/feature/celery-task-scheduler
Feature/celery task scheduler
2026-04-24 18:19:00 +08:00
Timebomb2018
f2aedd29bc refactor(http_request): simplify request handling and remove unused fields
- Removed `last_request` field and related logic for storing raw request string
- Replaced `_extract_output` and `_extract_extra_fields` to use `process_data` instead of `request`
- Updated `_build_content` to directly parse JSON body without intermediate rendering step
- Modified `execute` to generate `process_data` from actual HTTP request object instead of manual string building
- Added `process_data` field to `HttpRequestNodeOutput` model for consistent debugging info
2026-04-24 17:09:01 +08:00
wwq
cf8db47389 feat(workflow): augment logging capabilities with execution status and loop support
- Augment workflow logs with execution status fields and loop node information.
- Refactor log service to handle distinct processing logic for workflows and agents.
- Construct message and node logs derived from workflow_executions data.
2026-04-24 17:02:03 +08:00
Eternity
be10bab763 refactor(core): migrate task scheduler to per-user queue with dynamic sharding 2026-04-24 14:21:18 +08:00
Timebomb2018
89f2f9a045 feat(citation): support downloading cited documents with allow_download toggle
Added `allow_download` flag to citation config and `download_url` field to citation output. Implemented `/citations/{document_id}/download` endpoint to serve original files when enabled. Removed unused `files` field and `HttpRequestDataProcessing` model from HTTP request node config.
2026-04-24 14:18:25 +08:00
wwq
b33f5951d8 fix(workflow): rectify error handling and bolster execution logging
- Rectify exception propagation during node execution failures to ensure errors are correctly raised.
- Bolster workflow logging to support failed status records and persist node execution data, including loop nodes.
2026-04-24 11:52:15 +08:00
wwq
2d120a64b1 fix(workflow): rectify error handling and bolster execution logging
- Rectify exception propagation during node execution failures to ensure errors are correctly raised.
- Bolster workflow logging to support failed status records and persist node execution data, including loop nodes.
2026-04-24 11:50:48 +08:00
wwq
0f7a7263eb fix(workflow): rectify error handling and bolster execution logging
- Rectify exception propagation during node execution failures to ensure errors are correctly raised.
- Bolster workflow logging to support failed status records and persist node execution data, including loop nodes.
2026-04-24 11:39:33 +08:00
Timebomb2018
767eb5e6f2 feat(multimodal): support document image extraction and inline vision processing
Added document image extraction capability for PDF and DOCX files, including page/index metadata and storage integration. Extended `process_files` with `document_image_recognition` flag to conditionally enable vision-based image processing when model supports it. Updated knowledge repository and workflow node logic to enforce status=1 checks. Added PyMuPDF dependency.
2026-04-24 11:18:50 +08:00
山程漫悟
9fdb952396 Merge pull request #985 from wanxunyang/develop-wxy
feat: enhance workflow debugging, logging and auth middleware
2026-04-24 10:17:32 +08:00
wwq
fb23c34475 feat: enhance HTTP request debugging and extend logging data
- feat(http_request): augment debugging capabilities with raw request generation and improved error handling.
- feat(app_log): extend session filtering logic to support retrieving all session types.
- feat(log): add 'process' field to node execution records for better data tracking.
2026-04-23 20:55:34 +08:00
wwq
5f39d9a208 feat(workflow): enhance HTTP request node with curl debugging support 2026-04-23 18:26:49 +08:00
wwq
f6cf53f81c feat(workflow): enhance HTTP request node with curl debugging support 2026-04-23 18:24:19 +08:00
wwq
08a455f6b3 feat(workflow): enhance HTTP request node with curl debugging support 2026-04-23 18:20:05 +08:00
miao
7ac0eff0b8 fix(memory): fix problems
- Parameterize SKIP/LIMIT in Cypher query instead of f-string interpolation
- Add UUID format validation in validate_end_user_in_workspace before DB query
- Update limit/depth Query descriptions to clarify auto-cap behavior in service layer
- Move uuid import to module level in api_key_utils.py

Modified files:
- api/app/services/memory_explicit_service.py
- api/app/core/api_key_utils.py
- api/app/controllers/service/user_memory_api_controller.py
2026-04-23 16:29:22 +08:00
wwq
404ce9f9ba feat(workflow): enhance HTTP request node with curl debugging support
- Augment HTTP request node capabilities and add generated curl commands for easier debugging.

feat(log): implement workflow execution logs and search functionality

- Add detailed logging for workflow node execution and enable search capabilities within application logs.

feat(auth): introduce middleware to verify application publication status

- Add a check to ensure the application is published before allowing access.

fix(converter): rectify variable handling logic in Dify converter

- Correct issues related to processing variables within the Dify converter module.

refactor(model): remove quota check decorator from model update operations

- Decouple quota validation from the model update process to streamline the logic.
2026-04-23 15:46:12 +08:00
miao
bf9a3503de feat(memory-api): add memory detail external service APIs
Add external service APIs for memory detail queries
Provides memory data access endpoints for external service integration
Add utility functions for API key user resolution and end_user validation

Modified files:
- api/app/controllers/service/user_memory_api_controller.py
- api/app/core/api_key_utils.py
- api/app/controllers/service/__init__.py
2026-04-23 15:36:45 +08:00
Ke Sun
b8009074d5 Merge branch 'release/v0.3.1' into develop 2026-04-23 12:16:57 +08:00