- Add .qoder/repowiki/zh/ to .gitignore to exclude generated repowiki content
- Update CORE_GENERAL_TYPES in env.example to align with ontology.md 13-category entity taxonomy (Chinese labels)
- Add PIPELINE_SNAPSHOT_ENABLED config for extraction pipeline stage snapshot output
- Fix missing newline at end of env.example
Introduce the `别名失效` predicate to handle cases where an alias is
explicitly no longer applicable to an entity.
Changes:
- write_pipeline.py: extend _merge_alias_in_memory to process
`别名失效` edges — removes invalidated alias names from target
entity's aliases list in-memory before Neo4j write
- cypher_queries.py: add REMOVE_INVALID_ALIASES and DELETE_ALIAS_NODES
queries; update REDIRECT_ALIAS_EDGES to handle both `别名属于` and
`别名失效` predicates
- tasks.py: add step 1.5 in post_store_dedup_and_alias_merge_task to
execute REMOVE_INVALID_ALIASES and sync removals to PostgreSQL;
add step 3 to delete alias nodes after edge redirection; add
snapshot step 3.5 for post-merge entity state; pass snapshot_dir
to the task
- end_user_info_repository.py: add remove_aliases() method to remove
specified aliases from end_user_info.aliases (case-insensitive)
- write_snapshot_recorder.py: add save_alias_merge_result() static
method to write stage 8 snapshot after alias merge and deletion
- extract_triplet.jinja2: document `别名失效` predicate with usage
rules — only use when conversation explicitly negates an alias
- Add `aliases` and `end_user_id` fields to user entity dicts in
`collect_user_entities_for_metadata` so downstream tasks can write
them to PostgreSQL
- Add `update_aliases_and_metadata` method to `EndUserInfoRepository`
for incremental, case-insensitive dedup merge of aliases and
structured metadata fields
- Add `_sync_end_user_info_pg` helper in tasks.py that writes aliases
and extracted metadata to `end_user_info`, and back-fills
`end_user.other_name` when empty
- Call `_sync_end_user_info_pg` from `extract_metadata_batch_task`
after Neo4j write, and also when no new metadata but aliases exist
- Filter `meta_data` response in `UserMemoryService.get_end_user_info`
to expose only four core fields: goals, traits, interests, core_facts
- Delete ExtractionOrchestrator (~2500 lines) and write_tools legacy path;
MemoryService/WritePipeline is now the sole write path
- Remove NEW_PIPELINE_ENABLED feature flag from memory_agent_service
- Simplify pilot_run_service to always use PilotWritePipeline
- Add dialog_at field to statement and triplet extraction prompts as the
primary reference time for resolving relative temporal expressions
- Rewrite relative time phrases (e.g. 昨天, 下周) into concrete dates
directly in statement_text when stably resolvable from dialog_at
- Rename extracat_Pruning.jinja2 to extracat_pruning.jinja2; expand
few-shot examples and update memory type enum (drop NULL, add
agreement/repetition/other)
Remove the deprecated expired_at field from all graph models, Neo4j
Cypher queries, repositories, and pipeline code. Replace with dialog_at
on StatementNode to track the original dialog timestamp.
- Strip expired_at from DialogueNode, ChunkNode, StatementNode,
ExtractedEntityNode, edges, and all Cypher queries
- Add dialog_at to MessageItem schema and propagate through extraction
and graph build steps
- Extract emotion/metadata async submission from WritePipeline into
a generic _submit_celery_task helper
- Add post_store_dedup_and_alias_merge Celery task for async alias
merging and second-layer dedup after Neo4j write
- Switch pytest async backend from anyio to asyncio_mode=auto
- Replace extract_user_metadata_task with entity-level extract_metadata_batch_task
- Add MetadataExtractionStep following ExtractionStep pattern with Jinja2 prompts
- Flatten MetadataExtractionResponse to 9-field schema (aliases, core_facts, etc.)
- Add Cypher queries for incremental metadata writeback and alias edge redirection
- Wire _extract_metadata into WritePipeline as Step 3.6 (fire-and-forget)
- Add pilot_write() to MemoryService; refactor pilot_run_service to use it
- Extract snapshot logic into WriteSnapshotRecorder
- Add valid_at/invalid_at passthrough in triplet extraction prompt (both zh/en)
- Propagate temporal_validity to EntityEntityEdge in ExtractionOrchestrator
- Use coalesce() for valid_at/invalid_at in Neo4j cypher queries to handle NULLs
- Fix workspace_id/config_id UUID parsing in read_memory config resolution
- Downgrade verbose extraction pipeline logs from info to debug
- Remove UUID and short API key patterns from sensitive filter to reduce false positives
- Standardize log message format (use = spacing, end_user_id label)
- Fix misindented TODO comment in write_pipeline.py
- Rename StatementExtractionStep → StatementTemporalExtractionStep and
extract_statement.jinja2 → extract_statement_temporal.jinja2 to reflect
merged temporal extraction logic
- Move extraction_pipeline_orchestrator.py out of steps/ to engine root
- Move dedup_step.py into steps/ directory
- Introduce WriteMemoryRequest schema to replace positional args in write_memory()
- Extract _resolve_and_load_config, _preprocess_files, _write_neo4j, and
_invalidate_interest_cache as private helpers in MemoryAgentService
- Remove shadow pipeline and simplify NEW_PIPELINE_ENABLED branch
- Merge 类型归属/成员隶属/任职服务 relation types into single 归属身份关系 in triplet prompt
- Add alias merge logic (别名属于) in deduplication and MERGE_ALIAS_BELONGS_TO Cypher query
- Add StorageType, Language, MessageItem enums/models to memory_agent_schema
- Reduce AgentMemory_Long_Term.DEFAULT_SCOPE from 6 to 1
- Delete standalone extract_temporal.jinja2 (logic merged into statement step)
Introduce ExtractionStep abstraction with modular pipeline stages:
- Add base ExtractionStep class with render/call/parse lifecycle
- Implement StatementExtractionStep, TripletExtractionStep,
EmbeddingStep, EmotionStep, GraphBuildStep, and DedupStep
- Add SidecarStepFactory for hot-pluggable non-critical steps
- Define Pydantic I/O schemas for all pipeline stages
- Refactor WritePipeline to orchestrate new step-based flow
- Add NEW_PIPELINE_ENABLED env switch for old/new pipeline routing
- Add emotion_enabled config flag to MemoryConfig
- Fix workspace_id reference in get_end_user_connected_config
Introduce a layered pipeline architecture for the memory write flow:
- WritePipeline: orchestrates preprocess → extract → store → cluster → summarize
with deadlock retry, resource cleanup, and pilot-run support
- MemoryService: facade that delegates to WritePipeline, placeholder methods
for read/forget/reflect
- BearLogger: structured step-level logging with perf threshold alerts
- Shadow pipeline integration in MemoryAgentService (env-gated pilot run)
Also includes:
- Fix deprecated SQLAlchemy declarative_base import
- Extend Neo4j Entity fulltext index to cover description and aliases
- Migrate Pydantic schemas to v2 (ConfigDict, field_validator)
The default thinking budget tokens value was changed from 10000 to 1024 in base.py, and the minimum validation constraint was updated from 1024 to 1 in app_schema.py to allow smaller budgets while maintaining backward compatibility.
- Replace truthiness checks with 'is not None' for data.message in graph_data and community_graph endpoints to handle empty string correctly
- Remove Optional wrapper from GraphStatistics.edge_types since it already has a default_factory
- Add user_memory_schema.py with typed Pydantic models for all user memory
API responses: MemoryInsightReportData, UserSummaryData, GraphData,
MemoryTypeStatItem, cache result models, and RelationshipEvolutionData
- Refactor user_memory_controllers.py to construct schema instances and
return model_dump() instead of raw dicts
- Remove unused imports (datetime, timestamp_to_datetime, EndUserInfoResponse,
EndUserInfoCreate, EndUser)
- Replace plain image URLs with `<img src="..." data-url="...">` HTML tags in multimodal and document extractor services
- Propagate citations from workflow end events to client responses
- Update system prompts to instruct LLMs to render images using Markdown `` with strict UUID-preserving URL copying