feat(memory): add async user metadata extraction pipeline

- Add MetadataExtractor to collect user-related statements post-dedup and extract profile/behavioral metadata via independent LLM call - Add Celery task (extract_user_metadata) routed to memory_tasks queue - Add metadata models (UserMetadata, UserMetadataProfile, etc.) - Add metadata utility functions (clean, validate, merge with _op support) - Add Jinja2 prompt template for metadata extraction (zh/en) - Fix Lucene query parameter naming: rename `q` to `query` across all Cypher queries, graph_search functions, and callers - Escape `/` in Lucene queries to prevent TokenMgrError - Add `speaker` field to ChunkNode and persist it in Neo4j - Remove unused imports (argparse, os, UUID) in search.py - Fix unnecessary db context nesting in interest distribution task
2026-04-09 11:01:56 +08:00
parent cfbf83f71e
commit f2d7479229
17 changed files with 714 additions and 78 deletions
--- a/api/app/celery_app.py
+++ b/api/app/celery_app.py
@@ -111,6 +111,9 @@ celery_app.conf.update(
        # Clustering tasks → memory_tasks queue (使用相同的 worker，避免 macOS fork 问题)
        'app.tasks.run_incremental_clustering': {'queue': 'memory_tasks'},

+        # Metadata extraction → memory_tasks queue
+        'app.tasks.extract_user_metadata': {'queue': 'memory_tasks'},
+
        # Document tasks → document_tasks queue (prefork worker)
        'app.core.rag.tasks.parse_document': {'queue': 'document_tasks'},
        'app.core.rag.tasks.build_graphrag_for_kb': {'queue': 'document_tasks'},