Introduce ExtractionStep abstraction with modular pipeline stages:
- Add base ExtractionStep class with render/call/parse lifecycle
- Implement StatementExtractionStep, TripletExtractionStep,
EmbeddingStep, EmotionStep, GraphBuildStep, and DedupStep
- Add SidecarStepFactory for hot-pluggable non-critical steps
- Define Pydantic I/O schemas for all pipeline stages
- Refactor WritePipeline to orchestrate new step-based flow
- Add NEW_PIPELINE_ENABLED env switch for old/new pipeline routing
- Add emotion_enabled config flag to MemoryConfig
- Fix workspace_id reference in get_end_user_connected_config
Introduce a layered pipeline architecture for the memory write flow:
- WritePipeline: orchestrates preprocess → extract → store → cluster → summarize
with deadlock retry, resource cleanup, and pilot-run support
- MemoryService: facade that delegates to WritePipeline, placeholder methods
for read/forget/reflect
- BearLogger: structured step-level logging with perf threshold alerts
- Shadow pipeline integration in MemoryAgentService (env-gated pilot run)
Also includes:
- Fix deprecated SQLAlchemy declarative_base import
- Extend Neo4j Entity fulltext index to cover description and aliases
- Migrate Pydantic schemas to v2 (ConfigDict, field_validator)
The default thinking budget tokens value was changed from 10000 to 1024 in base.py, and the minimum validation constraint was updated from 1024 to 1 in app_schema.py to allow smaller budgets while maintaining backward compatibility.
- Replace truthiness checks with 'is not None' for data.message in graph_data and community_graph endpoints to handle empty string correctly
- Remove Optional wrapper from GraphStatistics.edge_types since it already has a default_factory
- Add user_memory_schema.py with typed Pydantic models for all user memory
API responses: MemoryInsightReportData, UserSummaryData, GraphData,
MemoryTypeStatItem, cache result models, and RelationshipEvolutionData
- Refactor user_memory_controllers.py to construct schema instances and
return model_dump() instead of raw dicts
- Remove unused imports (datetime, timestamp_to_datetime, EndUserInfoResponse,
EndUserInfoCreate, EndUser)
- Replace plain image URLs with `<img src="..." data-url="...">` HTML tags in multimodal and document extractor services
- Propagate citations from workflow end events to client responses
- Update system prompts to instruct LLMs to render images using Markdown `` with strict UUID-preserving URL copying
- Introduce `reasoning_content`, `suggested_questions`, `citations`, and `audio_status` fields in conversation and app response schemas
- Conditionally set `audio_status` to `"pending"` only when `audio_url` is present
- Replace `model_dump` override with `@model_serializer(mode="wrap")` for cleaner serialization logic
- Change knowledge base validation failure from `RuntimeError` to warning + `continue` to avoid halting retrieval on invalid KB
- Augment log search with app type filtering to enable keyword searching within workflow_executions.
- Introduce execution sequence markers to ensure logs are displayed in the correct chronological order.
- Ameliorate error handling to capture successful node outputs alongside failure details.
- Rectify the processing of empty JSON bodies in HTTP request nodes.