Files
MemoryBear/api/app/services/memory_storage_service.py
Ke Sun 79ab929fb0 Release/v0.2.3 (#355)
* feat(web): add PageEmpty component

* feat(web): add PageTabs component

* feat(web): add PageEmpty component

* feat(web): add PageTabs component

* feat(prompt): add history tracking for prompt releases

* feat(web): add prompt menu

* refactor: The PageScrollList component supports two generic parameters

* feat(web): BodyWrapper compoent update PageLoading

* feat(web): add Ontology menu

* feat(web): memory management add scene

* feat(tasks): add celery task configuration for periodic jobs

- Add ignore_result=True to prevent storing results for periodic tasks
- Set max_retries=0 to skip failed periodic tasks without retry attempts
- Configure acks_late=False for immediate acknowledgment in beat tasks
- Add time_limit and soft_time_limit to regenerate_memory_cache task (3600s/3300s)
- Add time_limit and soft_time_limit to workspace_reflection_task (300s/240s)
- Add time_limit and soft_time_limit to run_forgetting_cycle_task (7200s/7000s)
- Improve task reliability and resource management for scheduled jobs

* feat(sandbox): add Node.js code execution support to sandbox

* Release/v0.2.2 (#260)

* [modify] migration script

* [add] migration script

* fix(web): change form message

* fix(web): the memoryContent field is compatible with numbers and strings

* feat(web): code node hidden

* fix(model):
1. create a basic model to check if the name and provider are duplicated.
2. The result shows error models because the provider created API Keys for all matching models.

---------

Co-authored-by: Mark <zhuwenhui5566@163.com>
Co-authored-by: zhaoying <yzhao96@best-inc.com>
Co-authored-by: yingzhao <zhaoyingyz@126.com>
Co-authored-by: Timebomb2018 <18868801967@163.com>

* Feature/ontology class clean (#249)

* [add] Complete ontology engineering feature implementation

* [add] Add ontology feature integration and validation utilities

* [add] Add OWL validator and validation utilities

* [fix] Add missing render_ontology_extraction_prompt function

* [fix]Add dependencies, fix functionality

* [add] migration script

* feat(celery): add dedicated periodic tasks worker and queue (#261)

* fix(web): conflict resolve

* Fix/v022 bug (#263)

* [fix]Fix the issue of inconsistent language in explicit and episodic memory.

* [fix]Fix the issue of inconsistent language in explicit and episodic memory.

* [add]Add scene_id

* [fix]Based on the AI review to fix the code

* Fix/develop memory reflex (#265)

* 遗漏的历史映射

* 遗漏的历史映射

* 反思后台报错处理

* [add] migration script

* fix: chat conversation_id add node_start

* feat(web): show code node

* fix(web): Restructure the CustomSelect component, repair the interface that is called multiple times when the form is updated

* feat(web): RadioGroupCard support block mode

* feat(web): create space add icon

* feat(app and model): token consumption statistics

* Add/develop memory (#264)

* 遗漏的历史映射

* 遗漏的历史映射

* 遗漏的历史映射

* 遗漏的历史映射

* 遗漏的历史映射

* 遗漏的历史映射

* 遗漏的历史映射

* 遗漏的历史映射

* 遗漏的历史映射

* 新增长期记忆功能

* 新增长期记忆功能

* 新增长期记忆功能

* 知识库检索多余字段

* 长期

* feat(app and model): token consumption statistics of the cluster

* memory_BUG_fix

* fix(web): prompt history remove pageLoading

* fix(prompt): remove hard-coded import of prompt file paths (#279)

* Fix/develop memory bug (#274)

* 遗漏的历史映射

* 遗漏的历史映射

* fix_timeline_memories

* fix(web): update retrieve_type key

* Fix/develop memory bug (#276)

* 遗漏的历史映射

* 遗漏的历史映射

* fix_timeline_memories

* fix_timeline_memories

* write_gragp/bug_fix

* write_gragp/bug_fix

* write_gragp/bug_fix

* chore(celery): disable periodic task scheduling

* fix(prompt): remove hard-coded import of prompt file paths

---------

Co-authored-by: lixinyue11 <94037597+lixinyue11@users.noreply.github.com>
Co-authored-by: zhaoying <yzhao96@best-inc.com>
Co-authored-by: yingzhao <zhaoyingyz@126.com>
Co-authored-by: Ke Sun <kesun5@illinois.edu>

* fix(web): remove delete confirm content

* refactor(workflow): relocate template directory into workflow

* feat(memory): add long-term storage task routing and batching

* fix(web): PageScrollList loading update

* fix(web): PageScrollList loading update

* Ontology v1 bug (#291)

* [changes]Add 'id' as the secondary sorting key, and 'scene_id' now returns a UUID object

* [fix]Fix the "end_user" return to be sorted by update time.

* [fix]Set the default values of the memory configuration model based on the spatial model.

* [fix]Remove the entity extraction check combination model, read the configuration list, and add the return of scene_id

* [fix]Fix the "end_user" return to be sorted by update time.

* [fix]

* fix(memory): add Redis session validation

- Add macOS fork() safety configuration in celery_app.py to prevent initialization issues
- Add null/False checks for Redis session queries in term_memory_save to handle missing sessions gracefully
- Add null/False checks in memory_long_term_storage to prevent processing empty Redis results
- Add null/False checks in aggregate_judgment before format_parsing to avoid errors on missing data
- Initialize redis_messages variable in window_dialogue for consistency
- Add debug logging when no existing session found in Redis for better troubleshooting
- Add TODO comments for magic numbers (scope=6, time=5) to be extracted as constants
- Improve error handling when Redis returns False or empty results instead of crashing

* fix(web): PageScrollList style update

* fix(workflow): fix argument passing in code execution nodes

* fix(web): prompt add disabled

* fix(web): space icon required

* feat(app): modify the key of the token

* fix(fix the key of the app's token):

* fix(workflow): switch code input encoding to base64+URL encoding

* [add]The main project adds multi-API Key load balancing.

* [changes]Attribute security access, secure numerical conversion, unified use of local variables

* fix(web): save add session update

* fix(web): language editor support paste

* [changes]Active status filtering logic, API Key selection strategy

* memory_BUG

* memory_BUG_long_term

* [changes]

* memory_BUG_long_term

* memory_BUG_long_term

* Fix/release memory bug (#306)

* memory_BUG_fix

* memory_BUG

* memory_BUG_long_term

* memory_BUG_long_term

* memory_BUG_long_term

* knowledge_retrieval/bug/fix

* knowledge_retrieval/bug/fix

* knowledge_retrieval/bug/fix

* [fix]1.The "read_all_config" interface returns "scene_name";2.Memory configuration for lightweight query ontology scenarios

* fix(web): replace code editor

* [changes]Modify the description of the time for the recent event

* [changes]Modify the code based on the AI review

* feat(web): update memory config ontology api

* fix(web): ui update

* knowledge_retrieval/bug/fix

* knowledge_retrieval/bug/fix

* knowledge_retrieval/bug/fix

* feat(workflow): add token usage statistics for question classifier and parameter extraction

* feat(web): move prompt menu

* Multiple independent transactions - single transaction

* Multiple independent transactions - single transaction

* Multiple independent transactions - single transaction

* Multiple independent transactions - single transaction

* Write Missing None (#321)

* Write Missing None

* Write Missing None

* Write Missing None

* Apply suggestion from @sourcery-ai[bot]

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Write Missing None

---------

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Fix/release memory bug (#324)

* Write Missing None

* Write Missing None

* Write Missing None

* Apply suggestion from @sourcery-ai[bot]

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Write Missing None

* redis update

* redis update

* redis update

* redis update

---------

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Fix/writer memory bug (#326)

* [fix]Fix the bug

* [fix]Fix the bug

* [fix]Correct the direction indication.

* fix(web): markdown table ui update

* Fix/release memory bug (#332)

* Write Missing None

* Write Missing None

* Write Missing None

* Apply suggestion from @sourcery-ai[bot]

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Write Missing None

* redis update

* redis update

* redis update

* redis update

* writer_dup_bug/fix

---------

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Fix/fact summary (#333)

* [fix]Disable the contents related to fact_summary

* [fix]Disable the contents related to fact_summary

* [fix]Modify the code based on the AI review

* Fix/release memory bug (#335)

* Write Missing None

* Write Missing None

* Write Missing None

* Apply suggestion from @sourcery-ai[bot]

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Write Missing None

* redis update

* redis update

* redis update

* redis update

* writer_dup_bug/fix

* writer_graph_bug/fix

* writer_graph_bug/fix

---------

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Revert "feat(web): move prompt menu"

This reverts commit 9e6e8f50f8.

* fix(web): ui update

* fix(web): update text

* fix(web): ui update

* fix(model): change the "vl" model type of dashscope to "chat"

* fix(model): change the "vl" model type of dashscope to "chat"

---------

Co-authored-by: zhaoying <yzhao96@best-inc.com>
Co-authored-by: Eternity <1533512157@qq.com>
Co-authored-by: Mark <zhuwenhui5566@163.com>
Co-authored-by: yingzhao <zhaoyingyz@126.com>
Co-authored-by: Timebomb2018 <18868801967@163.com>
Co-authored-by: 乐力齐 <162269739+lanceyq@users.noreply.github.com>
Co-authored-by: lixinyue11 <94037597+lixinyue11@users.noreply.github.com>
Co-authored-by: lixinyue <2569494688@qq.com>
Co-authored-by: Eternity <61316157+myhMARS@users.noreply.github.com>
Co-authored-by: lanceyq <1982376970@qq.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
2026-02-06 19:01:57 +08:00

648 lines
24 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
Memory Storage Service
Handles business logic for memory storage operations.
"""
import asyncio
import json
import os
import time
from datetime import datetime
from typing import Any, AsyncGenerator, Dict, List, Optional
from app.core.logging_config import get_config_logger, get_logger
from app.core.memory.analytics.hot_memory_tags import (
get_hot_memory_tags,
get_raw_tags_from_db,
filter_tags_with_llm,
)
from app.core.memory.analytics.recent_activity_stats import get_recent_activity_stats
from app.models.user_model import User
from app.repositories.memory_config_repository import MemoryConfigRepository
from app.repositories.neo4j.neo4j_connector import Neo4jConnector
from app.schemas.memory_config_schema import ConfigurationError
from app.schemas.memory_storage_schema import (
ConfigKey,
ConfigParamsCreate,
ConfigParamsDelete,
ConfigPilotRun,
ConfigUpdate,
ConfigUpdateExtracted,
)
from app.services.memory_config_service import MemoryConfigService
from app.utils.sse_utils import format_sse_message
from dotenv import load_dotenv
from sqlalchemy.orm import Session
logger = get_logger(__name__)
config_logger = get_config_logger()
# Load environment variables for Neo4j connector
load_dotenv()
_neo4j_connector = Neo4jConnector()
class MemoryStorageService:
"""Service for memory storage operations"""
def __init__(self):
logger.info("MemoryStorageService initialized")
async def get_storage_info(self) -> dict:
"""
Example wrapper method - retrieves storage information
Args:
Returns:
Storage information dictionary
"""
logger.info("Getting storage info ")
# Empty wrapper - implement your logic here
result = {
"status": "active",
"message": "This is an example wrapper"
}
return result
class DataConfigService: # 数据配置服务类PostgreSQL
"""Service layer for config params CRUD.
使用 SQLAlchemy ORM 进行数据库操作。
"""
def __init__(self, db: Session) -> None:
"""初始化服务
Args:
db: SQLAlchemy 数据库会话
"""
self.db = db
@staticmethod
def _convert_timestamps_to_format(data_list: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""将 created_at 和 updated_at 字段从 datetime 对象转换为 YYYYMMDDHHmmss 格式"""
for item in data_list:
for field in ['created_at', 'updated_at']:
if field in item and item[field] is not None:
value = item[field]
dt = None
# 处理不同类型的时间值
if hasattr(value, 'to_native'):
# Neo4j DateTime 对象
dt = value.to_native()
elif isinstance(value, datetime):
# Python datetime 对象
dt = value
elif isinstance(value, str):
# 字符串格式
try:
dt = datetime.fromisoformat(value.replace('Z', '+00:00'))
except Exception:
pass # 保持原值
# 转换为 YYYYMMDDHHmmss 格式
if dt:
item[field] = dt.strftime('%Y%m%d%H%M%S')
return data_list
# --- Create ---
def create(self, params: ConfigParamsCreate) -> Dict[str, Any]: # 创建配置参数(仅名称与描述)
# 如果workspace_id存在且模型字段未全部指定则自动获取
if params.workspace_id and not all([params.llm_id, params.embedding_id, params.rerank_id]):
configs = self._get_workspace_configs(params.workspace_id)
if configs is None:
raise ValueError(f"工作空间不存在: workspace_id={params.workspace_id}")
# 只在未指定时填充(允许手动覆盖)
if not params.llm_id:
params.llm_id = configs.get('llm')
if not params.embedding_id:
params.embedding_id = configs.get('embedding')
if not params.rerank_id:
params.rerank_id = configs.get('rerank')
# reflection_model_id 和 emotion_model_id 默认与 llm_id 一致
if not params.reflection_model_id:
params.reflection_model_id = params.llm_id
if not params.emotion_model_id:
params.emotion_model_id = params.llm_id
config = MemoryConfigRepository.create(self.db, params)
self.db.commit()
return {"affected": 1, "config_id": config.config_id}
def _get_workspace_configs(self, workspace_id) -> Optional[Dict[str, Any]]:
"""获取工作空间模型配置(内部方法,便于测试)"""
from app.db import SessionLocal
from app.repositories.workspace_repository import get_workspace_models_configs
db_session = SessionLocal()
try:
return get_workspace_models_configs(db_session, workspace_id)
finally:
db_session.close()
# --- Delete ---
def delete(self, key: ConfigParamsDelete) -> Dict[str, Any]: # 删除配置参数按配置ID
success = MemoryConfigRepository.delete(self.db, key.config_id)
if not success:
raise ValueError("未找到配置")
return {"affected": 1}
# --- Update ---
def update(self, update: ConfigUpdate) -> Dict[str, Any]: # 部分更新配置参数
config = MemoryConfigRepository.update(self.db, update)
if not config:
raise ValueError("未找到配置")
return {"affected": 1}
def update_extracted(self, update: ConfigUpdateExtracted) -> Dict[str, Any]: # 更新记忆萃取引擎配置参数
config = MemoryConfigRepository.update_extracted(self.db, update)
if not config:
raise ValueError("未找到配置")
return {"affected": 1}
# --- Forget config params ---
# 遗忘引擎配置方法已迁移到 memory_forget_service.py
# 使用新方法: MemoryForgetService.read_forgetting_config() 和 MemoryForgetService.update_forgetting_config()
# --- Read ---
def get_extracted(self, key: ConfigKey) -> Dict[str, Any]: # 获取萃取配置参数
result = MemoryConfigRepository.get_extracted_config(self.db, key.config_id)
if not result:
raise ValueError("未找到配置")
return result
# --- Read All ---
def get_all(self, workspace_id = None) -> List[Dict[str, Any]]: # 获取所有配置参数
results = MemoryConfigRepository.get_all(self.db, workspace_id)
# 将 ORM 对象转换为字典列表
data_list = []
for config, scene_name in results:
# 安全地转换 user_id 为 int
config_id_old = None
if config.config_id_old:
try:
config_id_old = int(config.config_id_old)
except (ValueError, TypeError):
config_id_old = None
if config_id_old:
memory_config=config_id_old
else:
memory_config=config.config_id
config_dict = {
"config_id": memory_config,
"config_name": config.config_name,
"config_desc": config.config_desc,
"workspace_id": str(config.workspace_id) if config.workspace_id else None,
"end_user_id": config.end_user_id,
"config_id_old": config_id_old,
"apply_id": config.apply_id,
"scene_id": str(config.scene_id) if config.scene_id else None,
"scene_name": scene_name, # 新增:场景名称
"llm_id": config.llm_id,
"embedding_id": config.embedding_id,
"rerank_id": config.rerank_id,
"enable_llm_dedup_blockwise": config.enable_llm_dedup_blockwise,
"enable_llm_disambiguation": config.enable_llm_disambiguation,
"deep_retrieval": config.deep_retrieval,
"t_type_strict": config.t_type_strict,
"t_name_strict": config.t_name_strict,
"t_overall": config.t_overall,
"state": config.state,
"chunker_strategy": config.chunker_strategy,
"pruning_enabled": config.pruning_enabled,
"pruning_scene": config.pruning_scene,
"pruning_threshold": config.pruning_threshold,
"enable_self_reflexion": config.enable_self_reflexion,
"iteration_period": config.iteration_period,
"reflexion_range": config.reflexion_range,
"baseline": config.baseline,
"statement_granularity": config.statement_granularity,
"include_dialogue_context": config.include_dialogue_context,
"max_context": config.max_context,
"lambda_time": config.lambda_time,
"lambda_mem": config.lambda_mem,
"offset": config.offset,
"created_at": config.created_at,
"updated_at": config.updated_at,
}
data_list.append(config_dict)
# 将 created_at 和 updated_at 转换为 YYYYMMDDHHmmss 格式
return self._convert_timestamps_to_format(data_list)
async def pilot_run_stream(self, payload: ConfigPilotRun) -> AsyncGenerator[str, None]:
"""
流式执行试运行,产生 SSE 格式的进度事件
Args:
payload: 试运行配置和对话文本
Yields:
SSE 格式的字符串,包含以下事件类型:
- 各种阶段名称: 进度更新 (如 starting, knowledge_extraction_complete 等)
- result: 最终结果
- error: 错误信息
- done: 完成标记
Raises:
ValueError: 当配置无效或参数缺失时
RuntimeError: 当管线执行失败时
"""
from pathlib import Path
project_root = str(Path(__file__).resolve().parents[2])
try:
# 发出初始进度事件
yield format_sse_message("starting", {
"message": "开始试运行...",
"time": int(time.time() * 1000)
})
# 步骤 1: 配置加载和验证(数据库优先)
payload_cid = str(getattr(payload, "config_id", "") or "").strip()
cid: Optional[str] = payload_cid if payload_cid else None
if not cid:
raise ValueError("未提供 payload.config_id禁止启动试运行")
# 验证 dialogue_text 必须提供
dialogue_text = payload.dialogue_text.strip() if payload.dialogue_text else ""
logger.info(f"[PILOT_RUN_STREAM] Received dialogue_text length: {len(dialogue_text)}, preview: {dialogue_text[:100]}")
if not dialogue_text:
raise ValueError("试运行模式必须提供 dialogue_text 参数")
# Load configuration from database only using centralized manager
try:
config_service = MemoryConfigService(self.db)
memory_config = config_service.load_memory_config(
config_id=str(cid),
service_name="MemoryStorageService.pilot_run_stream"
)
logger.info(f"Configuration loaded successfully: {memory_config.config_name}")
except ConfigurationError as e:
raise RuntimeError(f"Configuration loading failed: {e}")
# 步骤 2: 创建进度回调函数捕获管线进度
# 使用队列在回调和生成器之间传递进度事件
progress_queue: asyncio.Queue = asyncio.Queue()
async def progress_callback(stage: str, message: str, data: Optional[Dict[str, Any]] = None) -> None:
"""
进度回调函数,将进度事件放入队列
Args:
stage: 阶段标识
message: 进度消息
data: 可选的结果数据(用于传递节点执行结果)
"""
await progress_queue.put((stage, message, data))
# 步骤 3: 在后台任务中执行管线
async def run_pipeline():
"""在后台执行管线并捕获异常"""
try:
from app.services.pilot_run_service import run_pilot_extraction
logger.info(f"[PILOT_RUN_STREAM] Calling run_pilot_extraction with dialogue_text length: {len(dialogue_text)}")
await run_pilot_extraction(
memory_config=memory_config,
dialogue_text=dialogue_text,
db=self.db,
progress_callback=progress_callback,
)
logger.info("[PILOT_RUN_STREAM] pipeline_main completed")
# 标记管线完成
await progress_queue.put(("__PIPELINE_COMPLETE__", "", None))
except Exception as e:
# 将异常放入队列
await progress_queue.put(("__PIPELINE_ERROR__", str(e), None))
# 启动后台任务
pipeline_task = asyncio.create_task(run_pipeline())
# 步骤 4: 从队列中读取进度事件并发出
while True:
try:
# 等待进度事件,设置超时以检测客户端断开
stage, message, data = await asyncio.wait_for(
progress_queue.get(),
timeout=0.5
)
# 检查特殊标记
if stage == "__PIPELINE_COMPLETE__":
break
elif stage == "__PIPELINE_ERROR__":
raise RuntimeError(message)
# 构建进度事件数据
progress_data = {
"message": message,
"time": int(time.time() * 1000)
}
# 如果有结果数据,添加到事件中
if data:
progress_data["data"] = data
# 发出进度事件,使用 stage 作为事件类型
yield format_sse_message(stage, progress_data)
except TimeoutError:
# 超时,继续等待(这允许检测客户端断开)
continue
# 等待管线任务完成
await pipeline_task
# 步骤 5: 读取提取结果
from app.core.config import settings
result_path = settings.get_memory_output_path("extracted_result.json")
if not os.path.isfile(result_path):
raise FileNotFoundError(f"试运行完成,但未找到提取结果文件: {result_path}")
with open(result_path, "r", encoding="utf-8") as rf:
extracted_result = json.load(rf)
# 步骤 6: 发出结果事件
result_data = {
"config_id": cid,
"time_log": os.path.join(project_root, "logs", "time.log"),
"extracted_result": extracted_result,
}
yield format_sse_message("result", result_data)
# 步骤 7: 发出完成事件
yield format_sse_message("done", {
"message": "试运行完成",
"time": int(time.time() * 1000)
})
except asyncio.CancelledError:
# 客户端断开连接
logger.info("[PILOT_RUN_STREAM] Client disconnected during streaming")
raise
except Exception as e:
# 发出错误事件
logger.error(f"[PILOT_RUN_STREAM] Error during streaming: {e}", exc_info=True)
yield format_sse_message("error", {
"code": 5000,
"message": "试运行失败",
"error": str(e),
"time": int(time.time() * 1000)
})
# -------------------- Neo4j Search & Analytics (fused from data_search_service.py) --------------------
# Ensure env for connector (e.g., NEO4J_PASSWORD)
load_dotenv()
_neo4j_connector = Neo4jConnector()
async def search_dialogue(end_user_id: Optional[str] = None) -> Dict[str, Any]:
result = await _neo4j_connector.execute_query(
MemoryConfigRepository.SEARCH_FOR_DIALOGUE,
end_user_id=end_user_id,
)
data = {"search_for": "dialogue", "num": result[0]["num"]}
return data
async def search_chunk(end_user_id: Optional[str] = None) -> Dict[str, Any]:
result = await _neo4j_connector.execute_query(
MemoryConfigRepository.SEARCH_FOR_CHUNK,
end_user_id=end_user_id,
)
data = {"search_for": "chunk", "num": result[0]["num"]}
return data
async def search_statement(end_user_id: Optional[str] = None) -> Dict[str, Any]:
result = await _neo4j_connector.execute_query(
MemoryConfigRepository.SEARCH_FOR_STATEMENT,
end_user_id=end_user_id,
)
data = {"search_for": "statement", "num": result[0]["num"]}
return data
async def search_entity(end_user_id: Optional[str] = None) -> Dict[str, Any]:
result = await _neo4j_connector.execute_query(
MemoryConfigRepository.SEARCH_FOR_ENTITY,
end_user_id=end_user_id,
)
data = {"search_for": "entity", "num": result[0]["num"]}
return data
async def search_all(end_user_id: Optional[str] = None) -> Dict[str, Any]:
result = await _neo4j_connector.execute_query(
MemoryConfigRepository.SEARCH_FOR_ALL,
end_user_id=end_user_id,
)
# 检查结果是否为空或长度不足
if not result or len(result) < 4:
data = {
"total": 0,
"counts": {
"dialogue": 0,
"chunk": 0,
"statement": 0,
"entity": 0,
},
}
return data
data = {
"total": result[-1]["Count"],
"counts": {
"dialogue": result[0]["Count"],
"chunk": result[1]["Count"],
"statement": result[2]["Count"],
"entity": result[3]["Count"],
},
}
return data
async def kb_type_distribution(end_user_id: Optional[str] = None) -> Dict[str, Any]:
"""统一知识库类型分布接口。
聚合 dialogue/chunk/statement/entity 四类计数,返回统一的分布结构,便于前端一次性消费。
"""
result = await _neo4j_connector.execute_query(
MemoryConfigRepository.SEARCH_FOR_ALL,
end_user_id=end_user_id,
)
# 检查结果是否为空或长度不足
if not result or len(result) < 4:
data = {
"total": 0,
"distribution": [
{"type": "dialogue", "count": 0},
{"type": "chunk", "count": 0},
{"type": "statement", "count": 0},
{"type": "entity", "count": 0},
]
}
return data
total = result[-1]["Count"]
distribution = [
{"type": "dialogue", "count": result[0]["Count"]},
{"type": "chunk", "count": result[1]["Count"]},
{"type": "statement", "count": result[2]["Count"]},
{"type": "entity", "count": result[3]["Count"]},
]
data = {"total": total, "distribution": distribution}
return data
async def search_detials(end_user_id: Optional[str] = None) -> List[Dict[str, Any]]:
result = await _neo4j_connector.execute_query(
MemoryConfigRepository.SEARCH_FOR_DETIALS,
end_user_id=end_user_id,
)
return result
async def search_edges(end_user_id: Optional[str] = None) -> List[Dict[str, Any]]:
result = await _neo4j_connector.execute_query(
MemoryConfigRepository.SEARCH_FOR_EDGES,
end_user_id=end_user_id,
)
return result
async def analytics_hot_memory_tags(
db: Session,
current_user: User,
limit: int = 10
) -> List[Dict[str, Any]]:
"""
获取热门记忆标签按数量排序并返回前N个
优化策略:
1. 先从所有用户收集原始标签不调用LLM
2. 聚合并合并相同标签的频率
3. 排序后取前N个
4. 只调用一次LLM进行筛选
"""
workspace_id = current_user.current_workspace_id
# 获取更多标签供LLM筛选获取limit*4个标签
raw_limit = limit * 4
from app.services.memory_dashboard_service import get_workspace_end_users
# 使用 asyncio.to_thread 避免阻塞事件循环
end_users = await asyncio.to_thread(get_workspace_end_users, db, workspace_id, current_user)
if not end_users:
return []
# 步骤1: 收集所有用户的原始标签不调用LLM
connector = Neo4jConnector()
try:
all_raw_tags = []
for end_user in end_users:
raw_tags = await get_raw_tags_from_db(
connector,
str(end_user.id),
limit=raw_limit,
by_user=False
)
if raw_tags:
all_raw_tags.extend(raw_tags)
if not all_raw_tags:
return []
# 步骤2: 聚合相同标签的频率
tag_frequency_map = {}
for tag_name, frequency in all_raw_tags:
if tag_name in tag_frequency_map:
tag_frequency_map[tag_name] += frequency
else:
tag_frequency_map[tag_name] = frequency
# 步骤3: 按频率降序排序取前raw_limit个
sorted_tags = sorted(
tag_frequency_map.items(),
key=lambda x: x[1],
reverse=True
)[:raw_limit]
if not sorted_tags:
return []
# 步骤4: 只调用一次LLM进行筛选
tag_names = [tag for tag, _ in sorted_tags]
# 使用第一个用户的end_user_id来获取LLM配置
# 因为同一工作空间下的用户应该使用相同的配置
first_end_user_id = str(end_users[0].id)
filtered_tag_names = await filter_tags_with_llm(tag_names, first_end_user_id)
# 步骤5: 根据LLM筛选结果构建最终列表保留频率
final_tags = []
for tag, freq in sorted_tags:
if tag in filtered_tag_names:
final_tags.append((tag, freq))
# 步骤6: 只返回前limit个
top_tags = final_tags[:limit]
return [{"name": t, "frequency": f} for t, f in top_tags]
finally:
await connector.close()
async def analytics_recent_activity_stats() -> Dict[str, Any]:
stats, _msg = get_recent_activity_stats()
total = (
stats.get("chunk_count", 0)
+ stats.get("statements_count", 0)
+ stats.get("triplet_entities_count", 0)
+ stats.get("triplet_relations_count", 0)
+ stats.get("temporal_count", 0)
)
# 精简:仅提供“最新一次活动多久前”
latest_relative = None
try:
info = stats.get("log_path", "")
idx = info.rfind("最新:")
if idx != -1:
latest_path = info[idx + 3 :].strip()
if latest_path and os.path.exists(latest_path):
import time
diff = max(0.0, time.time() - os.path.getmtime(latest_path))
m = int(diff // 60)
if m < 1:
latest_relative = "刚刚"
elif m < 60:
latest_relative = "一会前"
else:
latest_relative = "较早前"
except Exception:
pass
data = {"total": total, "stats": stats, "latest_relative": latest_relative}
return data