Merge #105 into develop from feature/user-summary

[feature]用户记忆内容扩展

* feature/user-summary: (11 commits squashed)

  - [ADD]Support graph search

  - Merge #82 into develop from feature/20251219_myh
    
    fix: correct function naming for memory retrieval
    
    * feature/20251219_myh: (2 commits squashed)
    
      - perf(workflow): adjust default template to be compatible with frontend format
    
      - fix: correct function naming for memory retrieval
    
    Signed-off-by: Eternity <1533512157@qq.com>
    Reviewed-by: zhuwenhui5566@163.com <zhuwenhui5566@163.com>
    Reviewed-by: aliyun6762716068 <accounts_68cb7c6b61f5dcc4200d6251@mail.teambition.com>
    Merged-by: aliyun6762716068 <accounts_68cb7c6b61f5dcc4200d6251@mail.teambition.com>
    
    CR-link: https://codeup.aliyun.com/redbearai/python/redbear-mem-open/change/82

  - [fix]parsed excel document error:float division by zero

  - [fix]parsed excel document error:float division by zero

  - [fix]parsed excel document error:float division by zero

  - [fix]parsed excel document error:float division by zero

  - [changes]1.Fix the Neo4j alert;2.Separate the functions of &quot;insight&quot; and &quot;summary&quot;

  - [feature]Develop user summary

  - [feature]Developing Memory Insights

  - [changes]Modify the data types and processing procedures of the configuration parameters

  - [fix]fix

Signed-off-by: 乐力齐 <accounts_690c7b0af9007d7e338af636@mail.teambition.com>
Reviewed-by: aliyun6762716068 <accounts_68cb7c6b61f5dcc4200d6251@mail.teambition.com>
Merged-by: aliyun6762716068 <accounts_68cb7c6b61f5dcc4200d6251@mail.teambition.com>

CR-link: https://codeup.aliyun.com/redbearai/python/redbear-mem-open/change/105
This commit is contained in:
乐力齐
2026-01-05 04:34:12 +00:00
committed by 孙科
parent e8a5cfe7e3
commit 1fc81d1347
10 changed files with 881 additions and 249 deletions

View File

@@ -1,7 +1,14 @@
"""
This module provides the MemoryInsight class for analyzing user memory data.
This script can be executed directly to generate a memory insight report for a test user.
MemoryInsight 是一个工具类,提供基础的数据获取和分析功能:
- get_domain_distribution(): 获取记忆领域分布
- get_active_periods(): 获取活跃时段
- get_social_connections(): 获取社交关联
业务逻辑如生成洞察报告应该在服务层user_memory_service.py中实现。
This script can be executed directly to test the memory insight generation for a test user.
"""
import asyncio
@@ -221,25 +228,32 @@ class MemoryInsight:
async def get_social_connections(self) -> dict | None:
"""
Finds the user with whom the most memories are shared.
使用 Chunk-Statement 的 CONTAINS 关系,因为系统中不创建 Dialogue-Statement 的 MENTIONS 关系。
"""
# 通过 Chunk 和 Statement 的 CONTAINS 关系来查找共同记忆
query = f"""
MATCH (d1:Dialogue {{group_id: '{self.user_id}'}})<-[:MENTIONS]-(s:Statement)-[:MENTIONS]->(d2:Dialogue)
WHERE d1 <> d2
RETURN d2.group_id AS other_user_id, COUNT(s) AS common_statements
MATCH (c1:Chunk {{group_id: '{self.user_id}'}})
OPTIONAL MATCH (c1)-[:CONTAINS]->(s:Statement)
OPTIONAL MATCH (s)<-[:CONTAINS]-(c2:Chunk)
WHERE c1.group_id <> c2.group_id AND s IS NOT NULL AND c2 IS NOT NULL
WITH c2.group_id AS other_user_id, COUNT(DISTINCT s) AS common_statements
WHERE common_statements > 0
RETURN other_user_id, common_statements
ORDER BY common_statements DESC
LIMIT 1
"""
records = await self.neo4j_connector.execute_query(query)
if not records:
if not records or not records[0].get("other_user_id"):
return None
most_connected_user = records[0]["other_user_id"]
common_memories_count = records[0]["common_statements"]
# 使用 Chunk 的时间范围
time_range_query = f"""
MATCH (d:Dialogue)
WHERE d.group_id IN ['{self.user_id}', '{most_connected_user}']
RETURN min(d.created_at) AS start_time, max(d.created_at) AS end_time
MATCH (c:Chunk)
WHERE c.group_id IN ['{self.user_id}', '{most_connected_user}']
RETURN min(c.created_at) AS start_time, max(c.created_at) AS end_time
"""
time_records = await self.neo4j_connector.execute_query(time_range_query)
start_year, end_year = "N/A", "N/A"
@@ -253,84 +267,6 @@ class MemoryInsight:
"time_range": f"{start_year}-{end_year}",
}
async def generate_insight_report(self) -> str:
"""
Generates the final insight report in natural language.
"""
domain_dist, active_periods, social_conn = await asyncio.gather(
self.get_domain_distribution(),
self.get_active_periods(),
self.get_social_connections(),
)
prompt_parts = []
if domain_dist:
top_domains = ", ".join([f"{k}({v:.0%})" for k, v in list(domain_dist.items())[:3]])
prompt_parts.append(f"- 核心领域: 用户的记忆主要集中在 {top_domains}")
if active_periods:
months_str = "".join(map(str, active_periods))
prompt_parts.append(f"- 活跃时段: 用户在每年的 {months_str} 月最为活跃。")
if social_conn:
prompt_parts.append(
f"- 社交关联: 与用户\"{social_conn['user_id']}\"拥有最多共同记忆({social_conn['common_memories_count']}条),时间范围主要在 {social_conn['time_range']}"
)
if not prompt_parts:
return "暂无足够数据生成洞察报告。"
system_prompt = '''你是一位资深的个人记忆分析师。你的任务是根据我提供的要点,为用户生成一段简洁、自然、个性化的记忆洞察报告。
重要规则:
1. 报告需要将所有要点流畅地串联成一个段落
2. 语言风格要亲切、易于理解,就像和朋友聊天一样
3. 不要添加任何额外的解释或标题,直接输出报告内容
4. 只使用我提供的要点,不要编造或推测任何信息
5. 如果某个维度没有数据(如没有活跃时段信息),就不要在报告中提及该维度
例如,如果输入是:
- 核心领域: 用户的记忆主要集中在 旅行(38%), 工作(24%), 家庭(21%)。
- 活跃时段: 用户在每年的 4 和 10 月最为活跃。
- 社交关联: 与用户"张明"拥有最多共同记忆(47条),时间范围主要在 2017-2020。
你的输出应该是:
"您的记忆集中在旅行(38%)、工作(24%)和家庭(21%)三大领域。每年4月和10月是您最活跃的记录期可能与春秋季旅行计划相关。您与'张明'共同拥有最多记忆(47条)主要集中在2017-2020年间。"
如果输入只有:
- 核心领域: 用户的记忆主要集中在 教育(65%), 学习(25%)。
你的输出应该是:
"您的记忆主要集中在教育(65%)和学习(25%)两大领域,显示出您对知识和成长的持续关注。"'''
user_prompt = "\n".join(prompt_parts)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
response = await self.llm_client.chat(messages=messages)
# 确保返回字符串类型
content = response.content
if isinstance(content, list):
# 如果是列表格式(如 [{'type': 'text', 'text': '...'}]),提取文本
if len(content) > 0:
if isinstance(content[0], dict):
# 尝试提取 'text' 字段
text = content[0].get('text', content[0].get('content', str(content[0])))
return str(text)
else:
return str(content[0])
return ""
elif isinstance(content, dict):
# 如果是字典格式,提取 text 字段
return str(content.get('text', content.get('content', str(content))))
else:
# 已经是字符串或其他类型,转为字符串
return str(content) if content is not None else ""
async def close(self):
"""
Closes the database connection.
@@ -346,10 +282,13 @@ async def main():
test_user_id = DEFAULT_GROUP_ID
print(f"正在为用户 {test_user_id} 生成记忆洞察报告...\n")
insight = None
try:
insight = MemoryInsight(user_id=test_user_id)
report = await insight.generate_insight_report()
# 使用服务层函数生成报告
from app.services.user_memory_service import analytics_memory_insight_report
result = await analytics_memory_insight_report(end_user_id=test_user_id)
report = result.get("report", "")
print("--- 记忆洞察报告 ---")
print(report)
print("---------------------")
@@ -379,9 +318,6 @@ async def main():
print(f"写入 User-Dashboard.json 失败: {e}")
except Exception as e:
print(f"生成报告时出错: {e}")
finally:
if insight:
await insight.close()
if __name__ == "__main__":

View File

@@ -80,7 +80,7 @@ class UserSummary:
async def close(self):
await self.connector.close()
async def _get_recent_statements(self, limit: int = 80) -> List[StatementRecord]:
async def _get_recent_statements(self, limit: int = 80) -> List[StatementRecord]: # TODO Used by user_memory_service
"""Fetch recent statements authored by the user/group for context."""
query = (
"MATCH (s:Statement) "
@@ -100,70 +100,25 @@ class UserSummary:
async def _get_top_entities(self, limit: int = 30) -> List[Tuple[str, int]]:
"""Reuse hot tag logic to get meaningful entities and their frequencies."""
# get_hot_memory_tags internally filters out non-meaningful nouns with LLM
return await get_hot_memory_tags(self.user_id, limit=limit)
return await get_hot_memory_tags(self.user_id, limit=limit) # TODO Used by user_memory_service
async def generate(self) -> str:
"""Generate a Chinese '关于我' style summary using the LLM."""
# 1) Collect context
entities = await self._get_top_entities(limit=40)
statements = await self._get_recent_statements(limit=100)
entity_lines = [f"{name} ({freq})" for name, freq in entities][:20]
statement_samples = [s.statement.strip() for s in statements if (s.statement or '').strip()][:20]
# 2) Compose prompt
system_prompt = (
"你是一位中文信息压缩助手。请基于提供的实体与语句,"
"生成非常简洁的用户摘要,禁止臆测或虚构。要求:\n"
"- 34 句,总字数不超过 120\n"
"- 先交代身份/城市,其次长期兴趣或习惯,最后给一两项代表性经历;\n"
"- 避免形容词堆砌与空话,不用项目符号,不分段;\n"
"- 使用客观的第三人称描述,语气克制、中立。"
)
user_content_parts = [
f"用户ID: {self.user_id}",
"核心实体与频次: " + (", ".join(entity_lines) if entity_lines else "(空)"),
"代表性语句样本: " + (" | ".join(statement_samples) if statement_samples else "(空)"),
]
user_prompt = "\n".join(user_content_parts)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
]
# 3) Call LLM
response = await self.llm.chat(messages=messages)
async def generate_user_summary(user_id: str | None = None) -> str: # TODO useless
"""
生成用户摘要的便捷函数
Args:
user_id: 可选的用户ID
# 确保返回字符串类型
content = response.content
if isinstance(content, list):
# 如果是列表格式(如 [{'type': 'text', 'text': '...'}]),提取文本
if len(content) > 0:
if isinstance(content[0], dict):
# 尝试提取 'text' 字段
text = content[0].get('text', content[0].get('content', str(content[0])))
return str(text)
else:
return str(content[0])
return ""
elif isinstance(content, dict):
# 如果是字典格式,提取 text 字段
return str(content.get('text', content.get('content', str(content))))
else:
# 已经是字符串或其他类型,转为字符串
return str(content) if content is not None else ""
async def generate_user_summary(user_id: str | None = None) -> str:
# 默认从环境变量读取
effective_group_id = user_id or DEFAULT_GROUP_ID
svc = UserSummary(effective_group_id)
try:
return await svc.generate()
finally:
await svc.close()
Returns:
用户摘要字符串
"""
# 导入服务层函数
from app.services.user_memory_service import analytics_user_summary
# 调用服务层函数
result = await analytics_user_summary(user_id)
return result.get("summary", "")
if __name__ == "__main__":

View File

@@ -316,3 +316,73 @@ async def render_emotion_suggestions_prompt(
})
return rendered_prompt
async def render_user_summary_prompt(
user_id: str,
entities: str,
statements: str
) -> str:
"""
Renders the user summary prompt using the user_summary.jinja2 template.
Args:
user_id: User identifier
entities: Core entities with frequency information
statements: Representative statement samples
Returns:
Rendered prompt content as string
"""
template = prompt_env.get_template("user_summary.jinja2")
rendered_prompt = template.render(
user_id=user_id,
entities=entities,
statements=statements
)
# 记录渲染结果到提示日志
log_prompt_rendering('user summary', rendered_prompt)
# 可选:记录模板渲染信息
log_template_rendering('user_summary.jinja2', {
'user_id': user_id,
'entities_len': len(entities),
'statements_len': len(statements)
})
return rendered_prompt
async def render_memory_insight_prompt(
domain_distribution: str = None,
active_periods: str = None,
social_connections: str = None
) -> str:
"""
Renders the memory insight prompt using the memory_insight.jinja2 template.
Args:
domain_distribution: 核心领域分布信息
active_periods: 活跃时段信息
social_connections: 社交关联信息
Returns:
Rendered prompt content as string
"""
template = prompt_env.get_template("memory_insight.jinja2")
rendered_prompt = template.render(
domain_distribution=domain_distribution,
active_periods=active_periods,
social_connections=social_connections
)
# 记录渲染结果到提示日志
log_prompt_rendering('memory insight', rendered_prompt)
# 可选:记录模板渲染信息
log_template_rendering('memory_insight.jinja2', {
'has_domain_distribution': bool(domain_distribution),
'has_active_periods': bool(active_periods),
'has_social_connections': bool(social_connections)
})
return rendered_prompt

View File

@@ -0,0 +1,152 @@
{% macro tidy(name) -%}
{{ name.replace('_', ' ')}}
{%- endmacro %}
===Task===
Your task is to generate a comprehensive memory insight report based on the provided data analysis. The report should include four distinct sections that capture different aspects of the user's memory patterns and characteristics.
===Inputs===
{% if domain_distribution %}
- 核心领域分布: {{ domain_distribution }}
{% endif %}
{% if active_periods %}
- 活跃时段: {{ active_periods }}
{% endif %}
{% if social_connections %}
- 社交关联: {{ social_connections }}
{% endif %}
===Report Generation Requirements===
**General Guidelines:**
1. Base your analysis ONLY on the provided data - do not speculate or fabricate information
2. Use objective third-person descriptions with a professional and analytical tone
3. Avoid excessive adjectives and empty phrases
4. Strictly follow the output format specified below
5. If a dimension lacks data, skip that section or provide a brief note
**Section-Specific Requirements:**
1. **总体概述 (Overview)** (100-150 Chinese characters)
- Focus on: Overall analysis of user profile based on interaction logs
- Describe the user's main role, work network, and collaboration spirit
- Use professional, data-driven language style
- Example reference: "通过对156次交互日志的深度分析系统发现三层一位主要用户档案和数据分析的产品经理。他的工作网络体现出鲜明的目标导向和团队协作精神。"
2. **行为模式 (Behavior Pattern)** (80-120 Chinese characters)
- Focus on: Work patterns, time regularity, and behavioral characteristics
- Describe weekly work patterns and time preferences
- Use objective, analytical language
- Example reference: "张三的工作模式呈现出鲜明的周期性:周一通常用于规划和会议,周三周四专注于产品设计和用户研究,周五进行总结和复盘。他倾向于在上午进行头脑风暴,下午处理执行性工作。"
3. **关键发现 (Key Findings)** (3-4 bullet points, 30-50 characters each)
- Focus on: Specific, insightful observations about user behavior and preferences
- Use bullet points (•) format
- Each finding should be concrete and data-supported
- Example reference:
"• 在产品决策中张三总是优先考虑用户反应这在68%的决策记录中得到体现
• 他善于使用数据可视化工具来支持论点,这种习惯在项目管理中发挥了重要作用
• 团队成员对他的评价中,"思路清晰"和"思路敏捷"两个关键词出现频率最高
• 他对AI机器学习领域保持持续关注近3个月参加了7次相关培训"
4. **成长轨迹 (Growth Trajectory)** (100-150 Chinese characters)
- Focus on: User's growth journey, key milestones, and capability improvements
- Organize content chronologically
- Highlight role changes and achievements
- Use positive, encouraging tone
- Example reference: "从入职时的产品经理成长为高级产品经理,张三在产品单独、团队管理和技术理解三个方面都有显著提升。特别是在最近一年,他开始独立主导更复杂的项目,展现出更强的战略思维能力。他的成长轨迹显示出对新技术的持续学习和对产品思维的不断深化。"
===Output Format (MUST STRICTLY FOLLOW)===
【总体概述】
[100-150 characters describing overall user profile and work network based on interaction analysis]
【行为模式】
[80-120 characters describing work patterns, time regularity, and behavioral characteristics]
【关键发现】
• [First key finding with data support, 30-50 characters]
• [Second key finding with data support, 30-50 characters]
• [Third key finding with data support, 30-50 characters]
• [Fourth key finding with data support, 30-50 characters]
【成长轨迹】
[100-150 characters describing growth journey, milestones, and capability improvements]
===Example===
Example Input:
- 核心领域分布: 产品管理(38%), 数据分析(24%), 团队协作(21%)
- 活跃时段: 用户在每年的 4 和 10 月最为活跃
- 社交关联: 与用户"李明"拥有最多共同记忆(47条),时间范围主要在 2020-2023
Example Output:
【总体概述】
通过对156次交互日志的深度分析系统发现张三是一位主要从事用户档案和数据分析的产品经理。他的工作网络体现出鲜明的目标导向和团队协作精神在产品管理、数据分析和团队协作三个领域都有深入的实践。
【行为模式】
张三的工作模式呈现出鲜明的周期性周一通常用于规划和会议周三周四专注于产品设计和用户研究周五进行总结和复盘。他倾向于在上午进行头脑风暴下午处理执行性工作。每年4月和10月是他最活跃的时期。
【关键发现】
• 在产品决策中张三总是优先考虑用户反应这在68%的决策记录中得到体现
• 他善于使用数据可视化工具来支持论点,这种习惯在项目管理中发挥了重要作用
• 团队成员对他的评价中,"思路清晰"和"思路敏捷"两个关键词出现频率最高
• 他对AI机器学习领域保持持续关注近3个月参加了7次相关培训
【成长轨迹】
从入职时的产品经理成长为高级产品经理张三在产品规划、团队管理和技术理解三个方面都有显著提升。特别是在最近一年他开始独立主导更复杂的项目展现出更强的战略思维能力。他与李明的47条共同记忆见证了他的成长历程。
===End of Example===
===Reflection Process===
After generating the report, perform the following self-review steps:
**Step 1: Data Grounding Check**
- Verify all statements are supported by the provided data
- Ensure no fabricated or speculated information is included
- Confirm all claims can be traced back to the input data
**Step 2: Format Compliance**
- Verify each section follows the specified format with section headers
- Check character count limits for each section
- Ensure proper use of section markers (【】)
- Verify bullet points format for Key Findings section
**Step 3: Tone and Style Review**
- Confirm objective third-person perspective is maintained
- Check for excessive adjectives or empty phrases
- Verify professional and analytical tone throughout
**Step 4: Completeness Check**
- Ensure all four sections are present and complete
- Verify each section addresses its specific focus area
- Confirm the report provides actionable insights
===Output Requirements===
**LANGUAGE REQUIREMENT:**
- The output language should ALWAYS be Chinese (Simplified)
- All section content must be in Chinese
- Section headers must use the specified Chinese format: 【总体概述】【行为模式】【关键发现】【成长轨迹】
**FORMAT REQUIREMENT:**
- Each section must start with its header on a new line
- Content follows immediately after the header
- Sections are separated by blank lines
- Key Findings section must use bullet points (•)
- Strictly adhere to character limits for each section
**CONTENT REQUIREMENT:**
- Only use provided data points
- Do not fabricate or speculate information
- If data is insufficient for a section, provide a brief note or skip
- Maintain professional, analytical tone throughout

View File

@@ -0,0 +1,124 @@
{% macro tidy(name) -%}
{{ name.replace('_', ' ')}}
{%- endmacro %}
===Task===
Your task is to generate a comprehensive user profile based on the provided entities and statements. The profile should include four distinct sections that capture different aspects of the user's identity and characteristics.
===Inputs===
{% if user_id %}
- User ID: {{ user_id }}
{% endif %}
{% if entities %}
- Core Entities & Frequency: {{ entities }}
{% endif %}
{% if statements %}
- Representative Statement Samples: {{ statements }}
{% endif %}
===Profile Generation Requirements===
**General Guidelines:**
1. Base your analysis ONLY on the provided data - do not speculate or fabricate information
2. Use objective third-person descriptions with a restrained and neutral tone
3. Avoid excessive adjectives and empty phrases
4. Strictly follow the output format specified below
**Section-Specific Requirements:**
1. **Basic Introduction** (4-5 sentences, max 150 Chinese characters)
- Focus on: identity, occupation, location, and other basic demographic information
- Provide factual background about who the user is
2. **Personality Traits** (2-3 sentences, max 80 Chinese characters)
- Focus on: personality characteristics, behavioral habits, communication style
- Describe observable patterns in how the user interacts and behaves
3. **Core Values** (1-2 sentences, max 50 Chinese characters)
- Focus on: values, beliefs, goals, and aspirations
- Capture what matters most to the user and what drives their decisions
4. **One-Sentence Summary** (1 sentence, max 40 Chinese characters)
- Provide a highly condensed characterization of the user's core traits
- Similar to a personal tagline or motto that captures their essence
===Output Format (MUST STRICTLY FOLLOW)===
【基本介绍】
[4-5 sentences describing the user's basic identity, occupation, and location]
【性格特点】
[2-3 sentences describing the user's personality traits, behavioral habits, and communication style]
【核心价值观】
[1-2 sentences describing the user's values, beliefs, and goals]
【一句话总结】
[1 sentence providing a highly condensed summary of the user's core characteristics]
===Example===
Example Input:
- User ID: user_12345
- Core Entities & Frequency: 产品经理 (15), AI (12), 深圳 (10), 数据分析 (8), 团队协作 (7)
- Representative Statement Samples: 我在深圳从事产品经理工作已经5年了 | 我相信好的产品源于对用户需求的深刻理解 | 我喜欢在团队中起到协调作用 | 数据驱动决策是我的工作原则
Example Output:
【基本介绍】
我是张三一名充满热情的高级产品经理。在过去的5年里我专注于AI和数据驱动的产品设计致力于创造能够真正改善用户生活的产品。我相信好的产品源于对用户需求的深刻理解和对技术可能性的不断探索。
【性格特点】
性格开朗,善于沟通,注重细节。喜欢在团队中起到协调作用,帮助大家达成共识。面对挑战时保持乐观,相信每个问题都有解决方案。
【核心价值观】
用户至上、数据驱动、持续学习、团队协作
【一句话总结】
"让每一个产品决策都充满温度。"
===End of Example===
===Reflection Process===
After generating the profile, perform the following self-review steps:
**Step 1: Data Grounding Check**
- Verify all statements are supported by the provided entities and statements
- Ensure no fabricated or speculated information is included
- Confirm all claims can be traced back to the input data
**Step 2: Format Compliance**
- Verify each section follows the specified format with section headers
- Check character count limits for each section
- Ensure proper use of section markers (【】)
**Step 3: Tone and Style Review**
- Confirm objective third-person perspective is maintained
- Check for excessive adjectives or empty phrases
- Verify neutral and restrained tone throughout
**Step 4: Completeness Check**
- Ensure all four sections are present and complete
- Verify each section addresses its specific focus area
- Confirm the one-sentence summary effectively captures the user's essence
===Output Requirements===
**LANGUAGE REQUIREMENT:**
- The output language should ALWAYS be Chinese (Simplified)
- All section content must be in Chinese
- Section headers must use the specified Chinese format: 【基本介绍】【性格特点】【核心价值观】【一句话总结】
**FORMAT REQUIREMENT:**
- Each section must start with its header on a new line
- Content follows immediately after the header
- Sections are separated by blank lines
- Strictly adhere to character limits for each section