Merge pull request #686 from SuanmoSuanyangTechnology/feature/user-alias
Feature/user alias
This commit is contained in:
@@ -39,6 +39,30 @@
|
||||
比如:输入历史信息内容:[{'Query': '4月27日,我和你推荐过一本书,书名是什么?', 'ANswer': '张曼玉推荐了《小王子》'}]
|
||||
拆分问题:4月27日,我和你推荐过一本书,书名是什么?,可以拆分为:4月27日,张曼玉推荐过一本书,书名是什么?
|
||||
|
||||
## 指代消歧规则(Coreference Resolution):
|
||||
在拆分问题时,必须解析并替换所有指代词和抽象称呼,使问题具体化:
|
||||
|
||||
1. **"用户"的消歧**:
|
||||
- "用户是谁?" → 分析历史记录,找出对话发起者的姓名
|
||||
- 如果历史中有"我叫X"、"我的名字是X"、或多次提到某个人物,则"用户"指的就是这个人
|
||||
- 示例:历史中有"老李的原名叫李建国",则"用户是谁?"应拆分为"李建国是谁?"或"老李(李建国)是谁?"
|
||||
|
||||
2. **"我"的消歧**:
|
||||
- "我喜欢什么?" → 从历史中找出对话发起者的姓名,替换为"X喜欢什么?"
|
||||
- 示例:历史中有"张曼玉推荐了《小王子》",则"我推荐的书是什么?"应拆分为"张曼玉推荐的书是什么?"
|
||||
|
||||
3. **"他/她/它"的消歧**:
|
||||
- 从上下文或历史中找出最近提到的同类实体
|
||||
- 示例:历史中有"老李的同事叫他建国哥",则"他的同事怎么称呼他?"应拆分为"老李的同事怎么称呼他?"
|
||||
|
||||
4. **"那个人/这个人"的消歧**:
|
||||
- 从历史中找出最近提到的人物
|
||||
- 示例:历史中有"李建国",则"那个人的原名是什么?"应拆分为"李建国的原名是什么?"
|
||||
|
||||
5. **优先级**:
|
||||
- 如果历史记录中反复出现某个人物(如"老李"、"李建国"、"建国哥"),则"用户"很可能指的就是这个人
|
||||
- 如果无法从历史中确定指代对象,保留原问题,但在reason中说明"无法确定指代对象"
|
||||
|
||||
|
||||
|
||||
输出要求:
|
||||
@@ -71,6 +95,34 @@
|
||||
"reason": "输出原问题的关键要素"
|
||||
}
|
||||
]
|
||||
|
||||
## 指代消歧示例(重要):
|
||||
示例1 - "用户"的消歧:
|
||||
输入历史:[{'Query': '老李的原名叫什么?', 'Answer': '李建国'}, {'Query': '老李的同事叫他什么?', 'Answer': '建国哥'}]
|
||||
输入问题:"用户是谁?"
|
||||
输出:
|
||||
[
|
||||
{
|
||||
"original_question": "用户是谁?",
|
||||
"extended_question": "李建国是谁?",
|
||||
"type": "单跳",
|
||||
"reason": "历史中反复提到'老李/李建国/建国哥','用户'指的就是对话发起者李建国"
|
||||
}
|
||||
]
|
||||
|
||||
示例2 - "我"的消歧:
|
||||
输入历史:[{'Query': '张曼玉推荐了什么书?', 'Answer': '《小王子》'}]
|
||||
输入问题:"我推荐的书是什么?"
|
||||
输出:
|
||||
[
|
||||
{
|
||||
"original_question": "我推荐的书是什么?",
|
||||
"extended_question": "张曼玉推荐的书是什么?",
|
||||
"type": "单跳",
|
||||
"reason": "历史中提到张曼玉推荐了书,'我'指的就是张曼玉"
|
||||
}
|
||||
]
|
||||
|
||||
**Output format**
|
||||
**CRITICAL JSON FORMATTING REQUIREMENTS:**
|
||||
1. Use only standard ASCII double quotes (") for JSON structure - never use Chinese quotation marks ("") or other Unicode quotes
|
||||
|
||||
@@ -27,6 +27,30 @@
|
||||
比如:输入历史信息内容:[{'Query': '4月27日,我和你推荐过一本书,书名是什么?', 'ANswer': '张曼玉推荐了《小王子》'}]
|
||||
拆分问题:4月27日,我和你推荐过一本书,书名是什么?,可以拆分为:4月27日,张曼玉推荐过一本书,书名是什么?
|
||||
|
||||
## 指代消歧规则(Coreference Resolution):
|
||||
在拆分问题时,必须解析并替换所有指代词和抽象称呼,使问题具体化:
|
||||
|
||||
1. **"用户"的消歧**:
|
||||
- "用户是谁?" → 分析历史记录,找出对话发起者的姓名
|
||||
- 如果历史中有"我叫X"、"我的名字是X"、或多次提到某个人物(如"老李"、"李建国"),则"用户"指的就是这个人
|
||||
- 示例:历史中反复出现"老李/李建国/建国哥",则"用户是谁?"应拆分为"李建国是谁?"或"老李(李建国)是谁?"
|
||||
|
||||
2. **"我"的消歧**:
|
||||
- "我喜欢什么?" → 从历史中找出对话发起者的姓名,替换为"X喜欢什么?"
|
||||
- 示例:历史中有"张曼玉推荐了《小王子》",则"我推荐的书是什么?"应拆分为"张曼玉推荐的书是什么?"
|
||||
|
||||
3. **"他/她/它"的消歧**:
|
||||
- 从上下文或历史中找出最近提到的同类实体
|
||||
- 示例:历史中有"老李的同事叫他建国哥",则"他的同事怎么称呼他?"应拆分为"老李的同事怎么称呼他?"
|
||||
|
||||
4. **"那个人/这个人"的消歧**:
|
||||
- 从历史中找出最近提到的人物
|
||||
- 示例:历史中有"李建国",则"那个人的原名是什么?"应拆分为"李建国的原名是什么?"
|
||||
|
||||
5. **优先级**:
|
||||
- 如果历史记录中反复出现某个人物(如"老李"、"李建国"、"建国哥"),则"用户"很可能指的就是这个人
|
||||
- 如果无法从历史中确定指代对象,保留原问题,但在reason中说明"无法确定指代对象"
|
||||
|
||||
## 指令:
|
||||
你是一个智能数据拆分助手,请根据数据特性判断输入属于哪种类型:
|
||||
单跳(Single-hop)
|
||||
@@ -151,6 +175,34 @@
|
||||
]
|
||||
- 必须通过json.loads()的格式支持的形式输出
|
||||
- 必须通过json.loads()的格式支持的形式输出,响应必须是与此确切模式匹配的有效JSON对象。不要在JSON之前或之后包含任何文本。
|
||||
|
||||
## 指代消歧示例(重要):
|
||||
示例1 - "用户"的消歧:
|
||||
输入历史:[{'Query': '老李的原名叫什么?', 'Answer': '李建国'}, {'Query': '老李的同事叫他什么?', 'Answer': '建国哥'}]
|
||||
输入问题:"用户是谁?"
|
||||
输出:
|
||||
[
|
||||
{
|
||||
"id": "Q1",
|
||||
"question": "李建国是谁?",
|
||||
"type": "单跳",
|
||||
"reason": "历史中反复提到'老李/李建国/建国哥','用户'指的就是对话发起者李建国"
|
||||
}
|
||||
]
|
||||
|
||||
示例2 - "我"的消歧:
|
||||
输入历史:[{'Query': '张曼玉推荐了什么书?', 'Answer': '《小王子》'}]
|
||||
输入问题:"我推荐的书是什么?"
|
||||
输出:
|
||||
[
|
||||
{
|
||||
"id": "Q1",
|
||||
"question": "张曼玉推荐的书是什么?",
|
||||
"type": "单跳",
|
||||
"reason": "历史中提到张曼玉推荐了书,'我'指的就是张曼玉"
|
||||
}
|
||||
]
|
||||
|
||||
- 关键的JSON格式要求
|
||||
1.JSON结构仅使用标准ASCII双引号(“)-切勿使用中文引号(“”)或其他Unicode引号
|
||||
2.如果提取的语句文本包含引号,请使用反斜杠(\“)正确转义它们
|
||||
|
||||
@@ -176,6 +176,22 @@ async def write(
|
||||
)
|
||||
if success:
|
||||
logger.info("Successfully saved all data to Neo4j")
|
||||
|
||||
# 同步用户别名到 PostgreSQL
|
||||
try:
|
||||
# 创建一个临时的 orchestrator 实例来调用同步方法
|
||||
temp_orchestrator = ExtractionOrchestrator(
|
||||
llm_client=llm_client,
|
||||
embedder_client=embedder_client,
|
||||
connector=neo4j_connector,
|
||||
embedding_id=embedding_model_id
|
||||
)
|
||||
await temp_orchestrator._update_end_user_other_name(all_entity_nodes, chunked_dialogs)
|
||||
logger.info("Successfully synced user aliases to PostgreSQL")
|
||||
except Exception as sync_error:
|
||||
logger.error(f"Failed to sync user aliases to PostgreSQL: {sync_error}", exc_info=True)
|
||||
# 不影响主流程
|
||||
|
||||
# 写入成功后,同步等待聚类完成(避免与 Memory Summary 并发冲突)
|
||||
await _trigger_clustering_sync(
|
||||
all_entity_nodes,
|
||||
|
||||
@@ -203,6 +203,7 @@ def accurate_match(
|
||||
) -> Tuple[List[ExtractedEntityNode], Dict[str, str], Dict[str, Dict]]:
|
||||
"""
|
||||
精确匹配:按 (end_user_id, name, entity_type) 合并实体并建立重定向与合并记录。
|
||||
同时检测某实体的 name 是否命中另一实体的 aliases,若命中则直接合并。
|
||||
返回: (deduped_entities, id_redirect, exact_merge_map)
|
||||
"""
|
||||
exact_merge_map: Dict[str, Dict] = {}
|
||||
@@ -240,6 +241,48 @@ def accurate_match(
|
||||
pass
|
||||
|
||||
deduped_entities = list(canonical_map.values())
|
||||
|
||||
# 2) 第二轮:检测某实体的 name 是否命中另一实体的 aliases(alias-to-name 精确合并)
|
||||
# 场景:LLM 把 aliases 中的词(如"齐齐")又单独抽取为独立实体,需在此阶段合并掉
|
||||
# 优化:先构建 (end_user_id, alias_lower) -> canonical 的反向索引,查找 O(1)
|
||||
alias_index: Dict[tuple, ExtractedEntityNode] = {}
|
||||
for canonical in deduped_entities:
|
||||
uid = getattr(canonical, "end_user_id", None)
|
||||
for alias in (getattr(canonical, "aliases", []) or []):
|
||||
alias_lower = alias.strip().lower()
|
||||
if alias_lower:
|
||||
alias_index[(uid, alias_lower)] = canonical
|
||||
|
||||
i = 0
|
||||
while i < len(deduped_entities):
|
||||
ent = deduped_entities[i]
|
||||
ent_name = (getattr(ent, "name", "") or "").strip().lower()
|
||||
ent_uid = getattr(ent, "end_user_id", None)
|
||||
canonical = alias_index.get((ent_uid, ent_name))
|
||||
# 确保不是自身
|
||||
if canonical is not None and canonical.id != ent.id:
|
||||
_merge_attribute(canonical, ent)
|
||||
id_redirect[ent.id] = canonical.id
|
||||
for k, v in list(id_redirect.items()):
|
||||
if v == ent.id:
|
||||
id_redirect[k] = canonical.id
|
||||
try:
|
||||
k = f"{canonical.end_user_id}|{(canonical.name or '').strip()}|{(canonical.entity_type or '').strip()}"
|
||||
if k not in exact_merge_map:
|
||||
exact_merge_map[k] = {
|
||||
"canonical_id": canonical.id,
|
||||
"end_user_id": canonical.end_user_id,
|
||||
"name": canonical.name,
|
||||
"entity_type": canonical.entity_type,
|
||||
"merged_ids": set(),
|
||||
}
|
||||
exact_merge_map[k]["merged_ids"].add(ent.id)
|
||||
except Exception:
|
||||
pass
|
||||
deduped_entities.pop(i)
|
||||
else:
|
||||
i += 1
|
||||
|
||||
return deduped_entities, id_redirect, exact_merge_map
|
||||
|
||||
def fuzzy_match(
|
||||
|
||||
@@ -19,6 +19,7 @@
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import uuid
|
||||
from datetime import datetime
|
||||
from typing import Any, Awaitable, Callable, Dict, List, Optional, Tuple
|
||||
|
||||
@@ -62,6 +63,10 @@ from app.core.memory.storage_services.extraction_engine.pipeline_help import (
|
||||
export_test_input_doc,
|
||||
)
|
||||
from app.core.memory.utils.data.ontology import TemporalInfo
|
||||
from app.db import get_db_context
|
||||
from app.models.end_user_info_model import EndUserInfo
|
||||
from app.repositories.end_user_info_repository import EndUserInfoRepository
|
||||
from app.repositories.end_user_repository import EndUserRepository
|
||||
from app.repositories.neo4j.neo4j_connector import Neo4jConnector
|
||||
|
||||
# 配置日志
|
||||
@@ -1324,6 +1329,151 @@ class ExtractionOrchestrator:
|
||||
perceptual_edges
|
||||
)
|
||||
|
||||
async def _update_end_user_other_name(
|
||||
self,
|
||||
entity_nodes: List[ExtractedEntityNode],
|
||||
dialog_data_list: List[DialogData]
|
||||
) -> None:
|
||||
"""
|
||||
从 Neo4j 读取用户实体的最终 aliases,同步到 end_user 和 end_user_info 表
|
||||
|
||||
注意:
|
||||
1. other_name 使用本次对话提取的第一个别名(保持时间顺序)
|
||||
2. aliases 从 Neo4j 读取(保持完整性)
|
||||
|
||||
Args:
|
||||
entity_nodes: 实体节点列表
|
||||
dialog_data_list: 对话数据列表
|
||||
"""
|
||||
try:
|
||||
if not dialog_data_list:
|
||||
logger.warning("dialog_data_list 为空,跳过用户别名同步")
|
||||
return
|
||||
|
||||
end_user_id = dialog_data_list[0].end_user_id
|
||||
if not end_user_id:
|
||||
logger.warning("end_user_id 为空,跳过用户别名同步")
|
||||
return
|
||||
|
||||
# 1. 提取本次对话的用户别名(保持 LLM 提取的原始顺序,不排序)
|
||||
current_aliases = self._extract_current_aliases(entity_nodes)
|
||||
|
||||
# 2. 从 Neo4j 获取完整 aliases(权威数据源)
|
||||
neo4j_aliases = await self._fetch_neo4j_user_aliases(end_user_id)
|
||||
|
||||
if not neo4j_aliases:
|
||||
# Neo4j 中没有别名,使用本次对话提取的别名
|
||||
neo4j_aliases = current_aliases
|
||||
if not neo4j_aliases:
|
||||
logger.debug(f"aliases 为空,跳过同步: end_user_id={end_user_id}")
|
||||
return
|
||||
|
||||
logger.info(f"本次对话提取的 aliases: {current_aliases}")
|
||||
logger.info(f"Neo4j 中的完整 aliases: {neo4j_aliases}")
|
||||
|
||||
# 3. 同步到数据库
|
||||
end_user_uuid = uuid.UUID(end_user_id)
|
||||
with get_db_context() as db:
|
||||
# 更新 end_user 表
|
||||
end_user = EndUserRepository(db).get_by_id(end_user_uuid)
|
||||
if not end_user:
|
||||
logger.warning(f"未找到 end_user_id={end_user_id} 的用户记录")
|
||||
return
|
||||
|
||||
new_name = self._resolve_other_name(end_user.other_name, current_aliases, neo4j_aliases)
|
||||
if new_name is not None:
|
||||
end_user.other_name = new_name
|
||||
logger.info(f"更新 end_user 表 other_name → {new_name}")
|
||||
else:
|
||||
logger.debug(f"end_user 表 other_name 保持不变: {end_user.other_name}")
|
||||
|
||||
# 更新或创建 end_user_info 记录
|
||||
info = EndUserInfoRepository(db).get_by_end_user_id(end_user_uuid)
|
||||
if info:
|
||||
new_name_info = self._resolve_other_name(info.other_name, current_aliases, neo4j_aliases)
|
||||
if new_name_info is not None:
|
||||
info.other_name = new_name_info
|
||||
logger.info(f"更新 end_user_info 表 other_name → {new_name_info}")
|
||||
if info.aliases != neo4j_aliases:
|
||||
info.aliases = neo4j_aliases
|
||||
logger.info(f"同步 Neo4j aliases 到 end_user_info: {neo4j_aliases}")
|
||||
else:
|
||||
first_alias = current_aliases[0].strip() if current_aliases else ""
|
||||
if first_alias:
|
||||
db.add(EndUserInfo(
|
||||
end_user_id=end_user_uuid,
|
||||
other_name=first_alias,
|
||||
aliases=neo4j_aliases,
|
||||
meta_data={}
|
||||
))
|
||||
logger.info(f"创建 end_user_info 记录,other_name={first_alias}, aliases={neo4j_aliases}")
|
||||
|
||||
db.commit()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"更新 end_user other_name 失败: {e}", exc_info=True)
|
||||
|
||||
|
||||
|
||||
def _extract_current_aliases(self, entity_nodes: List[ExtractedEntityNode]) -> List[str]:
|
||||
"""从实体节点提取用户别名(保持 LLM 提取的原始顺序,不进行任何排序)
|
||||
|
||||
这个方法直接返回 LLM 提取的别名列表,不做任何修改。
|
||||
第一个别名将被用作 other_name。
|
||||
|
||||
Args:
|
||||
entity_nodes: 实体节点列表
|
||||
|
||||
Returns:
|
||||
别名列表(保持 LLM 提取的原始顺序)
|
||||
"""
|
||||
USER_NAMES = {'用户', '我', 'User', 'I'}
|
||||
for entity in entity_nodes:
|
||||
if getattr(entity, 'name', '').strip() in USER_NAMES:
|
||||
aliases = getattr(entity, 'aliases', []) or []
|
||||
logger.debug(f"提取到用户别名(原始顺序): {aliases}")
|
||||
return aliases
|
||||
return []
|
||||
|
||||
|
||||
async def _fetch_neo4j_user_aliases(self, end_user_id: str) -> List[str]:
|
||||
"""从 Neo4j 查询用户实体的完整 aliases 列表"""
|
||||
cypher = """
|
||||
MATCH (e:ExtractedEntity)
|
||||
WHERE e.end_user_id = $end_user_id AND e.name IN ['用户', '我', 'User', 'I']
|
||||
RETURN e.aliases AS aliases
|
||||
LIMIT 1
|
||||
"""
|
||||
result = await Neo4jConnector().execute_query(cypher, end_user_id=end_user_id)
|
||||
if not result:
|
||||
logger.debug(f"Neo4j 中未找到用户实体: end_user_id={end_user_id}")
|
||||
return []
|
||||
aliases = result[0].get('aliases') or []
|
||||
if not aliases:
|
||||
logger.debug(f"Neo4j 用户实体 aliases 为空: end_user_id={end_user_id}")
|
||||
return aliases
|
||||
|
||||
def _resolve_other_name(
|
||||
self,
|
||||
current: Optional[str],
|
||||
current_aliases: List[str],
|
||||
neo4j_aliases: List[str]
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
决定 other_name 是否需要更新,返回新值;无需更新返回 None。
|
||||
|
||||
决策规则:
|
||||
- 为空 → 用本次对话第一个别名
|
||||
- 不在 Neo4j aliases 中 → 用 Neo4j 第一个别名(说明已被删除)
|
||||
- 否则 → 保持不变(返回 None)
|
||||
"""
|
||||
if not current or not current.strip():
|
||||
return current_aliases[0].strip() if current_aliases else None
|
||||
if current not in neo4j_aliases:
|
||||
return neo4j_aliases[0].strip() if neo4j_aliases else None
|
||||
|
||||
return None
|
||||
|
||||
async def _run_dedup_and_write_summary(
|
||||
self,
|
||||
dialogue_nodes: List[DialogueNode],
|
||||
|
||||
@@ -5,6 +5,15 @@
|
||||
===Task===
|
||||
Extract entities and knowledge triplets from the given statement.
|
||||
|
||||
**⚠️ CRITICAL REQUIREMENTS:**
|
||||
1. **ALIASES ORDER IS CRITICAL**: The FIRST alias in the array will be used as the user's primary display name (other_name). You MUST put the most important/frequently used name FIRST.
|
||||
2. **ALWAYS include aliases field**: Even if empty, you MUST include "aliases": [] in EVERY entity.
|
||||
|
||||
<!-- TODO: v0.2.10 - denied_aliases 功能暂时禁用,将通过 Cypher 查询实现
|
||||
2. **DENIED_ALIASES**: When user explicitly denies a name (e.g., "我不叫X", "I'm not called X"), you MUST put X in denied_aliases field, NOT in aliases.
|
||||
3. **ALWAYS include both fields**: Even if empty, you MUST include "aliases": [] and "denied_aliases": [] in EVERY entity.
|
||||
-->
|
||||
|
||||
{% if language == "zh" %}
|
||||
**重要:请使用中文生成实体名称(name)、描述(description)和示例(example)。**
|
||||
{% else %}
|
||||
@@ -18,34 +27,29 @@ Extract entities and knowledge triplets from the given statement.
|
||||
{% if ontology_types %}
|
||||
===Ontology Type Guidance===
|
||||
|
||||
**CRITICAL RULE: You MUST ONLY use the predefined ontology type names listed below for the entity "type" field. Do NOT use any other type names, even if they seem reasonable.**
|
||||
**CRITICAL: Use ONLY predefined type names below. If no exact match, use CLOSEST type. NEVER invent new types.**
|
||||
|
||||
**If no predefined type fits an entity, use the CLOSEST matching predefined type. NEVER invent new type names.**
|
||||
**Type Priority:**
|
||||
1. [场景类型] Scene Types (domain-specific, prefer first)
|
||||
2. [通用类型] General Types (standard ontologies)
|
||||
3. [通用父类] Parent Types (hierarchy context)
|
||||
|
||||
**Type Priority (from highest to lowest):**
|
||||
1. **[场景类型] Scene Types** - Domain-specific types, ALWAYS prefer these first
|
||||
2. **[通用类型] General Types** - Common types from standard ontologies (DBpedia)
|
||||
3. **[通用父类] Parent Types** - Provide type hierarchy context
|
||||
**Rules:**
|
||||
- Type MUST exactly match predefined names
|
||||
- Do NOT modify, translate, or abbreviate type names
|
||||
- Prefer scene types over general types
|
||||
|
||||
**Type Matching Rules:**
|
||||
- Entity type MUST exactly match one of the predefined type names below
|
||||
- Do NOT use types like "Equipment", "Component", "Concept", "Action", "Condition", "Data", "Duration" unless they appear in the predefined list
|
||||
- Do NOT modify, translate, abbreviate, or create variations of type names
|
||||
- Prefer scene types (marked [场景类型]) over general types when both could apply
|
||||
- If uncertain, check the type description to find the best match
|
||||
|
||||
**Predefined Ontology Types:**
|
||||
**Predefined Types:**
|
||||
{{ ontology_types }}
|
||||
|
||||
{% if type_hierarchy_hints %}
|
||||
**Type Hierarchy Reference:**
|
||||
The following shows type inheritance relationships (Child → Parent → Grandparent):
|
||||
**Hierarchy:**
|
||||
{% for hint in type_hierarchy_hints %}
|
||||
- {{ hint }}
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
|
||||
**ALLOWED Type Names (use EXACTLY one of these, no exceptions):**
|
||||
**ALLOWED Names:**
|
||||
{{ ontology_type_names | join(', ') }}
|
||||
|
||||
{% endif %}
|
||||
@@ -62,66 +66,88 @@ The following shows type inheritance relationships (Child → Parent → Grandpa
|
||||
- **Entity descriptions must be in English**
|
||||
- **Examples must be in English**
|
||||
{% endif %}
|
||||
- **Semantic Memory Classification (is_explicit_memory):**
|
||||
* Set to `true` if the entity represents **explicit/semantic memory**:
|
||||
- **Concepts:** "Machine Learning", "Photosynthesis", "Democracy"
|
||||
- **Knowledge:** "Python Programming Language", "Theory of Relativity"
|
||||
- **Definitions:** "API (Application Programming Interface)", "REST API"
|
||||
- **Principles:** "SOLID Principles", "First Law of Thermodynamics"
|
||||
- **Theories:** "Evolution Theory", "Quantum Mechanics"
|
||||
- **Methods/Techniques:** "Agile Development", "Machine Learning Algorithm"
|
||||
- **Technical Terms:** "Neural Network", "Database"
|
||||
* Set to `false` for:
|
||||
- **People:** "John Smith", "Dr. Wang"
|
||||
- **Organizations:** "Microsoft", "Harvard University"
|
||||
- **Locations:** "Beijing", "Central Park"
|
||||
- **Events:** "2024 Conference", "Project Meeting"
|
||||
- **Specific objects:** "iPhone 15", "Building A"
|
||||
- **Example Generation (IMPORTANT for semantic memory entities):**
|
||||
* For entities where `is_explicit_memory=true`, generate a **concise example (around 20 characters)** to help understand the concept
|
||||
* The example should be:
|
||||
- **Specific and concrete**: Use real-world scenarios or applications
|
||||
- **Brief**: Around 20 characters (can be slightly longer if needed for clarity)
|
||||
- **Semantic Memory (is_explicit_memory):**
|
||||
* `true` for: Concepts, Knowledge, Definitions, Theories, Methods (e.g., "Machine Learning", "REST API")
|
||||
* `false` for: People, Organizations, Locations, Events, Specific objects
|
||||
* For `is_explicit_memory=true`, provide concise example (~20 chars{% if language == "zh" %},使用中文{% endif %})
|
||||
|
||||
**🚨🚨🚨 ALIASES & DENIED_ALIASES - MANDATORY FIELDS 🚨🚨🚨**
|
||||
|
||||
**CRITICAL RULES (违反将导致提取失败):**
|
||||
|
||||
1. **EVERY entity MUST have aliases field:**
|
||||
- `"aliases": [...]` - REQUIRED, even if empty `[]`
|
||||
|
||||
2. **ALIASES - 别名提取规则:**
|
||||
{% if language == "zh" %}
|
||||
- **使用中文**
|
||||
- 包含:昵称、全名、简称、别称、网名等
|
||||
- 顺序:**第一个别名将作为用户的主显示名称(other_name),必须把最重要/最常用的名字放在第一位**
|
||||
- 提取顺序:严格按照对话中首次出现的顺序
|
||||
- 示例:
|
||||
* "我叫张三,大家叫我小张" → aliases=["张三", "小张"](张三是第一个,将成为 other_name)
|
||||
* "大家叫我小李,我全名叫李明" → aliases=["小李", "李明"](小李先出现,将成为 other_name)
|
||||
- 空值:如果没有别名,使用 `[]`
|
||||
- 重要:只提取本次对话中明确提到的别名,不要推测或添加未提及的名字
|
||||
{% else %}
|
||||
- **In English**
|
||||
- Include: nicknames, full names, abbreviations, alternative names
|
||||
- Order: **The FIRST alias will be used as the user's primary display name (other_name). Put the most important/frequently used name FIRST**
|
||||
- Extraction order: Strictly follow the order of first appearance in conversation
|
||||
- Examples:
|
||||
* "I'm John, people call me Johnny" → aliases=["John", "Johnny"] (John is first, will become other_name)
|
||||
* "People call me Mike, my full name is Michael" → aliases=["Mike", "Michael"] (Mike appears first, will become other_name)
|
||||
- Empty: If no aliases, use `[]`
|
||||
- Important: Only extract aliases explicitly mentioned in current conversation, do not infer or add unmentioned names
|
||||
{% endif %}
|
||||
* For non-semantic entities (`is_explicit_memory=false`), the example field can be empty
|
||||
- **Aliases Extraction:**
|
||||
|
||||
|
||||
|
||||
3. **USER ENTITY SPECIAL HANDLING:**
|
||||
{% if language == "zh" %}
|
||||
* 别名使用中文
|
||||
- 用户实体的 name 字段:使用 "用户" 或 "我"
|
||||
- 用户的真实姓名:放入 aliases
|
||||
- 示例:
|
||||
* "我叫李明" → name="用户", aliases=["李明"]
|
||||
{% else %}
|
||||
* Aliases should be in English
|
||||
- User entity name field: use "User" or "I"
|
||||
- User's real name: put in aliases
|
||||
- Examples:
|
||||
* "I'm John" → name="User", aliases=["John"]
|
||||
{% endif %}
|
||||
* Include common alternative names, abbreviations and full names
|
||||
* If no aliases exist, use empty array: []
|
||||
- Exclude lengthy quotes, calendar dates, temporal ranges, and temporal expressions
|
||||
- For numeric values: extract as separate entities (instance_of: 'Numeric', name: units, numeric_value: value)
|
||||
Example: £30 → name: 'GBP', numeric_value: 30, instance_of: 'Numeric'
|
||||
|
||||
|
||||
|
||||
4. **ALIASES ORDER:**
|
||||
{% if language == "zh" %}
|
||||
- 顺序优先级:按出现顺序,先出现的在前
|
||||
{% else %}
|
||||
- Order priority: by appearance order, first mentioned comes first
|
||||
{% endif %}
|
||||
|
||||
**EXAMPLES OF CORRECT EXTRACTION:**
|
||||
{% if language == "zh" %}
|
||||
- "我叫张三" → aliases=["张三"] (张三将成为 other_name)
|
||||
- "大家叫我小明,我全名叫李明" → aliases=["小明", "李明"] (小明先出现,将成为 other_name)
|
||||
- "我是李华,网名叫华仔" → aliases=["李华", "华仔"] (李华先出现,将成为 other_name)
|
||||
{% else %}
|
||||
- "I'm John" → aliases=["John"] (John will become other_name)
|
||||
- "People call me Mike, my full name is Michael" → aliases=["Mike", "Michael"] (Mike appears first, will become other_name)
|
||||
- "I'm John Smith, username JSmith" → aliases=["John Smith", "JSmith"] (John Smith appears first, will become other_name)
|
||||
{% endif %}
|
||||
|
||||
- Exclude lengthy quotes, dates, temporal expressions
|
||||
- Numeric values: extract as entities (instance_of: 'Numeric', name: units, numeric_value: value)
|
||||
|
||||
**Triplet Extraction:**
|
||||
- Extract (subject, predicate, object) triplets where:
|
||||
- Subject: main entity performing the action or being described
|
||||
- Predicate: relationship between entities (e.g., 'is', 'works at', 'believes')
|
||||
- Object: entity, value, or concept affected by the predicate
|
||||
- Extract (subject, predicate, object) where subject/object are entities, predicate is relationship
|
||||
{% if language == "zh" %}
|
||||
- subject_name 和 object_name 必须使用中文
|
||||
- subject_name 和 object_name 使用中文
|
||||
{% else %}
|
||||
- subject_name and object_name must be in English (translate if original is in another language)
|
||||
- subject_name and object_name in English
|
||||
{% endif %}
|
||||
- Exclude all temporal expressions from every field
|
||||
- Use ONLY the predicates listed in "Predicate Instructions" (uppercase English tokens)
|
||||
- Do NOT translate predicate tokens
|
||||
- Do NOT include `statement_id` field (assigned automatically)
|
||||
|
||||
**When NOT to extract triplets:**
|
||||
- Non-propositional utterances (emotions, fillers, onomatopoeia)
|
||||
- No clear predicate from the given definitions applies
|
||||
- Standalone noun phrases or checklist items → extract as entities only
|
||||
- Do NOT invent generic predicates (e.g., "IS_DOING", "FEELS", "MENTIONS")
|
||||
|
||||
**If no valid triplet exists:** Return triplets: [], extract entities if present, otherwise both arrays empty.
|
||||
- Use ONLY predicates from "Predicate Instructions" (uppercase tokens)
|
||||
- Exclude temporal expressions, do NOT include `statement_id`
|
||||
- **When NOT to extract:** emotions, fillers, no clear predicate, standalone nouns
|
||||
- **If no valid triplet:** Return triplets: []
|
||||
{%- if predicate_instructions -%}
|
||||
|
||||
**Predicate Instructions:**
|
||||
@@ -207,26 +233,44 @@ Output:
|
||||
{"entity_idx": 0, "name": "三脚架", "type": "Equipment", "description": "摄影器材配件", "example": "", "aliases": ["相机三脚架"], "is_explicit_memory": false}
|
||||
]
|
||||
}
|
||||
|
||||
**Example 4 (别名 - Chinese):** "我的名字是乐力齐,我的小名是齐齐,同事们都叫我小乐"
|
||||
Output:
|
||||
{
|
||||
"triplets": [],
|
||||
"entities": [
|
||||
{"entity_idx": 0, "name": "用户", "type": "Person", "description": "用户本人", "example": "", "aliases": ["乐力齐", "齐齐", "小乐"], "is_explicit_memory": false}
|
||||
]
|
||||
}
|
||||
|
||||
**Example 5 (别名顺序 - Chinese):** "我叫陈思远。对了,我的网名叫「远山」"
|
||||
Output:
|
||||
{
|
||||
"triplets": [],
|
||||
"entities": [
|
||||
{"entity_idx": 0, "name": "用户", "type": "Person", "description": "用户本人", "example": "", "aliases": ["陈思远", "远山"], "is_explicit_memory": false}
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
{% endif %}
|
||||
===End of Examples===
|
||||
|
||||
{% if ontology_types %}
|
||||
**⚠️ REMINDER: The examples above use generic type names for illustration only. You MUST use ONLY the predefined ontology type names from the "ALLOWED Type Names" list above. For example, use "PredictiveMaintenance" instead of "Concept", use "ProductionLine" instead of "Equipment", etc. Map each entity to the closest matching predefined type.**
|
||||
**⚠️ REMINDER: Examples use generic types for illustration. You MUST use predefined types from "ALLOWED Names" above.**
|
||||
{% endif %}
|
||||
|
||||
===Output Format===
|
||||
|
||||
**JSON Requirements:**
|
||||
- Use only ASCII double quotes (") for JSON structure
|
||||
- Never use Chinese quotation marks ("") or Unicode quotes
|
||||
- Escape quotation marks in text with backslashes (\")
|
||||
- Ensure proper string closure and comma separation
|
||||
- No line breaks within JSON string values
|
||||
- Use ASCII double quotes ("), escape with \"
|
||||
- No Chinese quotes (""), no line breaks in strings
|
||||
{% if language == "zh" %}
|
||||
- **语言要求:实体名称(name)、描述(description)、示例(example)、subject_name、object_name 必须使用中文**
|
||||
- **语言:name、description、example、subject_name、object_name 使用中文**
|
||||
{% else %}
|
||||
- **Language Requirement: Entity names, descriptions, examples, subject_name, object_name must be in English**
|
||||
- **If the original text is in Chinese, translate all names to English**
|
||||
- **Language: names, descriptions, examples in English (translate if needed)**
|
||||
{% endif %}
|
||||
- **⚠️ ALIASES ORDER: preserve temporal order of appearance**
|
||||
- **🚨 MANDATORY FIELD: EVERY entity MUST include "aliases" field, even if empty array []**
|
||||
|
||||
{{ json_schema }}
|
||||
|
||||
Reference in New Issue
Block a user