refactor(memory): enhance extraction ontology and add assistant pruning graph support

- Expand entity type ontology with detailed definitions, examples, and notes
  (merged types: 地点设施, 物品设备, 产品服务, 软件平台, 角色职业, 知识能力, 偏好习惯目标, 称呼别名, 智能体)
- Add relation ontology taxonomy with 15 predicate categories and usage rules
- Strengthen reference resolution rules: resolve pronouns before extraction,
  skip unresolvable references entirely
- Add guidelines to avoid extracting abstract propositions, emotions, and
  low-value entities (effort/reward/success patterns)
- Add 7 new extraction examples covering edge cases
- Add AssistantOriginal/AssistantPruned node models and graph persistence
  (PRUNED_TO and BELONGS_TO_DIALOG edges, Neo4j indexes and constraints)
- Add graph_build_step.py for building graph nodes/edges from DialogData
- Update write_pipeline.py to pass assistant pruning nodes/edges to graph saver
- Update data_pruning.py with related preprocessing changes
This commit is contained in:
lanceyq
2026-04-28 13:32:29 +08:00
parent 2355536b44
commit 7747ed7ac1
11 changed files with 917 additions and 421 deletions

View File

@@ -46,6 +46,12 @@ async def create_fulltext_indexes():
OPTIONS { indexConfig: { `fulltext.analyzer`: 'cjk' } }
""")
# 创建 AssistantPruned 剪枝文本全文索引
await connector.execute_query("""
CREATE FULLTEXT INDEX assistantPrunedFulltext IF NOT EXISTS FOR (p:AssistantPruned) ON EACH [p.text]
OPTIONS { indexConfig: { `fulltext.analyzer`: 'cjk' } }
""")
finally:
await connector.close()
@@ -135,6 +141,17 @@ async def create_vector_indexes():
`vector.similarity_function`: 'cosine'
}}
""")
# AssistantPruned text embedding index (optional, for semantic search on pruned hints)
await connector.execute_query("""
CREATE VECTOR INDEX assistant_pruned_embedding_index IF NOT EXISTS
FOR (p:AssistantPruned)
ON p.text_embedding
OPTIONS {indexConfig: {
`vector.dimensions`: 1024,
`vector.similarity_function`: 'cosine'
}}
""")
finally:
await connector.close()
@@ -179,6 +196,22 @@ async def create_unique_constraints():
"""
)
# AssistantOriginal.id unique
await connector.execute_query(
"""
CREATE CONSTRAINT assistant_original_id_unique IF NOT EXISTS
FOR (o:AssistantOriginal) REQUIRE o.id IS UNIQUE
"""
)
# AssistantPruned.id unique
await connector.execute_query(
"""
CREATE CONSTRAINT assistant_pruned_id_unique IF NOT EXISTS
FOR (p:AssistantPruned) REQUIRE p.id IS UNIQUE
"""
)
finally:
await connector.close()