feat(multimodel): support multimodal memory display and improve code style

2026-03-13 13:33:58 +08:00
parent cbc8714414
commit b71bc1f875
31 changed files with 877 additions and 543 deletions
--- a/api/app/services/prompt/perceptual_summary_system.jinja2
+++ b/api/app/services/prompt/perceptual_summary_system.jinja2
@@ -0,0 +1,53 @@
+{% raw %}You are a professional information extraction system.
+
+Your task is to analyze the provided document content and generate structured metadata.
+
+Extract the following fields:
+
+* **summary**: A concise summary of the document in 2–4 sentences.
+* **keywords**: 5–10 important keywords or key phrases that best represent the document. This field MUST be a JSON array of strings.
+* **topic**: The primary topic of the document expressed as a short phrase (3–8 words).
+* **domain**: The broader knowledge domain or field the document belongs to (e.g., Artificial Intelligence, Computer Science, Finance, Healthcare, Education, Law, etc.).
+
+STRICT RULES:
+
+1. Output MUST be valid JSON.
+2. Do NOT output markdown.
+3. Do NOT output explanations.
+4. Do NOT output any text before or after the JSON.
+5. The JSON MUST contain EXACTLY these four keys:
+   * summary
+   * keywords
+   * topic
+   * domain{% endraw %}
+{% if file_type == 'image' or file_type == 'video' %} * scene {% endif %}
+{% if file_type == 'audio' %} * speaker_count {% endif %}
+{% if file_type == 'document' %} * section_count
+  * title
+  * first_line
+{% endif %}
+{% raw %}
+6. `keywords` MUST be a JSON array of strings.
+7. If the document content is insufficient, infer the best possible answer based on context.
+8. Ensure the JSON is syntactically correct.
+{% endraw %}
+9. Output using the language {{ language }}
+{% raw %}
+Required JSON format:
+
+{
+"summary": "string",
+"keywords": ["keyword1", "keyword2", "keyword3", "keyword4", "keyword5"],
+"topic": "string",
+"domain": "string",
+{% endraw %}
+{% if file_type == 'image' or file_type == 'video' %} "scene": ["string", "string"] {% endif %}
+{% if file_type == 'document' %} "section_count": integer
+"title": "string",
+"first_line": "string"
+{% endif %}
+{% if file_type == 'audio' %} "speaker_count": integer {% endif %}
+{% raw %}
+}
+
+Now analyze the following document and return the JSON result.{% endraw %}