fix(http-request,embedding,naive): tighten form-data validation, reduce truncation length to 8000, and disable chunking for Excel

The form-data validation now ensures all items in the list are of type HttpFormData. Truncation length for embedding inputs is reduced from 8191 to 8000 to accommodate tokenizer differences and avoid overflow. Excel parsing now disables chunking by setting chunk_token_num to 0, aligning with intended behavior for structured file ingestion.
This commit is contained in:
Timebomb2018
2026-04-14 16:14:01 +08:00
parent 0965008210
commit e3265e4ba3
3 changed files with 10 additions and 5 deletions

View File

@@ -675,7 +675,7 @@ def chunk(filename, binary=None, from_page=0, to_page=100000,
parser_config["chunk_token_num"] = 0
else:
sections = [(_, "") for _ in excel_parser(binary) if _]
parser_config["chunk_token_num"] = 12800
parser_config["chunk_token_num"] = 0
elif re.search(r"\.(txt|py|js|java|c|cpp|h|php|go|ts|sh|cs|kt|sql)$", filename, re.IGNORECASE):
callback(0.1, "Start to parse.")