上下文更长 ≠ 更好：为什么 RAG 仍然重要

Elastic 中国社区官方博客

于 2025-07-12 09:24:21 发布

阅读量692

点赞数 32

CC 4.0 BY-SA版权

分类专栏： Elasticsearch AI Elastic 文章标签： elasticsearch 大数据搜索引擎数据库全文检索人工智能 ai

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/UbuntuTouch/article/details/149288509

Elastic 同时被 3 个专栏收录

1947 篇文章

订阅专栏

Elasticsearch

1318 篇文章

订阅专栏

535 篇文章

订阅专栏

作者：来自 Elastic Jeffrey Rengifo 及 Eduard Martin

了解为什么 RAG 策略仍然适用，并能带来最高效、更优的结果。

Elasticsearch 原生集成了业界领先的生成式 AI 工具和服务商。查看我们关于超越 RAG 基础知识或构建可投入生产应用的 Elastic 向量数据库的网络研讨会。

为了为你的用例构建最佳搜索解决方案，可以开始免费云端试用，或在本地机器上尝试 Elastic。

拥有超过 100 万 tokens 的模型并不新鲜；早在一年多前，Google 就发布了具备 100 万 tokens 上下文的 Gemini 1.5。100 万个 tokens 大约相当于 2000 页 A5 文件，这在很多情况下，已经超过我们存储的全部数据。

于是问题出现了：“如果我直接把所有内容都放进 prompt 里会怎样？”

在本文中，我们将对比 RAG 和直接将所有内容发送给长上下文模型，让 LLM 分析上下文并回答问题。

你可以在这里找到完整实验的 notebook。

初步想法

在开始之前，我们可以提出一些待验证的观点：

便利性：并不是很多模型具备长上下文版本，因此我们的选择有限。
性能：LLM 处理 100 万 tokens 的速度应当比从 Elasticsearch 检索 + LLM 处理更小上下文快得多。
价格：每个问题的成本应明显更高。
精准度：RAG 系统可以有效帮助我们过滤噪音，让 LLM 聚焦于重要信息。

虽然将所有内容作为上下文发送是一个优势，但也带来一个挑战：你是否确保抓取了查询中所有相关的文档。Elasticsearch 允许你灵活组合不同策略来搜索正确的文档：过滤器、全文搜索、语义搜索和混合搜索。

测试定义

模型 / RAG 规格

LLM 模型： gemini-2.0-flash
模型提供商： Google
数据集： Elasticsearch search labs 文章集

我们将针对每个测试案例评估以下内容：

LLM 成本
端到端延迟
回答准确性

测试案例

基于一组 Elasticsearch 文章数据集，我们将针对两种不同类型的问题测试两种策略：RAG 和 LLM 全上下文：

文本型：问题内容与文档中原文完全一致。
非文本型：问题内容在文档中并未直接出现，LLM 需要进行推理或整合多个片段的信息。

运行测试

1）索引数据

下载 NDJSON 格式的数据集以运行以下步骤：

以下步骤和截图均来自一个云托管部署。在部署中，进入 “Overview” 页面，向下滚动并点击 “Upload a file”。然后点击 “here”，因为我们需要添加自定义的 mappings。

在新页面中，拖拽包含数据集的 ndjson 文件，然后点击 import。

然后，点击 advanced，输入索引名称，并添加以下 mappings：

{
  "properties": {
    "text": { "type": "text", "copy_to": "semantic_text" },
    "meta_description": { "type": "keyword", "copy_to": "semantic_text" },
    "title": { "type": "keyword", "copy_to": "semantic_text" },
    "imported_at": { "type": "date" },
    "url": { "type": "keyword" },
    "semantic_text": {
      "type": "semantic_text"
    }
  }
}

点击 import 完成操作，并等待数据被索引。

2）文本型 RAG - Textual RAG

我提取了文章《Elasticsearch in JavaScript the proper way, part II》的一段内容，作为查询字符串使用。

query_str = """
Let’s now create a test.js file and install our mock client: Now, add a mock for semantic search: We can now create a test for our code, making sure that the Elasticsearch part will always return the same results: Let’s run the tests.
"""

运行短语匹配查询

这是我们将使用的查询，通过短语匹配搜索功能从 Elasticsearch 中检索结果。我们会将 query_str 作为输入传入短语匹配搜索。

textual_rag_summary = {}  # Variable to store results

start_time = time.time()

es_query = {
    "query": {"match_phrase": {"text": {"query": query_str}}},
    "_source": ["title"],
    "highlight": {
        "pre_tags": [""],
        "post_tags": [""],
        "fields": {"title": {}, "text": {}},
    },
    "size": 10,
}

response = es_client.search(index=index_name, body=es_query)
hits = response["hits"]["hits"]

textual_rag_summary["time"] = (
    time.time() - start_time
)  # save time taken to run the query
textual_rag_summary["es_results"] = hits  # save hits

print("ELASTICSEARCH RESULTS: \n", json.dumps(hits, indent=4))

返回的匹配结果：

ELASTICSEARCH RESULTS: 
 [
    {
        "_index": "technical-articles",
        "_id": "tnWcO5cBTbKqUnB5yeVn",
        "_score": 36.27694,
        "_source": {
            "title": "Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs"
        },
        "highlight": {
            "text": [
                "Let\u2019s now create a test.js file and install our mock client: Now, add a mock for semantic search: We can now create a test for our code, making sure that the Elasticsearch part will always return the same results: Let\u2019s run the tests"
            ]
        }
    }
]

该 prompt 模板为 LLM 提供了回答问题的指令和所需上下文。prompt 的结尾部分，我们请求提供包含我们所需信息的文章。

该 prompt 模板将用于所有测试。

# LLM prompt template
template = """
  Instructions:

  - You are an assistant for question-answering tasks.
  - Answer questions truthfully and factually using only the context presented.
  - If you don't know the answer, just say that you don't know, don't make up an answer.
  - Use markdown format for code examples.
  - You are correct, factual, precise, and reliable.
  - Answer

  Context:
  {context}

  Question:
  {question}.

  What is the title article?
"""

通过 LLM 运行结果

Elasticsearch 的结果将作为上下文提供给 LLM，以便获得所需的答案。我们会提取文章标题和与用户查询相关的重点内容，然后将问题、文章标题和重点发送给 LLM 以找到答案。

start_time = time.time()

prompt = ChatPromptTemplate.from_template(template)

context = ""

for hit in hits:
    # For semantic_text matches, we need to extract the text from the highlighted field
    if "highlight" in hit:
        highlighted_texts = []

        for values in hit["highlight"].values():
            highlighted_texts.extend(values)

        context += f"{hit['_source']['title']}\n"
        context += "\n --- \n".join(highlighted_texts)

# Use LangChain for the LLM part
chain = prompt | llm | StrOutputParser()

printable_prompt = prompt.format(context=context, question=query_str)
print("PROMPT WITH CONTEXT AND QUESTION:\n ", printable_prompt)  # Print prompt

with get_openai_callback() as cb:
    response = chain.invoke({"context": context, "question": query_str})

# Save results
textual_rag_summary["answer"] = response
textual_rag_summary["total_time"] = (time.time() - start_time) + textual_rag_summary[
    "time"
]  # Sum of time taken to run the semantic search and the LLM
textual_rag_summary["tokens_sent"] = cb.prompt_tokens
textual_rag_summary["cost"] = calculate_cost(
    input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens
)

print("LLM Response:\n ", response)

LLM 回复：

 What is the title article?

LLM Response:
  Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs

模型找到了正确的文章。

3）LLM 文本型 - LLM Textual

全匹配查询

为了给 LLM 提供上下文，我们将从 Elasticsearch 索引的文档中获取。我们会发送所有已索引的 303 篇文章，总长度约为 100 万 tokens。

textual_llm_summary = {}  # Variable to store results

start_time = time.time()

es_query = {"query": {"match_all": {}}, "sort": [{"title": "asc"}], "size": 1000}

es_results = es_client.search(index=index_name, body=es_query)
hits = es_results["hits"]["hits"]

# Save results
textual_llm_summary["es_results"] = hits
textual_llm_summary["time"] = time.time() - start_time

print("ELASTICSEARCH RESULTS: \n", json.dumps(hits, indent=4))

ELASTICSEARCH RESULTS: 
 [
    {
        "_index": "technical-articles",
        "_id": "J3WUI5cBTbKqUnB5J83I",
        "_score": null,
        "_source": {
            "meta_description": ".NET articles from Elasticsearch Labs",
            "imported_at": "2025-05-30T18:43:20.036600",
            "text": "Tutorials Examples Integrations Blogs Start free trial .NET Categories All Articles Agent AutoOps ... API Reference Elastic.co Change theme Change theme Sitemap RSS 2025. Elasticsearch B.V. All Rights Reserved.",
            "title": ".NET - Elasticsearch Labs",
            "url": "https://www.elastic.co/search-labs/blog/category/dot-net-programming"
        },
        "sort": [
            ".NET - Elasticsearch Labs"
        ]
    },
   ... 
]

通过 LLM 运行结果

和上一步一样，我们将向 LLM 提供上下文并请求答案。

start_time = time.time()

prompt = ChatPromptTemplate.from_template(template)
# Use LangChain for the LLM part
chain = prompt | llm | StrOutputParser()

printable_prompt = prompt.format(context=context, question=query_str)
print("PROMPT:\n ", printable_prompt)  # Print prompt

with get_openai_callback() as cb:
    response = chain.invoke({"context": hits, "question": query_str})

# Save results
textual_llm_summary["answer"] = response
textual_llm_summary["total_time"] = (time.time() - start_time) + textual_llm_summary[
    "time"
]  # Sum of time taken to run the match_all query and the LLM
textual_llm_summary["tokens_sent"] = cb.prompt_tokens
textual_llm_summary["cost"] = calculate_cost(
    input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens
)

print("LLM Response:\n ", response)  # Print LLM response

LLM 回复：

...  
What is the title article?

LLM Response:
  The title of the article is "Testing your Java code with mocks and real Elasticsearch".

4）非文本型 RAG - RAG non-textual

第二个测试中，我们将使用语义查询从 Elasticsearch 检索结果。为此，我们构建了《Elasticsearch in JavaScript, the proper way, part II》文章的简短摘要作为 query_str，作为输入提供给 RAG。

query_str = "This article explains how to improve code reliability. It includes techniques for error handling, and running applications without managing servers."

从现在开始，代码大多遵循与文本查询测试相同的模式，因此这些部分我们将参考 notebook 中的代码。

运行语义搜索

Notebook 参考：2. Run Comparisons > Test 2: Semantic Query > Executing semantic search。

语义搜索返回的匹配结果：

ELASTICSEARCH RESULTS: 
 [
...
    {
        "_index": "technical-articles",
        "_id": "KHV7MpcBTbKqUnB5TN-F",
        "_score": 0.07619048,
        "_source": {
            "title": "Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs"
        },
        "highlight": {
            "text": [
                "We will review: Production best practices Error handling Testing Serverless environments Running the",
                "how long you want to have access to it.",
                "Conclusion In this article, we learned how to handle errors, which is crucial in production environments",
                "DT By: Drew Tate Integrations How To May 21, 2025 Get set, build: Red Hat OpenShift AI applications powered",
                "KB By: Kofi Bartlett Jump to Production best practices Error handling Testing Serverless environments"
            ]
        }
    },
      ...
]

通过 LLM 运行结果

Notebook 参考：2. Run Comparisons > Test 2: Semantic Query > Run results through LLM

LLM 回复：

...
  What is the title article?

LLM Response:
  Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs

5）LLM 非文本型 - LLM non-textual

全匹配查询

Notebook 参考：2. Run Comparisons > Test 2: Semantic Query > Match all query

全匹配查询返回结果：

ELASTICSEARCH RESULTS: 
 [
    {
        "_index": "technical-articles",
        "_id": "J3WUI5cBTbKqUnB5J83I",
        "_score": null,
        "_source": {
            "meta_description": ".NET articles from Elasticsearch Labs",
            "imported_at": "2025-05-30T18:43:20.036600",
            "text": "Tutorials Examples Integrations Blogs Start free trial .NET Categories All Articles ... to easily utilize Elasticsearch to build advanced search experiences including generative AI, embedding models, reranking capabilities and more. Let's connect Menu Tutorials Examples Integrations Blogs Search Additional Resources Elasticsearch API Reference Elastic.co Change theme Change theme Sitemap RSS 2025. Elasticsearch B.V. All Rights Reserved.",
            "title": ".NET - Elasticsearch Labs",
            "url": "https://www.elastic.co/search-labs/blog/category/dot-net-programming"
        },
        "sort": [
            ".NET - Elasticsearch Labs"
        ]
    },
...
]

通过 LLM 运行结果

Notebook 参考：2. Run Comparisons > Test 2: Semantic Query > Run results through LLM

LLM 回复：

...
 What is the title article?

LLM Response:
  "Elasticsearch in JavaScript the proper way, part II" and "A tutorial on building local agent using LangGraph, LLaMA3 and Elasticsearch vector store from scratch - Elasticsearch Labs" and "Advanced integration tests with real Elasticsearch - Elasticsearch Labs" and "Automatically updating your Elasticsearch index using Node.js and an Azure Function App - Elasticsearch Labs"

测试结果

现在我们来展示测试结果。

文本型查询

	Strategy	Answer	Tokens Sent	Time(s)	LLM Cost
0	Textual RAG	Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs	237	1.281432	0.000029
1	Textual LLM	The title of the article is "Testing your Java code with mocks and real Elasticsearch"	1,023,231	45.647408	0.102330

语义查询

	Strategy	Answer	Tokens Sent	Time(s)	LLM Cost
0	Semantic RAG	Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs	1,328	0.878199	0.000138
1	Semantic LLM	"Elasticsearch in JavaScript the proper way, part II" and "A tutorial on building local agent using LangGraph, LLaMA3 and Elasticsearch vector store from scratch - Elasticsearch Labs" and "Advanced integration tests with real Elasticsearch - Elasticsearch Labs" and "Automatically updating your Elasticsearch index using Node.js and an Azure Function App - Elasticsearch Labs"	1,023,196	44.386912	0.102348