上下文更长 ≠ 更好:为什么 RAG 仍然重要

作者:来自 Elastic Jeffrey Rengifo 及 Eduard Martin

了解为什么 RAG 策略仍然适用,并能带来最高效、更优的结果。

Elasticsearch 原生集成了业界领先的生成式 AI 工具和服务商。查看我们关于超越 RAG 基础知识或构建可投入生产应用的 Elastic 向量数据库的网络研讨会。

为了为你的用例构建最佳搜索解决方案,可以开始免费云端试用,或在本地机器上尝试 Elastic。


拥有超过 100 万 tokens 的模型并不新鲜;早在一年多前,Google 就发布了具备 100 万 tokens 上下文的 Gemini 1.5。100 万个 tokens 大约相当于 2000 页 A5 文件,这在很多情况下,已经超过我们存储的全部数据。

于是问题出现了:“如果我直接把所有内容都放进 prompt 里会怎样?

在本文中,我们将对比 RAG 和直接将所有内容发送给长上下文模型,让 LLM 分析上下文并回答问题。

你可以在这里找到完整实验的 notebook。

初步想法

在开始之前,我们可以提出一些待验证的观点:

  • 便利性:并不是很多模型具备长上下文版本,因此我们的选择有限。
  • 性能:LLM 处理 100 万 tokens 的速度应当比从 Elasticsearch 检索 + LLM 处理更小上下文快得多。
  • 价格:每个问题的成本应明显更高。
  • 精准度:RAG 系统可以有效帮助我们过滤噪音,让 LLM 聚焦于重要信息。

虽然将所有内容作为上下文发送是一个优势,但也带来一个挑战:你是否确保抓取了查询中所有相关的文档。Elasticsearch 允许你灵活组合不同策略来搜索正确的文档:过滤器、全文搜索、语义搜索和混合搜索。

测试定义

模型 / RAG 规格

  • LLM 模型: gemini-2.0-flash
  • 模型提供商: Google
  • 数据集: Elasticsearch search labs 文章集

我们将针对每个测试案例评估以下内容:

  • LLM 成本
  • 端到端延迟
  • 回答准确性

测试案例

基于一组 Elasticsearch 文章数据集,我们将针对两种不同类型的问题测试两种策略:RAG 和 LLM 全上下文:

  • 文本型:问题内容与文档中原文完全一致。

  • 非文本型:问题内容在文档中并未直接出现,LLM 需要进行推理或整合多个片段的信息。

运行测试

1)索引数据

下载 NDJSON 格式的数据集以运行以下步骤:

以下步骤和截图均来自一个云托管部署。在部署中,进入 “Overview” 页面,向下滚动并点击 “Upload a file”。然后点击 “here”,因为我们需要添加自定义的 mappings。

在新页面中,拖拽包含数据集的 ndjson 文件,然后点击 import。

然后,点击 advanced,输入索引名称,并添加以下 mappings:

{
  "properties": {
    "text": { "type": "text", "copy_to": "semantic_text" },
    "meta_description": { "type": "keyword", "copy_to": "semantic_text" },
    "title": { "type": "keyword", "copy_to": "semantic_text" },
    "imported_at": { "type": "date" },
    "url": { "type": "keyword" },
    "semantic_text": {
      "type": "semantic_text"
    }
  }
}

点击 import 完成操作,并等待数据被索引。

2)文本型 RAG - Textual RAG

我提取了文章《Elasticsearch in JavaScript the proper way, part II》的一段内容,作为查询字符串使用。

query_str = """
Let’s now create a test.js file and install our mock client: Now, add a mock for semantic search: We can now create a test for our code, making sure that the Elasticsearch part will always return the same results: Let’s run the tests.
"""

运行短语匹配查询

这是我们将使用的查询,通过短语匹配搜索功能从 Elasticsearch 中检索结果。我们会将 query_str 作为输入传入短语匹配搜索。

textual_rag_summary = {}  # Variable to store results

start_time = time.time()

es_query = {
    "query": {"match_phrase": {"text": {"query": query_str}}},
    "_source": ["title"],
    "highlight": {
        "pre_tags": [""],
        "post_tags": [""],
        "fields": {"title": {}, "text": {}},
    },
    "size": 10,
}

response = es_client.search(index=index_name, body=es_query)
hits = response["hits"]["hits"]

textual_rag_summary["time"] = (
    time.time() - start_time
)  # save time taken to run the query
textual_rag_summary["es_results"] = hits  # save hits

print("ELASTICSEARCH RESULTS: \n", json.dumps(hits, indent=4))

返回的匹配结果:

ELASTICSEARCH RESULTS: 
 [
    {
        "_index": "technical-articles",
        "_id": "tnWcO5cBTbKqUnB5yeVn",
        "_score": 36.27694,
        "_source": {
            "title": "Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs"
        },
        "highlight": {
            "text": [
                "Let\u2019s now create a test.js file and install our mock client: Now, add a mock for semantic search: We can now create a test for our code, making sure that the Elasticsearch part will always return the same results: Let\u2019s run the tests"
            ]
        }
    }
]

该 prompt 模板为 LLM 提供了回答问题的指令和所需上下文。prompt 的结尾部分,我们请求提供包含我们所需信息的文章。

该 prompt 模板将用于所有测试。

# LLM prompt template
template = """
  Instructions:

  - You are an assistant for question-answering tasks.
  - Answer questions truthfully and factually using only the context presented.
  - If you don't know the answer, just say that you don't know, don't make up an answer.
  - Use markdown format for code examples.
  - You are correct, factual, precise, and reliable.
  - Answer

  Context:
  {context}

  Question:
  {question}.

  What is the title article?
"""

通过 LLM 运行结果

Elasticsearch 的结果将作为上下文提供给 LLM,以便获得所需的答案。我们会提取文章标题和与用户查询相关的重点内容,然后将问题、文章标题和重点发送给 LLM 以找到答案。

start_time = time.time()

prompt = ChatPromptTemplate.from_template(template)

context = ""

for hit in hits:
    # For semantic_text matches, we need to extract the text from the highlighted field
    if "highlight" in hit:
        highlighted_texts = []

        for values in hit["highlight"].values():
            highlighted_texts.extend(values)

        context += f"{hit['_source']['title']}\n"
        context += "\n --- \n".join(highlighted_texts)

# Use LangChain for the LLM part
chain = prompt | llm | StrOutputParser()

printable_prompt = prompt.format(context=context, question=query_str)
print("PROMPT WITH CONTEXT AND QUESTION:\n ", printable_prompt)  # Print prompt

with get_openai_callback() as cb:
    response = chain.invoke({"context": context, "question": query_str})

# Save results
textual_rag_summary["answer"] = response
textual_rag_summary["total_time"] = (time.time() - start_time) + textual_rag_summary[
    "time"
]  # Sum of time taken to run the semantic search and the LLM
textual_rag_summary["tokens_sent"] = cb.prompt_tokens
textual_rag_summary["cost"] = calculate_cost(
    input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens
)

print("LLM Response:\n ", response)

LLM 回复:

 What is the title article?

LLM Response:
  Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs

模型找到了正确的文章。

3)LLM 文本型 - LLM Textual

全匹配查询

为了给 LLM 提供上下文,我们将从 Elasticsearch 索引的文档中获取。我们会发送所有已索引的 303 篇文章,总长度约为 100 万 tokens。

textual_llm_summary = {}  # Variable to store results

start_time = time.time()

es_query = {"query": {"match_all": {}}, "sort": [{"title": "asc"}], "size": 1000}

es_results = es_client.search(index=index_name, body=es_query)
hits = es_results["hits"]["hits"]

# Save results
textual_llm_summary["es_results"] = hits
textual_llm_summary["time"] = time.time() - start_time

print("ELASTICSEARCH RESULTS: \n", json.dumps(hits, indent=4))
ELASTICSEARCH RESULTS: 
 [
    {
        "_index": "technical-articles",
        "_id": "J3WUI5cBTbKqUnB5J83I",
        "_score": null,
        "_source": {
            "meta_description": ".NET articles from Elasticsearch Labs",
            "imported_at": "2025-05-30T18:43:20.036600",
            "text": "Tutorials Examples Integrations Blogs Start free trial .NET Categories All Articles Agent AutoOps ... API Reference Elastic.co Change theme Change theme Sitemap RSS 2025. Elasticsearch B.V. All Rights Reserved.",
            "title": ".NET - Elasticsearch Labs",
            "url": "https://www.elastic.co/search-labs/blog/category/dot-net-programming"
        },
        "sort": [
            ".NET - Elasticsearch Labs"
        ]
    },
   ... 
]

通过 LLM 运行结果

和上一步一样,我们将向 LLM 提供上下文并请求答案。

start_time = time.time()

prompt = ChatPromptTemplate.from_template(template)
# Use LangChain for the LLM part
chain = prompt | llm | StrOutputParser()

printable_prompt = prompt.format(context=context, question=query_str)
print("PROMPT:\n ", printable_prompt)  # Print prompt

with get_openai_callback() as cb:
    response = chain.invoke({"context": hits, "question": query_str})

# Save results
textual_llm_summary["answer"] = response
textual_llm_summary["total_time"] = (time.time() - start_time) + textual_llm_summary[
    "time"
]  # Sum of time taken to run the match_all query and the LLM
textual_llm_summary["tokens_sent"] = cb.prompt_tokens
textual_llm_summary["cost"] = calculate_cost(
    input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens
)

print("LLM Response:\n ", response)  # Print LLM response

LLM 回复:

...  
What is the title article?

LLM Response:
  The title of the article is "Testing your Java code with mocks and real Elasticsearch".

4)非文本型 RAG - RAG non-textual

第二个测试中,我们将使用语义查询从 Elasticsearch 检索结果。为此,我们构建了《Elasticsearch in JavaScript, the proper way, part II》文章的简短摘要作为 query_str,作为输入提供给 RAG。

query_str = "This article explains how to improve code reliability. It includes techniques for error handling, and running applications without managing servers."

从现在开始,代码大多遵循与文本查询测试相同的模式,因此这些部分我们将参考 notebook 中的代码。

运行语义搜索

Notebook 参考:2. Run Comparisons > Test 2: Semantic Query > Executing semantic search。

语义搜索返回的匹配结果:

ELASTICSEARCH RESULTS: 
 [
...
    {
        "_index": "technical-articles",
        "_id": "KHV7MpcBTbKqUnB5TN-F",
        "_score": 0.07619048,
        "_source": {
            "title": "Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs"
        },
        "highlight": {
            "text": [
                "We will review: Production best practices Error handling Testing Serverless environments Running the",
                "how long you want to have access to it.",
                "Conclusion In this article, we learned how to handle errors, which is crucial in production environments",
                "DT By: Drew Tate Integrations How To May 21, 2025 Get set, build: Red Hat OpenShift AI applications powered",
                "KB By: Kofi Bartlett Jump to Production best practices Error handling Testing Serverless environments"
            ]
        }
    },
      ...
]

通过 LLM 运行结果

Notebook 参考:2. Run Comparisons > Test 2: Semantic Query > Run results through LLM

LLM 回复:

...
  What is the title article?

LLM Response:
  Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs

5)LLM 非文本型 -  LLM non-textual

全匹配查询

Notebook 参考:2. Run Comparisons > Test 2: Semantic Query > Match all query

全匹配查询返回结果:

ELASTICSEARCH RESULTS: 
 [
    {
        "_index": "technical-articles",
        "_id": "J3WUI5cBTbKqUnB5J83I",
        "_score": null,
        "_source": {
            "meta_description": ".NET articles from Elasticsearch Labs",
            "imported_at": "2025-05-30T18:43:20.036600",
            "text": "Tutorials Examples Integrations Blogs Start free trial .NET Categories All Articles ... to easily utilize Elasticsearch to build advanced search experiences including generative AI, embedding models, reranking capabilities and more. Let's connect Menu Tutorials Examples Integrations Blogs Search Additional Resources Elasticsearch API Reference Elastic.co Change theme Change theme Sitemap RSS 2025. Elasticsearch B.V. All Rights Reserved.",
            "title": ".NET - Elasticsearch Labs",
            "url": "https://www.elastic.co/search-labs/blog/category/dot-net-programming"
        },
        "sort": [
            ".NET - Elasticsearch Labs"
        ]
    },
...
]

通过 LLM 运行结果

Notebook 参考:2. Run Comparisons > Test 2: Semantic Query > Run results through LLM

LLM 回复:

...
 What is the title article?

LLM Response:
  "Elasticsearch in JavaScript the proper way, part II" and "A tutorial on building local agent using LangGraph, LLaMA3 and Elasticsearch vector store from scratch - Elasticsearch Labs" and "Advanced integration tests with real Elasticsearch - Elasticsearch Labs" and "Automatically updating your Elasticsearch index using Node.js and an Azure Function App - Elasticsearch Labs"

测试结果

现在我们来展示测试结果。

文本型查询

StrategyAnswerTokens SentTime(s)LLM Cost
0Textual RAGElasticsearch in JavaScript the proper way, part II - Elasticsearch Labs2371.2814320.000029
1Textual LLMThe title of the article is "Testing your Java code with mocks and real Elasticsearch"1,023,23145.6474080.102330

语义查询

StrategyAnswerTokens SentTime(s)LLM Cost
0Semantic RAGElasticsearch in JavaScript the proper way, part II - Elasticsearch Labs1,3280.8781990.000138
1Semantic LLM"Elasticsearch in JavaScript the proper way, part II" and "A tutorial on building local agent using LangGraph, LLaMA3 and Elasticsearch vector store from scratch - Elasticsearch Labs" and "Advanced integration tests with real Elasticsearch - Elasticsearch Labs" and "Automatically updating your Elasticsearch index using Node.js and an Azure Function App - Elasticsearch Labs"1,023,19644.3869120.102348


结论

RAG 仍然非常重要。我们的测试显示,使用大上下文模型在上下文窗口中不经过过滤直接发送数据,在价格、延迟和准确性方面都不如 RAG 系统。处理大量上下文时,模型常常会失去关注重点

即使有大语言模型(LLM)的能力,在发送数据前过滤信息依然很关键,因为发送过多 tokens 会降低回答质量。但当无法预先过滤或答案需要大量数据支撑时,大上下文 LLM 仍然有价值。

此外,确保在 RAG 系统中使用正确查询以获得完整且准确答案也很重要。你可以测试不同查询参数以检索不同数量的文档,直到找到最适合你的方案。

  • 便利性:使用 RAG 发送给 LLM 的平均 tokens 为 783,低于所有主流模型的最大上下文窗口。
  • 性能:RAG 查询速度明显更快,平均 1 秒,而纯 LLM 方式约需 45 秒。
  • 价格:RAG 查询的平均成本($0.00008)比纯 LLM 方式($0.1)低约 1250 倍。
  • 准确性:RAG 系统在所有测试中均产生准确回答,而全上下文方式则出现不准确情况。

原文:Longer context ≠ Better: Why RAG still matters - Elasticsearch Labs

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值