大模型从入门到应用——LangChain:链(Chains)-[链与索引:文档分析和基于文档的聊天]

分类目录:《大模型从入门到应用》总目录

LangChain系列文章:


分析文档(Analyze Document)

AnalyzeDocumentChain更像是一个终端链。该链条接受一个文档,将其拆分,并通过CombineDocumentsChain进行处理,这可以用作更完整的端到端链条。

with open("../../state_of_the_union.txt") as f:
    state_of_the_union = f.read()
摘要

让我们在下面看看它的工作原理,使用它来对一个长文档进行摘要。

from langchain import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.chains import AnalyzeDocumentChain

llm = OpenAI(temperature=0)
summary_chain = load_summarize_chain(llm, chain_type="map_reduce")

summarize_document_chain = AnalyzeDocumentChain(combine_docs_chain=summary_chain)
summarize_document_chain.run(state_of_the_union)

输出:

" In this speech, President Biden addresses the American people and the world, discussing the recent aggression of Russia's Vladimir Putin in Ukraine and the US response. He outlines economic sanctions and other measures taken to hold Putin accountable, and announces the US Department of Justice's task force to go after the crimes of Russian oligarchs. He also announces plans to fight inflation and lower costs for families, invest in American manufacturing, and provide military, economic, and humanitarian assistance to Ukraine. He calls for immigration reform, protecting the rights of women, and advancing the rights of LGBTQ+ Americans, and pays tribute to military families. He concludes with optimism for the future of America."
问答

让我们使用一个问答链:

from langchain.chains.question_answering import load_qa_chain

qa_chain = load_qa_chain(llm, chain_type="map_reduce")
qa_document_chain = AnalyzeDocumentChain(combine_docs_chain=qa_chain)
qa_document_chain.run(input_document=state_of_the_union, question="what did the president say about justice breyer?")

输出:

' The president thanked Justice Breyer for his service.'

带有聊天历史记录的文档聊天(Chat Over Documents with Chat History)

本节介绍了如何使用ConversationalRetrievalChain来设置一个链条,通过文档进行聊天并保留聊天历史记录。这个链条和RetrievalQAChain的唯一区别在于它允许传入聊天历史记录,这可以用于进行后续问题的提问。

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
加载文档

我们可以将其替换为我们想要的任何类型的数据加载器。

from langchain.document_loaders import TextLoader

loader = TextLoader("../../state_of_the_union.txt")
documents = loader.load()

如果我们有多个加载器要合并,还可以这样做:

# loaders = [....]
# docs = []
# for loader in loaders:
#     docs.extend(loader.load())

然后,我们将文档拆分、创建嵌入并将它们放入向量存储中。这样我们就可以在它们上进行语义搜索了。

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

日志输出:

Using embedded DuckDB without persistence: data will be transient

现在我们可以创建一个内存对象,用于跟踪输入和输出并保存对话。

from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

现在我们可以初始化ConversationalRetrievalChain

qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), memory=memory)
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query})
result["answer"]

输出:

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

输入:

query = "Did he mention who she suceeded"
result = qa({"question": query})
result['answer']

输出:

' Ketanji Brown Jackson succeeded Justice Stephen Breyer on the United States Supreme Court.'
传递聊天历史记录

在上面的例子中,我们使用了一个内存对象来跟踪聊天历史记录,我们也可以直接将聊天历史记录传递进来。为了做到这一点,我们需要初始化一个没有任何内存对象的链条。

qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever())

下面是一个没有聊天历史记录的提问的例子:

chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]

输出:

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

以下是一个带有一些聊天历史记录的提问的例子:

chat_history = [(query, result["answer"])]
query = "Did he mention who she suceeded"
result = qa({"question": query, "chat_history": chat_history})
result['answer']

输出:

' Ketanji Brown Jackson succeeded Justice Stephen Breyer on the United States Supreme Court.'
使用不同的模型来压缩问题

该链条有两个步骤:

  1. 将当前问题和聊天历史记录压缩为一个独立的问题:这是为了创建一个独立的向量以用于检索。
  2. 进行检索:使用单独的模型进行检索增强生成来回答问题。

LangChain的声明性特性的一部分优势在于我们可以轻松地为每个调用使用一个单独的语言模型。这在将一个更便宜、更快速的模型用于压缩问题的简单任务,然后使用一个更昂贵的模型来回答问题时非常有用,以下是一个这样做的例子:

from langchain.chat_models import ChatOpenAI

qa = ConversationalRetrievalChain.from_llm(
    ChatOpenAI(temperature=0, model="gpt-4"),
    vectorstore.as_retriever(),
    condense_question_llm = ChatOpenAI(temperature=0, model='gpt-3.5-turbo'),
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})
chat_history = [(query, result["answer"])]
query = "Did he mention who she suceeded"
result = qa({"question": query, "chat_history": chat_history})
返回源文档

我们还可以轻松地从ConversationalRetrievalChain中返回源文档,这在我们想要检查返回的文档时非常有用。

qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})
result['source_documents'][0]
Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../state_of_the_union.txt'})
带有search_distanceConversationalRetrievalChain

如果我们正在使用支持按搜索距离进行过滤的向量数据库,问你还可以添加一个阈值参数:

vectordbkwargs = {"search_distance": 0.9}
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history, "vectordbkwargs": vectordbkwargs})
map_reduceConversationalRetrievalChain

我们还可以在ConversationalRetrievalChain链中使用不同类型的组合文档链:

from langchain.chains import LLMChain
from langchain.chains.question_answering import load_qa_chain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT

llm = OpenAI(temperature=0)
question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
doc_chain = load_qa_chain(llm, chain_type="map_reduce")

chain = ConversationalRetrievalChain(
    retriever=vectorstore.as_retriever(),
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = chain({"question": query, "chat_history": chat_history})
result['answer']

输出:

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."
使用带有源文档的ConversationalRetrievalChain进行问答

我们还可以将此链与带有源文档的问答链结合使用:

from langchain.chains.qa_with_sources import load_qa_with_sources_chain

llm = OpenAI(temperature=0)
question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
doc_chain = load_qa_with_sources_chain(llm, chain_type="map_reduce")

chain = ConversationalRetrievalChain(
    retriever=vectorstore.as_retriever(),
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = chain({"question": query, "chat_history": chat_history})
result['answer']

输出:

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \nSOURCES: ../../state_of_the_union.txt"
带有流式输出到stdoutConversationalRetrievalChain

在此示例中,链的输出将逐个标记地流式传输到stdout

from langchain.chains.llm import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT
from langchain.chains.question_answering import load_qa_chain

# 使用流式 LLM 构建一个 ConversationalRetrievalChain 来合并文档
# 并使用单独的非流式 LLM 生成问题
llm = OpenAI(temperature=0)
streaming_llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)

question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
doc_chain = load_qa_chain(streaming_llm, chain_type="stuff", prompt=QA_PROMPT)

qa = ConversationalRetrievalChain(
    retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})

输出:

 The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

输入:

chat_history = [(query, result["answer"])]
query = "Did he mention who she suceeded"
result = qa({"question": query, "chat_history": chat_history})

输出:

 Ketanji Brown Jackson succeeded Justice Stephen Breyer on the United States Supreme Court.
get_chat_history函数

我们还可以指定一个get_chat_history函数,用于格式化chat_history字符串。

def get_chat_history(inputs) -> str:
    res = []
    for human, ai in inputs:
        res.append(f"Human:{human}\nAI:{ai}")
    return "\n".join(res)
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), get_chat_history=get_chat_history)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})
result['answer']

输出:

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

参考文献:
[1] LangChain官方网站:https://www.langchain.com/
[2] LangChain 🦜️🔗 中文网,跟着LangChain一起学LLM/GPT开发:https://www.langchain.com.cn/
[3] LangChain中文网 - LangChain 是一个用于开发由语言模型驱动的应用程序的框架:http://www.cnlangchain.com/

### 关于 LangChain入门教程 #### 了解 LangChain 基本概念核心功能 LangChain 是一种用于构建语言模型应用的强大框架,旨在简化开发者创建复杂自然语言处理系统的流程。该框架不仅提供了多种预训练的大规模语言模型(LLMs),还支持定制化的提示模板(Prompt Templates)、代理(Agents)、记忆(Memory)、索引(Indexes)以及条(Chains)。这些特性使得开发人员能够更灵活地设计对话系统其他基于文本的应用程序[^1]。 #### 组件详解 - **Models**: 支持不同种类的语言模型,如 ChatGPT、ChatGLM T5 等。 - **Prompts**: 提供管理自定义提示的功能,有助于优化 LLMs 的交互效果。 - **Agents**: 负责决策并执行特定任务,允许大型语言模型访问外部工具服务。 - **Memory**: 记录会话历史记录,保持上下文连贯性。 - **Indexes**:文档进行结构化处理,便于后续查询操作。 - **Chains**: 定义了一系列组件之间的调用顺序,形成完整的业务逻辑流[^4]。 #### 实际应用场景展示——简单问答系统 为了更好地理解如何利用 LangChain 创建实际项目,在此提供了一个简易版的问答系统实例: ```python from langchain import LangChain, PromptTemplate, LLMMemory, SimpleIndexCreator, AgentExecutor import json # 加载配置文件中的参数设置 with open('config.json', 'r') as f: config = json.load(f) # 初始化内存对象来存储聊天记录 memory = LLMMemory() # 设置索引来加速检索过程 index_creator = SimpleIndexCreator() indexes = index_creator.create_indexes(config['documents']) # 配置提示模板以指导模型生成合适的回复 prompt_template = PromptTemplate( input_variables=["history", "input"], template="Based on the following conversation history:\n{history}\nThe user asks: {input}" ) # 构造Agent执行器来进行具体的操作 agent_executor = AgentExecutor.from_llm_and_tools(llm=config['llm'], tools=[...], memory=memory) while True: question = input("Ask a question:") # 获取当前对话的历史作为背景信息的一部分 context = agent_executor.memory.get_context(inputs={"question": question}) # 将问题传递给Agent执行器获取答案 response = agent_executor.run(prompt=prompt_template.format(history=context["history"], input=question)) print(response) ``` 这段代码片段展示了如何结合多个 LangChain 组件建立一个可以持续互动的基础架构,并通过循环读取用户的输入来维持整个交流的过程[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

von Neumann

您的赞赏是我创作最大的动力~

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值