大模型从入门到应用——LangChain：模型（Models）-[大型语言模型（LLMs）：加载与保存LLM类、流式传输LLM与Chat Model响应和跟踪tokens使用情况]

本文链接：https://blog.csdn.net/hy592070616/article/details/131905445

LangChain系列文章：

加载与保存LLM类

本部分将演示如何将LLM配置写入磁盘并从磁盘读取。如果我们想保存给定LLM的配置（比如：the provider、the temperature等），这个方法将非常有用。

from langchain.llms import OpenAI
from langchain.llms.loading import load_llm

加载LLM

首先，我们了解从磁盘加载LLM。LLM可以以两种格式保存在磁盘上：json或yaml。无论扩展名如何，它们都以相同的方式加载。

!cat llm.json

输出文件内容：

{
    "model_name": "text-davinci-003",
    "temperature": 0.7,
    "max_tokens": 256,
    "top_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
    "n": 1,
    "best_of": 1,
    "request_timeout": null,
    "_type": "openai"
}

加载LLM：

llm = load_llm("llm.json")

yaml文件示例：

!cat llm.yaml

输出文件内容：

_type: openai
best_of: 1
frequency_penalty: 0.0
max_tokens: 256
model_name: text-davinci-003
n: 1
presence_penalty: 0.0
request_timeout: null
temperature: 0.7
top_p: 1.0

加载LLM：

llm = load_llm("llm.yaml")

保存LLM

如果我们想将LLM从内存中转换为其序列化版本，只需调用.save方法即可轻松实现。同样，它支持json和yaml两种格式。

llm.save("llm.json")
llm.save("llm.yaml")

流式传输LLM与Chat Model响应

LangChain为LLMs提供了流媒体支持。目前，LangChain支持OpenAI、ChatOpenAI和ChatAnthropic实现的流媒体，但是学习LangChain的路线图上还有其他LLM实现的流媒体支持。要使用流媒体，请使用一个实现on_llm_new_token的CallbackHandler。在下面的例子中，我们使用了StreamingStdOutCallbackHandler：

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI, ChatAnthropic
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage

llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = llm("Write me a song about sparkling water.")

输出：
Verse 1

I'm sippin' on sparkling water,
It's so refreshing and light,
It's the perfect way to quench my thirst
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

Verse 2
I'm sippin' on sparkling water,
It's so bubbly and bright,
It's the perfect way to cool me down
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

Verse 3
I'm sippin' on sparkling water,
It's so light and so clear,
It's the perfect way to keep me cool
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

如果使用generate，我们仍然可以访问最终的LLMResult。然而，目前暂不支持流式处理中的token_usage。

llm.generate(["Tell me a joke."])

输出：

Q: What did the fish say when it hit the wall?

A: Dam!

LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})

这是 ChatOpenAI聊天模型实现的示例：

chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = chat([HumanMessage(content="Write me a song about sparkling water.")])

输出：

Verse 1:
Bubbles rising to the top
A refreshing drink that never stops
Clear and crisp, it's oh so pure
Sparkling water, I can't ignore

Chorus:
Sparkling water, oh how you shine
A taste so clean, it's simply divine
You quench my thirst, you make me feel alive
Sparkling water, you're my favorite vibe

Verse 2:
No sugar, no calories, just H2O
A drink that's good for me, don't you know
With lemon or lime, you're even better
Sparkling water, you're my forever

Chorus:
Sparkling water, oh how you shine
A taste so clean, it's simply divine
You quench my thirst, you make me feel alive
Sparkling water, you're my favorite vibe

Bridge:
You're my go-to drink, day or night
You make me feel so light
I'll never give you up, you're my true love
Sparkling water, you're sent from above

Chorus:
Sparkling water, oh how you shine
A taste so clean, it's simply divine
You quench my thirst, you make me feel alive
Sparkling water, you're my favorite vibe

Outro:
Sparkling water, you're the one for me
I'll never let you go, can't you see
You're my drink of choice, forevermore
Sparkling water, I adore.

这里有一个使用ChatAnthropic聊天模型实现的例子，它使用了claude模型：

chat = ChatAnthropic(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = chat([HumanMessage(content="Write me a song about sparkling water.")])

输出：

 Here is my attempt at a song about sparkling water:

Sparkling water, bubbles so bright, 
Dancing in the glass with delight.
Refreshing and crisp, a fizzy delight,
Quenching my thirst with each sip I take.
The carbonation tickles my tongue,
As the refreshing water song is sung.
Lime or lemon, a citrus twist,
Makes sparkling water such a bliss.
Healthy and hydrating, a drink so pure,
Sparkling water, always alluring.
Bubbles ascending in a stream, 
Sparkling water, you're my dream!

跟踪tokens使用情况

下面介绍如何跟踪特定调用的tokens使用情况。目前，这个方法仅适用于OpenAI API。首先让我们看一个极其简单的示例，跟踪token在单个LLM调用中的使用情况：

from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    print(cb)

输出：

Tokens Used: 42

Prompt Tokens: 4

Completion Tokens: 38

Successful Requests: 1

Total Cost (USD): $0.00084

上下文管理器中的任何内容都将被跟踪。以下是使用它来跟踪多次连续调用的示例。

with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    result2 = llm("Tell me a joke")
    print(cb.total_tokens)

输出：

如果使用了具有多个步骤的链或代理，它将跟踪所有这些步骤：

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
with get_openai_callback() as cb:
    response = agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")

输出：

> Entering new AgentExecutor chain...
 I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: Sudeikis and Wilde's relationship ended in November 2020. Wilde was publicly served with court documents regarding child custody while she was presenting Don't Worry Darling at CinemaCon 2022. In January 2021, Wilde began dating singer Harry Styles after meeting during the filming of Don't Worry Darling.
Thought: I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557

Thought: I now know the final answer.
Final Answer: Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557.

> Finished chain.
Total Tokens: 1506
Prompt Tokens: 1350
Completion Tokens: 156
Total Cost (USD): $0.03012

参考文献：
[1] LangChain 🦜️🔗 中文网，跟着LangChain一起学LLM/GPT开发：https://www.langchain.com.cn/
[2] LangChain中文网 - LangChain 是一个用于开发由语言模型驱动的应用程序的框架：http://www.cnlangchain.com/