大模型RAG

一、澄清一个概念

RAG 不要 参考下面这张图!!!

在这里插入图片描述

这张图源自一个研究工作

  • 此论文第一次提出 RAG 这个叫法
  • 在研究中,作者尝试将检索和生成做在一个模型体系中

但是,实际生产中,RAG 不是这么做的!!!

二、什么是检索增强的生成模型(RAG)

2.1、LLM 固有的局限性

  1. LLM 的知识不是实时的
  2. LLM 可能不知道你私有的领域/业务知识

在这里插入图片描述
2.2、检索增强生成
RAG(Retrieval Augmented Generation)顾名思义,通过检索的方法来增强生成模型的能力。

RAG

类比:你可以把这个过程想象成开卷考试。让 LLM 先翻书,再回答问题。

三、RAG 系统的基本搭建流程

搭建过程:

  1. 文档加载,并按一定条件切割成片段
  2. 将切割的文本片段灌入检索引擎
  3. 封装检索接口
  4. 构建调用流程:Query -> 检索 -> Prompt -> LLM -> 回复

3.1、文档的加载与切割

# !pip install --upgrade openai
# 安装 pdf 解析库
# !pip install pdfminer.six
from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextContainer
def extract_text_from_pdf(filename, page_numbers=None, min_line_length=1):
    '''从 PDF 文件中(按指定页码)提取文字'''
    paragraphs = []
    buffer = ''
    full_text = ''
    # 提取全部文本
    for i, page_layout in enumerate(extract_pages(filename)):
        # 如果指定了页码范围,跳过范围外的页
        if page_numbers is not None and i not in page_numbers:
            continue
        for element in page_layout:
            if isinstance(element, LTTextContainer):
                full_text += element.get_text() + '\n'
    # 按空行分隔,将文本重新组织成段落
    lines = full_text.split('\n')
    for text in lines:
        if len(text) >= min_line_length:
            buffer += (' '+text) if not text.endswith('-') else text.strip('-')
        elif buffer:
            paragraphs.append(buffer)
            buffer = ''
    if buffer:
        paragraphs.append(buffer)
    return paragraphs
paragraphs = extract_text_from_pdf("llama2.pdf", min_line_length=10)
for para in paragraphs[:4]:
    print(para+"\n")
Llama 2: Open Foundation and Fine-Tuned Chat Models

 Hugo Touvron∗ Louis Martin† Kevin Stone† Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom∗

 GenAI, Meta

 In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based onour human evaluations for helpfulness and safety, may be a suitable substitute for closed source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

3.2、检索引擎

先看一个最基础的实现

安装 ES 客户端

#!pip install elasticsearch7

安装 NLTK(文本处理方法库)

#!pip install nltk

from elasticsearch7 import Elasticsearch, helpers
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import nltk
import re

import warnings
warnings.simplefilter("ignore")  # 屏蔽 ES 的一些Warnings

# 实验室平台已经内置
# nltk.download('punkt')  # 英文切词、词根、切句等方法
# nltk.download('stopwords')  # 英文停用词库
def to_keywords(input_string):
    '''(英文)文本只保留关键字'''
    # 使用正则表达式替换所有非字母数字的字符为空格
    no_symbols = re.sub(r'[^a-zA-Z0-9\s]', ' ', input_string)
    word_tokens = word_tokenize(no_symbols)
    # 加载停用词表
    stop_words = set(stopwords.words('english'))
    ps = PorterStemmer()
    # 去停用词,取词根
    filtered_sentence = [ps.stem(w)
                         for w in word_tokens if not w.lower() in stop_words]
    return ' '.join(filtered_sentence)
此处 to_keywords 为针对英文的实现,针对中文的实现请参考 chinese_utils.py
# chinese_utils.py
import re
import jieba
import nltk
from nltk.corpus import stopwords

nltk.download('stopwords')  

def to_keywords(input_string):
    """将句子转成检索关键词序列"""
    # 按搜索引擎模式分词
    word_tokens = jieba.cut_for_search(input_string)
    # 加载停用词表
    stop_words = set(stopwords.words('chinese'))
    # 去除停用词
    filtered_sentence = [w for w in word_tokens if not w in stop_words]
    return ' '.join(filtered_sentence)

def sent_tokenize(input_string):
    """按标点断句"""
    # 按标点切分
    sentences = re.split(r'(?<=[。!?;?!])', input_string)
    # 去掉空字符串
    return [sentence for sentence in sentences if sentence.strip()]

    
if "__main__" == __name__:
    # 测试关键词提取
    print(to_keywords("小明硕士毕业于中国科学院计算所,后在日本京都大学深造"))
    # 测试断句
    print(sent_tokenize("这是,第一句。这是第二句吗?是的!啊"))

将文本灌入检索引擎

import os, time

# 引入配置文件
ELASTICSEARCH_BASE_URL = os.getenv('ELASTICSEARCH_BASE_URL')
ELASTICSEARCH_PASSWORD = os.getenv('ELASTICSEARCH_PASSWORD')
ELASTICSEARCH_NAME= os.getenv('ELASTICSEARCH_NAME')

# tips: 如果想在本地运行,请在下面一行 print(ELASTICSEARCH_BASE_URL) 获取真实的配置

# 1. 创建Elasticsearch连接
es = Elasticsearch(
    hosts=[ELASTICSEARCH_BASE_URL],  # 服务地址与端口
    http_auth=(ELASTICSEARCH_NAME, ELASTICSEARCH_PASSWORD),  # 用户名,密码
)

# 2. 定义索引名称
index_name = "teacher_demo_index0"

# 3. 如果索引已存在,删除它(仅供演示,实际应用时不需要这步)
if es.indices.exists(index=index_name):
    es.indices.delete(index=index_name)

# 4. 创建索引
es.indices.create(index=index_name)

# 5. 灌库指令
actions = [
    {
   
        "_index": index_name,
        "_source": {
   
            "keywords": to_keywords(para),
            "text": para
        }
    }
    for para in paragraphs
]

# 6. 文本灌库
helpers.bulk(es, actions)

# 灌库是异步的
time.sleep(2)

实现关键字检索

def search(query_string, top_n=3):
    # ES 的查询语言
    search_query = {
   
        "match": {
   
            "keywords": to_keywords(query_string)
        }
    }
    res = es.search(index=index_name, query=search_query, size=top_n)
    return [hit["_source"]["text"] for hit in res["hits"]["hits"]]
results = search("how many parameters does llama 2 have?", 2)
for r in results:
    print(r+"\n")
1. Llama 2, an updated version of Llama 1, trained on a new mix of publicly available data. We also increased the size of the pretraining corpus by 40%, doubled the context length of the model, and adopted grouped-query attention (Ainslie et al., 2023). We are releasing variants of Llama 2 with 7B, 13B, and 70B parameters. We have also trained 34B variants, which we report on in this paper but are not releasing.§

 In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based onour human evaluations for helpfulness and safety, may be a suitable substitute for closed source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

3.3、LLM 接口封装

from openai import OpenAI
import os
# 加载环境变量
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())  # 读取本地 .env 文件,里面定义了 OPENAI_API_KEY

client = OpenAI()
def get_completion(prompt, model="gpt-3.5-turbo"):
    '''封装 openai 接口'''
    messages = [{
   "role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,  # 模型输出的随机性,0 表示随机性最小
    )
    return response.choices[0].message.content

3.4、Prompt 模板

def build_prompt(prompt_template, **kwargs):
    '''将 Prompt 模板赋值'''
    inputs = {
   }
    for k, v in kwargs.items():
        if isinstance(v, list) and all(isinstance(elem, str) for elem in v):
            val = '\n\n'.join(v)
        else:
            val = v
        inputs[k] = val
    return prompt_template.format(**inputs)
prompt_template = """
你是一个问答机器人。
你的任务是根据下述给定的已知信息回答用户问题。

已知信息:
{context}

用户问:
{query}

如果已知信息不包含用户问题的答案,或者已知信息不足以回答用户的问题,请直接回复"我无法回答您的问题"。
请不要输出已知信息中不包含的信息或答案。
请用中文回答用户问题。
"""

3.5、RAG Pipeline 初探

user_query = "how many parameters does llama 2 have?"

# 1. 检索
search_results = search(user_query, 2)

# 2. 构建 Prompt
prompt = build_prompt(prompt_template, context=search_results, query=user_query)
print("===Prompt===")
print(prompt)

# 3. 调用 LLM
response = get_completion(prompt)

print("===回复===")
print(response)
===Prompt===

你是一个问答机器人。
你的任务是根据下述给定的已知信息回答用户问题。

已知信息:
 1. Llama 2, an updated version of Llama 1, trained on a new mix of publicly available data. We also increased the size of the pretraining corpus by 40%, doubled the context length of the model, and adopted grouped-query attention (Ainslie et al., 2023). We are releasing variants of Llama 2 with 7B, 13B, and 70B parameters. We have also trained 34B variants, which we report on in this paper but are not releasing.§

 In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based onour human evaluations for helpfulness and safety, may be a suitable substitute for closed source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

用户问:
how many parameters does llama 2 have?

如果已知信息不包含用户问题的答案,或者已知信息不足以回答用户的问题,请直接回复"我无法回答您的问题"。
请不要输出已知信息中不包含的信息或答案。
请用中文回答用户问题。

===回复===
Llama 2有7B, 13B和70B参数。
扩展阅读:
    • Elasticsearch(简称ES)是一个广泛应用的开源搜索引擎: https://www.elastic.co/
    • 关于ES的安装、部署等知识,网上可以找到大量资料,例如: https://juejin.cn/post/7104875268166123528
    • 关于经典信息检索技术的更多细节,可以参考: https://nlp.stanford.edu/IR-book/information-retrieval-book.html

3.6、关键字检索的局限性

同一个语义,用词不同,可能导致检索不到有效的结果

# user_query="Does llama 2 have a chat version?"
user_query = "Does llama 2 have a conversational variant?"

search_results = search(user_query, 2)

for res in search_results:
    print(res+"\n")
1. Llama 2, an updated version of Llama 1, trained on a new mix of publicly available data. We also increased the size of the pretraining corpus by 40%, doubled the context length of the model, and adopted grouped-query attention (Ainslie et al., 2023). We are releasing variants of Llama 2 with 7B, 13B, and 70B parameters. We have also trained 34B variants, which we report on in this paper but are not releasing.§

 variants of this model with 7B, 13B, and 70B parameters as well.

四、向量检索

4.1、什么是向量

向量是一种有大小和方向的数学对象。它可以表示为从一个点到另一个点的有向线段。例如,二维空间中的向量可以表示为 ( x , y ) (x,y) (x,y),表示从原点 ( 0 , 0 ) (0,0)

### RAG 大规模语言模型实现方式 检索增强生成(RAG, Retrieval-Augmented Generation)通过结合外部知识库和大规模预训练语言模型的能力,能够更精确地生成答案并提升回答的质量。其核心在于利用检索模块从大量文档中提取相关信息,并将其作为输入传递给生成模型。 #### 核心原理 RAG 的工作流程可以分为两个主要部分:检索阶段和生成阶段。在检索阶段,系统会根据用户的查询从数据库或索引中找到最相关的若干段落;随后,在生成阶段,这些段落被用作上下文信息提供给生成器,帮助它构建更加准确的回答[^1]。 #### 技术架构 通常情况下,RAG 可以采用双塔结构来完成上述任务。具体来说: - **检索端**:使用密集表示学习的方法(如 Dense Passage Retriever, DPR),将问题和候选文档映射到同一向量空间内进行匹配。 - **生成端**:基于 Transformer 架构的大规模语言模型负责理解检索得到的内容以及原始提问,并最终输出自然流畅的语言响应。 以下是简单的伪代码展示如何集成这两个组件: ```python class RagModel: def __init__(self, retriever, generator): self.retriever = retriever # 密集检索器实例 self.generator = generator # 文本生成器实例 def forward(self, query): retrieved_docs = self.retriever(query) # 获取相关文档列表 concatenated_input = f"{query} [SEP] {' '.join(retrieved_docs)}" generated_answer = self.generator(concatenated_input) return generated_answer ``` 此设计允许开发者灵活替换不同的检索算法或者调整生成逻辑以适应特定应用场景需求。 #### 学习资源推荐 对于希望深入研究 RAG 或其他 AI 大型模型技术的学习者而言,可以从官方文档、学术论文以及其他公开渠道获取宝贵资料。例如有专门针对此类主题整理而成的一系列指南可供参考[^3]。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

SunnyRivers

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值