【AI大模型应用开发】【综合实战】AI+搜索，手把手带你实现属于你的AI搜索引擎（附完整代码）-CSDN博客

本文链接：https://blog.csdn.net/Android062005/article/details/147008508

本站文章一览：

在这里插入图片描述

现在市面上有很多的AI+搜索的应用或插件，一直想学习其背后的实现原理。今天咱们就学习一下，并且亲自动手实践，从0开始，搭建一个自己的AI搜索引擎。最终实现效果如下：

在这里插入图片描述

话不多说，开干。

本文代码参考：mp.weixin.qq.com/s/6F22Mls7z… 的API。

0. 框架

先来搞定框架。

代码中，服务端使用了Python + Flask框架，前端使用HTML。通过 Flask的render_template函数渲染HTML页面。render_template 函数是 Flask 提供的一个工具，用于渲染 Jinja2 模板。Jinja2 是一个 Python 的模板引擎，它允许你在 HTML 文件中使用 Python 变量和表达式。

代码如下：

from flask import Flask, render_template, request, jsonify
@app.route('/', methods=['GET'])
def index():
    chat_history = history
    return render_template('ai_search.html', history=chat_history)

代码中，HTML页面的名称为 “ai_search.html”。

注意，在使用此种方法渲染HTML页面时，需要将HTML文件放到templates文件夹下，否则找不到文件，报错。

在这里插入图片描述
也就是说，工程目录结构应该如下：

在这里插入图片描述

1. 服务端（Python + Flask）

服务端就是利用Flask封装一个个地接口，然后进行相应处理。

1.1 Search接口

@app.route('/search', methods=['GET', 'POST'])
def search():
    if request.method == 'POST':
        keyword = request.form['keyword']
    elif request.method == 'GET':
        keyword = request.args.get('keyword', '')
    else:
        keyword = ''
    
    results = crawl_pages(keyword)
    output = ""
    for result in results:
     output += f"<li><a id='myID' href='javascript:void(0);' onclick='handleLinkClick(\"{result['url']}\")'>{result['title']}</a></li><br>"
    return output

Search接口接收用户输入的关键字，然后调用 crawl_pages 接口去获取检索结果。

1.1.1 crawl_pages接口

def crawl_pages(query_text, page_num=2):
    browser = mechanicalsoup.Browser()
    query_text_encoded = quote(query_text) # 关键字编码，例如关键字中的中文要转码才能作为URL的参数
    results = []
    for page_index in range(1, page_num+1):
        url = f"https://search.cctv.com/search.php?qtext={query_text_encoded}&type=web&page={page_index}"
        page = browser.get(url)
        soup = BeautifulSoup(page.text, 'html.parser')
        web_content_links = soup.find_all('a', id=lambda x: x and x.startswith('web_content_'))
        for i, link in enumerate(web_content_links):
            target_page = parse_qs(urlparse(link['href']).query).get('targetpage', [None])[0]
            results.append({'title': link.text, 'url': target_page})
    return results

该接口通过关键字来去固定网页去检索该关键字，获取前两页的检索结果，通过前两页的检索结果，通过爬虫，将结果的标题和URL提取出来。

（1）url = f"https://search.cctv.com/search.php?qtext={query_text_encoded}&type=web&page={page_index}"，这是表明去哪个网页搜索这个关键字。这个链接相当于以下操作，去CCTV网搜关键字：

在这里插入图片描述

（2）通过简单的爬虫，将以上获取到的检索结果界面中的所有结果的URL和标题提取出来：target_page = parse_qs(urlparse(link['href']).query).get('targetpage', [None])[0]，例如这一句，提取URL。

（3）然后你就会获得一堆的URL，返回给Search接口后，通过 output += f"<li><a id='myID' href='javascript:void(0);' onclick='handleLinkClick(\"{result['url']}\")'>{result['title']}</a></li><br>" 组装结果，插入到HTML中去显示。也就是侧边栏的效果：

在这里插入图片描述

1.2 generate-text接口

@app.route('/generate-text', methods=['POST'])
def generate_text_api():
    prompt = request.json['prompt']
    result = generate_text(prompt)
    return jsonify(result)

该接口是将用户输入的关键字当作Prompt，给大模型，让大模型根据这个信息回复点什么东西。中间没有什么特别的处理。要说值得注意的，就是 history.append({"user": prompt, "bot": generated_text}) 来将对话信息添加到历史记录里面。

def get_openai_chat_completion(messages, temperature, model = "gpt-3.5-turbo-1106"):
    response = client.chat.completions.create(
        model = model,
        messages = messages,
        temperature = temperature,
    )
    return response

def generate_text(prompt, temperature=0.5):
    messages = [
        {
            "role": "user",
            "content": prompt,
        }   
    ]
    response = get_openai_chat_completion(messages = messages, temperature=temperature)
    generated_text = response.choices[0].message.content
    history.append({"user": prompt, "bot": generated_text})  # 将用户输入和模型输出添加到历史记录中
    return {"status": "success", "response": generated_text}

这一步的效果如下，与检索毫无关系：

在这里插入图片描述

1.3 page_content接口

该接口是通过URL来获取网页内容。就是一个简单的爬虫程序，将网页中的文字和图片提取出来。

@app.route('/page_content')
def page_content():
    url = request.args.get('url', '')
    if not url:
        return '缺少 url 参数'
    browser = mechanicalsoup.Browser()
    page = browser.get(url)
    page.encoding = 'utf-8'  # 指定页面的编码为 utf-8
    soup = BeautifulSoup(page.text, 'html.parser')
    all_text = ''
    all_images = []

    # 获取页面中所有文本内容
    for element in soup.find_all(['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'span']):
        all_text += element.get_text() + ' '

    # 获取页面中所有图片链接
    for img in soup.find_all('img'):
        img_src = img.get('src')
        if img_src:
            all_images.append("https:"+img_src)

    return f"文本内容: {all_text}<br>图片链接: {', '.join(all_images)}"

2. 前端（HTML）

2.1 用户输入关键字后的动作

先来看下前端HTML代码中，当用户点击提交按钮后的动作，重点是下面几行。

inputForm.addEventListener('submit', async (event) => {
    ......

    const aa = document.getElementById('listView');
    aa.innerHTML = await getA(userInput);
    const response = await generateText(userInput);
    hideTypingAnimation(userMessage);
    
    ......
});

可以看到，当用户点击提交按钮后，首先调用了 getA 函数：

async function getA(prompt) {
    const response = await fetch(SERVER_URL + `/search?keyword=${prompt}`, {
        method: 'GET',
        headers: {
            'Content-Type': 'application/json'
        }
    });
    return await response.text();
}

getA函数，调用了服务端的 Search接口，去固定网页检索关键字，获取URL和标题列表。

然后，紧接着调用了 generateText 函数：

async function generateText(prompt) {
    const response = await fetch(SERVER_URL + '/generate-text', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            prompt
        })
    });
    return await response.json();
}

generateText 函数，调用了服务端的 generate-text 函数，利用大模型进行回复。

2.2 用户点击侧边栏标题后的动作

当用户点击侧边栏的某个标题后，执行的动作如下：

async function handleLinkClick(link) {
    const content = await getPageContent(link);
    
    ......
    
    const response = await generateText("总结内容：" + content);
    
    ......
}

首先，调用了 getPageContent 接口，通过服务端的 page_content 接口，爬取了该URL中的所有文字内容和图片内容。

然后，通过 generateText 接口，调用服务端的 generate-text 接口，使用大模型对这些文字内容进行总结，从而形成下面的效果：

在这里插入图片描述

3. 完整代码

3.1 ai_search.html

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Chat with AI</title>
    <style>
        body {
            display: flex;
            flex-direction: column;
            height: 100vh;
            margin: 0;
            font-family: Arial, sans-serif;
        }

        .website-container {
            position: fixed;
            top: 0;
            right: 0;
            width: 350px;
            height: 100%;
            border: 1px solid #ccc;
            overflow-y: auto;
            background-color: #f9f9f9;
            padding: 10px;
        }

        .chat-container {
            height: 100%;
            width: 85%;
            overflow: hidden;
            overflow-y: auto;
            padding: 10px;
            margin-right: 220px;
            /* 腾出右侧栏的宽度 */
        }

        .chat-container::-webkit-scrollbar {
            display: none;
        }

        .avatar-user {
            width: 40px;
            height: 40px;
            background-color: #7fb8e7;
            /* 设置用户头像颜色 */
            border-radius: 50%;
            /* 将头像设置为圆形 */
            margin-left: 10px;
            /* 调整头像与消息之间的间距 */
        }

        .avatar-bot {
            width: 40px;
            height: 40px;
            right: 0;
            background-color: #28a745;
            /* 设置机器人头像颜色 */
            border-radius: 50%;
            /* 将头像设置为圆形 */
            margin-right: 10px;
            /* 调整头像与消息之间的间距 */
            object-fit: cover;
            /* 防止头像变形 */
        }

        .message {
            display: flex;
            align-items: center;
            /* 垂直居中消息和头像 */
            margin-bottom: 1rem;
        }


        .message-text {
            padding: 10px;
            word-wrap: break-word;
            border-radius: 6px;
            max-width: 70%;
            margin：100px;
        }

        .message-text-user {
            padding: 10px;
            border-radius: 6px;
            max-width: 70%;
            margin：100px;
            word-wrap: break-word;
            background-color: #ececec;
        }

        .user-message {
            display: flex;
            justify-content: flex-end;

        }

        .bot-message .message-text {
            background-color: #2ea44f;
            color: white;
        }

        .input-container {
            position: fixed;
            bottom: 0;
            left: 0;
            width: calc(100% - 220px);
            /* 考虑右侧栏的宽度 */
            display: flex;
            align-items: center;
            background-color: #f9f9f9;
            padding: 10px;
        }

        .input-field {
            flex-grow: 1;
            padding: 0.75rem;
            border: 1px solid #d1d5da;
            border-radius: 6px;
            margin-right: 1rem;
        }

        .send-button {
            padding: 0.75rem 1rem;
            background-color: #2ea44f;
            color: white;
            border: none;
            border-radius: 6px;
            cursor: pointer;
        }

        .del-button {
            padding: 0.75rem 1rem;
            background-color: #aeaeae;
            color: white;
            border: none;
            margin-right: 10px;
            border-radius: 6px;
            cursor: pointer;
        }

        .send-button:disabled {
            opacity: 0.5;
            cursor: not-allowed;
        }

        .typing-indicator {
            position: absolute;
            margin-bottom: 50px font-size: 0.8rem;
            color: #586069;
        }

        .typing:before,
        .typing:after {
            content: '';
            display: inline-block;
            width: 0.75rem;
            height: 0.75rem;
            border-radius: 50%;
            margin-right: 0.25rem;
            animation: typing 1s infinite;
        }

        @keyframes typing {
            0% {
                transform: scale(0);
            }

            50% {
                transform: scale(1);
            }

            100% {
                transform: scale(0);
            }
        }

        /* 样式定义 */
        .listView {
            list-style-type: none;
            margin: 0;
            padding: 0;
        }

        .listView li {
            background-color: #f4f4f4;
            padding: 10px;
            margin-bottom: 5px;
            box-shadow: 2px 2px 5px rgba(0, 0, 0, 0.1);
            transition: box-shadow 0.3s ease;
        }

        .listView li:hover {
            box-shadow: 2px 2px 10px rgba(0, 0, 0, 0.2);
        }

        .listView li a {
            text-decoration: none;
            color: #333;
            display: block;
            transition: color 0.3s ease;
        }

        .listView li a:hover {
            color: #ff6600;
        }
    </style>
</head>

<body style="display: flex; flex-direction: column; height: 100vh;">

    <div id="website-container" class="website-container">
        <ul class="listView" id="listView"></ul>
    </div>
    <div style="height: 90%; width:80%; overflow-y: auto; display: flex; flex-direction: column;">
        <ul class="chat-container" id="chat-container">
            {% for item in history %}
                {% if loop.index == 1 %}
                    <!-- 对于第一条消息，可能想要做一些特殊处理 -->
                    <li class="message user-message">
                        <div class="message-text-user">{{ item.user }}</div> <!-- 这里应该插入用户消息 -->
                        <div class="avatar-user"></div>
                    </li>
                    <li class="message bot-message">
                        <div class="avatar-bot"></div>
                        <div class="message-text">{{ item.bot }}</div> <!-- 这里应该插入机器人消息 -->
                    </li>
                {% else %}
                    <!-- 对于其他消息，正常处理 -->
                    <li class="message user-message">
                        <div class="message-text-user">{{ item.user }}</div>
                        <div class="avatar-user"></div>
                    </li>
                    <li class="message bot-message">
                        <div class="avatar-bot"></div>
                        <div class="message-text">{{ item.bot }}</div>
                    </li>
                {% endif %}
            {% endfor %}
        </ul>
    </div>

    <form class="input-container" id="input-form" method="POST"
        style="position: fixed; bottom: 0; left: 0; width: 65%;">
        <button type="button" class="del-button" id="del-button" style="width: 100px;" onclick='del()'>清除</button>
        <input type="text" placeholder="你负责搜，我负责找" class="input-field" id="input-field" name="prompt" autocomplete="off"
            style="width: calc(100% - 100px);">
        <button type="submit" class="send-button" id="send-button" disabled style="width: 100px;">搜索</button>
    </form>

    <script>
        const SERVER_URL = '';
        const inputForm = document.getElementById('input-form');
        const inputField = document.getElementById('input-field');
        const chatContainer = document.getElementById('chat-container');

        inputField.addEventListener('input', () => {
            const userInput = inputField.value.trim();
            document.getElementById('send-button').disabled = !userInput;
        });

        inputForm.addEventListener('submit', async (event) => {
            event.preventDefault();
            const userInput = inputField.value.trim();
            const chatContainer = document.getElementById('chat-container');
            if (!userInput) {
                return;
            }
            const userMessage = createMessageElement(userInput, 'user-message', "message-text-user", "avatar-user");
            chatContainer.appendChild(userMessage);
            inputField.value = '';
            chatContainer.scrollTop = chatContainer.scrollHeight;
            inputField.disabled = true;
            document.getElementById('send-button').disabled = true;
            showTypingAnimation(userMessage);

            const aa = document.getElementById('listView');
            aa.innerHTML = await getA(userInput);
            const response = await generateText(userInput);
            hideTypingAnimation(userMessage);
            if (response.status === 'success') {
                const botResponse = createMessageElement(response.response, 'bot-message', "message-text", "avatar-bot");
                chatContainer.appendChild(botResponse);
                printMessageText(botResponse);

            } else {
                alert(response.message);
            }
            inputField.disabled = false;
            inputField.focus();
        });

        async function generateText(prompt) {
            const response = await fetch(SERVER_URL + '/generate-text', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json'
                },
                body: JSON.stringify({
                    prompt
                })
            });
            return await response.json();
        }
        async function getA(prompt) {
            const response = await fetch(SERVER_URL + `/search?keyword=${prompt}`, {
                method: 'GET',
                headers: {
                    'Content-Type': 'application/json'
                }
            });
            return await response.text();
        }
        function createMessageElement(text, className, name, bot) {
            const message = document.createElement('li');
            message.classList.add('message', className, 'typing');
            if (bot == "avatar-bot") {
                message.innerHTML = `
                <div class=${bot}></div>
                <div class=${name}>${text}</div>
                `;
            } else {
                message.innerHTML = `
                <div class=${name}>${text}</div>
                <div class=${bot}></div>
                
                `;
            }
            return message;
        }

        function showTypingAnimation(element) {
            const chatContainer = document.getElementById('chat-container');
            chatContainer.scrollTop = chatContainer.scrollHeight + 10;
            const rect = element.getBoundingClientRect();
            const topPosition = rect.top + window.scrollY + rect.height;
            const leftPosition = rect.left + window.scrollX;
            const typingIndicator = document.createElement('div');
            typingIndicator.classList.add('typing-indicator');
            typingIndicator.style.top = `${topPosition}px`;
            typingIndicator.style.left = `${leftPosition}px`;
            typingIndicator.innerHTML = '思考中...';

            document.body.appendChild(typingIndicator);
        }

        function hideTypingAnimation(element) {
            const typingIndicator = document.querySelector('.typing-indicator');
            if (typingIndicator) {
                typingIndicator.remove();
            }
            element.classList.remove('typing');
        }

        // 添加逐字打印效果
        function printMessageText(message) {
            const chatContainer = document.getElementById('chat-container');
            const text = message.querySelector('.message-text');
            const textContent = text.textContent;
            text.textContent = '';
            for (let i = 0; i < textContent.length; i++) {
                setTimeout(() => {
                    text.textContent += textContent.charAt(i);
                    chatContainer.scrollTop = chatContainer.scrollHeight;
                }, i * 10); // 控制打印速度
            }
        }
        async function handleLinkClick(link) {
            const content = await getPageContent(link);
            console.log(link);
            console.log(content);
            const userMessage = createMessageElement("总结中：" + link, 'user-message', "message-text-user", "avatar-user");
            showTypingAnimation(userMessage);

            const chatContainer = document.getElementById('chat-container');
            chatContainer.appendChild(userMessage);
            const response = await generateText("总结内容：" + content);
            hideTypingAnimation(userMessage);
            if (response.status === 'success') {
                const botResponse = createMessageElement(response.response, 'bot-message', "message-text", "avatar-bot");
                chatContainer.appendChild(botResponse);
                printMessageText(botResponse);

            } else {
                alert(response.message);
            }
        }
        function del(url) {
            const response = fetch(SERVER_URL + `/clear`, {
                method: 'POST'
            });
            location.replace("/");

            return 0;
        }
        // 获取页面内容
        async function getPageContent(url) {
            const response = await fetch(SERVER_URL + `/page_content?url=${url}`, {
                method: 'GET'
            });
            return await response.text();
        }
    </script>
</body>

</html>

3.2 ai_search.py

from flask import Flask, render_template, request, jsonify
from http import HTTPStatus
from openai import OpenAI
import mechanicalsoup
from bs4 import BeautifulSoup
from flask_cors import CORS
from urllib.parse import urlparse, parse_qs, quote
app = Flask(__name__)
client = OpenAI()

CORS(app)

history = []
def crawl_pages(query_text, page_num=2):
    browser = mechanicalsoup.Browser()
    query_text_encoded = quote(query_text)
    results = []
    for page_index in range(1, page_num+1):
        url = f"https://search.cctv.com/search.php?qtext={query_text_encoded}&type=web&page={page_index}"
        page = browser.get(url)
        soup = BeautifulSoup(page.text, 'html.parser')
        web_content_links = soup.find_all('a', id=lambda x: x and x.startswith('web_content_'))
        for i, link in enumerate(web_content_links):
            target_page = parse_qs(urlparse(link['href']).query).get('targetpage', [None])[0]
            results.append({'title': link.text, 'url': target_page})
    return results

def get_openai_chat_completion(messages, temperature, model = "gpt-3.5-turbo-1106"):
    response = client.chat.completions.create(
        model = model,
        messages = messages,
        temperature = temperature,
    )
    return response

def generate_text(prompt, temperature=0.5):
    messages = [
        {
            "role": "user",
            "content": prompt,
        }   
    ]
    response = get_openai_chat_completion(messages = messages, temperature=temperature)
    generated_text = response.choices[0].message.content
    history.append({"user": prompt, "bot": generated_text})  # 将用户输入和模型输出添加到历史记录中
    return {"status": "success", "response": generated_text}

@app.route('/', methods=['GET'])
def index():
    chat_history = history
    return render_template('ai_search.html', history=chat_history)

@app.route('/generate-text', methods=['POST'])
def generate_text_api():
    prompt = request.json['prompt']
    result = generate_text(prompt)
    return jsonify(result)

@app.route('/clear', methods=['POST'])
def clear():
    global history
    history = []
    return '', HTTPStatus.NO_CONTENT

@app.route('/search', methods=['GET', 'POST'])
def search():
    if request.method == 'POST':
        keyword = request.form['keyword']
    elif request.method == 'GET':
        keyword = request.args.get('keyword', '')
    else:
        keyword = ''
    
    results = crawl_pages(keyword)
    output = ""
    for result in results:
     output += f"<li><a id='myID' href='javascript:void(0);' onclick='handleLinkClick(\"{result['url']}\")'>{result['title']}</a></li><br>"
    return output

@app.route('/page_content')
def page_content():
    url = request.args.get('url', '')
    if not url:
        return '缺少 url 参数'
    browser = mechanicalsoup.Browser()
    page = browser.get(url)
    page.encoding = 'utf-8'  # 指定页面的编码为 utf-8
    soup = BeautifulSoup(page.text, 'html.parser')
    all_text = ''
    all_images = []

    # 获取页面中所有文本内容
    for element in soup.find_all(['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'span']):
        all_text += element.get_text() + ' '

    # 获取页面中所有图片链接
    for img in soup.find_all('img'):
        img_src = img.get('src')
        if img_src:
            all_images.append("https:"+img_src)

    return f"文本内容: {all_text}<br>图片链接: {', '.join(all_images)}"

if __name__ == '__main__':
    app.run(debug=True)

3.3 运行

运行 ai_search.py，打开提示中链接。

在这里插入图片描述

3.4 可能需要安装的依赖

pip install Flask -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install mechanicalsoup -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install Jinja2

3.5 一定是通过Jinja2加载HTML，而不是直接打开HTML

直接打开HTML文件会显示异常：

在这里插入图片描述

4. 总结

本文我们从0开始写了一个AI+搜索的搜索引擎。整体原理还是比较简单的，搜索的原理就是固定URL+关键字，然后爬取网页内的标题和URL，就算是结果了。至于文本总结就更不用多说了，前面的文章详细介绍和实践过。

这个例子很简单，但应该算比较完整了，可以作为后续类似项目的快速开始，在此基础上快速搭建出自己的原型产品。

大家可以上手运行一下，然后运行过程中，你会对这个例子产生一些改进的想法。

AI大模型应用怎么学？

这年头AI技术跑得比高铁还快，“早学会AI的碾压同行，晚入门的还能喝口汤，完全不懂的等着被卷成渣”！技术代差带来的生存压力从未如此真实。
兄弟们如果想入门AI大模型应用，没必要到处扒拉零碎教程，我整了套干货大礼包：从入门到精通的思维导图、超详细的实战手册，还有模块化的视频教程！现在无偿分享。