没有GPU,也可以尝试一下NVIDIA提供的免费大模型环境。
这里准备的demo调用了3.8B参数的轻量级模型:microsoft/phi-3-mini-4k-instruct, 加上Flask做一个简单的网页调用演示。
a) 项目结构
app.py
templates/index.html
b) 安装引用
# pip install Flask openai
c) 准备Flask应用 (app.py)
from flask import Flask, request, render_template
from openai import OpenAI
app = Flask(__name__)
client = OpenAI(
base_url="https://integrate.api.nvidia.com/v1",
api_key="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"
)
@app.route("/", methods=["GET", "POST"])
def index():
result = ""
if request.method == "POST":
user_input = request.form["user_input"]
completion = client.chat.completions.create(
model="microsoft/phi-3-mini-4k-instruct",
messages=[{"role": "user", "content": user_input}],
temperature=0.2,
top_p=0.7,
max_tokens=1024,
stream=True
)
for chunk in completion:
if chunk.choices[0].delta.content is not None:
result += chunk.choices[0].delta.content
return render_template("index.html", result=result)
if __name__ == "__main__":
app.run(debug=True)
d) 准备网页模板 (templates/index.html)
<!DOCTYPE html>
<html lang="zh">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>大模型请求</title>
</head>
<body>
<h1>输入你的请求</h1>
<form method="POST">
<textarea name="user_input" rows="4" cols="50" placeholder="请输入内容..."></textarea><br>
<input type="submit" value="提交">
</form>
<h2>返回结果:</h2>
<pre>{{ result }}</pre>
</body>
</html>
e) 运行Flask应用
# python app.py
* Serving Flask app 'app'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
* Restarting with stat
* Debugger is active!
* Debugger PIN: 831-621-819
f) 调用大模型
打开浏览器,访问 http://127.0.0.1:5000
<end>