翻了半天没找到b站视频的返回接口 于是果断参考其他的博主帖子
需求分析
B站爬取并下载到本地可以分为四个步骤
1.找到b站返回音视频的接口
2.从该接口获取音视频返回的路径
3.下载音视频
4.合并音视频,使用ffmpeg模块;
网络接口分析
这里使用的是谷歌浏览器,此时我们随机点开一个b站视频,然后根据它的url找到对应的接口
点击response,选择文档类型进行过滤,可以找到返回的视频和音频信息和它们的url,video是视频的信息,audio是音频信息,可以看到这里b站是把音频和视频分离开的,所以下载后需要合并
这里可以根据id筛选清晰度,id为80的是1080p最高清晰度应该
代码实现
前期准备:
需要下载一个用于合并音视频文件的ffmpeg
Builds - CODEX FFMPEG @ gyan.dev
这里使用的是python 3.9,在运行前下载好对应的依赖即可
import os
import pprint
import re
# 1、爬取视频页的网页源代码
import requests
import json
from lxml import etree
header = {
"referer": "https://www.bilibili.com",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0"
}
# 2、提取视频和音频的播放地址
def get_play_url(url):
r = requests.get(url, headers=header)
# print(r.text)
info = re.findall('window.__playinfo__=(.*?)</script>', r.text)[0]
video_url = json.loads(info)["data"]["dash"]["video"][0]["baseUrl"]
audio_url = json.loads(info)["data"]["dash"]["audio"][0]["baseUrl"]
# print(video_url)
# print(audio_url)
html = etree.HTML(r.text)
filename = html.xpath('//h1/text()')[0]
# print(filename)
return video_url, audio_url, filename
# 3、下载并保存视频和音频
def download_files(video_url, audio_url, filename, video_path,audio_path):
print("开始下载视频和音频")
video_content = requests.get(video_url, headers=header).content
audio_content = requests.get(audio_url, headers=header).content
with open(f'{video_path}/{filename}.mp4', 'wb') as f:
f.write(video_content)
print("视频部分下载完毕")
with open(f'{audio_path}/{filename}.mp3', 'wb') as f:
f.write(audio_content)
print("音频部分下载完毕")
# 4、合并视频和音频,使用ffmpeg模块
def combin_video_audio(filename, video_path, audio_path):
# cmd = fr"ffmpeg -i {video_path}{filename}.mp4 -i {audio_path}{filename}.mp3 -c:v copy -c:a aac -strict experimental -map 0:v -map 1:a {video_path}\output-{filename}.mp4 -loglevel quiet" # -loglevel quiet 表示隐藏日志,不加问题不大
cmd = fr"D:\ApplicationsSoftware\FFmpeg\ffmpeg-7.1-full_build\ffmpeg-7.1-full_build\bin\ffmpeg -i {video_path}/{filename}.mp4 -i {audio_path}/{filename}.mp3 -c:v copy -c:a aac -strict experimental -map 0:v -map 1:a {video_path}/output-{filename}.mp4 -loglevel quiet" # -loglevel quiet 表示隐藏日志,不加问题不大
os.system(cmd)
print("音频视频合并完毕")
print("--"*10)
os.remove(f'{video_path}/{filename}.mp4')
os.remove(f'{audio_path}/{filename}.mp3')
print('已删除多余的文件')
if __name__ == '__main__':
# url = 'https://www.bilibili.com/video/BV1AA4y1D7h2/?spm_id_from=333.337.search-card.all.click&vd_source=d9407807cd22419d13fabdc976906958'
url = 'https://www.bilibili.com/video/BV1F6qnYoEz1/?t=6&spm_id_from=333.1007.tianma.3-3-9.click'
video_path = r'E:\Download\videos'
audio_path = r'E:\Download\audio'
video_url, audio_url, filename = get_play_url(url)
download_files(video_url, audio_url, filename, video_path, audio_path)
combin_video_audio(filename, video_path, audio_path)