requests库学习

最新推荐文章于 2025-04-18 17:33:16 发布

点点滴滴25

最新推荐文章于 2025-04-18 17:33:16 发布

阅读量221

点赞数 1

分类专栏： python学习

本文链接：https://blog.csdn.net/qq_41325698/article/details/103054395

版权

python学习专栏收录该内容

15 篇文章

订阅专栏

带参数get请求：

1.字典形式

import requests
url="http://httpbin.org/get"
data={
    'name':'germey',
    'age':22
}
res=requests.get(url=url,params=data)
print(res.text)

import requests
url="http://httpbin.org/get?name=germey&age=22"

res=requests.get(url=url)
print(res.text)

二者等价。

解析json

import requests
import json
url="http://httpbin.org/get?name=germey&age=22"

res=requests.get(url=url)
print(res.text)
print(res.json())   #这两个等价
print(json.loads(res.text))

获取二进制数据：

import requests
import json
url="https://ss0.bdstatic.com/94oJfD_bAAcT8t7mm9GUKT-xh_/timg?image&quality=100&size=b4000_4000&sec=1573635010&di=6582f9faaa9eac482013d63dc96380cf&src=http://hbimg.b0.upaiyun.com/b8a2f3cb90ebfdcc8f432e55137d8008d8e0b53c656d-LYlEC1_fw658"

res=requests.get(url=url)
print(res.content)
with open('baidu.jpg','wb') as f:
    f.write(res.content)
    f.close()

添加头部：

如果不添加headers

爬取知乎：

import requests
import json
url="https://www.zhihu.com/"
res=requests.get(url=url)
print(res.text)

<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>openresty</center>
</body>
</html>

添加头部后：


import requests
import json
url="https://zhuanlan.zhihu.com/p/91229194"
headers={
    'User-Agent':"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
}
res=requests.get(url=url,headers=headers)
res.encoding = res.apparent_encoding
print(res.text)

就可以显示网页源代码了，期间，遇到中文乱码问题

解决：

res.encoding = res.apparent_encoding

这样可以解决99.99%的乱码，原因跟你说一下,爬取网页中，response.encoding:如果header中不存在charset,则默认编码为ISO-8859-1，而response.apparent_encoding他是一个备用编码方式,他会根据内容自动匹配给你个合适的编码方式
那为什么只能99.99%而不是100%解决,还有可能就是网站开发者故意放2种编码进去隐藏一些重要信息

基本post请求


import requests
import json
data={
    'name':'germey',
    'age':'22'
}
url="https://www.zhihu.com/question/338590379/answer/818309354"
headers={
    'User-Agent':"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
}
res=requests.post(url=url,data=data,headers=headers)
res.encoding = res.apparent_encoding
print(res.text)

高级操作：

1.文件上传：

import requests
import json

file={'file':open('baidu.jpg','rb')}  #把文件读取出来
res=requests.post("http://httpbin.org/post",files=file)
print(res.text)

2.获取cookie：


import requests
import json
url="https://www.zhihu.com/question/338590379/answer/818309354"

res=requests.get(url=url)
print(res.cookies)
for key,value in res.cookies.items():
    print(key + '=' +value)

Cookies用来维持登录状态。

3.会话维持


import requests
import json


requests.get('http://httpbin.org/cookies/set/number/123456')
res=requests.get('http://httpbin.org/cookies')
print(res.text)

上边这种获取cookies方式，相当于两个浏览器同时工作，一个设置cookies，另外一个获取cookies，所以获取为空。

import requests
import json

s=requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456')
res=s.get('http://httpbin.org/cookies')
print(res.text)

这种相当于在同一个浏览器下工作。

4.证书验证

日常访问一些网站时，会进行ssl证书验证，其实也可以不用验证，代码如下：

res=requests.get('https://www.12306.cn',verify=False)

默认verify=true

设置为false后

会报一警告：

意思是建议进行证书认证。

但是也可以消除警告信息。

from requests.packages import urllib3
import requests

urllib3.disable_warnings()

import json


res=requests.get('https://www.12306.cn',verify=False)

print(res.status_code)

5.设置代理：

import requests
proxies={
    "http":"http://10.251.234.146:9743",
    "https":"https://10.251.234.146:9743"
}


res=requests.get('https://www.taobao.com',proxies=proxies)

print(res.status_code)

我用的是锐捷，校内网，他说：系统积极拒绝访问。

6.认证设置：

import requests
from requests.auth import HTTPBasicAuth
res=requests.get('https://www.taobao.com',auth=HTTPBasicAuth("user","passwd"))
print(res.status_code)

或者：

res=requests.get('https://www.taobao.com',auth=("user","passwd"))

这两个都可以进行登录验证

其他技巧：

打印网页信息：

print(res.read().decode('utf-8'))

decode('utf-8') 在这的作用是将返回的网页信息规范化。

异常：

模板

import socket,urllib.request,urllib.error
from urllib import error,request
try:
    res=urllib.request.urlopen('http://www.baidu.com')
except error.HTTPError as e:
    print(e.reason,e.code,e.headers,sep='\n')
except error.URLError as e:
    print(e.reason)
else:
    print("Request Successfully!")

urllib.error 有三种类型的error

URLError

HTTPError

ContentTooShortError

只要捕捉前两个异常，基本上就可以完成处理。