用requests库、bs4的BeautifulSoup库和pandas爬取房地产网页的房屋出售信息

whatthinking

于 2020-09-03 22:37:45 发布

阅读量1k

点赞数

分类专栏： python 文章标签：爬虫 python

本文链接：https://blog.csdn.net/whatthinking/article/details/108394511

版权

本文详细介绍了如何利用Python的requests库获取网页数据，结合BeautifulSoup进行HTML解析，再使用pandas进行数据清洗和整理，从而爬取房地产网站上的房屋出售信息。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
import pandas as pd

# 获取网页源码
def get_html(url):
    try:
        res=requests.get(url,timeout=30)  # 发送url请求，并将请求到的结果赋值给res
        res.encoding='gb2312'       # 同一改成 GB2312 编码
        return res.text
    except:
        return ''
def parse_html(html):
    # 将爬取到的网页传给beautifulsoup，用html引擎解析器解析
    soup=BeautifulSoup(html,'html.parser')
    # 查找所有tr标签里属性的键是bgcolor值是#FFFFFF的标签，以列表的形式赋值给变量
    tr_list=soup.find_all('tr',attrs={'bgcolor':'#FFFFFF'})
    # 保存所有房屋信息
    houses=[]
    for tr in tr_list:
        house={}
        # 详细地址
        house['详细地址']=tr.find_all('a',attrs={'target':'_blank'})[0].string
        # 详情链接
        house['详情链接']='http://www.lgfdcw.com/cs/'+tr.find_all('a',attrs={'target':'

最低0.47元/天解锁文章