泰坦尼克号船员获救预测

Python库介绍

  • Numpy—Python科学计算库
  • Pandas—Python数据分析处理库
  • Scikit-learn—Python机器学习库

数据介绍

本次使用的数据来源于kaggle

在这里插入图片描述

数据预处理

对缺失的数据进行填充:
在这里插入图片描述
在这里插入图片描述

线性回归模型

# !/usr/bin/env python
# —*— coding: utf-8 —*—
# @Time:    2020/1/3 9:42
# @Author:  Martin
# @File:    Titanic1.py
# @Software:PyCharm

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
# 读入数据
titanic = pd.read_csv('../res/train.csv')
# 数据预处理
titanic['Age'] = titanic['Age'].fillna(titanic['Age'].median())
titanic.loc[titanic['Sex'] == 'male', 'Sex'] = 0
titanic.loc[titanic['Sex'] == 'female', 'Sex'] = 1
titanic['Embarked'] = titanic['Embarked'].fillna('S')
titanic.loc[titanic['Embarked'] == 'S', 'Embarked'] = 0
titanic.loc[titanic['Embarked'] == 'C', 'Embarked'] = 1
titanic.loc[titanic['Embarked'] == 'Q', 'Embarked'] = 2
# 线性回归
predictors = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
alg = LinearRegression()
kf = KFold(n_splits=3, shuffle=False, random_state=None)
predictions = []
for train, test in kf.split(titanic[predictors]):
    train_predictors = (titanic[predictors].iloc[train, :])
    train_target = titanic['Survived'].iloc[train]
    alg.fit(train_predictors, train_target)
    test_predictions = alg.predict(titanic[predictors].iloc[test, :])
    predictions.append(test_predictions)
# 验证模型的准确率
predictions = np.concatenate(predictions, axis=0)
predictions[predictions > .5] = 1
predictions[predictions <= .5] = 0
accuracy = sum(predictions == titanic['Survived']) / len(predictions)
print(accuracy)

结果如下:
0.7833894500561167 0.7833894500561167 0.7833894500561167

随机森林模型

# !/usr/bin/env python
# —*— coding: utf-8 —*—
# @Time:    2020/1/3 12:16
# @Author:  Martin
# @File:    Titanic2.py
# @Software:PyCharm

from sklearn.ensemble import RandomForestClassifier
from sklearn import model_selection
import pandas as pd
# 读入数据
titanic = pd.read_csv('../res/train.csv')
# 数据预处理
titanic['Age'] = titanic['Age'].fillna(titanic['Age'].median())
titanic.loc[titanic['Sex'] == 'male', 'Sex'] = 0
titanic.loc[titanic['Sex'] == 'female', 'Sex'] = 1
titanic['Embarked'] = titanic['Embarked'].fillna('S')
titanic.loc[titanic['Embarked'] == 'S', 'Embarked'] = 0
titanic.loc[titanic['Embarked'] == 'C', 'Embarked'] = 1
titanic.loc[titanic['Embarked'] == 'Q', 'Embarked'] = 2
# 随机森林模型
predictors = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
alg = RandomForestClassifier(random_state=1, n_estimators=50, min_samples_split=4, min_samples_leaf=2)
kf = model_selection.KFold(n_splits=3, shuffle=True, random_state=1)
scores = model_selection.cross_val_score(alg, titanic[predictors], titanic['Survived'], cv=kf)
print(scores.mean())

结果如下:
0.8260381593714926 0.8260381593714926 0.8260381593714926

总结

我们还可以通过如下方法来继续提高预测的准确率:

  • 增加特征
  • 综合利用多种算法(回归+随机森林)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值