Datawhale AI春训营
AI4S蛋白质赛道学习笔记
流程
报名赛事在http://competition.sais.com.cn/competitionDetail/532313/format?spm=CHANNEL-0001
进入之后注册,然后填写个人手机号,通过支付宝进行实名认证等即可报名参与赛道
要报名赛道之后才可以下载相关的数据集和baseline代码等官方数据
- 报名赛事
- 下载docker、安装docker,打开docker
- 使用免费云算力训练模型、运行模型训练的baseline
git lfs install
git clone https://www.modelscope.cn/datasets/Datawhale/sais_third_synthetic_baseline.git - 开通阿里云镜像服务,创建镜像仓库 ,命名为 sais_synthetic
- 下载训练模型等五个文件
model.pkl,ml_baseline.py,Dockerfile,requirements.txt,run.sh - 在本地进行docker打包并推送
docker login --username=xx xxxx
大约3分钟
docker build -t sais_synthetic:v1 .
大约耗时5分钟
docker tag sais_synthetic:v1 xxxxxx/sais_medicine:v1
docker tag sais_synthetic:v1 crpi-yimn1cg16ys23bar.cn-hangzhou.personal.cr.aliyuncs.com/sais_synthetic_wumao/sais_synthetic:v1
docker push xxxxx/sais_synthetic:v1
docker push crpi-yimn1cg16ys23bar.cn-hangzhou.personal.cr.aliyuncs.com/sais_synthetic_wumao/sais_synthetic:v1 - 然后提交镜像、获得分数
model.pkl生成
model.pkl是在jupter notebook中的,打开克隆的datawhale的baseline仓库
里面有ml_baseline.ipynb,运行这个notebook就可以生成model.pkl了
!pip install gensim
import pickle
import gensim
import gensim.models
import os
import sys
import random
import numpy as np
import pandas as pd
from joblib import load, dump
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import classification_report
datas = pickle.load(open("WSAA_data_public.pkl", "rb"))
random_seed = random.randint(0, 10000)
model_w2v = gensim.models.Word2Vec(
sentences=[' '.join(x["sequence"]) for x in datas],
vector_size=random.choice([10, 20, 40, 50, 100]),
min_count=1,
seed=random_seed
)
data_x = []
data_y = []
for data in datas:
sequence = list(data["sequence"])
for idx, (_, y) in enumerate(zip(sequence, data['label'])):
data_x.append(
model_w2v.wv[sequence[max(0, idx-2): min(len(sequence), idx+2)]].mean(0)
)
data_y.append(y)
model = GaussianNB()
pred = cross_val_predict(
model, data_x, data_y
)
print(classification_report(data_y, pred))
model = GaussianNB()
model.fit(data_x, data_y)
dump((model, model_w2v), "model.pkl")
然后生成的这个model.pkg和配合的Dockerfile,脚本等按照要求推送到ali云的镜像平台,然后就可以提交了
安装docker后构建:
docker build -t sais_synthetic:v2 .
docker images
推送:
docker tag sais_synthetic:v2 xxxxxx/sais_medicine:v1
docker push xxxxx/sais_synthetic:v1
# 例如
docker tag sais_synthetic:v1 crpi-yimn1cg16ys23bar.cn-hangzhou.personal.cr.aliyuncs.com/sais_synthetic_wumao/sais_synthetic:v1
docker push crpi-yimn1cg16ys23bar.cn-hangzhou.personal.cr.aliyuncs.com/sais_synthetic_wumao/sais_synthetic:v1
上传之后,就可以到官网提交了,记得复制外网地址哦