kubernetes如何调度GPU资源?
1.安装显卡驱动
NVIDIA-Linux-x86_64-535.113.01.run* cuda 12.2
国内下载地址,速度快,替换驱动版本
# wget https://cn.download.nvidia.com/XFree86/Linux-x86_64/525.113.01/NVIDIA-Linux-x86_64-525.113.01.run
# sh NVIDIA-Linux-x86_64-535.113.01.run
安装完reboot 重启
开启内存持久化
(base) ubuntu@ubuntu:~$ nvidia-smi -pm 1
Unable to set persistence mode for GPU 00000000:17:00.0: Insufficient Permissions
Terminating early due to previous errors.
nvidia-smi 查看显卡
(base) ubuntu@ubuntu:~$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-88717b49-0372-9d05-e6ca-238870f93bf3)
GPU 1: NVIDIA GeForce RTX 4090 (UUID: GPU-74b01939-bc8b-833b-10ac-daa5c60fc594)
GPU 2: NVIDIA GeForce RTX 4090 (UUID: GPU-0715eb37-44d8-d7ca-cd20-79452c93fe86)
GPU 3: NVIDIA GeForce RTX 4090 (UUID: GPU-b9f5ac04-9684-71fe-88b6-6363e7c2936d)
(base) ubuntu@ubuntu:~$
2.安装docker
配置apt源
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
安装docker
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
重启docker
systemctl restart docker
3.安装nvidia-docker-toolkit
安装Apt
配置存储库:
#curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
&& \
sudo apt-get update
安装NVIDIA容器工具包:
#sudo apt-get install -y nvidia-container-toolkit
测试安装
#docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
4.安装kubernetes kubeadm
由于服务器上已经安装了docker ,所有我们不用containerd
基础环境配置
1.设置主机名字,具有明显的标识性
hostnamectl set-hostname ubuntu
2.禁用SELinux
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
3.关闭swap分区
swapoff -a #临时关闭
sed -r