ubuntu 22.04 下 Kubernetes管理 4块4090 GPU显卡

kubernetes如何调度GPU资源?

1.安装显卡驱动

NVIDIA-Linux-x86_64-535.113.01.run* cuda 12.2

国内下载地址,速度快,替换驱动版本

# wget https://cn.download.nvidia.com/XFree86/Linux-x86_64/525.113.01/NVIDIA-Linux-x86_64-525.113.01.run
# sh NVIDIA-Linux-x86_64-535.113.01.run 

安装完reboot 重启

开启内存持久化

(base) ubuntu@ubuntu:~$ nvidia-smi -pm 1
Unable to set persistence mode for GPU 00000000:17:00.0: Insufficient Permissions
Terminating early due to previous errors.

nvidia-smi 查看显卡

(base) ubuntu@ubuntu:~$  nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-88717b49-0372-9d05-e6ca-238870f93bf3)
GPU 1: NVIDIA GeForce RTX 4090 (UUID: GPU-74b01939-bc8b-833b-10ac-daa5c60fc594)
GPU 2: NVIDIA GeForce RTX 4090 (UUID: GPU-0715eb37-44d8-d7ca-cd20-79452c93fe86)
GPU 3: NVIDIA GeForce RTX 4090 (UUID: GPU-b9f5ac04-9684-71fe-88b6-6363e7c2936d)
(base) ubuntu@ubuntu:~$ 

2.安装docker

配置apt源

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

安装docker

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

重启docker

systemctl restart docker

3.安装nvidia-docker-toolkit

安装Apt

配置存储库:

#curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \

&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \

sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \

sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \

&& \

sudo apt-get update

安装NVIDIA容器工具包:

#sudo apt-get install -y nvidia-container-toolkit

测试安装

#docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

4.安装kubernetes kubeadm

由于服务器上已经安装了docker ,所有我们不用containerd

基础环境配置

1.设置主机名字,具有明显的标识性

hostnamectl set-hostname ubuntu

2.禁用SELinux

sudo setenforce 0

sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

3.关闭swap分区

swapoff -a #临时关闭

sed -r

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值