Ubuntu 16.04 安装 CUDA Toolkit 8.0_cuda8.0对应的toolkit-CSDN博客

本文详细介绍了在Ubuntu16.04系统上安装NVIDIA CUDAToolkit 8.0的过程，包括禁用Nouveau驱动，使用runfile安装CUDA工具包和示例，以及环境变量的设置。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Ubuntu 16.04 安装 CUDA Toolkit 8.0

NVIDIA 引领人工智能计算 - NVIDIA
https://www.nvidia.cn/

CUDA Zone
https://developer.nvidia.com/cuda-zone

Download Now -> Legacy Releases
https://developer.nvidia.com/cuda-downloads

CUDA Toolkit Archive
https://developer.nvidia.com/cuda-toolkit-archive

CUDA Toolkit 8.0 - CUDA Toolkit 8.0 GA2
https://developer.nvidia.com/cuda-80-ga2-download-archive

CUDA Toolkit Documentation v8.0
https://docs.nvidia.com/cuda/archive/8.0/

1. CUDA Toolkit 8.0 - CUDA Toolkit 8.0 GA2

1.1 Select Target Platform

在这里插入图片描述

1.2 Download Installers for Linux Ubuntu 16.04 x86_64

The base installer is available for download below.
There is 1 patch available. This patch requires the base installer to be installed first.

在这里插入图片描述

prompt [prɒm(p)t]：v. 提示，鼓励，促进，激起，导致，提白 adj. 敏捷的，迅速的，立刻的，及时的，准时的，即期要送的 n. 提示，提词，提示符，鼓励，催促，付款期限 adv. 准时地

cuda_8.0.61.2_linux.run
cuda_8.0.61_375.26_linux.run

legacy ['legəsɪ]：n. 遗赠，遗产

建议使用 runfile 安装。

2. PRE-INSTALLATION ACTIONS

Some actions must be taken before the CUDA Toolkit and Driver can be installed on Linux.

You can override the install-time prerequisite checks by running the installer with the -override flag. Remember that the prerequisites will still be required to use the NVIDIA CUDA Toolkit.
您可以通过使用 -override 标志运行安装程序来不理会安装时必备条件检查。请记住，使用 NVIDIA CUDA 工具包仍然需要满足必备条件。

2.1. Verify You Have a CUDA-Capable GPU

To verify that your GPU is CUDA-capable, go to your distribution’s equivalent of System Properties, or, from the command line, enter:
要验证您的 GPU 是否具有 CUDA 功能，请转至您的发行版等效的系统属性，或者从命令行输入：

$ lspci | grep -i nvidia

If you do not see any settings, update the PCI hardware database that Linux maintains by entering update-pciids (generally found in /sbin) at the command line and rerun the previous lspci command.
如果没有看到任何设置，请通过在命令行输入 update-pciids (通常位于 /sbin 中) 并重新运行以前的 lspci 命令来更新 Linux 维护的 PCI 硬件数据库。

If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/ cuda-gpus, your GPU is CUDA-capable.
如果你的显卡来自 NVIDIA 并且它在 http://developer.nvidia.com/ cuda-gpus 中列出，那么你的 GPU 就具备了 CUDA 功能。

The Release Notes for the CUDA Toolkit also contain a list of supported products.
CUDA 工具包的发行说明还包含受支持产品的列表。

yongqiang@famu-sys:~$ lspci | grep -i nvidia
02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1)
03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
82:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1)
82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
83:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1)
83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
yongqiang@famu-sys:~$

2.2. Verify You Have a Supported Version of Linux

The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes.
CUDA 开发工具仅在某些特定的 Linux 发行版上受支持。这些列在 CUDA Toolkit 发行说明中。

To determine which distribution and release number you’re running, type the following at the command line:
要确定正在运行的发行版和版本号，请在命令行键入以下内容：

$ uname -m && cat /etc/*release

You should see output similar to the following, modified for your particular system:
您应该看到类似于以下的输出，针对您的特定系统进行了修改：

x86_64
Red Hat Enterprise Linux Workstation release 6.0 (Santiago)

The x86_64 line indicates you are running on a 64-bit system. The remainder gives information about your distribution.
x86_64 行表示您正在 64 位系统上运行。其余部分提供有关您的发行版的信息。

$ uname -m && cat / etc / * release

yongqiang@famu-sys:~$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
yongqiang@famu-sys:~$

2.3. Verify the System Has gcc Installed

The gcc compiler is required for development using the CUDA Toolkit. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly.
使用 CUDA Toolkit 进行开发需要 gcc 编译器。运行 CUDA 应用程序不需要它。它通常作为 Linux 安装的一部分安装，并且在大多数情况下，安装有受支持的 Linux 版本的 gcc 版本将正常工作。

To verify the version of gcc installed on your system, type the following on the command line:
要验证系统上安装的 gcc 版本，请在命令行上键入以下内容：

$ gcc --version

If an error message displays, you need to install the development tools from your Linux distribution or obtain a version of gcc and its accompanying toolchain from the Web.
如果显示错误消息，则需要从 Linux 发行版安装开发工具，或从 Web 获取 gcc 及其附带工具链的版本。

$ gcc --version

accompany [ə'kʌmpənɪ]：vt. 陪伴，伴随，伴奏 vi. 伴奏，伴唱

yongqiang@famu-sys:~$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

yongqiang@famu-sys:~$

2.4. Verify the System has the Correct Kernel Headers and Development Packages Installed

The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.
CUDA 驱动程序要求在安装驱动程序以及重建驱动程序时安装内核运行版本的内核头文件和开发包。例如，如果您的系统运行内核版本 3.17.4-301，则还必须安装 3.17.4-301 内核头文件和开发包。

While the Runfile installation performs no package validation, the RPM and Deb installations of the driver will make an attempt to install the kernel header and development packages if no version of these packages is currently installed. However, it will install the latest version of these packages, which may or may not match the version of the kernel your system is using. Therefore, it is best to manually ensure the correct version of the kernel headers and development packages are installed prior to installing the CUDA Drivers, as well as whenever you change the kernel version.
虽然 Runfile 安装不执行包验证，但如果当前没有安装这些包的版本，则驱动程序的 RPM 和 Deb 安装将尝试安装内核头和开发包。但是，它将安装这些软件包的最新版本，这些软件包可能与您的系统使用的内核版本匹配，也可能不匹配。因此，最好在安装 CUDA 驱动程序之前以及更改内核版本时手动确保安装正确版本的内核头文件和开发包。

The version of the kernel your system is running can be found by running the following command:
可以通过运行以下命令找到系统运行的内核版本：

$ uname -r

This is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers. This command will be used multiple times below to specify the version of the packages to install. Note that below are the common-case scenarios for kernel usage. More advanced cases, such as custom kernel branches, should ensure that their kernel headers and sources match the kernel build they are running.
这是在安装 CUDA 驱动程序之前必须安装的内核头文件和开发包的版本。此命令将在下面多次使用，以指定要安装的软件包的版本。请注意，以下是内核使用的常见方案。更高级的情况 (例如自定义内核分支) 应确保其内核头和源与它们正在运行的内核构建匹配。

Ubuntu
The kernel headers and development packages for the currently running kernel can be installed with:
可以使用以下命令安装当前运行内核的内核头文件和开发包：

$ sudo apt-get install linux-headers-$(uname -r)

yongqiang@famu-sys:~$ uname -r
4.13.0-36-generic
yongqiang@famu-sys:~$ 
yongqiang@famu-sys:~$ dpkg --get-selections | grep linux-
linux-base					install
linux-firmware					install
linux-generic-hwe-16.04				install
linux-headers-4.13.0-36				hold
linux-headers-4.13.0-36-generic			hold
linux-headers-generic-hwe-16.04			install
linux-image-4.13.0-36-generic			hold
linux-image-extra-4.13.0-36-generic		hold
linux-image-generic-hwe-16.04			install
linux-libc-dev:amd64				install
linux-sound-base				install
syslinux-common					install
syslinux-legacy					install
yongqiang@famu-sys:~$

2.5 Download Verification

The download can be verified by comparing the MD5 checksum posted at http://developer.nvidia.com/cuda-downloads/checksums with that of the downloaded file. If either of the checksums differ, the downloaded file is corrupt and needs to be downloaded again.
可以通过将 http://developer.nvidia.com/cuda-downloads/checksums 上发布的 MD5 校验和与下载文件的校验和进行比较来验证下载。如果任一校验和不同，则下载的文件已损坏，需要再次下载。

To calculate the MD5 checksum of the downloaded file, run the following:
要计算下载文件的MD5校验和，请运行以下命令：

$ md5sum

yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ ls -l
总用量 1584920
-rw-rw-r-- 1 yongqiang yongqiang   97546170 6月  24 00:12 cuda_8.0.61.2_linux.run
-rw-rw-r-- 1 yongqiang yongqiang 1465528129 6月  24 00:17 cuda_8.0.61_375.26_linux.run
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ 
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ sudo chmod 777 cuda_*
[sudo] yongqiang 的密码： 
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ 
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ ll
总用量 1584928
drwxrwxr-x  2 yongqiang yongqiang       4096 6月  25 10:39 ./
drwxrwxrwx 10 nobody    nogroup         4096 6月  24 10:01 ../
-rwxrwxrwx  1 yongqiang yongqiang   97546170 6月  24 00:12 cuda_8.0.61.2_linux.run*
-rwxrwxrwx  1 yongqiang yongqiang 1465528129 6月  24 00:17 cuda_8.0.61_375.26_linux.run*
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ 
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ md5sum cuda_8.0.61.2_linux.run 
09adbda67db5267a7d4444fb5173f182  cuda_8.0.61.2_linux.run
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ 
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$ md5sum cuda_8.0.61_375.26_linux.run 
33e1bd980e91af4e55f3ef835c103f9b  cuda_8.0.61_375.26_linux.run
yongqiang@famu-sys:/media/famu/DISK_DEEP/software$

2.7. Handle Conflicting Installation Methods

Before installing CUDA, any previously installations that could conflict should be uninstalled. This will not affect systems which have not had CUDA installed previously, or systems where the installation method has been preserved (RPM/Deb vs. Runfile). See the following charts for specifics.
在安装 CUDA 之前，应卸载任何可能发生冲突的先前安装。这不会影响以前未安装过 CUDA 的系统，也不会影响已保留安装方法的系统 (RPM / Deb 与 Runfile)。

NVIDIA Driver 和 CUDA Toolkit 卸载命令
Use the following command to uninstall a Toolkit runfile installation:

$ sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl

Use the following command to uninstall a Driver runfile installation:

$ sudo /usr/bin/nvidia-uninstall

Use the following commands to uninstall a RPM/Deb installation:

$ sudo apt-get --purge remove <package_name> # Ubuntu

3. RUNFILE INSTALLATION

This section describes the installation and configuration of CUDA when using the standalone installer. The standalone installer is a “.run” file and is completely self-contained.
本节介绍使用独立安装程序时 CUDA 的安装和配置。独立安装程序是一个 .run 文件，完全是自包含的。

3.1. Overview

The Runfile installation installs the NVIDIA Driver, CUDA Toolkit, and CUDA Samples via an interactive text-based interface.
Runfile 安装通过基于交互式文本的界面安装 NVIDIA 驱动程序，CUDA 工具包和 CUDA 示例。

Distribution-specific instructions on disabling the Nouveau drivers as well as steps for verifying device node creation are also provided.
还提供了有关禁用 Nouveau 驱动程序的特定于分发的说明以及验证设备节点创建的步骤。

The Runfile installation does not include support for cross-platform development. For cross-platform development, see the CUDA Cross-Platform Environment section.
Runfile 安装不包括对跨平台开发的支持。有关跨平台开发，请参阅 CUDA 跨平台环境部分。

3.2 Disable the Nouveau drivers - Disabling Nouveau

To install the Display Driver, the Nouveau drivers must first be disabled. Each distribution of Linux has a different method for disabling Nouveau.
要安装显示驱动程序，必须首先禁用 Nouveau 驱动程序。Linux 的每个发行版都有一种不同的方法来禁用 Nouveau。

The Nouveau drivers are loaded if the following command prints anything:
如果以下命令打印任何内容，则会加载 Nouveau 驱动程序：

$ lsmod | grep nouveau

lsmod | grep nouveau 查看 nouveau 是否运行，如果看不到任何关于 nouveau 的信息，说明已经关闭。
nouveau 是 Ubuntu 自带的第三方开源 NVIDIA 卡驱动，安装 NVIDIA 官方驱动之前需要禁用 nouveau。不安装 OPENGL (-no-opengl-files).

Ubuntu

Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:

blacklist nouveau
options nouveau modeset=0

按下 ESC 后输入 :wq 保存并退出。

Regenerate the kernel initramfs:

$ sudo update-initramfs -u

yongqiang@famu-sys:~$ lsmod | grep nouveau
yongqiang@famu-sys:~$ 
yongqiang@famu-sys:~$ ls /etc/modprobe.d/
alsa-base.conf          blacklist-firewire.conf     blacklist-oss.conf           fbdev-blacklist.conf
blacklist-ath_pci.conf  blacklist-framebuffer.conf  blacklist-rare-network.conf  iwlwifi.conf
blacklist.conf          blacklist-modem.conf        blacklist-watchdog.conf      mlx4.conf
yongqiang@famu-sys:~$ 
yongqiang@famu-sys:~$ sudo vim /etc/modprobe.d/blacklist-nouveau.conf
yongqiang@famu-sys:~$ 
yongqiang@famu-sys:~$ cat /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
yongqiang@famu-sys:~$
yongqiang@famu-sys:~$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.13.0-36-generic
yongqiang@famu-sys:~$

3.3. Installation

This can usually be accomplished by adding the number “3” to the end of the system’s kernel boot parameters.
这通常可以通过在系统的内核引导参数末尾添加数字 3 来实现。

Since the NVIDIA drivers are not yet installed, the text terminals may not display correctly. Temporarily adding “nomodeset” to the system’s kernel boot parameters may fix this issue.
由于尚未安装 NVIDIA 驱动程序，文本终端可能无法正确显示。暂时将 nomodeset 添加到系统的内核引导参数可能会解决此问题。

The reboot is required to completely unload the Nouveau drivers and prevent the graphical interface from loading. The CUDA driver cannot be installed while the Nouveau drivers are loaded or while the graphical interface is active.
需要重新启动才能完全卸载 Nouveau 驱动程序并阻止加载图形界面。加载 Nouveau 驱动程序或图形界面处于活动状态时，无法安装 CUDA 驱动程序。

Verify that the Nouveau drivers are not loaded. If the Nouveau drivers are still loaded, consult your distribution’s documentation to see if further steps are needed to disable Nouveau.
验证是否加载 Nouveau 驱动程序。如果仍然加载了 Nouveau 驱动程序，请查阅您的发行版文档，了解是否需要采取进一步措施来禁用 Nouveau。

$ sudo sh cuda__linux.run

安装 CUDA 时，不能再安装显卡驱动。
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26? - NVIDIA Accelerated Graphics Driver 已安装，此处不要再安装。
Do you want to install a symbolic link at /usr/local/cuda? - 这是在系统中安装的第二个 CUDA 版本，此处选择 (n)o。

按 CTRL + ALT + F7 无图形化界面，说明已经关闭了图形化界面。
按 CTRL + ALT + F1 进入 tty1 文本模式，关闭 (图形) 桌面显示管理器 lightdm。

login as: yongqiang
yongqiang@192.168.3.41's password:
Welcome to Ubuntu 16.04.5 LTS (GNU/Linux 4.13.0-36-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

371 个可升级软件包。
268 个安全更新。

Last login: Wed Apr 10 10:01:11 2019 from 192.168.6.29
yongqiang@famu-sys:~$
yongqiang@famu-sys:~$ lsmod | grep nouveau
yongqiang@famu-sys:~$
yongqiang@famu-sys:~$ sudo service lightdm stop
[sudo] yongqiang 的密码：
yongqiang@famu-sys:~$
yongqiang@famu-sys:~$ cd software/
yongqiang@famu-sys:~/software$ ls
cuda_8.0.61.2_linux.run  cuda_8.0.61_375.26_linux.run  libcudnn6-dev_6.0.21-1+cuda8.0_amd64.deb
yongqiang@famu-sys:~/software$
yongqiang@famu-sys:~/software$ sudo sh cuda_8.0.61_375.26_linux.run
Logging to /tmp/cuda_install_2438.log
Using more to view the EULA.
End User License Agreement

......

Default Install Location of CUDA Toolkit

Windows platform:
%ProgramFiles%\NVIDIA GPU Computing Toolkit\CUDA\v#.#

Linux platform:
/usr/local/cuda-#.#

Mac platform:
/Developer/NVIDIA/CUDA-#.#


Default Install Location of CUDA Samples

Windows platform:
%ProgramData%\NVIDIA Corporation\CUDA Samples\v#.#

Linux platform:
/usr/local/cuda-#.#/samples
and
$HOME/NVIDIA_CUDA-#.#_Samples

Mac platform:
/Developer/NVIDIA/CUDA-#.#/samples


Default Install Location of Nsight Visual Studio Edition

Windows platform:
%ProgramFiles(x86)%\NVIDIA Corporation\Nsight Visual Studio Edition #.#

......

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-8.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /home/yongqiang ]:

Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...
Installing the CUDA Samples in /home/yongqiang ...
Copying samples to /home/yongqiang/NVIDIA_CUDA-8.0_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-8.0
Samples:  Installed in /home/yongqiang

Please make sure that
 -   PATH includes /usr/local/cuda-8.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_2438.log
yongqiang@famu-sys:~/software$
yongqiang@famu-sys:~$ sudo service lightdm start
[sudo] yongqiang 的密码：
yongqiang@famu-sys:~$

Default Installation Directory
CUDA Toolkit - /usr/local/cuda-8.0
CUDA Samples - $(HOME)/NVIDIA_CUDA-8.0_Samples

4. Environment Setup

The PATH variable needs to include /usr/local/cuda-8.0/bin
To add this path to the PATH variable:

$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-8.0/lib64 on a 64-bit system, or /usr/local/cuda-8.0/lib on a 32-bit system.

To change the environment variables for 64-bit operating systems:

$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

To change the environment variables for 32-bit operating systems:

$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib\
 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Note that the above paths change when using a custom install path with the runfile installation method.
请注意，使用自定义安装路径和 runfile 安装方法时，上述路径会发生更改。

4.1 /etc/profile

/etc/profile 文件添加环境设置。

# foreverstrong cuda-8.0
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# foreverstrong cuda-9.0
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

sudo vim /etc/profile
source /etc/profile

cat /etc/profile
echo $PATH

4.2 ~/.bashrc

~/.bashrc 文件添加环境设置。

# foreverstrong cuda-8.0
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# foreverstrong cuda-9.0
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

sudo gedit ~/.bashrc
source ~/.bashrc

cat ~/.bashrc
echo $PATH

5. samples

/usr/local/cuda-8.0/samples/

5.1 /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery

yongqiang@famu-sys:~$ cd /usr/local/cuda-8.0/samples/
yongqiang@famu-sys:/usr/local/cuda-8.0/samples$ ll
总用量 144
drwxr-xr-x 11 root root  4096 6月  25 11:30 ./
drwxr-xr-x 17 root root  4096 6月  25 11:30 ../
drwxr-xr-x 48 root root  4096 6月  25 11:30 0_Simple/
drwxr-xr-x  7 root root  4096 6月  25 11:30 1_Utilities/
drwxr-xr-x 12 root root  4096 6月  25 11:30 2_Graphics/
drwxr-xr-x 21 root root  4096 6月  25 11:30 3_Imaging/
drwxr-xr-x 10 root root  4096 6月  25 11:30 4_Finance/
drwxr-xr-x 10 root root  4096 6月  25 11:30 5_Simulations/
drwxr-xr-x 31 root root  4096 6月  25 11:30 6_Advanced/
drwxr-xr-x 37 root root  4096 6月  25 11:30 7_CUDALibraries/
drwxr-xr-x  6 root root  4096 6月  25 11:30 common/
-rw-r--r--  1 root root 96407 6月  25 11:30 EULA.txt
-rw-r--r--  1 root root  2652 6月  25 11:30 Makefile
yongqiang@famu-sys:/usr/local/cuda-8.0/samples$ 
yongqiang@famu-sys:/usr/local/cuda-8.0/samples$ cd 1_Utilities/
yongqiang@famu-sys:/usr/local/cuda-8.0/samples/1_Utilities$ ll
总用量 28
drwxr-xr-x  7 root root 4096 6月  25 11:30 ./
drwxr-xr-x 11 root root 4096 6月  25 11:30 ../
drwxr-xr-x  2 root root 4096 6月  25 11:30 bandwidthTest/
drwxr-xr-x  2 root root 4096 6月  25 11:30 deviceQuery/
drwxr-xr-x  2 root root 4096 6月  25 11:30 deviceQueryDrv/
drwxr-xr-x  2 root root 4096 6月  25 11:30 p2pBandwidthLatencyTest/
drwxr-xr-x  2 root root 4096 6月  25 11:30 topologyQuery/
yongqiang@famu-sys:/usr/local/cuda-8.0/samples/1_Utilities$ cd deviceQuery
yongqiang@famu-sys:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery$ make
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery.o -c deviceQuery.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Assembler messages:
致命错误： can't create deviceQuery.o: 权限不够
Makefile:250: recipe for target 'deviceQuery.o' failed
make: *** [deviceQuery.o] Error 1
yongqiang@famu-sys:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery$ 
yongqiang@famu-sys:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery$ sudo make
[sudo] yongqiang 的密码： 
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery.o -c deviceQuery.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
"/usr/local/cuda-8.0"/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_60,code=compute_60 -o deviceQuery deviceQuery.o 
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
yongqiang@famu-sys:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery$ 
yongqiang@famu-sys:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery$ sudo ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11177 MBytes (11720130560 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1620 MHz (1.62 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11178 MBytes (11721506816 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1620 MHz (1.62 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 2: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11178 MBytes (11721506816 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1620 MHz (1.62 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 130 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 3: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11178 MBytes (11721506816 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1620 MHz (1.62 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 131 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 1080 Ti (GPU0) -> GeForce GTX 1080 Ti (GPU1) : Yes
> Peer access from GeForce GTX 1080 Ti (GPU0) -> GeForce GTX 1080 Ti (GPU2) : No
> Peer access from GeForce GTX 1080 Ti (GPU0) -> GeForce GTX 1080 Ti (GPU3) : No
> Peer access from GeForce GTX 1080 Ti (GPU1) -> GeForce GTX 1080 Ti (GPU0) : Yes
> Peer access from GeForce GTX 1080 Ti (GPU1) -> GeForce GTX 1080 Ti (GPU2) : No
> Peer access from GeForce GTX 1080 Ti (GPU1) -> GeForce GTX 1080 Ti (GPU3) : No
> Peer access from GeForce GTX 1080 Ti (GPU2) -> GeForce GTX 1080 Ti (GPU0) : No
> Peer access from GeForce GTX 1080 Ti (GPU2) -> GeForce GTX 1080 Ti (GPU1) : No
> Peer access from GeForce GTX 1080 Ti (GPU2) -> GeForce GTX 1080 Ti (GPU3) : Yes
> Peer access from GeForce GTX 1080 Ti (GPU3) -> GeForce GTX 1080 Ti (GPU0) : No
> Peer access from GeForce GTX 1080 Ti (GPU3) -> GeForce GTX 1080 Ti (GPU1) : No
> Peer access from GeForce GTX 1080 Ti (GPU3) -> GeForce GTX 1080 Ti (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 8.0, NumDevs = 4, Device0 = GeForce GTX 1080 Ti, Device1 = GeForce GTX 1080 Ti, Device2 = GeForce GTX 1080 Ti, Device3 = GeForce GTX 1080 Ti
Result = PASS
yongqiang@famu-sys:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery$

6. 信息查看

查看 GPU 驱动版本

yongqiang@famu-sys:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  390.48  Thu Mar 22 00:42:57 PDT 2018
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) 
yongqiang@famu-sys:~$

CUDA 默认安装路径是 /usr/local/cuda-8.0 或 /usr/local/cuda-9.0。/usr/local/cuda 是一个符号链接文件，指向 /usr/local/cuda-8.0 或 /usr/local/cuda-9.0 文件夹。使用 ls -l 指令可以查看文件的链接关系。

yongqiang@famu-sys:~$ ll /usr/local/
总用量 48
drwxr-xr-x 12 root root 4096 6月  25 11:30 ./
drwxr-xr-x 11 root root 4096 3月   1  2018 ../
drwxr-xr-x  2 root root 4096 6月  25 20:24 bin/
lrwxrwxrwx  1 root root   19 10月 17  2018 cuda -> /usr/local/cuda-9.0/
drwxr-xr-x 17 root root 4096 6月  25 11:30 cuda-8.0/
drwxr-xr-x 18 root root 4096 10月 17  2018 cuda-9.0/
drwxr-xr-x  2 root root 4096 3月   1  2018 etc/
drwxr-xr-x  2 root root 4096 3月   1  2018 games/
drwxr-xr-x  2 root root 4096 3月   1  2018 include/
drwxr-xr-x  4 root root 4096 3月   1  2018 lib/
lrwxrwxrwx  1 root root    9 10月 17  2018 man -> share/man/
drwxr-xr-x  2 root root 4096 3月   1  2018 sbin/
drwxr-xr-x  8 root root 4096 3月   1  2018 share/
drwxr-xr-x  2 root root 4096 3月   1  2018 src/
yongqiang@famu-sys:~$

7. Runfile Installer

Perform the following steps to install CUDA and verify the installation.
执行以下步骤以安装 CUDA 并验证安装。

Disable the Nouveau drivers:
Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:
在 /etc/modprobe.d/blacklist-nouveau.conf 中创建一个文件，其中包含以下内容：

blacklist nouveau
options nouveau modeset=0

Regenerate the kernel initramfs:

$ sudo update-initramfs -u

Reboot into runlevel 3 by temporarily adding the number “3” and the word “nomodeset” to the end of the system’s kernel boot parameters.
通过临时将数字 3 和单词 nomodeset 添加到系统内核引导参数的末尾，重新启动到运行级别 3。
Run the installer silently to install with the default selections (implies acceptance of the EULA):
以静默方式运行安装程序以使用默认选择进行安装（意味着接受EULA）：

sudo sh cuda_<version>_linux.run --silent

Create an xorg.conf file to use the NVIDIA GPU for display:
创建 xorg.conf 文件以使用 NVIDIA GPU 进行显示：

$ sudo nvidia-xconfig

If the GPU used for display is an NVIDIA GPU, the X server configuration file, /etc/X11/xorg.conf, may need to be modified. In some cases, nvidia-xconfig can be used to automatically generate a xorg.conf file that works for the system. For non-standard systems, such as those with more than one GPU, it is recommended to manually edit the xorg.conf file. Consult the xorg.conf documentation for more information.
如果用于显示的 GPU 是 NVIDIA GPU，则可能需要修改 X 服务器配置文件 /etc/X11/xorg.conf。在某些情况下，nvidia-xconfig 可用于自动生成适用于系统的 xorg.conf 文件。对于非标准系统，例如具有多个 GPU 的系统，建议手动编辑 xorg.conf 文件。有关更多信息，请参阅 xorg.conf 文档。

Reboot the system to load the graphical interface.
重新引导系统以加载图形界面。
Set up the development environment by modifying the PATH and LD_LIBRARY_PATH variables:
通过修改 PATH 和 LD_LIBRARY_PATH 变量来设置开发环境：

$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Install a writable copy of the samples then build and run the nbody sample:

$ cuda-install-samples-8.0.sh ~
$ cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
$ make
$ ./nbody