70 lines
1.5 KiB
Markdown
70 lines
1.5 KiB
Markdown
|
---
|
||
|
title: nvidea
|
||
|
updated: 2022-04-03 11:44:16Z
|
||
|
created: 2021-05-04 14:58:11Z
|
||
|
---
|
||
|
|
||
|
# NVIDIA
|
||
|
|
||
|
## show installed video drivers
|
||
|
|
||
|
nvidia-smi
|
||
|
|
||
|
[Latest drivers](https://www.nvidia.com/Download/index.aspx?lang=en-us)
|
||
|
|
||
|
---
|
||
|
|
||
|
## list installed hw
|
||
|
|
||
|
lspci | grep -i nvidia
|
||
|
sudo lshw -numeric -C display
|
||
|
|
||
|
## find NVIDIA modules
|
||
|
|
||
|
find /usr/lib/modules -name nvidia.ko
|
||
|
|
||
|
## Settings
|
||
|
|
||
|
nvidia-settings
|
||
|
|
||
|
## run
|
||
|
```bash
|
||
|
nvidia-smi nvidia-smi -L
|
||
|
nvidia-smi -l n # run every n seconds
|
||
|
```
|
||
|
|
||
|
|
||
|
## monitoring nvidia
|
||
|
https://github.com/fbcotter/py3nvml
|
||
|
|
||
|
---
|
||
|
|
||
|
## successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero => error; Modify in host and set the -1 to 0
|
||
|
/sys/bus/pci/devices/0000:2b:00.0/numa_node
|
||
|
|
||
|
for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done
|
||
|
|
||
|
https://stackoverflow.com/questions/44232898/memoryerror-in-tensorflow-and-successful-numa-node-read-from-sysfs-had-negativ
|
||
|
|
||
|
---
|
||
|
|
||
|
## set numa value at start computer
|
||
|
|
||
|
```bash
|
||
|
sudo crontab -e
|
||
|
# Add the following line
|
||
|
@reboot (echo 0 | tee -a "/sys/bus/pci/devices/<PCI_ID>/numa_node")
|
||
|
```
|
||
|
[Source](https://askubuntu.com/questions/1379119/how-to-set-the-numa-node-for-an-nvidia-gpu-persistently)
|
||
|
|
||
|
---
|
||
|
|
||
|
## start docker with --gpus=all every time, otherwise error
|
||
|
### failed call to cuInit: UNKNOWN ERROR (-1
|
||
|
### no NVIDIA GPU device is present: /dev/nvidia0 does not exist
|
||
|
docker run -it -p 8888:8888 --gpus=all tensorflow/tensorflow:latest-gpu-jupyter
|
||
|
|
||
|
---
|
||
|
|
||
|
## update nvidea drivers
|
||
|
ubuntu-drivers autoinstall
|