Summaries/AI/nvidea/nvidea.md

70 lines
1.5 KiB
Markdown
Raw Normal View History

2022-08-09 21:04:44 +02:00
---
title: nvidea
updated: 2022-04-03 11:44:16Z
created: 2021-05-04 14:58:11Z
---
# NVIDIA
## show installed video drivers
nvidia-smi
[Latest drivers](https://www.nvidia.com/Download/index.aspx?lang=en-us)
---
## list installed hw
lspci | grep -i nvidia
sudo lshw -numeric -C display
## find NVIDIA modules
find /usr/lib/modules -name nvidia.ko
## Settings
nvidia-settings
## run
```bash
nvidia-smi nvidia-smi -L
nvidia-smi -l n # run every n seconds
```
## monitoring nvidia
https://github.com/fbcotter/py3nvml
---
## successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero => error; Modify in host and set the -1 to 0
/sys/bus/pci/devices/0000:2b:00.0/numa_node
for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done
https://stackoverflow.com/questions/44232898/memoryerror-in-tensorflow-and-successful-numa-node-read-from-sysfs-had-negativ
---
## set numa value at start computer
```bash
sudo crontab -e
# Add the following line
@reboot (echo 0 | tee -a "/sys/bus/pci/devices/<PCI_ID>/numa_node")
```
[Source](https://askubuntu.com/questions/1379119/how-to-set-the-numa-node-for-an-nvidia-gpu-persistently)
---
## start docker with --gpus=all every time, otherwise error
### failed call to cuInit: UNKNOWN ERROR (-1
### no NVIDIA GPU device is present: /dev/nvidia0 does not exist
docker run -it -p 8888:8888 --gpus=all tensorflow/tensorflow:latest-gpu-jupyter
---
## update nvidea drivers
ubuntu-drivers autoinstall