Summaries/AI/nvidea/nvidea.md

1.5 KiB

title updated created
nvidea 2022-04-03 11:44:16Z 2021-05-04 14:58:11Z

NVIDIA

show installed video drivers

nvidia-smi

Latest drivers


list installed hw

lspci | grep -i nvidia sudo lshw -numeric -C display

find NVIDIA modules

find /usr/lib/modules -name nvidia.ko

Settings

nvidia-settings

run

nvidia-smi                               nvidia-smi -L
nvidia-smi -l n   # run every n seconds

monitoring nvidia

https://github.com/fbcotter/py3nvml


successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero => error; Modify in host and set the -1 to 0

/sys/bus/pci/devices/0000:2b:00.0/numa_node

for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done

https://stackoverflow.com/questions/44232898/memoryerror-in-tensorflow-and-successful-numa-node-read-from-sysfs-had-negativ


set numa value at start computer

sudo crontab -e
# Add the following line
@reboot (echo 0 | tee -a "/sys/bus/pci/devices/<PCI_ID>/numa_node")

Source


start docker with --gpus=all every time, otherwise error

failed call to cuInit: UNKNOWN ERROR (-1

no NVIDIA GPU device is present: /dev/nvidia0 does not exist

docker run -it -p 8888:8888 --gpus=all tensorflow/tensorflow:latest-gpu-jupyter


update nvidea drivers

ubuntu-drivers autoinstall