Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvmlDeviceGetName throws UnicodeDecodeError invalid start byte #53

Open
jsoft88 opened this issue May 25, 2024 · 8 comments
Open

nvmlDeviceGetName throws UnicodeDecodeError invalid start byte #53

jsoft88 opened this issue May 25, 2024 · 8 comments

Comments

@jsoft88
Copy link

jsoft88 commented May 25, 2024

Running the following code on WSL2 throws the error mentioned in the title:

from pynvml import *

handle = nvmlDeviceGetHandleByIndex(0)
print(nvmlDeviceGetName(handle))

Stacktrace:

File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.9/site-packages/pynvml/nvml.py", line 1744, in wrapper
    return res.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

Whereas nvidia-smi command returns info without issues:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0  On |                  N/A |
|  0%   35C    P8             16W /  370W |     947MiB /  24576MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

If I try to decode the output of nvmlDeviceGetName using utf-16 codec, this is the string:
'闸膠\uf88e肑要郸膐\uf889낑ꂀ釸膠\uf8a5ꂜ꾁駸膐\uf8a3ꂔꂀ雸膀\uf894낌ꂀ軸肐グ'

pynvml version 11.5.0

@UlionTse
Copy link

Same error in WSL2. @rjzamora @XuehaiPan
pynvml_error

@wookayin
Copy link

This repository is a wrong place. It's not where NVIDIA's pynvml lives.

@mattip
Copy link

mattip commented May 29, 2024

This is weird. I have reproduced this with latest pynvml, latest NVidia drivers, wsl2. I get this for the c_name.value returned from the call

-> return c_name.value
(Pdb) p [x for x in c_name.value]
[248, 149, 160, 129, 142, 248, 145, 128, 129, 137, 248, 144, 144, 129, 137, 248, 145, 176, 128, 160, 248, 145, 160, 129, 165, 248, 156, 160, 129, 175, 248, 153, 144, 129, 163, 248, 145, 176, 128, 160, 248, 150, 128, 129, 148, 248, 140, 144, 128, 160, 248, 141, 160, 128, 182, 248, 136, 128, 128, 176]
(Pdb) p c_name.value
b'\xf8\x95\xa0\x81\x8e\xf8\x91\x80\x81\x89\xf8\x90\x90\x81\x89\xf8\x91\xb0\x80\xa0\xf8\x91\xa0\x81\xa5\xf8\x9c\xa0\x81\xaf\xf8\x99\x90\x81\xa3\xf8\x91\xb0\x80\xa0\xf8\x96\x80\x81\x94\xf8\x8c\x90\x80\xa0\xf8\x8d\xa0\x80\xb6\xf8\x88\x80\x80\xb0'
(Pdb) len(c_name.value)
60

Note the 5-byte pattern repeating itself, the length of the string is 60. On the host windows I get

(Pdb) [x for x in c_name.value]
[78, 86, 73, 68, 73, 65, 32, 71, 101, 70, 111, 114, 99, 101, 32, 71, 84, 88, 32, 49, 54, 54, 48, 32, 83, 85, 80, 69, 82]
(Pdb) c_name.value
b'NVIDIA GeForce GTX 1660 SUPER'
(Pdb) len(c_name.value)
29

I don't see the connection between the two results. Maybe a bug in the NVidia drivers v555.85 ?

@mattip
Copy link

mattip commented May 29, 2024

nvidia-smi on WSL somehow gets the name right:

$ nvidia-smi
Wed May 29 11:59:40 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1660 ...    On  |   00000000:08:00.0  On |                  N/A |
| 28%   39C    P8             16W /  125W |    1945MiB /   6144MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

@mattip
Copy link

mattip commented May 29, 2024

This repository is a wrong place. It's not where NVIDIA's pynvml lives

Right. I can confirm this also happens in gpustat with nvidia-ml-py-12.550.52. Is there a place to get NVidia's attention?

$ python -m gpustat --debug
Error on querying NVIDIA devices. Use --debug flag to see more details.
'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

Traceback (most recent call last):
  File "/tmp/venv310/lib/python3.10/site-packages/gpustat/cli.py", line 58, in print_gpustat
    gpu_stats = GPUStatCollection.new_query(debug=debug, id=id)
  File "/tmp/venv310/lib/python3.10/site-packages/gpustat/core.py", line 603, in new_query
    gpu_info = get_gpu_info(handle)
  File "/tmp/venv310/lib/python3.10/site-packages/gpustat/core.py", line 456, in get_gpu_info
    name = _decode(N.nvmlDeviceGetName(handle))
  File "/tmp/venv310/lib/python3.10/site-packages/pynvml.py", line 2094, in wrapper
    return res.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

@mattip
Copy link

mattip commented May 29, 2024

I posted to a NVidia forum https://forums.developer.nvidia.com/t/nvmldevicegetname-problem-in-wsl-on-windows/294491 but am not optimistic. The other postings there do not see much traffic.

@rjzamora
Copy link
Collaborator

Thanks all for engaging. I'll do my best to find someone who can help - Sorry for the delay.

@rjzamora
Copy link
Collaborator

rjzamora commented May 30, 2024

Small Update: This issue has been escalated to the NVML team and the fix has been merged into the upcoming r560 driver branch. I do not believe there are plans to re-release the short-lived r555 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants