[Linuxppc-users] nvidia-smi displaying V100-SXM2 GPU memory as 15360MiB as opposed to 16152MiB

Frank Novak fnovak at us.ibm.com
Fri May 11 07:13:03 AEST 2018


There's a bug open to address what I think this is about..
https://bugzilla.linux.ibm.com/show_bug.cgi?id=167260

Bug 167260 - RH1574524- Pegas1.1 - GPU total memory is reduced with recent 
Pegas1.1 and Nvidia device driver (CORAL)

basically has to do with mem page allocations..

Cheers,
Frank 

-----------------------------------------------------------------------------------------------------------------------
Frank Novak  ( 诺帆 nuò、fān )
STSM, SCEM Open Hypervisor
IBM Linux Technology Center
US:  fnovak at us.ibm.com  ;  Notes:   Frank Novak/Watson/IBM @IBMUS
cell : 919-671-7966
-------------------------------------------------------------------------------------------------------------------------



From:   "Franck Barillaud" <fbarilla at us.ibm.com>
To:     linuxppc-users at lists.ozlabs.org
Date:   05/10/2018 04:54 PM
Subject:        [Linuxppc-users] nvidia-smi displaying V100-SXM2 GPU 
memory as 15360MiB as opposed to 16152MiB
Sent by:        "Linuxppc-users" 
<linuxppc-users-bounces+fnovak=us.ibm.com at lists.ozlabs.org>



>From a customer:

nvidia-smi is showing the 4 V100-SXM2 cards, but the memory is displaying 
at 15360MiB as opposed to 16152MiB that I get from the same cards on 
Amazon. Is this a known thing, or maybe some oddity with newer drivers or 
something? 

On minsky system:  

[root at mnode3-6 PyTorch]# nvidia-smi
Thu May 10 14:35:25 2018 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. 
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute 
M. |
|===============================+======================+======================|
|   0  Tesla P100-SXM2...  Off  | 00000002:01:00.0 Off |  0 |
| N/A   30C    P0    35W / 300W |      0MiB / 16276MiB |      0% Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-SXM2...  Off  | 00000006:01:00.0 Off |  0 |
| N/A   30C    P0    32W / 300W |      0MiB / 16276MiB |      0% Default |
+-------------------------------+----------------------+----------------------+
  
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU 
Memory |
|  GPU       PID   Type   Process name                             Usage   
|
|=============================================================================|
|  No running processes found    |
+-----------------------------------------------------------------------------+


what could cause the difference ?

Regards,
Franck Barillaud
STSM, CTO ISV Power Cloud Infrastructure
Master Inventor
Ext Phone: (512) 286-5242    Tie Line: 363-5242
e-mail: fbarilla at us.ibm.com
_______________________________________________
Linuxppc-users mailing list
Linuxppc-users at lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-users





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-users/attachments/20180510/d2537c21/attachment.html>


More information about the Linuxppc-users mailing list