/ ai

Fixing CUDNN_STATUS_INTERNAL_ERROR while testing cuDNN

If you're trying to test your CUDA Toolkit 8.0 and cuDNN v6.0 installation (for TensorFlow), and mnistCUDNN gives you the following error:

mike@alien ~/c/mnistCUDNN> ./mnistCUDNN
cudnnGetVersion() : 7004 , CUDNN_VERSION from cudnn.h : 7004 (7.0.4)
Host compiler version : GCC 5.4.0
There are 2 CUDA capable devices on your machine :
device 0 : sms 15  Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 8114, MemClock 4004.0 Mhz, Ecc=0, boardGroupID=0
device 1 : sms 15  Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 8114, MemClock 4004.0 Mhz, Ecc=0, boardGroupID=1
Using device 0

Testing single precision
CUDNN failure
Error: CUDNN_STATUS_INTERNAL_ERROR
mnistCUDNN.cpp:394
Aborting...

... that's due to corrupted cache, which you can fix by deleting it:

mike@alien ~/c/mnistCUDNN> sudo rm -rf ~/.nv/
mike@alien ~/c/mnistCUDNN> ./mnistCUDNN
cudnnGetVersion() : 7004 , CUDNN_VERSION from cudnn.h : 7004 (7.0.4)
Host compiler version : GCC 5.4.0
There are 2 CUDA capable devices on your machine :
device 0 : sms 15  Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 8114, MemClock 4004.0 Mhz, Ecc=0, boardGroupID=0
device 1 : sms 15  Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 8114, MemClock 4004.0 Mhz, Ecc=0, boardGroupID=1
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.020480 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.026624 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.036864 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.091136 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.137216 time requiring 203008 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.024576 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.038912 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.049152 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.101376 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.149504 time requiring 203008 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!
mike@alien ~/c/mnistCUDNN>