Float16 Slower Than Float32 In Keras

July 13, 2023 Post a Comment

I'm testing out my new NVIDIA Titan V, which supports float16 operations. I noticed that during training, float16 is much slower (~800 ms/step) than float32 (~500 ms/step). To do

Solution 1:

From the documentation of cuDNN (section 2.7, subsection Type Conversion) you can see:

Note: Accumulators are 32-bit integers which wrap on overflow.

and that this holds for the standard INT8 data type of the following: the data input, the filter input and the output.

Under those assumptions, @jiandercy is right that there's a float16 to float32 conversion and then back-conversion before returning the result, and float16 would be slower.

Solution 2:

I updated to CUDA 10.0, cuDNN 7.4.1, tensorflow 1.13.1, keras 2.2.4, and python 3.7.3. Using the same code as in the OP, training time was marginally faster with float16 over float32.

Baca Juga

I fully expect that a more complex network architecture would show a bigger difference in performance, but I didn't test this.

Introduction to Python Course

Float16 Slower Than Float32 In Keras

Solution 1:

Solution 2:

Post a Comment for "Float16 Slower Than Float32 In Keras"