Slow PyTorch CPU performance

I’m writing a paper about modern Deep Learning Architectures. In the process of experimentation, I wrote specifically about how much faster GPUs are than CPUs. My configuration was quite good (I know you all have TITAN V at your disposal, but still):

  • GeForce GTX 1070 Ti (8Gb),
  • Intel® Xeon® Processor E5-2630 v4

I was training my models on this nice GPU, but then performance was so slow on the CPU. I was aware of CPU parallelism limitations, but when I returned after more than three hours, it was still in the first epoch.

When I checked processor utilization, I couldn’t believe it. PyTorch was happily using 48 Gigs of RAM and 10% of CPU. As it turned out, my problem wasn’t an isolated case.

Here is step by step solution that allowed PyTorch to utilize CPU 100%:

$ conda install -c pytorch pytorch-nightly-cpu $ conda install -c fastai torchvision-nightly-cpu $ conda install -c intel openmp $ conda env update -f environment-cpu.yml $ conda activate fastai-cpu export NUM_CORES=40 $ export MKL_NUM_THREADS=$NUM_CORES OMP_NUM_THREADS=$NUM_CORES

Then, I can finally run my script on a processor:

$ CUDA_VISIBLE_DEVICES="" python med_network.py

Screenshot 2018-10-07 at 19.46.39

Now, the memory load has decreased to about 10Gb, training time has sped up more than tenfold, and I can use all computing power.

Remember, after you’ve done with CPU training, switch to an Anaconda GPU environment.

Note: I’m using the fast.ai library with PyTorch as a backend.