I’m writing a paper about modern Deep Learning Architectures. In the process of experimentation, I wrote specifically about how much faster GPUs are than CPUs. My configuration was quite good (I know you all have TITAN V at your disposal, but still):
- GeForce GTX 1070 Ti (8Gb),
- Intel® Xeon® Processor E5-2630 v4
I was training my models on this nice GPU, but then performance was so slow on the CPU. I was aware of CPU parallelism limitations, but when I returned after more than three hours, it was still in the first epoch.
When I checked processor utilization, I couldn’t believe it. PyTorch was happily using 48 Gigs of RAM and 10% of CPU. As it turned out, my problem wasn’t an isolated case.
Here is step by step solution that allowed PyTorch to utilize CPU 100%:
$ conda install -c pytorch pytorch-nightly-cpu $ conda install -c fastai torchvision-nightly-cpu $ conda install -c intel openmp $ conda env update -f environment-cpu.yml $ conda activate fastai-cpu export NUM_CORES=40 $ export MKL_NUM_THREADS=$NUM_CORES OMP_NUM_THREADS=$NUM_CORES Then, I can finally run my script on a processor:
$ CUDA_VISIBLE_DEVICES="" python med_network.py

Now, the memory load has decreased to about 10Gb, training time has sped up more than tenfold, and I can use all computing power.
Remember, after you’ve done with CPU training, switch to an Anaconda GPU environment.
Note: I’m using the fast.ai library with PyTorch as a backend.