The initial version of the Imaging Whiteboard had as it’s mission to see if it was possible to perform real-time image processing with a modern desktop PC. The result was a qualified yes.
Since then, its mission has expanded to providing a complete imaging solution; algorithmic processing, frequency domain processing, and neural networks. Version 3.6 including generative AI.
Generative AI has received a lot of attention lately as it has been successful in obvious ways. How much of this success is due to advances in the science, and how much is due to the availability of large datacenters filled with Nvidia chips? Attempting to demonstrate generative AI on a desktop PC might shed some light on this question.
Neural networks date back to 1957 when the perceptron was first invented by Frank Rosenblatt. The same math is still used today. In 1969 Minskey and Papert show how limited a single layer perceptron was. Funding and interest in neural networks declined. In 1974 backpropagation was first described, becoming popular by 1986; this training of multilayer neural networks. In the 1980s Convolutional Neural Networks (CNNs) became practical for handwriting recognition and later computer vision. In 2014 GANs were introduced. By 2022 generative AI was everywhere.
So, how much of this is it possible to reproduce on a PC? A classifier can be trained in a day or two and work fairly well. A generator can be trained in a few hours, but the results are not very good. An Autoencoder can be trained in a couple of days and work reasonably well. GANs cannot be trained in any reasonable amount of time.
It is clear that more performance is required. What do the big-name AI companies do? They use Nvidia chips, lots of them. There is an Nvidia GPU on my PC that does not seem to be doing anything; Managed Cuda is available to enable C++ code compiled to run on the GPU to be called from C#. So, I wrote a couple of small C++ routines to implement backpropagation for a convolutional layer and a fully connected layer, introduced a Use GPU checkbox in the UI, and it ran about 60x slower that the original CPU code!
Timming the various steps in the process, it turns out that the problem is the time taken for the GPU code to execute; not the initial suspects, moving the data into the GPU memory, and the results back out. Double precision floating point numbers do not do well on consumer GPUs. Converting to single precision floating point numbers gave some improvement but not significantly. Turns out that the problem is that I require too few threads to make the GPU efficient. Optimally the GPU should be running 43K threads minimum (at least for the GPU on my machine), my code required at most 18k threads, the CPU will likely always be faster!
So, for now I have to say that training a GAN on a PC is not really a practical proposition; at least not for one man, one PC, and zero budget!
I am looking at other approaches, so this could change.
Leave a Reply