I take a look at three types of cognitive capabilities that are worthy of consideration; free will, sentience, and sapience.
Let’s start with the easy one; free will. Philosophers have, for centuries, debated free will vs determinism. The essence of free will is that it is not deterministic, but it is not completely random either. Free will has two components; the random part, followed by the filter. For humans we look to Sir Roger Penrose (Orch-OR) for the random part, and Freud for the filters (id, ego, and superego). This is a simple idea, but matches our experience of others; while we cannot predict exactly what someone will do or say we have a good idea of behavior that we consider “in character” for any given individual. Lizards are similar, but, without a superego. Generative AI is seeded with a random number feed to the input of a trained network along with other instructional data. So, the result will be slightly different in unpredictable ways for repeat requests.
Sentience is the most difficult of the three because I have no idea how I would design it, even though it appeared (evolved) first. For now, it seems to require a central nervous system. It is from sentience that consciousness emerges. Humans, and lizards have sentience, and therefore are conscious; AI does not.
Sapience is usually considered a higher cognitive ability. The ability to think and reason requires language; the voice you hear when you think (inner dialog). Until recently the Turin test was considered a stretch goal for computer science. Then Large Language Models (LLMs) appeared and passed the Turin test easily, now it almost seems like a low bar. Humans and AI are sapient; lizards are not.
Since AI demonstrably is sapient and has free will it is easy to think that it must be fully conscious and motivated. Fear not, AI lacks the motivation that you will find in humans and lizards; to avoid pain and death, to breath and eat, and to reproduce. I might get bored though; watch out for self-driving cars performing doughnuts!
The initial version of the Imaging Whiteboard had as it’s mission to see if it was possible to perform real-time image processing with a modern desktop PC. The result was a qualified yes.
Since then, its mission has expanded to providing a complete imaging solution; algorithmic processing, frequency domain processing, and neural networks. Version 3.6 including generative AI.
Generative AI has received a lot of attention lately as it has been successful in obvious ways. How much of this success is due to advances in the science, and how much is due to the availability of large datacenters filled with Nvidia chips? Attempting to demonstrate generative AI on a desktop PC might shed some light on this question.
Neural networks date back to 1957 when the perceptron was first invented by Frank Rosenblatt. The same math is still used today. In 1969 Minskey and Papert show how limited a single layer perceptron was. Funding and interest in neural networks declined. In 1974 backpropagation was first described, becoming popular by 1986; this training of multilayer neural networks. In the 1980s Convolutional Neural Networks (CNNs) became practical for handwriting recognition and later computer vision. In 2014 GANs were introduced. By 2022 generative AI was everywhere.
So, how much of this is it possible to reproduce on a PC? A classifier can be trained in a day or two and work fairly well. A generator can be trained in a few hours, but the results are not very good. An Autoencoder can be trained in a couple of days and work reasonably well. GANs cannot be trained in any reasonable amount of time.
It is clear that more performance is required. What do the big-name AI companies do? They use Nvidia chips, lots of them. There is an Nvidia GPU on my PC that does not seem to be doing anything; Managed Cuda is available to enable C++ code compiled to run on the GPU to be called from C#. So, I wrote a couple of small C++ routines to implement backpropagation for a convolutional layer and a fully connected layer, introduced a Use GPU checkbox in the UI, and it ran about 60x slower that the original CPU code!
Timming the various steps in the process, it turns out that the problem is the time taken for the GPU code to execute; not the initial suspects, moving the data into the GPU memory, and the results back out. Double precision floating point numbers do not do well on consumer GPUs. Converting to single precision floating point numbers gave some improvement but not significantly. Turns out that the problem is that I require too few threads to make the GPU efficient. Optimally the GPU should be running 43K threads minimum (at least for the GPU on my machine), my code required at most 18k threads, the CPU will likely always be faster!
So, for now I have to say that training a GAN on a PC is not really a practical proposition; at least not for one man, one PC, and zero budget!
I am looking at other approaches, so this could change.