Tuesday, April 25, 2023

Just What is an AI Chip Anyway?

These days the discussion of advances in artificial intelligence seems to emphasize the neural networks that, through training on vast amounts of web data, learn to recognize patterns and act on them--as with the "word prediction"-based GPT-4 that has been making headlines everywhere this past month (such that news outlets which do not ordinarily give such matters much heed are writing about them profusely). By contrast we hear less of the hardware on which the neural networks are run--but all the same you have probably heard the term "AI chip" being bandied about. If you looked it up you probably also found it hard to get a straightforward explanation as to how AI chips are different from regular--"general-purpose"--computing chips in their functioning, and why this matters.

There is some reason for that. The material is undeniably technical, with many an important concept of little apparent meaning without reference to another concept. (It is a lot easier to appreciate "parallel processing" if one knows about "sequential processing," for example.) Still, it is not so hard to get some grasp of some of the basics as one may think for all that.

Basically general-purpose chips are intended to be usable for pretty much anything and everything computers do. However, AI chips are designed to perform as many of the specific calculations needed by AI systems--which is to say, the calculations used in the training of neural nets on data, and the application of that training (the term for which is "inference")--as possible, even at the expense of their ability to perform the wider variety of tasks to which general-purpose computers are put.

Putting it crudely this comes to sacrificing "quality" to "quantity" where calculations are concerned--the chip doing many, many more "imprecise" calculations in a given amount of time, because qualitatively those less precise calculations are "good enough" for an object like pattern recognition, and the premium on getting as many calculations done as quickly as possible is high. (Pattern recognition is very calculation-intensive, so that it can be better to have more rough calculations than fewer precise ones.) Admittedly this still sounds a bit abstract, but it has a clear, concrete basis in aspects of AI chip design presented below, namely:

1. Optimization for Low Precision Calculations. (Think Lower-Bit Execution Units on a Logic Chip--But More of Them.)
It is fairly basic computer science knowledge that computers perform their calculations using strings of "bits"--the 0s and 1s of binary code--with increasingly advanced computers using longer and longer strings enabling more precise calculation. For instance, we may speak of 8-bit calculations involving strings of 8 1s and 0s (allowing for, at 2 to the power of 8, a mere 256 values) as against 16-bit calculations using strings of 16 such 1s and 0s (which means at 2 to the power of 16, 256 times 256, or 65,536, values).

However, it may be the case that even when we could have a 16-bit calculation, for particular purposes the 8-bit calculations are adequate, especially if we go about making those calculations the right way (e.g. do a good job of rounding the numbers). It just so happens that neural net training and inference is one area where this works, where the values may be known to fall in a limited range, the task coming back as it does to pattern recognition. After all, the pattern the algorithm is supposed to look for is either there or not--as with some image it is supposed to recognize.

Why does this matter? The answer is that you could, on a given "logic" chip (the kind we use for processing, not memory storage), get a lot more 8-bit calculations done than 16-bit calculations. An 8-bit execution unit, for example, uses just one-sixth the chip space--and energy--that a 16-bit execution unit does. The result is that opting for the 8-bit unit when given a choice between the two means many more execution units can be put on a given chip, and thus have that many more 8-bit calculations done at once (against one 16-unit doing 16-bit calculations). Given that pattern-recognition can be a very calculation-intensive task, the trade-off of precision of calculations against quantity of calculations can be well worth the while.

2. "Model-Level Parallelism." (Chop Up the Task So Those Lower-Bit But More Numerous Execution Units Can Work Simultaneously--in Parallel--to Get it Done Faster.)
In general-purpose computer logic chips are designed for sequential processing--the execution unit does one calculation by itself all the way through. However, computers can alternatively utilize parallel processing which splits a task into "batches" which can be performed all at once by different execution units on a chip, or different chips within a bigger system--the calculation split up among the units, which do their parts of the calculations, with the results being added up. This permits a given piece of processing to be done more quickly.

That being the case you might wonder why we do not use parallel processing for all computing tasks. The reason is that parallel processing means more complexity and higher costs all around--more processors, and more of everything required to keep them running properly (energy, etc.). Additionally, not every problem lends itself well to this kind of task division. Parallelism works best when you can chop up one big task into a lot of small, highly repetitive tasks performed over and over again--in computer jargon, when the task is "fine-grained" with numerous "iterations"--until some condition is met, like performing that task a pre-set number of times, or triggering some response. It works less well when the task is less divisible or repetitive. (Indeed, the time taken to split up and distribute the batches of the task among the various processors may end up making such a process slower than if it were done sequentially on one processor.)

As it happens, the kind of neural network operations with which AI research is concerned are exactly the kind of situation where parallel processing pays off because the operations they involve tend to be "identical and independent of the results of other computations." Consider, for example, how this can be done when a neural network is asked to recognize an image--different execution units responsible for examining different regions, or parts, of an image all at once--until the overall neural network, "adding up" the results of the calculations in those individual units, recognizes the image as what it is or is not supposed to look for.

3. Memory Optimization. (Given All the Space Savings, and the Predictability of the Task, You Might Even Be Able to Put the Memory on the Same Chip Doing the Processing, Speeding Up the Work Yet Again.)
As previously noted in general-purpose computing there is a separation between logic chips and memory chips, which has the logic chips having to access memory "off-chip" as they process data because, given the premium on the chip's flexibility, it is not clear in advance just what data the processor will have to access to perform its task.

As it happens the mechanics of accessing data off-chip constitute a significant drag on a processor's performance. It is the case that it takes more time, and energy, to access off-chip data like this than actually process that data, with all that means performance-wise, the more in as processing speed has improved more rapidly than the speed of that memory access.

However, if one knows in advance what data a particular process will need, the memory storage can be located closer to the processor, shortening the distance and saving energy. In fact, especially when there are contemporary processing and space savings such as those lower-bit execution units afford, the prospect exists of getting around the processing-memory "bottleneck" by having the processing and the memory it needs to use together on the very same chip. Moreover, while chips can be designed for particular operations from the outset (a type known as "Application-Specific Integrated Circuits, or ASICs), chips can be designed so that even after fabrication suitable programming can reconfigure their circuitry to arrange them in the way that would let them most efficiently run some operations developed afterward (called Field Programmable Gate Arrays, or FPGAs). The result is, again, an improvement in speed and efficiency generally that is heavily used in AI chips to help maximize that capacity for low-precision calculation at the heart of their usage.

To sum up: the value of AI chips lies in their use of more but lower-bit execution units organized for parallel processing on chips physically arranged to reduce or eliminate the time and energy costs of memory access so as to maximize their efficiency at low-precision calculations in a way that by no means works for everything, but works well for neural net training and use.

Of course, knowing all that may leave us wondering just how much difference it has all actually made in real-life computing. As it happens, for all the hype about how many hundreds and hundreds of billions of dollars the market for AI chips will expand to in 2030 or somesuch date, in the real-life year of 2021 they were an $11 billion market. That sounds like a lot--until one remembers that the overall chip market is over $550 billion, making the AI chip market just 2 percent of the total. Yes, just 2 percent--which is a reminder that, even if it can look from perusing the "tech" news as if these chips are everywhere, where everyday life is concerned we are still relying on fourth-generation computing--while, again, the AI chips we have, being inferior for general computing use, largely used for research and probably not about to displace the general-purpose kind in your general-purpose gadgets anytime soon.

Still, as one study of the subject of AI chips from Georgetown University's Center for Security and Emerging Technology reports, in the training and inference of neural networks such chips afford a gain of one to three orders of magnitude in the speed and efficiency as against general-purpose chips. Putting it another way, being able to use AI chips for this work, rather than just using the general-purpose kind, by letting computer scientists train neural nets tens, hundreds or even thousands of times faster than we otherwise would, may have advanced the state-of-the-art in this field by decades, bringing us to the present moment when even experts look at our creations and wonder if "artificial general intelligence" has not already arrived.

No comments:

Subscribe Now: Feed Icon