Battle Royale over AI chips

NVIDIA and the battle for the future of AI chips

Wired.UK has a compelling write up:

 While NVIDIA’s early work has given the GPU maker a head start, challengers are racing to catch up. Google started making its own chips in 2015; Amazon last year began shifting Alexa’s brains to its own Inferentia chips, after buying Annapurna Labs in 2016; Baidu has Kunlun, recently valued at $2 billion; Qualcomm has its Cloud AI 100; and IBM is working on an energy-efficient design. AMD acquired Xilinx for AI data centre work, and Intel added AI acceleration to its Xeon data centre CPUs in 2019; it has also bought two startups, Nervana in 2016 for $408 million and Habana Labs in 2019 for $2 billion. 

An AI chip is any processor that has been optimised to run machine learning workloads, via programming frameworks such as Google’s TensorFlow and Facebook’s PyTorch. AI chips don’t necessarily do all the work when training or running a deep-learning model, but operate as accelerators by quickly churning through the most intense workloads. 

(emphasis mine)

I have been tracking the NVIDIA’s meteoric rise, especially as I nursed my gaming ambitions earlier in the day, but had to scale back due to the costs involved. Consoles had not made their way, and I remember PlayStations being rigged together to create “super-computers”. Same story, different names now. Wired does an excellent task of dumbing down the narrative to ensure that investors understand about the “hype” around “number crunching”. I usually don’t link to mainstream technology articles for the specific reason is because they offer more fluff than substance. Yet, this write up had something substantial.

Google’s TPUs are application-specific integrated circuits (ASICs), designed for specific workloads; Cerebras makes a Wafer-Scale Engine, a behemoth chip 56 times larger than any other; IBM and BrainChip make neuromorphic chips, modelled on the human brain; and Mythic and Graphcore both make Intelligence Processing Units (IPU), though their designs differ….

ARM is a major designer of the chips that will apply deep learning in the real world – so-called inference at the edge. 

(emphasis mine)

Edge computing is making its inroads- where “computing” happens locally. For example, Siri processes queries locally to give you intelligible answers (possibly) or real-time translation or transcription on device. As the designs get more efficient, it will have a spin off benefit in several ways.

For example, EMR’s will have digital assistants to transcribe everything while image analysis happens in the cloud because it is more intensive to run. Likewise, algorithms requiring “split-second decisions” will happen at the edges, while bulk computing happens in the cloud. That’s the reason the 5G hype has overtaken us – to reduce the latency. If you zoom out, it is about creating platforms (and ecosystems) and winning this narrative around creating it. Any company which dominates the platform creation will have extortionate power for rent-seeking, and that explains the mad scramble.

Hence, it requires a new perspective on how we view these hardware improvements, overcoming the traditional Moore’s law and creating rifts, because computational expenses to “train algorithms” are exorbitant. I can foresee consortia of hospitals owning common “rails” to facilitate the processing at scale, unless you have commoditization of some key components of this extremely complex ecosystem and interdependencies.

The only thing I completely agreed with the Wired article is:

AI already has a bias problem, and that is exacerbated by unequal access to hardware. “It means we’ll only be looking at one side of the coin,” says Kate Kallot, head of emerging areas at NVIDIA. “If you leave out a large chunk of the population of the world… how are we going to be able to solve challenges everywhere in the world?”

The future is unevenly distributed.