Individual accelerator chips can be ganged together in a single module to tackle both the small jobs and the big ones without sacrificing efficiency
There’s no doubt that GPU-powerhouse Nvidia would like to have a solution for all size scales of AI—from massive data center jobs down to the always-on, low-power neural networks that listen for wakeup words in voice assistants.
Right now, that would take several different technologies, because none of them scale up or down particularly well. It’s clearly preferable to be able to deploy one technology rather than several. So, according to Nvidia chief scientist Bill Dally, the company has been seeking to answer the question: “Can you build something scalable… while still maintaining competitive performance-per-watt across the entire spectrum?”
It looks like the answer is yes. Last month at the VLSI Symposia in Kyoto, Nvidia detailed a tiny test chip that can work on its own to do the low-end jobs or be linked tightly together with up to 36 of its kin in a single module to do deep learning’s heavy lifting. And it does it all while achieving roughly the same top-class performance.
The individual accelerator chip is designed to perform the execution side of deep learning rather than the training part. Engineers generally
Read the rest of this post here