Breaking

Sunday, August 26, 2018

Why iPhones will have NVRAM-based AI in 2019

Storage is a major bottleneck for Artificial Intelligence, but Apple's control of all hardware and software will make their AI fast and affordable. Here's how.


Artificial Intelligence is remaking our world. But the biggest challenge is making AI work at the edge -- on a self-driving car, or your smartphone. Storage is a major AI bottleneck, but Apple's control of all hardware and software will make their AI fast and affordable.

Different types of workloads have different types of memory and storage access, and the subset of AI called machine learning (ML) is no different. ML has two main workloads: Training and inference.

ML training is data intensive. To learn to distinguish between a picture of a dog and a picture of a cat, the ML needs to see as many labeled pictures of each as possible to adjust its recognition parameters to accurately distinguish a dog from cat.

Once trained the resulting model needs much less storage capacity, since it is now just another program, and doesn't store thousands of pictures. However, the model does need to be close to the processor, whether that's a CPU, GPU, or dedicated AI hardware, so when a picture is sent to it for classification, the ML can respond quickly, and, on edge devices, energy efficiency.

GPUS

Responding quickly requires much parallelism, which is why graphics cards (GPUs) are popular in AI servers: their massive parallelism and high bandwidth makes them fast. But their high cost and energy use make GPUs impractical for mobile device usage.

GPUs weren't designed for ML usage - they just happened to be the best platform available for ML at the time - and now there are dozens of dedicated AI/ML processor and co-processor designs, including Apple's A11 Bionic Neural Engine. AI processors typically have a wide array of simple processors for parallelism, reduced floating point precision, and tuning for matrix operations.

Sending an image to the ML program for classification requires multiple off-chip memory and storage accesses, which slows performance and increases energy consumption. Placing AI processors in close proximity to the model and the data enables a dramatic reduction in I/O.

WHERE THE NVRAM COMES IN

A chunk -- say 32 to 64MB -- of on-chip non-volatile random access memory (NVRAM) is big enough to put most ML models to be placed close to the hardware is doing the work. Today, the model must be loaded from flash storage to DRAM, with relevant parts moved to on-chip registers and static RAM as needed.

HOW DOES THIS HELP APPLE?

Apple's sole source for its custom processors is TSMC, the world's largest semiconductor foundry. TSMC has put NVRAM on its roadmap for 2019. Put the advantages of on-chip AI using NVRAM together with TSMC's commitment to NVRAM, and you have an obvious and significant improvement to Apple's mobile devices.

THE STORAGE BITS TAKE

Much was made of Apple's slow start in AI a few years ago. But Apple has often been slow to markets they later came to dominate, such as music players and smartphones.

Apple's ability to control the entire hardware and software stack has been instrumental in their march to a $1T valuation. The key is that it allows them to produce higher performance devices since they don't have to make the compromises that integrators of commodity parts do.

Placing ML models in on-chip NVRAM next to the AI processor will improve performance, reduce energy consumption, and make it easy for Apple to update ML models during iOS updates. It will also disadvantage competitors who can't afford that level of on-chip integration.

AI is remaking our world. It will also remake the mobile device market, with Apple taking the lead.



3 comments: