Gpu Inference Dragon Board

The landscape of silicon for AI. There are many types of hardware accelerators that are used in Machine Learning today, in training and inference, and in the cloud and at the edge. Source: Moor Insights & Strategy Moor Insights & Strategy Technically, a GPU is an ASIC used for processing graphics algorithms.

NVIDIA’s meteoric growth in the datacenter, where its business is now generating some $1.6B annually, has been largely driven by the demand to train deep neural networks for Machine Learning (ML) and Artificial Intelligence (AI)—an area where the computational requirements are simply mindboggling. Much of this business is coming from the largest datacenters in the US, including Amazon, Google, Facebook, IBM, and Microsoft. Recently, NVIDIA announced new technology and customer initiatives at its annual Beijing GTC event to help drive revenue in the inference market for Machine Learning, as well as solidify the company’s position in the huge Chinese AI market. For those unfamiliar, inference is where the trained neural network is used to predict and classify sample data. It is likely that the inference market will eventually be larger, in terms of chip unit volumes, than the training market; after all, once you train a neural network, you probably intend to use it and use it a lot. Therefore it is critical that NVIDIA capture its share of this market as AI moves from early R&D to commercial deployment, both in the cloud and at the edge.

What did NVIDIA announce?

As is typically the case, NVIDIA’s CEO, Jensen Huang, made these announcements during a keynote address at Graphics Technology Conference (GTC) in Beijing—the first stop on a worldwide tour of GTC events. First, and perhaps most importantly, Huang announced new TensorRT3 software that optimizes trained neural networks for inference processing on NVIDIA GPUs. TensorRT3 can be used to package, or compile, neural networks built with any ML framework, for deployment across the NVIDIA portfolio of datacenter and edge devices. TensorRT is essentially the CUDA of inferencing. As a result, Huang announced that TensorRT3 is now being deployed by all of China’s largest Internet datacenters, namely Alibaba, Baidu, Tencent, and JD.com for ML workloads.

NVIDIA  1: TensorRT software is the cornerstone that should enable NVIDIA to deliver optimized inference performance in the cloud and at the edge.

NVIDIA

In addition to announcing the Chinese deployment wins, Huang provided some pretty compelling benchmarks to demonstrate the company’s prowess in accelerating Machine Learning inference operations, in the datacenter and at the edge. Note the ~20X increase in performance directly attributable to the new NVIDIA software (comparing the two V100 (Volta) results, in Figure 2).

Figure 2: Tensor RT3 performance for inference processing of images (ResNet-50).

NVIDIA

In addition to the TensorRT3 deployments, Huang announced that the largest Chinese Cloud Service Providers, Alibaba, Baidu, and Tencent, are all offering the company’s newest Tesla V100 GPUs to their customers for scientific and deep learning applications. For customers wanting to deploy deep learning in their own datacenters, he announced that Huawei, Inspur, and Lenovo would be selling HGX-based servers with Volta to their global customer base. HGX is an 8-GPU chassis with the NVLink interconnect, used to provide high levels of GPU scaling in a dense package. HGX, announced earlier this year, was designed with Microsoft and is available as an open source hardware platform through the Open Compute program. The Lenovo win is significant, seeing as the company seeks a high-density GPU server for large-scale training workloads, and is, at least for now, the only global OEM to offer HGX.

Figure 3: the Open Compute HGX platform allows 8 P100 or V100 GPUs to connect to any server for Machine Learning acceleration.

NVIDIA

Continuing with the theme of inference processing in China, NVIDIA also announced that the JD.com delivery subsidiary would be using the NVIDIA Jetson platform to guide and control its land and air drone delivery services. Delivering products through China’s crowded highway infrastructure is unreliable and time-consuming. To address this growing challenge, JD.com plans to have a million drones, with NVIDIA Jetson on board, in service by 2020.

Qualcomm neural processing unit programming interface

Figure 4: These self-piloting drones will help JD.Com quickly deliver goods through or above the congested Chinese urban transportation system.

NVIDIA

Conclusions

As Machine Learning matures beyond the research and development stage, attention is turning to the processing needs for inference. This data can be quite simple, such as text or images, or incredibly demanding, such as real-time spoken translation and high-definition video/Lidar. Therefore the corresponding processing requirements will vary from simple mobile processors in our phones to miniature supercomputers in our autonomous vehicles. NVIDIA is not content with just being the brains behind the creation of these AIs, and is positioning itself to compete with CPUs, FPGAs, and ASICs for the coming explosion in datacenter and edge ML processing. The customer wins announced by Mr. Huang demonstrate that they have what it takes to be a player in the next phase of Machine Learning and AI. However, unlike with training, which has been an all-NVIDIA show, the diversity of inference data, latency, and power requirements will create a wide range of solutions and an interesting competitive landscape.

Disclosure: Moor Insights & Strategy, like all research and analyst firms, provides or has provided research, analysis, advising and/or consulting to many high-tech companies in the industry mentioned in this article, including AMD, Intel, Microsoft, NVIDIA, Xilinx, and others. The author does not have any investment positions in any of the companies named in this article, except Google.

What did NVIDIA announce?

NVIDIA 1: TensorRT software is the cornerstone that should enable NVIDIA to deliver optimized inference performance in the cloud and at the edge.

NVIDIA

Figure 2: Tensor RT3 performance for inference processing of images (ResNet-50).

NVIDIA

Figure 3: the Open Compute HGX platform allows 8 P100 or V100 GPUs to connect to any server for Machine Learning acceleration.

NVIDIA

Figure 4: These self-piloting drones will help JD.Com quickly deliver goods through or above the congested Chinese urban transportation system.

NVIDIA

Conclusions

Disclosure: Moor Insights & Strategy, like all research and analyst firms, provides or has provided research, analysis, advising and/or consulting to many high-tech companies in the industry mentioned in this article, including AMD, Intel, Microsoft, NVIDIA, Xilinx, and others. The author does not have any investment positions in any of the companies named in this article, except Google.

When Google announced its second generation of ASICs to accelerate the company’s machine learning processing, my phone started ringing off the hook with questions about the potential impact on the semiconductor industry. Would the other members of the Super 7, the world’s largest datacenters, all rush to build their own chips for AI? How might this affect NVIDIA, a leading supplier of AI silicon and platforms, and potentially other companies such as AMD, Intel, and the many startups that hope to enter this lucrative market? Is it game over for GPUs and FPGAs just when they were beginning to seem so promising? To answer these and other questions, let us get inside the heads of these Goliaths of the Internet and see what they may be planning.

The Google Cloud TPU is a four ASIC board that delivers 180 Teraflops of performance Source: Google.

Google

As I explored in an article earlier this year, there are four major types of technology that can be used to accelerate the training and use of deep neural networks: CPUs, GPUs, FPGAs, and ASICs. The good old standby CPU has the advantage of being infinitely programmable, with decent but not stellar performance. It is used primarily in inference workloads where the trained Neural Network guides the computation to make accurate predictions about the input data item. FPGAs from Intel and Xilinx, on the other hand, offer excellent performance at very low power, but also offer more flexibility by allowing the designer to change the underlying hardware to best support changing software. FPGAs are used primarily in Machine Learning inference, video algorithms, and thousands of small-volume specialized applications. However, the skills needed to program the FPGA hardware are fairly hard to come by, and the performance of an FPGA will not approach that of a high-end GPU for certain workloads.

There are many types of hardware accelerators that are used in Machine Learning today, in training and inference, and in the cloud and at the edge. Source: Moor Insights & Strategy