Ali’s first chip was born!

Just now, at the Yunqi Conference, Zhang Jianfeng, CTO of Alibaba Group and Dean of Dharma Institute, showed the light 800 – Ali’s first AI chip.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_0.gif

For this chip, Ali put down a word a year ago, but no one expected to appear one year later: not only finished the film, but also officially launched on Alibaba Cloud.

This is also the first hardware product since the establishment of Ali Pingtou, and it is the first self-developed and mass-produced chip in the history of Ali’s 20 years of development.

In the era of changes in the chip industry era, Ali Haofeng relies on strength and rapid breakthroughs to grasp the next stage of the initiative, meaning and value, perhaps far beyond the chip itself.

However, the release site, Zhang Jianfeng wants to emphasize the fear of awe.

He said: “In the global chip field, Alibaba is a newcomer. Xuan Tie and Guang Guang 800 are the first step of Ping Long’s long march. We still have a long way to go.”

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_1.jpg

What is the light 800?

The name of light, still follows the tradition of Pingtou, taken from the sword of the gods.

“Brightness” is one of the three great swords of the ancient times. It contains not revealing light, but not light, just like the way in which light chips work – invisible but powerful computing power.

Specifically, this is a cloud AI chip that focuses on reasoning and focuses on visual scenes.

In terms of performance, breaking the existing AI chip record, performance and energy efficiency ratio is the world’s first.

The chip process is 12nm and the number of transistors is as high as 17 billion.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_2.jpg

In the industry-standard ResNet-50 test, the optical 800 inference performance reached 78,563 IPS, which is four times better than the best AI chips in the industry.

The energy efficiency ratio is 500 IPS/W, which is 3.3 times that of the second place.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_3.jpg

With horizontal data contrast, Light 800 delivers performance that is five times that of NVIDIA’s latest T4 and 46 times the most widely used NVIDIA P4 – more than the “bull” that was blown at the beginning of last year’s design.

At present, Light 800 has taken the lead in opening large-scale applications in multiple business scenarios within Ali.

From video image recognition, classification, search, to the city brain, the future can also be applied to medical imaging, automatic driving and other fields.

At the conference, Zhang Jianfeng showed the powerful performance of this chip.

With a daily addition of 1 billion product images, the Polaroid merchandise library can increase the recognition efficiency by 12 times and the time from the traditional general-purpose GPU to 5 minutes.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_4.jpg

There is also a city brain. In the urban brain, real-time processing of the traffic video of the main urban area of ​​Hangzhou requires 40 traditional GPUs with a delay of 300ms and only 4 with light 800. The delay is reduced to 150ms.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_5.jpg

Ali said that the Light 800 will first serve in the Ali internal scene business, and the AI ​​cloud service containing the Light 800 will be officially launched. The ALi cloud will be provided externally through Alibaba Cloud, but will not be sold directly in the form of chips.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_6.jpg

Prior to this, Ali Pingtou’s first brother, Lien Zhan, has released the swordless SoC platform and the black iron processor IP in the past two months. “There is no hard-to-create chip in the world”, helping companies to lower the chip design threshold.

Now, as the first hardware and the hardest core product, Light 800 still carries the flat head, and hopes to enjoy high performance computing anytime, anywhere through Alibaba Cloud AI Cloud Service.

It also means that Ali Pingtou has been through the software process (processor IP, SoC platform) to the complete process of hardware streaming for a year.

This is Ali’s core, the iconic “hands-on” moment.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_7.jpg

Ali made this year

Ali laid out the AI ​​chip, which was first exposed in April 2018.

Later in September, the Yunqi Conference, Ali announced the establishment of Pingtou, and first put down the “spoken words”, is building an NPU, the architectural design, performance results will be 40 times better than the industry’s best AI processor.

As soon as this statement came out, the language was shocked.

However, the present one year later – the transcript moment – Ali Pingtou brother did not say anything. With 800 light, the performance is 46 times that of P4, and even 5 times ahead of the latest generation of NVIDIA T4.

Although the process is not completely smooth, in the past year, the “light-containing” team not only completed the leap from scratch, but also exceeded everyone’s expectations.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_8.jpg

Throughout the process, the Light 800 team has done a lot of work on hardware and software design, such as architecture innovation, software compiler, framework, tool chain, etc., and a lot of optimization for INT8 data types later.

The person in charge of the light-containing 800 chip revealed that the chip uses a self-developed architecture to capture a large number of weight parameters and tensor data used in deep learning. Based on the support of sparse compression and quantization, the data is fetched through unique design. And pipeline processing technology, greatly reducing I / O demand and data movement.

The chip simultaneously optimizes convolution, matrix multiplication, vector calculation and various activation functions, and pushes the performance and energy efficiency of AI operations to the extreme through highly efficient hardware resource scheduling and full parallel data stream processing.

In addition, the Dharma Institute algorithm is integrated. For the CNN and visual algorithms to deeply optimize the calculation and storage density, the large network model can be calculated on one NPU.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_9.jpg

What is even more commendable is that the Pingtou brother team has overcome a series of challenges.

For example, how to balance performance, yield, power consumption, and so on. Pingtou brother fully considered these problems in software and hardware, completed the chip design and the whole process of the chip in the shortest time, completed the front-end design in 7 months, and succeeded in filming in only 3 months.

Pride and enthusiasm, from the perspective of traditional core making, is almost an impossible task. But in the end, the Ali AI chip team set a record, completed the challenge, and made the impossible possible.

Among them, naturally there is the hard work of “007” day and night, but it is not allowed to ignore the general trend of taking the wind.

It is the so-called time and place to cooperate with each other, the special needs of the core in the AI ​​era, and the advantage of the Ali business scenario, are the core reasons that have to be said.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_10.jpg

NPU: Chip iPhone moment

First of all, the special needs of the core of the AI ​​era.

As the name suggests, NPU – neural network processor is a chip that specializes in deep neural network algorithms, while deep neural network algorithm, the core is to imitate the structural characteristics of biological neural networks, and the most basic feature is to imitate the transmission mode between brain neurons, and The information entered is processed quickly.

However, the traditional general-purpose processor is based on the von Neumann structure, and its storage and operation processing are separated from each other. If a deep neural network is processed, a large number of read and write operations are required, which is limited by bandwidth and low in efficiency.

Therefore, the neural network chip represented by light 800, according to the neural network inference operation characteristics, will design specific hardware neurons, high-speed connection storage structure and dedicated instruction set, realize efficient organization and management of memory and computing unit, realize a single instruction. Complete multiple operations to improve computational efficiency and memory access efficiency.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_11.jpg

In short, the core is dedicated, the efficiency is higher, the cost is lower, and the corresponding benefits will be better.

On the other hand, because the requirements are clear, the application scenario is targeted, and the threshold for building AI chips is much lower than creating CPUs and GPUs.

So the whole industry is coming to an “iPhone moment” – the software redefines the hardware, and the scene needs to redefine the chip.

Then, in the process, it is better to combine with the application business scenario, continuously verify, feedback, iterate, and finally, after reaching the target effect, the film is completed and the physical implementation is completed.

So today’s trend, not only AI core startups formed a small climax, but also the Internet giants have become harder across the border, starting independent AI chips.

But I don’t want to make it, I have the ability to make it, and the result of the final creation… It is completely several levels, and it is a comprehensive strength competition.

This is why Ali’s business scenario advantage will bring acceleration to Pingtou’s first AI chip, and it is also the reason why Ali is holding the new era chip competition “Destiny”.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_12.jpg

Why create an AI core in one year?

In fact, although the speed is fast and the results are amazing, the Ali AI chip is not completely “starting from 0”.

Pingtou brother founded a year, but the Dharma Institute and Ali’s major businesses have accumulated a long time in AI algorithms and software.

Light 800 The team revealed that the algorithm based on Alibaba and the years of technical precipitation of Alibaba Group’s hardware infrastructure are the secrets of the software and hardware stack for the 800-reconstructed chip.

Since the differentiated design of the AI ​​chip is mainly embodied in the hardware architecture and software algorithms, the two need to be highly adapted to maximize the value of the chip.

In terms of algorithms, Alibaba Dharma’s Machine Intelligence Lab has built a complete algorithm system in the past two years, covering voice intelligence, language technology, machine vision, decision intelligence, etc., and has achieved many world-leading results.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_13.jpg

In terms of hardware, Alibaba has many years of experience in the fields of server, FPGA and storage. The team of Pingtou has a deep technical reserve in the fields of architecture and compilation technology.

Based on these capabilities, Pingtou completed a rapid breakthrough in the gap between algorithms and hardware. On top of the algorithmic capabilities, the self-developed chip architecture and the design of the complete software stack.

And with this design concept, the effect is immediate. For example, power consumption is a common problem in the AI ​​chip industry, but the flat-headed self-developed architecture can greatly reduce access to memory, and can reduce the power consumption of the chip to the lowest level while ensuring extreme performance.

In addition, the new entry also has the advantage of new entry.

Leading the team to create the arrogance of light 800, semiconductor giants engage in AI chips, there will be a burden of the developer’s ecology, but the goal of the Ali Pingtou team is to achieve the strongest computing power, the hardware capabilities are completely released, To build a bigger ecology.

In the process, the advantages of Ali business scene and organizational cohesion are also displayed all the time.

At the beginning of the architecture design, Pingtou Ge gave detailed feedback from the requirements and experience level, which directly helped to clarify the requirements.

Then the algorithm is added and verified in the iterative phase. The Dharma Institute and the business scenario are not bothered to help test, feedback, submit iterations, and complete the final guarantee before the filming.

Therefore, although Ali made the core, the company set up Pingtou Ge Semiconductor, but the “light 800” from scratch, not only a person, a team is fighting.

If you ask the advantages of Ali AI core making?

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_14.jpg

In addition to the determination that “life and death are not convinced,” the investment of real money and silver, the recruitment of senior talents in the industry…

Pingshou’s chief scientist, Alibaba’s senior researcher Yuan Zun , will attribute the specific advantages to “ABCDE”.

  • A : Algorithm, algorithm, Ali’s own technical reserves and AI strength, has a world-leading accumulation in algorithms.
  • B : Big Data, big data, huge ecological scenarios and coverage of all aspects of the business, both in terms of data quality and quantity have advantages.
  • C :Computing, the safe and stable computing power, Alibaba Cloud’s market leader and leading position, can explain everything.
  • D :Domain knowledge, professional domain knowledge, Ali itself is not just a company, but a collection of dozens of companies, economies, a variety of comprehensive application scenarios, for the latest technology and products, Have the most natural application base.
  • E : Ecosystem, ecology. Compared with traditional chip semiconductor companies, Ali’s extensive coverage, versatility and application prospects are the embodiment of comprehensive strength.

Yuan Zun believes that Ali, who owns “ABCDE”, is naturally more likely to stand on the AI ​​chip C position faster than other chip companies with C.

Moreover, this “ABCDE” helps the core on the one hand and the core on the other hand.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_15.jpg

Ali AI chip business model

Inside Ali, the demand for high-performance AI computing power is not one day because of the rich scene requirements and business volume.

In the e-commerce scene, the AI ​​chip is required for the image search involved in the emerging shopping methods such as Li Tao.

There are also entertainment scenes, Youku video repair, analysis, also need AI implementation.

There are also urban brains that Ali is advancing on a large scale. For example, various types of vehicles for detection, tracking, feature extraction, and attribute detection are also inseparable from stronger computing power.

In the future, in the important vertical areas such as medical care and autonomous driving, it is a vast space and has great potential for business prospects.

Single-autonomous chips have been used for their own benefit.

What’s more, the status and strength of Alibaba Cloud can also enable AI to calculate more fields and more enterprises through Alibaba Cloud.

Therefore, the business model of selecting 800 cloud chips for cloud services is also reasonable.

Ali, it is more willing to emphasize the core concept behind the business model of the first AI chip, the consistent inclusiveness – Pingtou brother founded.

The Swordless SoC platform and the Black Iron Processor IP, which were launched earlier this year, have chosen to open the license to help companies lower the chip design threshold.

The inclusiveness of Light 800 is reflected in the form of Alibaba Cloud AI Cloud Service, allowing enterprises to enjoy high-performance computing services anytime, anywhere.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_16.jpg

Flat head next step

During the Hangzhou Yunqi Conference, the next step of the Ali chip was also discussed.

With the release of Light 800, Pingtou Ge has gathered the full stack chip family:

  • The base unit processor IP, C-Sky series, and the mysterious iron series provide cost-effective IP for the AIOT terminal chip;
  • One-stop chip design platform, the swordless SoC platform integrates CPU, GPU, NPU, etc., reducing the chip design threshold;
  • The AI ​​chip, Light 800, provides high-performance computing power for AI scenes through AI cloud services.

These three major product series have initially completed the chip ecology of Pingtou Ge Duanyun.

Next, the Pingtou brother product form, the focus will be cloud AI training chip, on-line reasoning chip, and SoC chip for Alibaba Shenlong server to meet the computing needs of more scenarios.

In addition, the initial software and hardware closed-loop implementation of the flat-headed chip, Alibaba’s synergy between the three major businesses of chip, cloud and AI, is also beginning to take shape.

From the perspective of the trend of the times, the three were originally the Trinity.

The AI ​​algorithm is gradually integrated into the chip, and the dedicated chip of the integrated algorithm can provide more performance for the cloud service, while the cloud computing itself accelerates the large-scale landing of the AI ​​application.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_17.gif

In the past decade, Ali has focused on promoting and demonstrating results in AI and cloud computing.

Now, Pingtou brother fills the chip, and Ali forms the iron triangle in the layout.

However, China’s technology Internet giants naturally have a grand blueprint for greater ambition – industry ecology and developer ecology.

Previously, the Xuan Tie 910 was released, and Ali also clearly explained that the goal of Pingtou is to build the infrastructure platform of the AIOT era, continuing the Ali way from Alibaba B2B Taobao Alipay to Alibaba Cloud and rookie.

At the Yunqi Conference, the RISC-V architecture-based black iron processor and the swordless SoC platform also have developer cases – such as artificial intelligence unicorn Yuntian Lifei, veteran chip maker torch core technology, and reconfigurable Computational chip leader companies will be clear on the spot.

Of course, since the AI ​​chip is a complete refactoring of hardware and software integration, Ali will not stand by in the software stack and model framework.

The most obvious signal is the heavy introduction of Caffe’s father, Jia Yangqing.

Ali_first_AI_chip_released!_46_times_the_NVIDIA_P4,_refreshing_the_reasoning_performance_record_18.jpg

One more thing: light

Finally, Ali’s first AI chip, the name is still far-reaching.

The name “including light” comes from “Lie Zi Tang Wen”, which is the ancient sword and the first of the three swords of Shang Tianzi.

“It is invisible, and it doesn’t know what it touches, it’s nothing, it doesn’t matter.”

This sword faintly shines, like the light is like the wind, it is almost invisible, invisible, ubiquitous, and unfavorable.

The Pingtou team explained that this was the result of an internal vote within the 800 team.

Both want to convey the ability of the NPU, but also show the mentality of Pingtou’s first hardware into the chip field – in front of giants such as Intel and Nvidia, it is still a “young man”, making a core, awe.

However, the inclusion of Light 800 is still an important step in the history of Ali Peace.

Ali has just passed his 20th birthday, and this 20 years has completed “a business that is not difficult to do in the world.”

In the next 20 years, can you realize the bold dream of “Let the world have no hard-to-create chips”?