简体中文 English User Ctrl
User Ctrl
简体中文
简体中文 English
News Center

Huawei Ascend 910 AI processor will be released today: Da Vinci architecture is the strongest core

Feb 02 84
According to Huawei's official website, Huawei will hold the Ascend 910 AI processor and MindSpore open source computing framework conference in Shenzhen today (August 23). Huawei said it will launch the industry's fastest performing AI processor and full-field AI computing framework at its upcoming conference tomorrow.

Huawei officially announced that the conference will be released by Huawei's rotating chairman Xu Zhijun to release the Ascend 910 AI processor and MindSpore computing framework. Huawei Chairman Xu Zhijun, Chief Strategy Architect Party Wenshuo, Chip and Hardware Strategy Fellow Ai Wei, and Cloud BU EI Product General Manager Jia Yongli participated in the Q&A session.

As early as October 2018, at Huawei's full-link conference, Xu Zhijun, the chairman of Huawei's rotating company, first elaborated on the AI ​​strategy and officially announced two AI chips, the Shengteng 910 and the Shengteng 310. Xu Zhijun said that the Shengteng 910 was the chip with the highest density calculation on a single chip.

According to fast technology reports, at the HotChips, the industry's top conference that opened on Monday, Huawei briefly introduced some details of the Asend 910.

At the conference, the PPT shows that the Asend 910 is based on the Da Vinci core architecture and is built with the 7nm enhanced EUV process. The single Die has 32 built-in DaVinci cores, half-precision up to 256TFOPs and 350W power consumption. Ascend 910's computing density surpassed competing NVIDIA Tesla V100 and Google TPU v3. Huawei also designed an AI computing server with 2048 nodes, with an overall performance of up to 512 Peta Flops (2048 x 256).

According to Huawei's official WeChat information, Da Vinci is mainly composed of the core 3D Cube, Vector vector computing unit, Scalar scalar computing unit, etc. The 3D Cube accelerates the matrix operation and greatly increases the AI ​​computing power per unit power consumption. The AI ​​Core can implement 4096 MAC operations in one clock cycle. At the same time, Buffer L0A, L0B, and L0C are used to store the input matrix and output matrix data, and are responsible for transferring data to the Cube computing unit and storing the calculation results.