Huawei Reveals In-House HBM and Ascend 950, Places Big Bet on Large-Scale SuperClusters

Huawei Unveils Ambitious AI Hardware Roadmap at Connect 2025

At its Connect 2025 event, Huawei introduced a bold new hardware strategy, signaling its intent to compete at the system level in the rapidly evolving AI accelerator market. The company showcased its first self-developed High Bandwidth Memory (HBM) alongside a new generation of Ascend AI accelerators, emphasizing large-scale solutions over single-chip performance.

Ascend 950: Next-Generation AI Accelerators with In-House HBM

Huawei confirmed the upcoming Ascend 950 family, scheduled for early 2026, which will be available in two variants. The Ascend 950PR model features approximately 128 GB of proprietary HBM and delivers around 1.6 TB/s of memory bandwidth. The 950DT variant increases memory capacity to 144 GB and boosts bandwidth to nearly 4 TB/s. These advancements mark a significant step forward in Huawei’s ability to deliver high-performance AI hardware tailored for large-scale deployments.

Looking further ahead, Huawei outlined plans for the Ascend 960 and Ascend 970 accelerators, expected in 2027 and 2028. These future devices promise even greater aggregate memory capacities, higher interconnect bandwidth, and expanded support for FP8 precision formats, positioning Huawei as a serious contender in the global AI hardware landscape.

System-Scale AI: SuperPoDs, SuperClusters, and Advanced Networking

Rather than focusing solely on per-chip performance, Huawei’s competitive strategy centers on dense packaging and advanced networking. The company introduced its SuperPoD architecture and described the construction of massive SuperClusters, interconnected by the new Lingqu protocol and high-speed optical links. This approach enables the integration of hundreds of thousands of AI accelerators into a single, cohesive system.

Huawei announced that the Atlas 950 supernode, based on Ascend 950 units, will launch in Q4 2025 and is designed to handle exascale FP8 workloads. Future Atlas 960 clusters will further increase the number of accelerators and overall throughput. The largest planned SuperCluster will deliver an impressive 524 ExaFLOPS of FP8 compute and up to one ZettaFLOP of FP4 compute, utilizing 64 Atlas SuperPoDs and housing as many as 524,288 AI accelerators. Such a system is capable of supporting multiple AI research labs and their training and inference needs simultaneously.

Competing with Global Leaders in AI Compute

Huawei’s large-scale designs now rival the most ambitious Western projects, such as xAI’s Colossus 2, which is set to deploy over 550,000 NVIDIA GB200 and GB300 GPUs. By focusing on system-level performance, Huawei aims to close the gap with leading chipmakers like NVIDIA and AMD, leveraging its control over memory and networking to deliver aggregate throughput that can offset per-chip performance differences.

Power Infrastructure and the Future of AI SuperClusters

One of the key factors enabling Huawei’s approach is China’s robust power grid, which can support the energy demands of massive AI clusters. Unlike many U.S.-based data centers, which are often constrained by power limitations, Chinese infrastructure allows for the deployment of large-scale systems that prioritize total compute performance, even at the cost of higher energy consumption. This fundamental difference in power availability could reshape how data center operators worldwide evaluate the trade-offs between single-chip efficiency and cluster-level performance.

If Huawei’s SuperClusters are realized as planned, they have the potential to redefine the landscape of AI infrastructure, offering unprecedented scale and performance for both domestic and global AI workloads.

Maya Stein Maya is a tech journalist who specializes in PC hardware and semiconductor industry trends. When she’s not writing, she’s building custom PCs and testing the latest peripherals.