This post originally appeared on Recode China AI.
For more than a decade, Nvidia’s chips have been the beating heart of China’s AI ecosystem. Its GPUs powered search engines, video apps, smartphones, electric vehicles, and the current wave of generative AI models. Even as Washington tightened export rules for advanced AI chips, Chinese companies kept settling for and buying “China-only” Nvidia chips stripped of their most advanced features—H800, A800, and H20.
But by 2025, patience in Beijing had seemingly snapped. State media began labeling Nvidia’s China-compliant H20 as unsafe and possibly compromised with hidden “backdoors.” Regulators summoned company executives for questioning, while reports from The Financial Times surfaced that tech companies like Alibaba and ByteDance were quietly told to cancel new Nvidia GPU orders. The Chinese AI startup DeepSeek also signaled in August that its next model will be designed to run on China’s “next-generation” domestic AI chips.
The message was clear: China could no longer bet its AI future on an U.S. supplier. If Nvidia wouldn’t—or couldn’t—sell its best hardware in China, domestic alternatives must fill the void by designing specialized chips for both AI training (building models) and AI inference (running them).
That’s difficult—in fact, some say it’s impossible. Nvidia’s chips set the global benchmark for AI computing power. Matching them requires not just raw silicon performance but memory, interconnection bandwidth, software ecosystems, and above all, production capacity at scale.
Still, a few contenders have emerged as China’s best hope: Huawei, Alibaba, Baidu, and Cambricon. Each tells a different story about China’s bid to reinvent its AI hardware stack.
Huawei’s AI Chips Are in the Lead
Huawei is betting on rack-scale supercomputing clusters that pool thousands of chips together for massive gains in computing power. VCG/Getty Images
If Nvidia is out, Huawei, one of China’s largest tech companies, looks like the natural replacement. Its Ascend line of AI chips has matured under the U.S. sanctions, and in September 2025 the company laid out a multi-year public roadmap:
- Ascend 950, expected in 2026 with a performance target of 1 petaflop in the low-precision FP8 format that’s commonly used in AI chips. It will have 128 to 144 gigabytes of on-chip memory, and interconnect bandwidths (a measure of how fast it moves data between components) of up to 2 terabytes per second.
- Ascend 960, expected in 2027, is projected to double the 950’s capabilities.
- Ascend 970 is further down the line, and promises significant leaps in both compute power and memory bandwidth.
The current offering is the Ascend 910B, introduced after U.S. sanctions cut Huawei off from global suppliers. Roughly comparable to the A100, Nvidia’s top chip in 2020, it became the de facto option for companies who couldn’t get Nvidia’s GPUs. One Huawei official even claimed the 910B outperformed the A100 by around 20 percent in some training tasks in 2024. But the chip still relies on an older type of high-speed memory (HBM2E), and can’t match Nvidia’s H20: It holds about a third less data in memory and transfers data between chips about 40 percent more slowly.
The company’s latest answer is the 910C, a dual-chiplet design that fuses two 910Bs. In theory, it can approach the performance of Nvidia’s H100 chip (Nvidia’s flagship chip until 2024); Huawei showcased a 384-chip Atlas 900 A3 SuperPoD cluster that reached roughly 300 Pflops of compute, implying that each 910C can deliver just under 800 teraflops when performing calculations in the FP16 format. That’s still shy of the H100’s roughly 2,000 Tflops, but it’s enough to train large-scale models if deployed at scale. In fact, Huawei has detailed how they used Ascend AI chips to train DeepSeek-like models.
To address the performance gap at the single-chip level, Huawei is betting on rack-scale supercomputing clusters that pool thousands of chips together for massive gains in computing power. Building on its Atlas 900 A3 SuperPoD, the company plans to launch the Atlas 950 SuperPoD in 2026, linking 8,192 Ascend chips to deliver 8 exaflops of FP8 performance, backed by 1,152 TB of memory and 16.3 petabytes per second of interconnect bandwidth. The cluster will span a footprint larger than two full basketball courts. Looking further ahead, Huawei’s Atlas 960 SuperPoD is set to scale up to 15,488 Ascend chips.
Hardware isn’t Huawei’s only play. Its MindSpore deep learning framework and lower-level CANN software are designed to lock customers into its ecosystem, offering a domestic alternative to PyTorch (a popular framework from Meta) and CUDA (Nvidia’s platform for programming GPUs) respectively.
State-backed firms and U.S.-sanctioned companies like iFlytek, 360, and SenseTime have already signed on as Huawei clients. The Chinese tech giants ByteDance and Baidu also ordered small batches of chips for trial.
Yet Huawei isn’t an automatic winner. Chinese telecom operators such as China Mobile and Unicom, which are also responsible for building China’s data centers, remain wary of Huawei’s influence. They often prefer to mix GPUs and AI chips from different suppliers rather than fully commit to Huawei. Big internet platforms, meanwhile, worry that partnering too closely could hand Huawei leverage over their own intellectual property.
Even so, Huawei is better positioned than ever to take on Nvidia.
Alibaba Pushes AI Chips to Protect Its Cloud Business
Alibaba Cloud’s business depends on reliable access to training-grade AI chips. So it’s making its own. Sun Pengxiong/VCG/Getty Images
Alibaba’s chip unit, T-Head, was founded in 2018 with modest ambitions around open-source RISC-V processors and data center servers. Today, it’s emerging as one of China’s most aggressive bids to compete with Nvidia.
T-Head’s first AI chip is the Hanguang 800 chip, an efficient chip designed for AI inference that was announced in 2019; it’s able to process 78,000 images per second and optimize recommendation algorithms and large language models (LLMs). Built on a 12-nanometer process with around 17 billion transistors, the chip can perform up to 820 trillion operations per second (TOPS) and access its memory at speeds of around 512 GB per second.
But its latest design—the PPU chip—is something else entirely. Built with 96 GB of high-bandwidth memory and support for high-speed PCIe 5.0 connections, the PPU is pitched as a direct rival to Nvidia’s H20.
During a state-backed television program featuring a China Unicom data center, the PPU was presented as capable of rivaling Nvidia’s H20. Reports suggest this data center runs over 16,000 PPUs out of 22,000 chips in total. The Information also reported that Alibaba has been using its AI chips to train LLMs.
Besides chips, Alibaba Cloud lately also upgraded its supernode server, named Panjiu, which now features 128 AI chips per rack, modular design for easy upgrades, and fully liquid cooling.
For Alibaba, the motivation is as much about cloud dominance as national policy. Its Alibaba Cloud business depends on reliable access to training-grade chips. By making its own silicon competitive with Nvidia’s, Alibaba keeps its infrastructure roadmap under its own control.
Baidu’s Big Chip Reveal in 2025
At a recent developer conference, Baidu unveiled a 30,000-chip cluster powered by its third-generation P800 processors.Qilai Shen/Bloomberg/Getty Images
Baidu’s chip story began long before today’s AI frenzy. As early as 2011, the search giant was experimenting with field-programmable gate arrays (FPGAs) to accelerate its deep learning workloads for search and advertising. That internal project later grew into Kunlun.
The first generation arrived in 2018. Kunlun 1 was built on Samsung’s 14-nm process, and delivered around 260 TOPS with a peak memory bandwidth of 512 GB per second. Three years later came Kunlun 2, a modest upgrade. Fabricated on a 7-nm node, it pushed performance to 256 TOPS for low-precision INT8 calculations and 128 Tflops for FP16, all while reducing power to about 120 watts. Baidu aimed this second generation less at training and more at inference-heavy tasks such as its Apollo autonomous cars and Baidu AI Cloud services. Also in 2021, Baidu spun off Kunlun into an independent company called Kunlunxin, which was then valued at US $2 billion.
For years, little surfaced about Kunlun’s progress. But that changed dramatically in 2025. At its developer conference, Baidu unveiled a 30,000-chip cluster powered by its third-generation P800 processors. Each P800 chip, according to research by Guosen Securities, reaches roughly 345 Tflops at FP16, putting it in the same level as Huawei’s 910B and Nvidia’s A100. Its interconnect bandwidth is reportedly close to Nvidia’s H20. Baidu pitched the system as capable of training “DeepSeek-like” models with hundreds of billions of parameters. Baidu’s latest multimodal models, the Qianfan-VL family of models with 3 billion, 8 billion, and 70 billion parameters, were all trained on its Kunlun P800 chips.
Kunlun’s ambitions extend beyond Baidu’s internal demands. This year, Kunlun chips secured orders worth over 1 billion yuan (about $139 million) for China Mobile’s AI projects. That news helped restore investor confidence: Baidu’s stock is up 64 percent this year, with the Kunlun reveal playing a central role in that rise.
Just today, Baidu announced its roadmap for its AI chips, promising to roll out a new product every year for the next five years. In 2026, the company will launch the M100, optimized for large-scale inference, and in 2027 the M300 will arrive, optimized for training and inference of massive multimodal models. Baidu hasn’t yet released details about the chips’ parameters.
Still, challenges loom. Samsung has been Baidu’s foundry partner from day one, producing Kunlun chips on advanced process nodes. Yet reports from Seoul suggest Samsung has paused production of Baidu’s 4-nm designs.
Cambricon’s Chip Moves Make Waves in the Stock Market
Cambricon struggled in the early 2020s, with chips like the MLU 290 that couldn’t compete with Nvidia chips. CFOTO/Future Publishing/Getty Images
The chip company Cambricon is probably the best performing publicly traded company on China’s domestic stock market. Over the past 12 months, Cambricon’s share price has jumped nearly 500 percent.
The company was officially spun out of the Chinese Academy of Sciences in 2016, but its roots stretch back to a 2008 research program focused on brain-inspired processors for deep learning. By the mid-2010s, the founders believed AI-specific chips were the future.
In its early years, Cambricon focused on accelerators called neural processing units (NPUs) for both mobile devices and servers. Huawei was a crucial first customer, licensing Cambricon’s designs for its Kirin mobile processors. But as Huawei pivoted to develop its own chips, Cambricon lost a flagship partner, forcing it to expand quickly into edge and cloud accelerators. Backing from Alibaba, Lenovo, iFlytek, and major state-linked funds helped push Cambricon’s valuation to $2.5 billion by 2018 and eventually landing it on Shanghai’s Nasdaq-like STAR Market in 2020.
The next few years were rough. Revenues fell, investors pulled back, and the company bled cash while struggling to keep up with Nvidia’s breakneck pace. For a while, Cambricon looked like another cautionary tale of Chinese semiconductor ambition. But by late 2024, fortunes began to change. The company returned to profitability, thanks in large part to its newest MLU series of chips.
That product line has steadily matured. The MLU 290, built on a 7-nm process with 46 billion transistors, was designed for hybrid training and inference tasks, with interconnect technology that could scale to clusters of more than 1,000 chips. The follow-up MLU 370, the last version before Cambricon was sanctioned by the United States government in 2022, can reach 96 Tflops at FP16.
Cambricon’s real deal came with the MLU 590 in 2023. The 590 was built on 7-nm and delivered peak performance of 345 Tflops at FP16, with some reports suggesting it could even surpass Nvidia’s H20 in certain scenarios. Importantly, it introduced support for less-precise data formats like FP8, which eased memory bandwidth pressure and boosted efficiency. This chip didn’t just mark a leap—it turned Cambricon’s finances around, restoring confidence that the company could deliver commercially viable products.
Now all eyes are on the MLU 690, currently in development. Industry chatter suggests it could approach, or even rival, Nvidia’s H100 in some metrics. Expected upgrades include denser compute cores, stronger memory bandwidth, and further refinements in FP8 support. If successful, it would catapult Cambricon from “domestic alternative” status to a genuine competitor at the global frontier.
Cambricon still faces hurdles: its chips aren’t yet produced at the same scale as Huawei’s or Alibaba’s, and past instability makes buyers cautious. But symbolically, its comeback matters. Once dismissed as a struggling startup, Cambricon is now seen as proof that China’s domestic chip path can yield profitable, high-performance products.
A Geopolitical Tug-of-War
At its core, the battle over Nvidia’s place in China isn’t really about teraflops or bandwidth. It’s about control. Washington sees chip restrictions as a way to protect national security and slow Beijing’s advance in AI. Beijing sees rejecting Nvidia as a way to reduce strategic vulnerability, even if it means temporarily living with less powerful hardware.
China’s big four contenders, Huawei, Alibaba, Baidu, and Cambricon, along with other smaller players such as Biren, Muxi, and Suiyuan, don’t yet offer the real substitutes. Most of their offerings are barely comparable with A100, Nvidia’s best chips five years ago, and they are working to catch up with H100, which was available three years ago.
Each player is also bundling its chips with proprietary software and stacks. This approach could force Chinese developers accustomed to Nvidia’s CUDA to spend more time adapting their AI models which, in turn, could affect both training and inference.
DeepSeek’s development of its next AI model, for example, has reportedly been delayed. The primary reason appears to be the company’s effort to run more of its AI training or inference on Huawei’s chips.
The question is not whether Chinese companies can build chips—they clearly can. The question is whether and when they can match Nvidia’s combination of performance, software support, and trust from end-users. On that front, the jury’s still out.
But one thing is certain: China no longer wants to play second fiddle in the world’s most important technology race.
From Your Site Articles
Related Articles Around the Web






