AI infrastructure startup Tensordyne has announced the development of its first commercial accelerator, the Napier chip, with fabrication currently underway on TSMC’s 3nm process. This product, developed in collaboration with Juniper Networks and Broadcom, aims to deliver higher throughput and lower power consumption than conventional GPUs.
The core innovation behind Napier lies in its use of logarithmic mathematics to simplify the computationally intensive task of matrix multiplication, a common requirement in AI workloads. In traditional computing, addition is less resource-intensive than multiplication. By employing logarithms, Tensordyne transforms multiplication into an addition problem, where a * b becomes log(a) + log(b).
Efficient Logarithmic Calculations
To utilize this logarithmic approach effectively, the challenge is to convert values to their logarithmic forms and back without significant loss of accuracy. While a lookup table (LUT) could have been a straightforward solution, co-founder Gilles Backhus indicated that it would be impractically large. Instead, Tensordyne employs the Mitchell approximation to estimate logarithmic values, complemented by a section-wise correction mechanism in hardware to achieve accuracy comparable to FP16 precision. The chip also supports FP8 and 4-bit block floating data types.
Specifications and Performance Claims
The Napier chip features a nominal thermal design power (TDP) of 300 watts, 144 GB of HBM3e memory across four stacks, and a memory bandwidth of 4.7 TB/s. It is designed to deliver up to 2.1 petaFLOPS of dense FP8 performance, positioning it as a competitor to Nvidia’s H200 accelerators, while consuming nearly 60% less power. However, actual performance metrics will only be confirmed upon release.
Tensordyne’s system, known as the TDN72, integrates eight air-cooled compute blades, each hosting a 10-core Intel Xeon-D CPU and nine Napier accelerators. This configuration allows for rack-scale deployments with a total of 72 accelerators per pod, featuring a high-speed interconnect fabric similar to Nvidia’s systems.
Software and Deployment Considerations
As Tensordyne prepares for the launch of its TDN72 system in 2027, it emphasizes the importance of software compatibility. The company has developed a proprietary serving platform and runtime environment to facilitate the deployment of existing models on its hardware. Moreover, Backhus anticipates that the Napier chip will be capable of generating over 1,000 tokens per second without relying on multi-token prediction techniques.
While Tensordyne has garnered interest from cloud providers like Cirrascale and BlueSky Compute, the success of the Napier chip will ultimately depend on its software ecosystem and real-world performance metrics once it becomes available.
This article was produced by NeonPulse.today using human and AI-assisted editorial processes, based on publicly available information. Content may be edited for clarity and style.








