Kioxia AI SSD Nvidia Partnership Promises 10 Million IOPS XL-Flash Revolutionizing GPU Data Transfer for AI Servers

Kioxia and Nvidia develop a revolutionary AI SSD targeting over 10 million IOPS with XL-Flash and direct GPU communication.
Kioxia AI SSD Nvidia Partnership Promises 10 Million IOPS XL-Flash Revolutionizing GPU Data Transfer for AI Servers

Fast storage that could transform the way AI servers work is truly bold. Recently, Kioxia had introduced their new kind of SSD, engineered for the stated performance exceeding 10 million input/output operations per second (IOPS) when small amounts of data are considered. In other words, this means around three times the speed achieved by most current high-end SSDs. Kioxia has joined hands with GPU giant Nvidia to take up this incredible project.

What is it The crux of a modern AI server's headache is the hurry of transferring data from storage to powerful GPUs. Today, the CPU is in between; this is all latency and slows down raw access. Hence, those high-priced GPUs are still not performing at full capacity.

Kioxia AI SSD Nvidia Partnership Promises 10 Million IOPS XL-Flash Revolutionizing GPU Data Transfer for AI Servers

To break through this performance wall, Kioxia is designing a whole new controller tailored to maximize these IOPS figures with the aim of going beyond 10 million IOPS with small 512-byte data blocks. As such, it should have GPUs bringing in data at a pace that keeps their processing cores busy 100% of the time.

The secret sauce for this "AI SSD" will be Kioxia's own single-level cell (SLC) XL-Flash memory. This isn't your average flash; it is what boasts incredibly low read latencies – think 3 to 5 microseconds, compared to the 40 to 100 microseconds you see with typical 3D NAND SSDs. Since SLC stores only one bit of data in each cell, it has very much faster access, as well as tremendous endurance, both critical for the relentlessness of AI demands.

You might ask, why 512-byte blocks The answer while larger 4K blocks make sense for raw bandwidth, AI applications such as large language model (LLM) or retrieval-augmented generation (RAG) systems often do large amounts of very random, small-lookups. They fetch tiny bits of information like parameters or knowledge base entries. For these applications, 512 bytes actually match real application behavior much better than larger ones. This approach is better for keeping latency low, and for bandwidth, you can simply use more drives.

Interestingly, Kioxia hasn't exactly said what type of connection interface this AI SSD would use, but it does not sound like one will need anything as massive as PCIe 6.0 bandwidth.

Another big part of the design is direct peer-to-peer communication between the GPU and the SSD. From there, data can work their way straight from storage into the GPU, avoiding the CPU altogether. This direct connection has the promise of better performance and lower latency.

Kioxia and Nvidia also understand what 512 bytes are good for that size of chunks maps well with the way GPUs work internally. GPUs internally keep data in small cache lines (like 32, 64, or 128 bytes) and their memory systems are built for a quick burst to many small, separate memory locations for feeding all their stream processors. So 512-byte reads just fit better with the GPU's natural way of doing things.

Kioxia believes that this AI SSD would be instrumental in AI training environments, especially in scenarios where there are large amounts of data accessed very rapidly and repeatedly in the training of large language models. This unit would also be expected to operate in AI inference applications, primarily in systems utilizing techniques such as RAG to enhance AI-generated content with real-time reference information (that is, AI reasoning). Super-low latency and high-bandwidth storage are critical to such machines for obtaining instantaneous responses and making sure GPUs are well utilized.

Kioxia's AI SSD, built in partnership with Nvidia, is set for launch in the second half of 2026; write it down in your diary because it might change the world of AI infrastructure as we know it.

About the author

mgtid
Owner of Technetbook | 10+ Years of Expertise in Technology | Seasoned Writer, Designer, and Programmer | Specialist in In-Depth Tech Reviews and Industry Insights | Passionate about Driving Innovation and Educating the Tech Community Technetbook

Post a Comment