Supermicro AMD Instinct MI350P AI Infrastructure Expansion with Air Cooled PCIe GPU Systems and Dual EPYC 9005 Processors for Generative AI Efficiency
Supermicro presents two 5U PCIe GPU servers designed for accelerating generative artificial intelligence and high performance computing tasks. The systems enable operation of AMD Instinct MI350P accelerators. Supermicro and AMD product documentation states that these servers provide enterprises with a direct route to expand their AI capabilities through existing air cooled data center systems. The AS 5126GS TNRT and AS 5126GS TNRT2 models utilize dual AMD EPYC 9005 series processors to provide the necessary computational foundation for dense GPU environments.
The 2026 systems utilize the AMD EPYC 9005 processor which enables up to 384 cores per server. The system combines high core density with extensive PCIe Gen 5 lane support and large cache capacity. The architecture allows each GPU to connect directly to the CPU with full PCIe bandwidth to prevent oversubscription. The system maintains steady performance during AI training and inference operations through its direct connection method. The AS 5126GS TNRT model utilizes a dual root architecture to support 8 GPUs while the TNRT2 variant features a PCIe switched topology that can accommodate up to 10 GPUs within a single 5U chassis.
Each AMD Instinct MI350P GPU provides 144 GB of HBM3e high bandwidth memory. The massive memory footprint allows exceptionally large language models to remain resident on the GPU which reduces the need for frequent data transfers. The system operates at a memory bandwidth capacity of 3.6 TB per second. The specifications enable real time inference which requires low latency and high responsiveness. The hardware also supports advanced precision formats like FP8 and MX scale data types to improve throughput and performance per watt.
Supermicro servers require design features that allow their deployment into facilities without needing any subsequent building modifications. Because these 5U systems are fully air cooled organizations can install them in standard enterprise racks without the added cost or risk of liquid cooling infrastructure. The method enables organizations to achieve faster return on investment through their existing power and cooling systems. The internal design focuses on minimal architectural complexity to reduce latency between components. The unit allows users to manage storage through its 8 or 12 hot swap NVMe or SATA drives which users can access from the unit's front side.
The system achieves power efficiency through its N plus N redundant power supplies which maintain 96% titanium level efficiency. The system reduces total ownership expenses by decreasing electricity usage and thermal energy output. The platform operates using the open source ROCm software stack which provides users of the system with developer freedom. The AI software stack allows organizations to use it without vendor lock and without incurring any licensing costs. Enterprise customers receive a frictionless workload migration path to Supermicro AMD systems through the prebuilt PyTorch TensorFlow integration. The system enables organizations to expand their ability to run agentic AI and RAG pipelines in 2026 through its dual GPU capacity and standard rack design.
