Xiaomi Robotics 0 VLA Model launches for advanced open source robotics and physical intelligence development

Xiaomi Unveils Xiaomi Robotics 0 VLA Model for Advanced Robotics

Xiaomi has launched its first Vision Language Action (VLA) model which is called Xiaomi Robotics 0. The open source model which operates with 4.7 billion parameters aims to develop physical intelligence. The system combines visual perception with language comprehension and real time physical execution into one unified system.

The Xiaomi Robotics 0 model operates on a closed loop principle consisting of perception, decision, and execution. The system divides its operations into two main functional components

Visual Language Model (VLM) The cognitive center of the system interprets high resolution scenes while it analyzes commands. The system can process vague instructions which enable it to identify objects while making logical deductions.
Action Expert The block uses a multi level Diffusion Transformer to control all physical movements. The system creates complete action sequences which maintain motion continuity while establishing movement accuracy.

The training process used different phases to connect scene comprehension with motor capabilities. The VLM starts by predicting action distributions which it derives from visual information. The Diffusion Transformer learns to create movement sequences by analyzing crucial environmental elements instead of processing basic text tokens. The method allows the robot to execute physical tasks while maintaining its cognitive abilities.

Xiaomi designed multiple technical improvements to achieve better operational efficiency by minimizing inference delays which caused operational downtime.

Asynchronous Computing The system enables continuous movement execution during model operations when complex calculations need additional processing time.

Clean Action Prefix Technology The technology enables the robot to refine its physical trajectory through the process of re feeding previous actions into the model.

Attention Masking The model improves its visual focus to direct attention toward immediate visual inputs which enables it to respond faster to changes in its surroundings.

Xiaomi Robotics 0 achieved better benchmark results than 30 competing systems which included LIBERO CALVIN and SimplerEnv. The testing involved a dual arm robotic platform which performed real world operations. The robot successfully executed complex, multi stage tasks such as folding towels and disassembling construction sets which demonstrated its ability to handle both rigid and flexible objects effectively.

Technetbook | The Tech Experts

Xiaomi Robotics 0 VLA Model launches for advanced open source robotics and physical intelligence development

Xiaomi Unveils Xiaomi Robotics 0 VLA Model for Advanced Robotics

About the author

Post a Comment

CXMT 32GB DDR4 RAM Price Disruption Changxin Memory Technologies DRAM Market Impact and YMTC Strategy

ASUS ROG Strix Aiolos M.2 SSD Enclosure features 20Gbps speeds for NVMe and SATA storage expansion

NVIDIA App Launches as Driver 591.86 Update Causes Black Screens and Stability Issues

Windows 11 Update KB5077181 Security and AI Features for 24H2 and 25H2 Versions to Improve Performance

TECNO CAMON 50 POVA 8 Series debut at MWC 2026 Barcelona featuring AI technology and advanced camera systems