Microsoft drives innovation and contributes to the broader information heart and synthetic intelligence group, benefiting your complete {industry}.
To supply the cloud infrastructure wanted for the AI period, fast technological transformation has by no means been extra essential than at the moment. To ship for our prospects whereas driving innovation, we will study from previous technological adjustments and see the crucial position of community-led innovation and {industry} standardization. Over the previous decade, Microsoft has pushed any such deep collaboration by way of cross-industry organizations just like the Open Compute Mission (OCP). Consequently, we proceed to advance {hardware} innovation at each layer of the computing stack, from server and rack structure, networking and storage, and reliability, availability and serviceability (RAS) designs to new efficiency evaluation frameworks. provide chain that assure safety.1 sustainability,2 and reliability3 throughout the cloud worth chain.
As we proceed to innovate within the age of AI, we’re excited to return to the OCP World Summit this yr with extra contributions to help ecosystem innovation, from new energy and cooling options that deal with the altering profile of AI information facilities to new {hardware} safety frameworks that put belief and resilience on the heart of our infrastructure for accelerated computing.
Cooling evolving information facilities with modular methods designed for world deployment
As AI calls for enhance, we’re reinventing our information facilities with a give attention to rising rack density and bettering cooling effectivity. Final fall, once we introduced the Azure Maia 100 system, we additionally launched a devoted liquid cooling “companion,” a closed-loop design that makes use of recirculated fluid to cut back warmth. Since then, we’ve continued down the trail of cooling innovation, working with companions to develop new information heart cooling methods that may deal with the rising energy profiles of AI whereas addressing ease of deployment. We’re happy to contribute designs for a complicated liquid cooling warmth exchanger unit to OCP in order that your complete group can profit from liquid cooling data and hold the tempo of innovation to adapt to quickly evolving AI methods. For extra data, learn the expertise group weblog.
Disaggregated Energy Architectures for Subsequent Technology Programs
The evolution of synthetic intelligence methods has additionally pushed larger energy densities in hyperscale information facilities. As these methods develop, we’ve found new alternatives for flexibility and modularity in system design. Whereas cloud computing and storage methods usually have energy densities beneath 20 kW, synthetic intelligence methods have raised energy densities to lots of of kW. We’re fixing the rising calls for for power infrastructure within the age of AI with Mt. Diablo, our newest collaboration with Meta. It is a new disaggregated rack design to handle crucial house and energy limitations. The answer contains a 400 disaggregated high-voltage direct present (VDC) drive that scales from lots of of kW to 1 MW, enabling 15% to 35% extra AI accelerators in every server rack. This modular strategy permits for energy changes within the disaggregated energy rack to satisfy the altering calls for of various inference and coaching SKUs. We’re excited to proceed our engineering collaboration with Meta on this contribution to the OCP group. Learn the Expertise Group Weblog to study extra.
Shifting in the direction of a safe AI future with new confidential computing options
Final month, Microsoft detailed our imaginative and prescient for Reliable AI and Azure Confidential Inferencethe place safety relies on {hardware} Trusted execution environments (TEE) and confidential belief boundary transparency. At the moment, we develop this imaginative and prescient with the brand new open supply silicon innovation of the Adams Bridge resilient quantum accelerator and its integration into Caliptra 2.0, the following era open supply silicon root of belief (RoT).
The rising capabilities of quantum computer systems current challenges for {hardware} safety, as classical uneven cryptographic algorithms broadly used all through {hardware} safety might be simply defeated by a sufficiently highly effective quantum pc. Recognizing this threat, the Nationwide Institute of Requirements and Expertise (NIST) has printed requirements for brand spanking new quantum resilient algorithms.
These new quantum resilient algorithms are considerably totally different from their classical counterparts. {Hardware} system producers ought to pay instant consideration to those adjustments as they impression crucial {hardware} safety capabilities, resembling immutable root-of-trust anchors for each code integrity and {hardware} id. At the moment, the challenges dealing with silicon parts are extra vital than these of software program, as a result of longer improvement occasions and the immutability of {hardware}. Due to this fact, instant motion is required for brand spanking new {hardware} designs.
As a part of Microsoft’s dedication to our Protected Future Initiative (SFI), and to speed up the adoption of quantum resilient algorithms, Microsoft and the Caliptra consortium are opening Adams Bridge, a brand new silicon block to speed up quantum resilient cryptography. To study extra about Adams Bridge and the way we make our future quantum safe, go to the Expertise Group Weblog.
Along with Caliptra 2.0 and Adams Bridge, Microsoft is taking further steps to enhance safety in {hardware} provide chains with OCP-SAFE (OCP Safety Evaluation Framework Analysis). OCP-SAFE, co-founded by Microsoft, requires systematic and constant safety audits of {hardware} and firmware. Mixed with Caliptra, OCP-SAFE advances transparency and safety assurance on the trail to {hardware} provide chain integrity, transparency and belief (SCITT). Learn the Expertise Group Weblog for extra data.
Bottlenecks to Advances: Optimizations at Each Layer within the Age of AI
Over the previous few years, Microsoft has been on this journey to develop our scale of supercomputing, enabling individuals and organizations all over the world to reap the advantages of generative AI throughout domains, from schooling to healthcare, enterprise, and extra. . Alongside the best way, we have continued to evolve and enhance our infrastructure, constructing among the world’s largest supercomputers with our rising fleet of high-performance accelerators for AI workloads of all styles and sizes. As we encounter rising calls for for innovation in AI, we’ve unlocked efficiency enhancements and efficiencies by way of system-level optimizations, lots of which have been contributed to the open supply group.
By the event of our personal silicon and customized system with Azure Maia, we’ve invested in efficiency per watt effectivity by way of {hardware} and software program algorithmic code. We put money into low-precision arithmetic to attain this by way of early implementation of the MX information formata normal that we contribute to OCP by way of our management within the Microscaling Alliance (MX) together with AMD, Arm, Intel, Qualcomm, Meta, Microsoft and NVIDIA.
Subsequent, we deal with the problem of scaling and large deployment with our liquid-cooled server design. This innovation ensures that our information facilities all over the world can use this expertise, bringing the design to the {industry} to allow wider adoption.
Lastly, we acknowledged that conventional Ethernet was not designed for AI efficiency and scalability. By making vital contributions to the Extremely Ethernet Consortium (UEC), we’ve expanded Ethernet into a cloth able to delivering the efficiency, scalability, and reliability wanted for AI functions.
By these efforts, Microsoft continues to drive innovation and contribute to the broader information heart and AI group, guaranteeing our developments profit your complete {industry}.
We invite attendees of this yr’s OCP International Summit to go to Microsoft at sales space #B35 to discover our newest cloud {hardware} demos with contributions from companions within the OCP group.
Join with Microsoft on the OCP International Summit 2024 and past:
1Delivering consistency and transparency for cloud {hardware} safetyRani Borkar. October 18, 2022.
2Find out how Microsoft Azure is accelerating {hardware} improvements for a sustainable futureZaid Kahn. November 9, 2021.
3Foster developments in AI infrastructure by way of standardizationRani Borkar and Reynold D’Sa. October 17, 2023.