

21 Feb 2025
It combines language and visual processing to control software and robotic systems.
Microsoft’s Magma can process multimodal data like texts, images, and videos and act on it. That’s irrespective of whether it is navigating a user interface, like clicking buttons and filling forms or manipulating physical objects, like commanding a robot to pick up things and move objects.
It has been developed by researchers from Microsoft, the University of Maryland, KAIST, the University of Washington, and the University of Wisconsin-Madison.
Microsoft’s ChatGPT for Robotics and Google’s RT-2 and PALM-E are AI-based robotics models that use large language models (LLMs) as interfaces and require separate models for perception and control. Magma integrates both abilities into a single base model.
Microsoft is positioning it as agentic AI that can formulate plans and act on them rather than just answer questions. OpenAI’s Operator and Google’s Gemini 2.0 are endeavours towards agentic AI.
Microsoft also launched Muse, a videogame AI model that generates visuals and actions.
Never miss another post from SalestorrsNews150. Follow Salestorrs on WhatsApp, LinkedIn, Facebook, X, and Instagram.