Researchers from the Massachusetts Institute of Technology (MIT) and collaborating institutions have developed an artificial intelligence–driven robotic assembly system that allows users to design and fabricate physical objects simply by describing them in words, potentially lowering barriers to product design and rapid prototyping.
The system combines generative AI with robotic assembly to convert text prompts into functional, multi-component objects. Instead of relying on traditional computer-aided design (CAD) software — which requires specialised expertise and is often too complex for early-stage ideation — the new framework enables non-experts to participate directly in the design process.
The approach uses two generative AI models. The first creates a 3D representation of an object’s geometry based on a user’s text prompt. A second model then reasons about how prefabricated components should be arranged, taking into account both the geometry and the intended function of the object. Once the design is finalised, a robotic system automatically assembles the object using reusable parts.
Using this end-to-end framework, the researchers fabricated items such as chairs and shelves from prefabricated structural and panel components. Because the components can be disassembled and reused, the system reduces material waste during fabrication.
In a user study evaluating the designs, more than 90 percent of participants preferred objects created using the AI-driven system compared to those produced using alternative automated approaches.
The research team said the framework could be particularly useful for rapid prototyping of complex objects such as aerospace components or architectural structures. Over time, the system could also enable local fabrication of everyday objects, such as furniture, reducing the need for shipping bulky products from central manufacturing facilities.
“Sooner or later, we want to be able to communicate and talk to a robot and AI system the same way we talk to each other to make things together. Our system is a first step toward enabling that future,” said lead author Alex Kyaw, a graduate student in MIT’s departments of Electrical Engineering and Computer Science and Architecture, in a media statement.
The study was authored by Kyaw along with Richa Gupta, an MIT architecture graduate student; Faez Ahmed, associate professor of mechanical engineering; Lawrence Sass, professor and chair of the Computation Group in the Department of Architecture; senior author Randall Davis, an MIT professor and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and researchers from Google DeepMind and Autodesk Research. The paper was recently presented at the Conference on Neural Information Processing Systems.
Vision-language models drive assembly decisions
A key technical challenge addressed by the researchers was enabling AI models to generate component-level designs suitable for robotic assembly. While existing generative models can create 3D meshes from text, these representations often lack the detailed structure needed to identify how individual parts should be assembled.
To solve this, the team used a vision-language model (VLM) trained to understand both images and text. The model determines how prefabricated structural and panel components should fit together to form an object.
“There are many ways we can put panels on a physical object, but the robot needs to see the geometry and reason over that geometry to make a decision about it. By serving as both the eyes and brain of the robot, the VLM enables the robot to do this,” Kyaw said.
The system allows users to remain actively involved throughout the design process, refining outputs through feedback. For example, a user can specify that panels be used only on certain parts of an object.
“The design space is very big, so we narrow it down through user feedback. We believe this is the best way to do it because people have different preferences, and building an idealised model for everyone would be impossible,” Kyaw said.
“The human-in-the-loop process allows the users to steer the AI-generated designs and have a sense of ownership in the final result,” added Gupta.
According to the researchers, the vision-language model demonstrated an ability to reason about function, such as identifying surfaces needed for sitting or leaning in a chair, rather than assigning components randomly.
Looking ahead, the team plans to expand the system’s capabilities to handle more complex materials and functional parts, including hinges and gears, enabling the creation of objects with moving components.
“Our hope is to drastically lower the barrier of access to design tools. We have shown that we can use generative AI and robotics to turn ideas into physical objects in a fast, accessible, and sustainable manner,” Davis said.





















