AI at Scale: Trust, Systems, and Supercomputing

How can AI systems become both powerful and trustworthy when deployed in critical infrastructure? This question framed Supercomputers for Trusted AI, a dialogue hosted by Swissnex in Japan that brought together two experts with complementary perspectives on the development and deployment of artificial intelligence from ETH Zurich.

On January 26-29, the conference “SCA/HPCAsia 2026” took place in Osaka as the joint edition of SupercomputingAsia (SCA) and the International Conference on High Performance Computing in the Asia-Pacific Region (HPCAsia), gathering some 2,500 participants. Swissnex invited Prof. Dr. Torsten Hoefler (Chief Architect for Machine Learning, CSCS) and Prof. Dr. John Lygeros (Director, NCCR Automation) to explore how AI can move from research environments into real-world systems, focusing on trust, reliability, and the practical challenges of operating at scale.

 

From Data to Computation

Torsten Hoefler opened the discussion approaching AI from the perspective of high-performance computing and AI for science, inviting the audience to take a step back and reflect on how AI systems are built and understood, rather than focusing on applications alone.

Drawing a historical analogy, Hoefler compared today’s AI systems to early engineering achievements: structures that worked remarkably well long before the underlying theory was fully understood. In a similar way, many AI systems perform impressively despite limited theoretical insight into why they work. This gap between capability and understanding, he suggested, becomes particularly important as systems scale.

He also challenged the familiar idea that “data is the new oil.” While large datasets played a key role in the early development of AI, Hoefler argued that computation has become the defining resource today. The ability to design, operate, and sustain large-scale computing infrastructure increasingly shapes who can push AI research forward.

In this context, he pointed to AI for science as a promising space where machine learning and physics-based models begin to converge, both deeply reliant on supercomputing. Rather than evolving separately, these approaches may benefit from closer interaction.

Control Theory Meets AI Deployment

John Lygeros complemented this view with a systems perspective grounded in the work of NCCR Automation, which focuses on trustworthy automation in domains such as robotics, energy systems, mobility, and industrial processes. He shared examples from real-world testbeds in Swiss communities, where theoretical models meet the realities of deployment, from energy management systems to industrial park development.

Drawing from his expertise, Lygeros highlighted that deploying AI in operational environments introduces challenges that go beyond model accuracy. Feedback loops, uncertainty, and long-term system behavior all matter, particularly when AI systems begin to interact with each other and with physical infrastructure. One concern raised during the discussion was the amplification of bias and error at scale. As AI systems increasingly generate data that is then used to train new models, small imperfections can compound over time. Understanding and managing these dynamics remains an open challenge.

In response, the dialogue touched on emerging mechanisms aimed at maintaining reliability, such as grounding models in verifiable sources or combining exploratory AI behavior with strong validation constraints, approaches that may become especially relevant in scientific and safety-critical contexts.

Beyond Performance Metrics

The dialogue highlighted how developing trusted AI systems requires thinking beyond model performance. Questions of system-level behavior, feedback dynamics, verification approaches, and infrastructure constraints all shape what is possible as AI moves into real-world applications. By bringing together perspectives from

high-performance computing and control theory, Supercomputers for Trusted AI offered a space where different technical approaches can engage constructively.  As AI systems move toward deployment in infrastructure, conversations that bridge computational capability with systems-level safety thinking become increasingly essential.