My Experience Running 5,000 NVIDIA H100 GPUs. Inside MareNostrum 5

This article shares a practical, first-hand experience using MareNostrum 5 to run large language models at scale, highlighting the infrastructure, support, and real-world impact of accessing thousands

Jan 20, 2026

Over the past weeks, I have been working hands-on with MareNostrum 5 (MN5) to run and evaluate large language models at scale. The experience has been highly positive, both from a technical and operational point of view.

Disclaimer
Although I hold a degree in Computer Science Engineering, my day-to-day technical skills are relatively limited. I am not an HPC specialist nor a deep systems engineer. However, with the help of a custom-made GAIA solution, I was able to set up the entire workflow end-to-end on my own, from environment configuration to model execution to large-scale prompt experimentation. This experience demonstrates not only the power of MareNostrum 5, but also how accessible it can be when the right tools and support are in place.

Image: Best 1:1, created by GPT 5.2 using my prompt

Outstanding Support and Responsiveness

The first thing that stands out is the quality of support.

The response time, technical depth and practical help provided by the MN5 support teams (mainly thanks to Bernardo!) are genuinely impressive. Issues are addressed quickly, explanations are clear, and solutions are pragmatic. This level of support makes a huge difference when working with complex systems such as large-scale GPU clusters.

In short, the support is world-class.

Technology Stack Used

The environment is modern, robust and designed for serious AI workloads.

Software stack

SLURM for job scheduling
Python 3.12
PyTorch 2.3 (CUDA enabled)
Hugging Face Transformers
Accelerate for multi-GPU and device mapping
Fully offline model execution (no external internet access)
Custom batch pipelines for repeated prompt execution

Hardware stack

Latest-generation GPU nodes
High-bandwidth interconnect
Large shared GPFS scratch storage
Optimised for both single-GPU and multi-GPU workloads

This stack allows rapid experimentation while remaining production-grade.

Working With the 20B Model

The 20B parameter model (gpt-oss-20b) runs exceptionally well on MN5.

Fast model loading
Stable inference
Predictable performance
Easy batching of prompts

Using SLURM jobs, I ran the same prompt dozens of times in a fully reproducible way, automatically collecting outputs for analysis. This is precisely what is needed for serious evaluation of prompts, system instructions and behaviour consistency.

A Major Upgrade Over Deucalion

Having previously worked with Deucalion, the difference is clear.

MareNostrum 5 is a significant upgrade in:

raw compute power
GPU availability
system maturity
tooling
and operational reliability

The workflow is smoother, faster and far more scalable. From now on, MN5 will be my default platform for this type of work.

Access via FCT, CNCA and BSC Barcelona

Access to MareNostrum 5 is provided through:

FCT (Fundação para a Ciência e Tecnologia)
CNCA (Centro Nacional de Computação Avançada)
BSC (Barcelona Supercomputing Center)

This collaboration model works exceptionally well and gives researchers and practitioners access to infrastructure that would otherwise be unreachable.

MareNostrum 5 Capabilities

MareNostrum 5 offers:

Thousands of GPUs
Massive parallelism
High-performance storage
Long-running and interactive workloads
Support for large-scale AI, HPC and hybrid workloads

It is clearly designed for the next generation of AI experimentation and deployment.

Massive Potential for System Prompt Testing

One of the most exciting aspects is the ability to test system prompts at scale.

With MN5, it becomes trivial to:

run the same prompt hundreds of times
compare different system prompts
analyse consistency, drift and edge cases
generate structured outputs for downstream analysis

For organisations working with LLMs, this opens the door to industrial-scale prompt engineering and validation, something that is not feasible on local machines or small clusters.

Final Thoughts

MareNostrum 5 combines top-tier hardware, a modern AI software stack, and exceptional human support.

For anyone serious about large language models, experimentation, or AI evaluation at scale, MN5 is not just an option; it is a reference platform.

From my side, the decision is clear: this is the infrastructure I will be using going forward.

Building Creative Machines

Discussion about this post

Ready for more?