Hello Blabladores,
you might have noticed that some models have been changing, coming and going.
As they say, troubles like to come with company :-) So we have a couple issues together:
1 - The supercomputer where we run 4 of our models (GPT-OSS, Qwen3-235, Llama3-405 and Qwen-3-Coder with function calling) is offline. You can check the status of Jureca-HWAI on
https://status.jsc.fz-juelich.de/
2 - The change on the api server last friday. Given that we use mostly the VLLM backend, and VLLM has changed architecture recently.
This in turn made many models run out of memory on the same hardware they ran before. So I am carefully reducing context size and the size of cuda graphs. It’s a manual, boring and slow process.
I am sorry, working as fast as I can here so we can keep barking loud!! :-D
Dr. Alexandre Strube
a.strube@fz-juelich.de
Helmholtz AI
Jülich Supercomputing Centre
Forschungszentrum Juelich GmbH
52425 Jülich, Germany
Phone: +49 2461 61-3866
JSC is the coordinator of the
John von Neumann Institute for Computing (NIC)
and member of the
Gauss Centre for Supercomputing (GCS)