Hello Blabladores, you might have noticed that some models have been changing, coming and going. As they say, troubles like to come with company :-) So we have a couple issues together: 1 - The supercomputer where we run 4 of our models (GPT-OSS, Qwen3-235, Llama3-405 and Qwen-3-Coder with function calling) is offline. You can check the status of Jureca-HWAI on https://status.jsc.fz-juelich.de/ 2 - The change on the api server last friday. Given that we use mostly the VLLM backend, and VLLM has changed architecture recently. As mentioned here, https://docs.vllm.ai/en/latest/configuration/conserving_memory.html#quantiza..., "CUDA graph capture takes up more memory in V1 than in V0.” This in turn made many models run out of memory on the same hardware they ran before. So I am carefully reducing context size and the size of cuda graphs. It’s a manual, boring and slow process. I am sorry, working as fast as I can here so we can keep barking loud!! :-D Dr. Alexandre Strube a.strube@fz-juelich.de Helmholtz AI Jülich Supercomputing Centre Forschungszentrum Juelich GmbH 52425 Jülich, Germany Phone: +49 2461 61-3866 JSC is the coordinator of the John von Neumann Institute for Computing (NIC) and member of the Gauss Centre for Supercomputing (GCS)
participants (1)
-
Strube, Alexandre