Attempting to phase out llama3.3 for alias-large in favor of Qwen3-30B-A3B

Hello Blabladores! In keeping with the latest developments in LLMs, I decided to try Qwen-3 30B - A3B. This is a much smaller model than Llama 3.3 - 70B, as well as as the previous model running as alias-large, DeepSeek-R1-70B, which was not so good and very unstable. This is a Mixture of Experts (MoE) model, with 30 billion parameters, where each expert has 3 billion parameters. This means a number of things: - Given the model is much smaller, we can have a much bigger context size, as there is more GPU memory available. The consquence is that the model is more useful with bigger workloads and agents. While I had to keep Llama 3.3-70 limited to a maximum context size of 8192 tokens, I can let Qwen-3 run at its maximum size, which is 128k tokens. - The mixture of experts model architecture means that, when activated, you are performing inference only on one expert at a time, and this is of 3 billion parameters - meaning it is MUCH faster, which makes it again more useful. Qwen3 is a much more modern model than the Llama3.3 - Five months in this field can make a huge difference. Besides, Qwen3 uses the Apache-2.0 license, and I am much more partial to it than to the arcane Meta Llama License which should not even exist. Mind you that while this is a reasoning model, I had to remove the reasoning part on the chat template - This is because the thinking process was causing the web ui to fail intermittently. When I get a fix on that, I will remove the setting so you can see the reasoning process. This affects both the Web UI and the API access. The chat template have a sep="<|im_end|>/nothink", so the model never reasons. If you have any feedback on the workings of the model, or if you think the quality has suffered, please let me know right away! In the worst case I would revert to Llama3.3. But I hope it won’t be needed! Let’s bark! Alex 
participants (1)
-
Strube, Alexandre