After the great news of GLM-Flash, I decided to try the full, 385 billion parameter GLM 4.7. And guess what? It’s running! Compared to Minimax, it is slower, but better at math, and multi-modal. It can understand text, video, images and audio. While Minimax is theoretically capable of one million tokens, GLM is limited to 204 thousand. Given that is just slightly better than Minimax, while being slower, I am not sure if I will keep running it. What do you think? Leeeeeeet’s bark! Alex
Well, bite me. The quantized version I am running apparently isn’t multi-modal :-(
On 22. Jan 2026, at 21:57, Strube, Alexandre <a.strube@fz-juelich.de> wrote:
After the great news of GLM-Flash, I decided to try the full, 385 billion parameter GLM 4.7.
And guess what? It’s running!
Compared to Minimax, it is slower, but better at math, and multi-modal. It can understand text, video, images and audio.
While Minimax is theoretically capable of one million tokens, GLM is limited to 204 thousand.
Given that is just slightly better than Minimax, while being slower, I am not sure if I will keep running it. What do you think?
Leeeeeeet’s bark!
Alex
participants (1)
-
Strube, Alexandre