

Yeah it does that, heh.
The Qwen team recommend a fairly high temperature, but I find it’s better with modified sampling (lower temperature, 0.1 MinP, a bit of rep penalty or DRY). Then it tends to not “second guess” itself and take the lower probability choice of continuing to reason.
If you’re looking for alternatives, Koboldcpp does support Vulkan. It may not be as fast as the (SYCL?) docker container, but supports new models and more features. It’s also precompiled as a one click exe: https://github.com/LostRuins/koboldcpp
It turns out (with the right optics) none of that stuff really matters. Attention/fame trump all.