Introduction To QWQ 32B Model
breaking news from quen which is run by Alibaba as have just released their qwq 32b model this is basically their new reasoning model with 32 billion parameters and it's rivaling other models for example like deep seek R1 now this is also open source and you can host it locally I'll come on to exactly how to do that today plus we'll be testing out and I'll show you how to get access you can see some of the key benchmarks here so this is qwq 32b versus deep seek R1 open a01 mini deep seek R1 distilled and you can see the different models here so in terms of the benchmarks here qwq is in Red so it's doing pretty well on the aim test Live code bench is just falling behind R1 it's outperforming deeps car1 and open ai1 mini on livebench and also leads on bfcl tests which is basically like function calling now if you want to get access the easiest way is you can go to quen you can also get it on hugging face and it's also available on olama to run locally so for example you can get this for free you just download AMA here and then you're going to go to qwq and get the 32b model and you can just run this inside the terminal so you grabb this terminal command open up terminal then you're going to make sure you have a llama running in the background which you can see right here is the icon and then you can just put in the code here so it's Alama space run space qwq to call that and then it will start running the model locally the only thing I'm going to say with that is it super slow I'm on a Mac M3 Pro and it is really slow to run offline what you can also do is you can navigate to chat. quen q w.ai and if you select in the top left here between the different models you can select qwq 32bit preview all right so there's all these different models if we scroll down so quen 2.5 Max which is another really good model but you want to select 30 2B and the one that we're looking at here is qwq 32b preview which is the reasoning encoding model so if we select qwq like so we can get access to it the way as well if we switch between these models if we switch on thinking mode here then we can use qwq inside there along with web search and you also get access to artifacts right this is super useful because if you combine qwq with artifacts then you can start doing some coding tasks so let's pull this up.
Using QWQ 32B For Coding Tasks
for example we'll say in aside here write a python script of a ball bouncing inside a spinning tessera the thing to note here as well is this is quite and also deep SE gu1 has been a little bit unreliable recently when I've tested it so this is a good option plus it's a free model right so for example with openi o1 preview and those sort of tools with for example O3 mini O3 preview Etc you are going to have to pay for those tools unless you just want to keep hitting the limits so this is like a free alternative there some other stuff you can do if we open up multiple tabs here we can also do a live web search so if we say what happened on March 5th 2025 in the AI industry we can run multiple different tabs using qq's thinking mode inside each of these models so let's switch over to 32b preview and start running this the one thing I would say is it's going to take a lot of time to reply but obviously this is a reason model it's the same with deeps and all other models right and then if we come to the results here so we've said what happened on March the 5th 2025 qwq 32b preview replied and then you can see here it says right here's what happened and you can see this is all sourced actually from today so you can connect the reasona model to the internet and you don't have to pay for it tool like plexy to do that unlike deep C car1 where typically the search feature breaks directly in the chat and you have to usually use per Plexi to get it working the artifact feature is taking a lot of time like you can see so it could be a while before it comes back to us but it's not because the model is slow like you can see it's actually just really thinking that through before it comes back with a response based on a python script of a ball bouncing inside a spinning Tesseract now what we can also do whilst we're waiting for that to load is compare it side by side versus two different features right so first of all we'll open up quen 2.5 Max over here then we'll use the 32b preview model and we'll also use deep seek R1 as well just to compare the outputs.
Comparing Outputs Of Diffrent Models
so quen's previous model versus 32b versus dc1 and we'll test it with this prompt which says there's a tree on the other side of the river in Winter how can I pick an apple all right we'll plug that into each model like and what we're looking for here so just to give this a run through we're going to say build the best possible AI automation calculator website this is quen 2.5 mats without reasoning we'll compare it inside the 32b preview model using thinking mode which is of course which is quen's latest model and then we'll compare it versus DBC G one as well what's also quite interesting here is just like how similar each of the UI so the user interface for each of these models very similar stuff so quen 2.5 Max has given us a working calculator like you can see I do like the artifacts feature as well inside here super useful but it's very basic right and we said build the website but it's just created like a basic calculator nothing interesting there we've now got the output back from Deep seek R1 so let's pull this up this actually looks a lot better right so if we compare for example the basic calculator from quen 2.5 Max versus deep SE gu one like the front end is a lot nicer just looks a lot more modern some nice buttons there Etc but look at this it's much more modern turned it into a full website like you can see and it's included like some nice it icons and that sort of thing for the landing page as well so if you compare the outputs like the difference between this and this is massive right this is like a full-blown website this is just like a basic tool that you know pretty much all AIS can do right now now if we come back to artifacts is given us this bouncing Tesseract using the thinking mode inside quen 2.5 Max so we're using qwq as a thinking model pretty nice preview right there one thing to note as well this is inside chat is that they say qwq 32b preview performs worse in multi-turn conversation compared with single turn so when you're using this model it's better to try and get the prompt right from first time rather than if you go back and forth of it it's not going to perform as well.
Recap And Final Thoughts
so just a recap show you the differences Etc really spell this out so alibaba's qw Q32 B model is more efficient right so for example deep seek R1 uses 671 billion parameters Whiley new qwq model uses 32 billion parameters it also uses multi-stage RL learning which is pretty crazy when you think about it it's like it's a lot smaller but it matches the same performance in logic math and coding and it's also open source right so this is free for commercial and research uses under the Apache 2.0 license and you can compare the benchmarks right here right so less parameters more efficient similar benchmarks on maths similar benchmarks on coding is open source source and it actually has a higher context window with 131,000 tokens versus deep seek r1's 128,000 tokens. appreciate you reading and see you on the next one bye-bye.

