My coding tests show a bell curve for the quants of QwQ Preview in Ollama, where Q4_K_M is more capable than the lower and higher quants:
Let's do some more testing to see if the curve holds up.
[Reference manual omitted for clarity. See https://github.com/leikareipa/retro-ngon/blob/7e7bf5902b626483fb946eea8895ecb5983505fd/docs/api-reference.md]
Above is the reference manual for a JavaScript software 3D renderer. Write a pixel shader function that applies a fisheye lens effect.
Q3_K_M and Q4_K_M managed to produce the requested fisheye effect; Q2_K failed completely and so did Q8_0. I think Q4_K_M's version is the best one.
Write a program using QBasic that draws a natural-looking lightning bolt.
[Syntax errors]
[Syntax errors]
[Syntax errors]
[Syntax errors]
Q4_K_M was the only quant that managed to produce syntactically valid, runnable code; all other quants' code had various syntax errors.
// Pixel shader: Applies a vignette effect to the pixel buffer. function ps_vignette(renderContext) { const {width, height, data:pixels} = renderContext.pixelBuffer; const centerX = (width / 2); const centerY = (height / 2); const radius = Math.max(centerX, centerY); const intensity = 1.0; for (let y = 0; y < height; y++) { for (let x = 0; x < width; x++) { const dx = x - centerX; const dy = y - centerY; const distanceSquared = (dx * dx) + (dy * dy); const vignette = Math.max(0, 1 - (distanceSquared / (radius * radius))); const i = (x + y * width) * 4; pixels[i + 0] *= (1 - intensity + (vignette * intensity)); pixels[i + 1] *= (1 - intensity + (vignette * intensity)); pixels[i + 2] *= (1 - intensity + (vignette * intensity)); } } }
Above is a sample pixel shader for a JavaScript 3D software renderer. Write a pixel shader that applies a fisheye effect.
Taking the average of eight runs of this prompt, and accounting for spread in the results, the sweet spot appears to be Q3_K_M through K5_K_M, while Q8_0 performs worse than Q2_K.
For reference, the table includes the unquantized version of QwQ as hosted on Hugging Face Playground.
These tests roughly confirm the previously-found bell curve, where Q4_K_M is the apex quant and categorically better than Q8_0. More broadly, the sweet spot appears to be between Q3_K_M and Q5_K_M, inclusive.
Q8_0 was in fact so bad that it indicates a potential problem with Ollama. Outside of these tests, I've also seen preliminary indications that Ollama's FP16 struggles in a similar way. I opened an issue for this, but it was closed without resolution, so it's not something the Ollama authors are concerned about.
All of that said, my time with QwQ has also shown a fair bit of variance in output quality within any given quant. You'd ideally do very many runs of a test to find a realistic average.