What I got
What I wanted
an experiment in
Hey Qwen3 (8B), Draw me a sheep (svg)!
What I got
What I wanted
Time to try reinforcement learning!
Wait, maybe a Vision-Language-Model (VLM) can be both the generator AND the evaluator?
Let's test this with Qwen3-VL
Hey Qwen3-VL (8B-Instruct), describe this image:
This is a minimalist, stylized illustration of the character Baymax from the Disney animated film Big Hero 6.
This is a minimalist, stylized illustration of a sheep, rendered in a simple black-and-white design against a plain white background.
Not bad, we can work with that!
Working with Qwen3-VL (8B), I realized it generated invalid SVGs half of the time.
So I did a SFT run: 200 examples generated by Claude, augmented to 1200 entries (dataset). Invalid rate dropped to ~5%. Some samples from the fine-tuned model:
Prompt was the same: "Generate an svg illustration of a pet - output svg code only"
I chose GRPO for this project because it seemed like the most practical choice for aesthetic optimization with limited GPU resources.
At each training step the model generates 4 SVGs. We convert them to PNGs and the same model scores each one. Its own critiques become the reward signal that improves the next round.
16 training steps later, here are some of the illustrations the model learned to create.
Some are good, some not-so good (and some are plain ugly):
(reload to see another random set of results)
The end