Prompet: Can an AI model teach itself to draw?

Hey Qwen3 (8B), Draw me a sheep (svg)!

What I got

vs

What I wanted

Time to try reinforcement learning!

Wait, maybe a Vision-Language-Model (VLM) can be both the generator AND the evaluator?

Let's test this with Qwen3-VL

Hey Qwen3-VL (8B-Instruct), describe this image:

This is a minimalist, stylized illustration of the character Baymax from the Disney animated film Big Hero 6.

This is a minimalist, stylized illustration of a sheep, rendered in a simple black-and-white design against a plain white background.

Not bad, we can work with that!

The plan

Qwen3-VL 8B generates SVG illustrations
The same model critiques them (as PNGs)
Drawing improves
...
Profit!

Step 1: nailing the basics (fine-tuning)

Working with Qwen3-VL (8B), I realized it generated invalid SVGs half of the time.

So I did a SFT run: 200 examples generated by Claude, augmented to 1200 entries (dataset). Invalid rate dropped to ~5%. Some samples from the fine-tuned model:

Prompt was the same: "Generate an svg illustration of a pet - output svg code only"

Step 2 - Reinforcement Learning in Action

I chose GRPO for this project because it seemed like the most practical choice for aesthetic optimization with limited GPU resources.

At each training step the model generates 4 SVGs. We convert them to PNGs and the same model scores each one. Its own critiques become the reward signal that improves the next round.

Training step 1/15

So, did it work?

16 training steps later, here are some of the illustrations the model learned to create.
Some are good, some not-so good (and some are plain ugly):

(reload to see another random set of results)

The end

I have questions! →