PROMPET

an experiment in

Reinforcement learning

Can an AI model teach itself to draw?

Let's find out

Hey Qwen3 (8B), Draw me a sheep (svg)!

What I got

Generated sheep illustration
vs

What I wanted

Target sheep illustration

Time to try reinforcement learning!

Wait, maybe a Vision-Language-Model (VLM) can be both the generator AND the evaluator?

Let's test this with Qwen3-VL

Hey Qwen3-VL (8B-Instruct), describe this image:

Baymax sheep illustration

This is a minimalist, stylized illustration of the character Baymax from the Disney animated film Big Hero 6.

Target sheep illustration

This is a minimalist, stylized illustration of a sheep, rendered in a simple black-and-white design against a plain white background.

Not bad, we can work with that!

The plan

  1. Qwen3-VL 8B generates SVG illustrations
  2. The same model critiques them (as PNGs)
  3. Drawing improves
  4. ...
  5. Profit!

Step 1: nailing the basics (fine-tuning)

Working with Qwen3-VL (8B), I realized it generated invalid SVGs half of the time.

So I did a SFT run: 200 examples generated by Claude, augmented to 1200 entries (dataset). Invalid rate dropped to ~5%. Some samples from the fine-tuned model:

Prompt was the same: "Generate an svg illustration of a pet - output svg code only"

Step 2 - Reinforcement Learning in Action

I chose GRPO for this project because it seemed like the most practical choice for aesthetic optimization with limited GPU resources.

At each training step the model generates 4 SVGs. We convert them to PNGs and the same model scores each one. Its own critiques become the reward signal that improves the next round.

Training step 1/15