✦ Conditional GAN · Academic Research Demo
TurnSketches
IntoReality

AI-powered design preview using Conditional Generative Adversarial Networks. Draw a shoe sketch and watch it come to life.

Scroll

Live Demo

Draw a shoe sketch or upload an edge map

Your Sketch

400 × 400
Pen size: 4px

Generated Shoe

👟

Your generated shoe will appear here

The Pipeline

How It Works

01

Input

Draw or upload a shoe sketch

Provide a clean edge map or line drawing of a shoe silhouette. The model expects a 256×256 binary-style sketch where white strokes on black background define the structural outline — just like the Canny edge detector would produce.

02

U-Net Generator

AI translates structure to texture

A U-Net encoder-decoder maps your sketch to a realistic image. The encoder compresses spatial information through 8 downsampling blocks; skip connections bridge encoder and decoder layers, preserving fine-grained details that would otherwise be lost during compression.

03

Output

Photorealistic shoe image

The generator outputs a 256×256 RGB image with realistic textures, lighting, and color. The PatchGAN discriminator ensures local texture coherence at a 70×70 receptive field, producing fine leather, mesh, or fabric detail without global blurriness.

Technical Deep Dive

Model Architecture

G

Generator: U-Net

Encoder → Decoder + Skip Connections

Encoder → Bottleneck → Decoder

64
128
256
512
512
512
512
512
512
512
512
512
256
128
64
3

channels per block (simplified)

Input

256×256

Activation

Tanh

Skip links

8 pairs

Skip connections concatenate encoder feature maps to the corresponding decoder layers, allowing the network to preserve low-level spatial details (edges, corners) that would otherwise be lost during downsampling.

D

Discriminator: PatchGAN

5 Strided Conv Layers

5 Conv Layers → Patch Grid

Input 256×256
Conv 128×128
Conv 64×64
Conv 32×32
Conv 30×30
Patch Output

30×30 real/fake judgments

Receptive field

70×70

Output

30×30

Input

Concat

PatchGAN judges whether 70×70 image patches look real or fake. This produces sharper textures than full-image discrimination and naturally captures high-frequency texture statistics like fabric weave and stitching detail.

Performance

Training Results

26
Epochs Trained
🏆
0.1602
Best Val L1
📦
8,000
Training Pairs

Loss Curves

G Loss
D Loss
Train L1
0.20.40.60.81.01.21.41.61.81510152026Epoch

Sample Outputs

Input sketch
Edge Map

Input sketch

Edge-detected outline from dataset

AI output
Generated

AI output

Pix2Pix U-Net generator output

Ground truth
Real Photo

Ground truth

Original paired photo from dataset

Implementation Details

Methodology

Dataset

  • KEY

    Edges2Shoes · 50,000 paired images

  • Each pair: Canny edge map + real photograph

  • Paired training enables direct pixel-level supervision — the model learns exact sketch-to-photo mappings rather than unpaired style transfer

Augmentation

  • KEY

    Random jitter: resize to 286×286 then crop to 256×256

  • Random horizontal flip with 50% probability

  • These two transforms alone prevent overfitting and teach the model translational invariance without introducing unrealistic distortions

Loss Function

  • KEY

    L1 (λ=100) + Adversarial BCE

  • L1 penalises per-pixel distance — enforces structural fidelity and prevents colour drift

  • Adversarial loss from PatchGAN pushes the generator toward photorealistic textures that L1 alone cannot produce

Optimizer

  • KEY

    Adam · β₁ = 0.5, β₂ = 0.999, LR 2×10⁻⁴

  • Lower β₁ reduces momentum — critical for GAN stability as gradients change direction frequently

  • Linear LR decay after epoch 25 gently anneals learning towards convergence without oscillation