AI-powered design preview using Conditional Generative Adversarial Networks. Draw a shoe sketch and watch it come to life.
Live Demo
Draw a shoe sketch or upload an edge map
Your Sketch
400 × 400Generated Shoe
Your generated shoe will appear here
The Pipeline
How It Works
Input
Draw or upload a shoe sketch
Provide a clean edge map or line drawing of a shoe silhouette. The model expects a 256×256 binary-style sketch where white strokes on black background define the structural outline — just like the Canny edge detector would produce.
U-Net Generator
AI translates structure to texture
A U-Net encoder-decoder maps your sketch to a realistic image. The encoder compresses spatial information through 8 downsampling blocks; skip connections bridge encoder and decoder layers, preserving fine-grained details that would otherwise be lost during compression.
Output
Photorealistic shoe image
The generator outputs a 256×256 RGB image with realistic textures, lighting, and color. The PatchGAN discriminator ensures local texture coherence at a 70×70 receptive field, producing fine leather, mesh, or fabric detail without global blurriness.
Technical Deep Dive
Model Architecture
Generator: U-Net
Encoder → Decoder + Skip Connections
Encoder → Bottleneck → Decoder
channels per block (simplified)
Input
256×256
Activation
Tanh
Skip links
8 pairs
Skip connections concatenate encoder feature maps to the corresponding decoder layers, allowing the network to preserve low-level spatial details (edges, corners) that would otherwise be lost during downsampling.
Discriminator: PatchGAN
5 Strided Conv Layers
5 Conv Layers → Patch Grid
30×30 real/fake judgments
Receptive field
70×70
Output
30×30
Input
Concat
PatchGAN judges whether 70×70 image patches look real or fake. This produces sharper textures than full-image discrimination and naturally captures high-frequency texture statistics like fabric weave and stitching detail.
Performance
Training Results
Loss Curves
Sample Outputs

Input sketch
Edge-detected outline from dataset

AI output
Pix2Pix U-Net generator output

Ground truth
Original paired photo from dataset
Implementation Details
Methodology
Dataset
- KEY
Edges2Shoes · 50,000 paired images
Each pair: Canny edge map + real photograph
Paired training enables direct pixel-level supervision — the model learns exact sketch-to-photo mappings rather than unpaired style transfer
Augmentation
- KEY
Random jitter: resize to 286×286 then crop to 256×256
Random horizontal flip with 50% probability
These two transforms alone prevent overfitting and teach the model translational invariance without introducing unrealistic distortions
Loss Function
- KEY
L1 (λ=100) + Adversarial BCE
L1 penalises per-pixel distance — enforces structural fidelity and prevents colour drift
Adversarial loss from PatchGAN pushes the generator toward photorealistic textures that L1 alone cannot produce
Optimizer
- KEY
Adam · β₁ = 0.5, β₂ = 0.999, LR 2×10⁻⁴
Lower β₁ reduces momentum — critical for GAN stability as gradients change direction frequently
Linear LR decay after epoch 25 gently anneals learning towards convergence without oscillation