Stable Diffusion for Advanced Users
Stable Diffusion: Open-Source AI Image Generation
Stable Diffusion is fundamentally different from Midjourney and DALL-E. It's open-source — you can run it locally on your own computer, modify it, extend it, and use it without ongoing subscription costs. The tradeoff is complexity: getting good results from Stable Diffusion requires more technical knowledge and configuration.
Why Stable Diffusion Exists in a Class of Its Own
Free and open: The base models are free to download and use. No per-image fees, no monthly subscription, no limits on generations.
Local execution: Run on your own hardware. Your images stay private — nothing is sent to an external server.
Unlimited customization: Fine-tune models on your own images, swap in specialized models, stack LoRAs, control every aspect of generation.
Massive model ecosystem: Thousands of community-trained models on Civitai and Hugging Face — specialized for anime, photorealism, architecture, product photography, specific art styles, specific people (with appropriate consent).
The catch: You need a reasonably powerful GPU (NVIDIA with 6GB+ VRAM for comfortable use), some technical setup, and patience to learn the system.
Running Stable Diffusion: Your Options
Option 1: Automatic1111 (AUTOMATIC1111 WebUI)
The most popular local SD interface. Feature-rich, endlessly extensible with plugins, used by most of the power user community.
Installation (Windows/Mac/Linux):
- Install Python 3.10 and Git
- Clone the repository:
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui - Download a model (e.g., from Civitai) and place in
/models/Stable-diffusion/ - Run
webui.bat(Windows) orwebui.sh(Mac/Linux) - Browser opens at
localhost:7860
Option 2: ComfyUI
Node-based interface — more powerful for building complex generation workflows, steeper learning curve.
Best for: Power users who want to build reusable generation pipelines with complex conditioning, LoRA stacking, or multi-model workflows.
Option 3: Cloud Services (No Local GPU Required)
Run Stable Diffusion without a powerful local machine:
- Replicate.com: API access to SD models, pay-per-image
- RunDiffusion.com: Hosted Automatic1111 environment
- Google Colab: Run SD notebooks in the cloud (free tier with limitations)
Core Concepts
Models (Checkpoints)
The base model determines the fundamental output style. Most popular:
SDXL (Stable Diffusion XL): The current flagship — higher resolution (1024x1024 native), better prompt following, more detailed outputs.
Realistic Vision: Fine-tuned for photorealistic photography output.
DreamShaper: Versatile — good at both realistic and artistic styles.
Majicmix: Particularly good for Asian-influenced art styles and portraiture.
Anything V5 / Counterfeit: Anime and illustration style.
Download models from Civitai (civitai.com) or Hugging Face. Place .safetensors or .ckpt files in /models/Stable-diffusion/.
LoRA (Low-Rank Adaptation)
LoRAs are small model add-ons that inject specific styles, characters, or concepts:
- Load a face LoRA to consistently generate a specific face style
- Load a lighting LoRA to apply a specific lighting aesthetic
- Stack multiple LoRAs with weight control
Usage in prompt: <lora:filename:0.8> (the number is the weight, 0.1-1.0)
VAE (Variational Autoencoder)
The VAE affects color saturation and sharpness. Many models benefit from a specific VAE. Common ones: vae-ft-mse-840000-ema-pruned (sharp, saturated), kl-f8-anime2 (anime aesthetic).
Samplers
The sampling algorithm affects quality and speed:
- DPM++ 2M Karras: Most popular for photorealism, good quality at 20-30 steps
- Euler a: Fast, creative, good for variation
- DDIM: Fast, good for precise control
- DPM++ SDE Karras: High quality, slower
Steps: 20-30 is usually sufficient. More steps ≠ better results after a point.
Prompting for Stable Diffusion
SD prompting is similar to Midjourney but with important differences:
Positive and Negative prompts: SD has explicit negative prompts to exclude things. This is powerful.
Comma-separated descriptors:
Positive: masterpiece, best quality, 8k, ultra detailed, photorealistic portrait,
professional studio lighting, beautiful woman, blue eyes, elegant dress,
bokeh background
Negative: (worst quality, low quality:1.4), blurry, watermark, text,
ugly, deformed, extra fingers, mutation, duplicate
Emphasis: Use parentheses for emphasis (blue dress:1.3) increases weight; [brown hair] decreases weight.
Universal Negative Prompt for Photorealism
(worst quality, low quality:1.4), (bad anatomy:1.3), (inaccurate limb:1.2),
bad composition, inaccurate eyes, extra digit, fewer digits,
(extra arms:1.2), text, watermark, logo
Key Settings
CFG Scale (Classifier-Free Guidance): How strictly SD follows your prompt.
- 7-8: Good balance (recommended starting point)
- Lower (4-6): More creative, less prompt-adherent
- Higher (10-15): Strongly follows prompt but can over-saturate
Resolution: SDXL works best at 1024x1024. SD 1.5 models at 512x512 or 768x768.
Seed: Controls randomness. -1 = random each time. Fix the seed to reproduce a specific image.
Img2img: Transform Existing Images
Img2img lets you use an existing image as a starting point:
- Upload an image
- Set denoising strength (0.1-0.9): lower = closer to original; higher = more transformation
- Prompt what you want the output to look like
Use cases:
- Style transfer (make a photo look like a painting)
- Upscale and enhance an image
- Modify specific elements while preserving overall composition
- Create variations of product photos
Use Cases That Justify Stable Diffusion
Privacy-sensitive generation: Generating images of real people (with consent), proprietary product concepts, or confidential designs you can't share with external services.
Scale without cost: Generating thousands of images for training datasets, content pipelines, or creative exploration without per-image costs.
Fine-tuning on your assets: Training a LoRA on your brand's visual style, product images, or specific characters.
Developer integration: Building custom applications with SD via the API without external service dependency.
Who Should Use Stable Diffusion
Stable Diffusion is worth the learning curve for:
- Developers building AI-powered image applications
- Power users who generate images at scale
- Anyone with privacy requirements
- Enthusiasts who want maximum control and experimentation
For most professionals, Midjourney or DALL-E 3 delivers better results with less setup time.
Next lesson: Canva AI — design tools for non-designers and teams.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises