THE LAST LOCAL IMAGE GENERATION GUIDE YOU'LL EVER NEED

Local Image Generation Ultimate Guide

UPDATED: Jan 2026 macOS + Windows + Linux FLUX / SD3.5 / ComfyUI

Local image generation means running AI models like FLUX and Stable Diffusion on your own computer. Instead of sending prompts to a cloud service, your GPU processes everything locally. The models are files you download once, and generation happens offline.

$ cat /var/log/ai-image-gen/2026-state.md

# The Scene

In 2026, local image generation is not a hobby—it's infrastructure. The gap between cloud and local has collapsed. With the right setup, you generate images indistinguishable from Midjourney, with zero per-image costs, complete privacy, and full creative control.

WHY LOCAL?
Unlimited Volume Generate 1,000 images/day. No throttling. No credits.
Zero Censorship No content filters. Generate exactly what you envision.
Total Privacy Your prompts never leave your machine.
Fine-Tuning Train custom LoRAs on your own subjects and styles.

The ecosystem has matured. FLUX emerged as the photorealism king. ComfyUI became the professional standard. LoRAs let anyone train custom styles in hours. The tools exist. You just need to know what to use and when.

$ diff --compare models/2026/*

# The Models

Two architectures dominate local generation: FLUX (by Black Forest Labs, the ex-Stability team) and Stable Diffusion 3.5 (by Stability AI). They're different tools for different jobs.

[F] FLUX BEST FOR PHOTOREALISM

FLUX is a 12B parameter transformer-based model that produces the most photorealistic AI images available. It excels at human anatomy, realistic lighting, and complex scenes. If you want images that look like photographs, FLUX is the answer.

FLUX.1 [schnell] Apache 2.0

4 steps ~30s on M3 Max Commercial OK

Distilled for speed. Great quality at blazing fast generation times. The go-to for iteration and prototyping. Fully open source.

FLUX.1 [dev] Non-Commercial

20-50 steps ~2-5 min Research/Personal

The quality benchmark. Perfect faces, intricate textures, superior composition. Requires 64GB+ RAM for comfortable local use.

FLUX.1 [pro] / Ultra API Only

Up to 4MP Cloud Commercial

The commercial API version. 2K resolution output. Best prompt adherence. Not available for local generation.

flux-strengths.txt

+ Photorealistic humans (faces, hands, anatomy)
+ Dramatic lighting and composition
+ Typography (can render text accurately)
+ High-resolution detail preservation
+ Natural skin textures and materials

- Heavy on resources (12B params, ~35GB disk)
- Slower than SD on equivalent hardware
- Fewer community LoRAs (ecosystem still growing)
- [dev] license restricts commercial use

[S] Stable Diffusion 3.5 BEST FOR ARTISTIC STYLES

SD3.5 continues the Stable Diffusion legacy with improved architecture. It produces vibrant, stylized images and has a massive ecosystem of LoRAs, ControlNets, and community tools. Better for artistic work than pure photorealism.

SD 3.5 Large Community License

8B params ~40s Revenue cap: $1M

The flagship. Great at stylized imagery, illustrations, and creative compositions. Huge LoRA ecosystem for custom styles.

SD 3.5 Medium Community License

2B params ~15s Consumer hardware

Optimized for lower-end hardware. Runs on 8GB VRAM cards. Good balance of quality and speed for most users.

sd35-strengths.txt

+ Vibrant colors and artistic flair
+ Massive LoRA ecosystem (thousands available)
+ Runs on consumer hardware (8GB VRAM)
+ Great for illustrations and stylized work
+ Lower disk footprint than FLUX

- Struggles with photorealistic humans
- Finger/hand issues persist (though improved)
- Text rendering less reliable than FLUX
- Less precise prompt following

ASPECT FLUX SD 3.5 VERDICT

Photorealism Excellent Good FLUX by far

Human Anatomy Superior Improved FLUX wins

Artistic Styles Good Excellent SD for variety

Text in Images Reliable Hit or miss FLUX clearly

Hardware Needs Heavy (24GB+) Moderate (8GB+) SD more accessible

LoRA Ecosystem Growing Massive SD dominates

Generation Speed Slower Faster SD faster

Other Models Worth Knowing

Playground v3 Excellent aesthetics, rivals Midjourney for stylized work

Ideogram 2 Best-in-class text rendering, API only

Hunyuan-DiT Tencent's open model, strong on Asian aesthetics

PixArt-Sigma Efficient transformer, good quality/speed ratio

$ sysctl hw.memsize && nvidia-smi

# Hardware Reality

Your experience with local generation depends entirely on your hardware. There are two paths: NVIDIA GPU (Windows/Linux) or Apple Silicon (Mac). Both work. They have different trade-offs.

[N] NVIDIA GPU Path RECOMMENDED FOR SPEED

NVIDIA GPUs with CUDA remain the gold standard for AI image generation. Most models are optimized for CUDA first. If raw speed matters, this is the path.

Entry $300-500

RTX 4060 Ti (16GB) / RTX 3060 (12GB)

Capability: SD3.5 Medium, FLUX schnell (slow)

~30-60s per image

Sweet Spot $700-900 used

RTX 3090 (24GB)

Capability: All models comfortably, including FLUX dev

~10-20s per image

Best value in 2026. 70-80% of 4090 performance at 1/3 the price.

Professional $1,800-2,500

RTX 4090 (24GB) / RTX 5090 (32GB)

Capability: Everything, batches, video generation

~5-10s per image

Key insight: VRAM matters more than GPU generation. A 3090 (24GB) outperforms a 4070 (12GB) for large models because it can fit more of the model in memory.

[A] Apple Silicon Path UNIFIED MEMORY ADVANTAGE

Macs use unified memory architecture—your RAM is your VRAM. This means a 64GB Mac can load models that would require a 64GB GPU on PC (which doesn't exist in consumer hardware). The trade-off is slower generation.

Minimum Viable MacBook Pro 14"

M3/M4 Pro (18-24GB)

Capability: SD3.5 Medium, FLUX schnell (with optimization)

~2-5 min per image

Requires memory optimization. Functional but slow.

Recommended MacBook Pro 16" / Mac Studio

M3/M4 Max (48-64GB)

Capability: All models including FLUX dev

~30-90s per image

The sweet spot for Mac users. Comfortable daily driver.

Professional Mac Studio

M2/M3 Ultra (128-192GB)

Capability: Everything + multitasking + large batches

~20-40s per image

Mac reality check: Apple Silicon is 3-5x slower than equivalent NVIDIA setups for image generation. But unified memory lets you run models that simply won't fit on consumer GPUs. Choose Mac if you value the ecosystem, efficiency, and can tolerate slower speeds.

Storage Requirements

FLUX.1 schnell ~23GB

FLUX.1 dev ~34GB

SD 3.5 Large ~16GB

ComfyUI + models ~50-100GB

Reserve 100GB minimum for a comfortable setup with multiple models.

$ ls -la /opt/ai/interfaces/

# The User Interfaces

You don't interact with models directly—you use a UI. The UI determines your workflow, what features you can access, and how much control you have. There are four main options.

ComfyUI PROFESSIONAL STANDARD

Node-based workflow editor. Complete control over every aspect of generation. The industry standard for studios and serious creators.

Learning Curve 10-30 hours

Control Level Total

Performance 2x faster than A1111

+ Strengths

First to support new models (FLUX, video)
Shareable workflows as JSON
Best performance and memory efficiency
Modular—only load what you need

- Weaknesses

Steep initial learning curve
Node spaghetti gets complex
Less beginner-friendly documentation

Verdict: Learn this. It's where the industry is going. Video generation, complex workflows, new models—all arrive here first.

SD WebUI Forge BEST FOR BEGINNERS

Fork of A1111 with 30-75% better performance. Familiar interface, optimized backend. The best starting point for newcomers.

Learning Curve 30 min

Control Level High

Performance 30-75% faster than A1111

+ Strengths

Familiar dropdown/slider interface
FLUX support (unlike base A1111)
Memory optimizations for 8GB cards
Extensions work out of box

- Weaknesses

Use July 2024 build (a9e0c38) for stability
Less flexible than ComfyUI
Development uncertain

Verdict: Start here if you're new. Graduate to ComfyUI when you need more control.

AUTOMATIC1111 LEGACY

The original. Massive extension library, tons of documentation, battle-tested. But showing its age—use Forge instead for new setups.

Learning Curve 30 min

Control Level High

Performance Baseline

Verdict: Only use if you have existing A1111 workflows. For new users, Forge is strictly better.

Fooocus ZERO CONFIG

"Midjourney-like" experience. Minimal interface, smart defaults, just write a prompt and go. Perfect for non-technical users.

Learning Curve 5 min

Control Level Basic

Performance Good

Verdict: Great for quick results or non-technical users. You'll outgrow it if you get serious.

RECOMMENDED PATH

1 Start with Forge to learn basics

2 Move to ComfyUI for production work

3 Build custom workflows and share

$ ./setup.sh --quickstart

# Getting Started

Here's the practical path to your first local image. We'll use ComfyUI with FLUX schnell—the best balance of quality and accessibility.

Get Model Access

FLUX models are "gated"—you need to accept terms before downloading. This takes 2 minutes.

huggingface-setup

# 1. Create account at huggingface.co

# 2. Visit model page and click "Agree and access"

$ open https://huggingface.co/black-forest-labs/FLUX.1-schnell

# 3. Create access token

$ open https://huggingface.co/settings/tokens

# 4. Login via CLI

$ pip install huggingface_hub

$ huggingface-cli login

Enter token: ************************************

Install ComfyUI

ComfyUI is a Python application. The portable version includes everything you need.

Windows

# Download portable (includes Python)

$ git clone https://github.com/comfyanonymous/ComfyUI

$ cd ComfyUI

# Or download release from GitHub

macOS / Linux

$ git clone https://github.com/comfyanonymous/ComfyUI

$ cd ComfyUI

$ pip install -r requirements.txt

# Mac users: also install MPS support

$ pip install torch torchvision torchaudio

Download Models

Place models in the correct ComfyUI folders. FLUX needs the base model plus text encoders.

model-download

# Using huggingface-cli (recommended)

$ cd ComfyUI/models/unet

$ huggingface-cli download black-forest-labs/FLUX.1-schnell \

flux1-schnell.safetensors --local-dir .

# Download text encoders

$ cd ../clip

$ huggingface-cli download comfyanonymous/flux_text_encoders \

t5xxl_fp16.safetensors clip_l.safetensors --local-dir .

# Download VAE

$ cd ../vae

$ huggingface-cli download black-forest-labs/FLUX.1-schnell \

ae.safetensors --local-dir .

Resulting folder structure:

ComfyUI/models/
├── unet/
│   └── flux1-schnell.safetensors
├── clip/
│   ├── t5xxl_fp16.safetensors
│   └── clip_l.safetensors
└── vae/
    └── ae.safetensors

Generate Your First Image

Launch ComfyUI and load a FLUX workflow. The default workflow works, or grab optimized ones from the community.

launch

$ cd ComfyUI

$ python main.py

Starting server...

To see the GUI go to: http://127.0.0.1:8188

ComfyUI ready

Quick workflow: Visit openart.ai/workflows and search "FLUX schnell" for ready-to-use workflows

Mac users: Add --force-fp16 flag if you get memory errors

$ man advanced-techniques

# Advanced Concepts

Once you're generating images, these techniques let you customize, control, and improve your results dramatically.

[L]

LoRAs

STYLE CUSTOMIZATION

Low-Rank Adaptation. Small add-on files (50-200MB) that teach the model new styles, characters, or concepts without replacing the base model.

Character likeness Art styles Product photos Specific aesthetics

Find LoRAs: civitai.com, huggingface.co

Tip: Stack multiple LoRAs with different weights (0.3-0.8) for unique combinations.

[C]

ControlNet

COMPOSITION CONTROL

Use reference images to control pose, depth, edges, or composition of generated images. Essential for consistent character poses and precise layouts.

Pose matching Depth maps Edge detection Segmentation

Tip: Canny edge + OpenPose combo gives precise control over both composition and human poses.

[I]

IP-Adapter

IMAGE PROMPTING

Use images as prompts instead of (or alongside) text. Feed a reference image and the model generates variations matching its style, subject, or composition.

Style transfer Face consistency Subject variations

Tip: IP-Adapter FaceID + LoRA = consistent character across many images.

[2]

Img2Img

ITERATIVE REFINEMENT

Feed an existing image and denoise it partially. Low strength (0.3-0.5) makes subtle changes; high strength (0.7-0.9) rebuilds most of the image.

Fix faces Add detail Style transfer Upscaling

Tip: Generate with schnell, refine with dev at 0.4 strength for best quality/speed balance.

[M]

Inpainting

SELECTIVE EDITING

Regenerate only masked portions of an image. Fix hands, change backgrounds, add or remove objects while keeping the rest intact.

Hand fixing Background swap Object removal Adding elements

Tip: Use generous mask padding (32-64px) for seamless blending.

[U]

Upscaling

RESOLUTION BOOST

Generate at 1024x1024, upscale to 4K. Models like RealESRGAN, SUPIR, and tiled diffusion can add detail while increasing resolution.

4x upscale Detail enhancement Print-ready output

Tip: Ultimate SD Upscale in ComfyUI tiles the image for memory-efficient 4x upscaling.

$ cat ~/.config/ai-gen/tips.md

# Pro Tips

Front-load your prompts

CLIP encoders read ~77 tokens. Put the most important details (subject, action, setting) in the first 20 words. Style and mood can come later.

Batch then cherry-pick

Generate 4-8 images at once, pick the best, then refine with img2img. Faster than iterating on single images.

Save your seeds

Found a great composition? Save the seed. You can regenerate with the same structure while changing styles, colors, or details.

Use negative prompts strategically

Don't overload negatives. Focus on specific artifacts you're seeing: "blurry, watermark, extra fingers" is better than a wall of text.

schnell -> dev workflow

Iterate ideas with FLUX schnell (fast), then render finals with dev (quality). Use the same seed for consistency.

Memory over CPU

When upgrading, prioritize RAM/VRAM over CPU/GPU generation. A 64GB M4 Max beats a 32GB M4 Max for AI work, regardless of core count.

# The Scene

# The Models

[F] FLUX BEST FOR PHOTOREALISM

[S] Stable Diffusion 3.5 BEST FOR ARTISTIC STYLES

Other Models Worth Knowing

# Hardware Reality

[N] NVIDIA GPU Path RECOMMENDED FOR SPEED

[A] Apple Silicon Path UNIFIED MEMORY ADVANTAGE

Storage Requirements

# The User Interfaces

# Getting Started

Get Model Access

Install ComfyUI

Download Models

Generate Your First Image

# Advanced Concepts

LoRAs

ControlNet

IP-Adapter

Img2Img

Inpainting

Upscaling

# Pro Tips

Front-load your prompts

Batch then cherry-pick

Save your seeds

Use negative prompts strategically

schnell -> dev workflow

Memory over CPU

Frequently Asked Questions