TL;DR - Multi-Modal Prompting Guide
AI can now see, not just read. This guide teaches you to work with AI models that understand images, analyze documents, and generate visuals. Ready-to-use prompts for both image analysis and image generation. For foundational prompting skills, see the Prompt Engineering Fundamentals guide.
What you’ll learn:
- Image Analysis — Get AI to understand and describe images
- Document Processing — Extract insights from PDFs, screenshots, documents
- Image Generation — Create images with DALL-E, Midjourney, and more
- Visual Work — Analyze UI, charts, diagrams, and designs
- Combining Modalities — Text + image prompts for best results
Models covered:
- Image Understanding: GPT-4 Vision, Claude 3, Gemini Pro Vision
- Image Generation: DALL-E 3, Midjourney, Stable Diffusion
For more on AI image generation tools, see the AI Image Generation guide.
💡 Key insight: The best multi-modal prompts combine visual input with specific, context-rich questions. Don’t just ask “What’s in this image?”—tell the AI what you’re trying to learn or accomplish.
Image Analysis Prompts
Use these prompts with GPT-4 Vision, Claude 3, or Gemini Pro Vision.
General Image Analysis
===========================================
PROMPT: Comprehensive Image Analysis
===========================================
Analyze this image in detail.
Provide:
1. DESCRIPTION
What's in this image? Describe the main elements, subjects, and setting.
2. DETAILS
Notable details that might be missed at first glance.
3. CONTEXT CLUES
What can you infer about:
- When this was taken/created
- Where this might be
- The purpose or context
4. TECHNICAL ASPECTS
- Image quality and composition
- Lighting and color
- Style (photo, illustration, screenshot, etc.)
5. TEXT CONTENT
Any visible text, signs, labels, or writing.
6. NOTABLE OBSERVATIONS
Anything unusual, interesting, or significant.
===========================================
UI/UX Screenshot Analysis
===========================================
PROMPT: UI/UX Review
===========================================
Analyze this UI screenshot as a UX expert.
# REPLACE: Add context about the product/screen
# Context: "This is the checkout page for our e-commerce site.
# We're seeing a 40% cart abandonment rate."
Evaluate:
1. FIRST IMPRESSIONS
What would a user think in the first 3 seconds?
2. USABILITY ASSESSMENT
| Element | Issue (if any) | Severity | Suggestion |
- Navigation clarity
- Visual hierarchy
- Call-to-action visibility
- Form design (if applicable)
- Error states (if visible)
3. ACCESSIBILITY CONCERNS
- Color contrast issues
- Text readability
- Touch target sizes
- Missing labels or indicators
4. DESIGN CONSISTENCY
- Visual consistency
- Spacing and alignment
- Typography hierarchy
- Color usage
5. FRICTION POINTS
Where might users get confused or stuck?
6. RECOMMENDATIONS
Priority improvements (ranked):
| Priority | Change | Expected Impact | Effort |
7. POSITIVE ELEMENTS
What's working well that should be preserved?
===========================================
For more on AI-powered design workflows, see the AI for Design guide.
Chart and Data Visualization Analysis
===========================================
PROMPT: Chart Analysis
===========================================
Analyze this chart/graph and extract insights.
# REPLACE: Add context about what you're looking for
# Context: "This is our monthly revenue data for 2024.
# Preparing for board presentation."
Provide:
1. CHART OVERVIEW
- Type of chart
- What data is being visualized
- Time period or scope
2. DATA EXTRACTION
| Category/Period | Value | Notable |
Extract key data points visible in the chart.
3. TRENDS AND PATTERNS
- Overall trend (up, down, stable, volatile)
- Seasonal patterns
- Anomalies or outliers
- Rate of change
4. KEY INSIGHTS
Top 3 insights from this visualization:
1. [Most important finding]
2. [Second insight]
3. [Third insight]
5. COMPARATIVE ANALYSIS
- How do different segments compare?
- What's performing best/worst?
- Any crossover points or inflections?
6. QUESTIONS RAISED
What questions does this data prompt?
- [Question 1]
- [Question 2]
7. PRESENTATION SUMMARY
One paragraph summary suitable for executives.
8. LIMITATIONS
What can't we conclude from this visualization?
===========================================
Product Photo Analysis
===========================================
PROMPT: Product Photo Analysis
===========================================
Analyze this product photo for e-commerce or marketing use.
1. PRODUCT IDENTIFICATION
- What is the product?
- Category and type
- Visible features and specifications
2. PHOTO QUALITY ASSESSMENT
| Aspect | Rating (1-5) | Notes |
| Lighting | | |
| Focus/Sharpness | | |
| Background | | |
| Composition | | |
| Color accuracy | | |
3. E-COMMERCE READINESS
- Suitable for main listing photo? (Y/N + why)
- Recommended image type: (hero, lifestyle, detail, etc.)
- What's missing for a complete listing?
4. VISIBLE PRODUCT DETAILS
- Size indicators
- Materials visible
- Colors and variants
- Branding elements
5. SUGGESTED IMPROVEMENTS
For a better product photo:
- [Improvement 1]
- [Improvement 2]
6. DESCRIPTION ELEMENTS
Key features to highlight in product description:
- [Feature 1]
- [Feature 2]
- [Feature 3]
7. SEO KEYWORDS
Suggested keywords based on what's visible:
- [Keyword 1]
- [Keyword 2]
===========================================
Compare Multiple Images
===========================================
PROMPT: Image Comparison
===========================================
Compare these [X] images and identify differences and similarities.
# REPLACE: Add comparison context
# Context: "These are before/after photos of our office renovation"
# OR: "Compare these competitor product photos"
# OR: "These are design iterations - which is best?"
1. OVERVIEW
Brief description of each image:
- Image 1: [Description]
- Image 2: [Description]
2. SIMILARITIES
| Aspect | Present in Both | Notes |
3. DIFFERENCES
| Aspect | Image 1 | Image 2 | Significance |
4. QUALITY COMPARISON
| Criteria | Image 1 | Image 2 | Better |
| Composition | | | |
| Lighting | | | |
| Clarity | | | |
5. PURPOSE FIT
For [stated purpose]:
- Best option: [Which image]
- Reason: [Why]
6. DETAILED CHANGE LOG (for before/after)
| Element | Before | After | Improvement? |
7. RECOMMENDATION
Based on [context], I recommend [choice] because [reasoning].
===========================================
Document Analysis Prompts
For PDFs, documents, and multi-page content uploaded to AI.
Document Summary
===========================================
PROMPT: Document Summary
===========================================
Summarize this document comprehensively.
# REPLACE: Add what you need the summary for
# Context: "Need to brief my manager on this 40-page report"
1. DOCUMENT METADATA
- Title/Type: [What is this document?]
- Length: [Pages/sections]
- Author/Source: [If visible]
- Date: [If visible]
2. EXECUTIVE SUMMARY
[3-5 sentences capturing the essential message]
3. KEY POINTS
The most important takeaways:
1. [Point 1]
2. [Point 2]
3. [Point 3]
4. [Point 4]
5. [Point 5]
4. SECTION BREAKDOWN
| Section | Topic | Key Points |
5. DATA AND FIGURES
Notable statistics, numbers, or data:
- [Stat 1]
- [Stat 2]
6. CONCLUSIONS/RECOMMENDATIONS
What does the document conclude or recommend?
7. ACTION ITEMS
Any actions suggested by this document:
- [Action 1]
- [Action 2]
8. WHAT'S MISSING
Questions the document doesn't answer:
- [Gap 1]
9. BRIEFING VERSION
[One paragraph you could share verbally in 30 seconds]
===========================================
Extract Data from Document
===========================================
PROMPT: Data Extraction
===========================================
Extract structured data from this document.
# REPLACE: Specify what data you need
# Needed: "All customer names, contact info, and purchase amounts"
# OR: "All dates and deadlines mentioned"
# OR: "Financial figures and their context"
1. EXTRACTED DATA TABLE
| [Column 1] | [Column 2] | [Column 3] | [Column 4] |
| --- | --- | --- | --- |
[Populate with extracted data]
2. RAW DATA LIST
All instances of [data type]:
- [Item 1]
- [Item 2]
- [Item 3]
3. DATA QUALITY NOTES
- Confidence level: [High/Medium/Low]
- Unclear items: [List any ambiguous extractions]
- Missing data: [What couldn't be found]
4. CONTEXT
For each data point, relevant context:
- [Data point]: [Context about where/how it appears]
5. SUGGESTED FORMAT
If you need this data in a specific format:
CSV:
[CSV format]
JSON:
[JSON format]
===========================================
Document Q&A
===========================================
PROMPT: Document Question Answering
===========================================
I've uploaded a document. Answer my questions based on its content.
# REPLACE: Add your questions
# Questions:
# 1. What is the main conclusion?
# 2. What budget was approved?
# 3. Who is responsible for implementation?
# 4. What are the risks mentioned?
# 5. When is the deadline?
For each question:
**Q: [Question]**
A: [Answer based on document]
Source: "[Quote or reference from document]" (page/section X)
Confidence: [High/Medium/Low]
---
[Repeat for each question]
---
ADDITIONAL NOTES:
- Questions I couldn't answer: [List]
- Related information you might want: [Proactive additions]
===========================================
Image Generation Prompts
DALL-E 3 Prompts
DALL-E responds well to natural, descriptive language. For more on advanced prompting techniques, see the Advanced Prompt Engineering guide.
===========================================
TEMPLATE: DALL-E Product Photo
===========================================
Create a professional product photograph:
Subject: [PRODUCT]
Setting: [BACKGROUND/ENVIRONMENT]
Style: [PHOTOGRAPHY STYLE]
Lighting: [LIGHTING DESCRIPTION]
Composition: [FRAMING AND ANGLE]
Mood: [ATMOSPHERE/FEELING]
Additional details:
- [SPECIFIC DETAIL 1]
- [SPECIFIC DETAIL 2]
---
EXAMPLE:
Create a professional product photograph:
Subject: A minimalist ceramic coffee mug, matte white, with a curved handle
Setting: Clean white marble countertop with soft morning light from a window
Style: Commercial product photography, editorial quality
Lighting: Soft natural light from the left, subtle shadows, bright and airy
Composition: 3/4 angle view, centered subject with negative space
Mood: Calm, sophisticated, morning coffee moment
Additional details:
- Steam rising gently from the mug
- A small green plant blurred in the background
- Subtle reflection on the marble surface
===========================================
TEMPLATE: DALL-E Marketing Visual
===========================================
Create a marketing visual for:
Purpose: [AD TYPE/USE CASE]
Product/Brand: [WHAT IS BEING PROMOTED]
Target Audience: [WHO THIS IS FOR]
Visual Style: [ART DIRECTION]
Color Palette: [COLORS TO USE]
Key Elements: [MUST-INCLUDE ITEMS]
Text Space: [WHERE TEXT WILL BE ADDED]
Mood/Emotion: [FEELING TO EVOKE]
Format: [DIMENSIONS/ORIENTATION]
Avoid: [WHAT NOT TO INCLUDE]
---
EXAMPLE:
Create a marketing visual for:
Purpose: Instagram ad for a fitness app
Product/Brand: "FitPulse" - AI-powered workout coaching app
Target Audience: Young professionals (25-35) who work out at home
Visual Style: Modern, energetic, tech-forward with human element
Color Palette: Electric blue (#0066FF), white, with orange accents
Key Elements: Person in workout clothes with smartphone, living room setting
Text Space: Clear space on left side for headline and CTA
Mood/Emotion: Motivated, confident, achievable fitness
Format: 1080x1350 (portrait for Instagram)
Avoid: Gym setting, heavy weights, unrealistic body types
===========================================
TEMPLATE: DALL-E Illustration
===========================================
Create an illustration:
Subject: [WHAT TO ILLUSTRATE]
Style: [ART STYLE]
Medium: [TRADITIONAL MEDIUM TO EMULATE]
Color Scheme: [COLOR APPROACH]
Composition: [LAYOUT AND ARRANGEMENT]
Level of Detail: [SIMPLE/MODERATE/DETAILED]
Mood: [ATMOSPHERE]
Technical notes:
- [SPECIFIC STYLE ELEMENT 1]
- [SPECIFIC STYLE ELEMENT 2]
---
EXAMPLE:
Create an illustration:
Subject: A cozy home office setup with plants, books, and warm lighting
Style: Modern flat illustration with subtle gradients
Medium: Digital illustration resembling gouache painting
Color Scheme: Warm neutrals (beige, terracotta, forest green) with cream highlights
Composition: Isometric view, room corner visible, desk as focal point
Level of Detail: Moderate - recognizable objects without photorealism
Mood: Peaceful, productive, aspirational work-from-home
Technical notes:
- Soft, rounded corners on furniture
- No outlines, shapes defined by color
- Window showing soft golden hour light
Midjourney Prompts
Midjourney uses specific parameters and benefits from artistic references.
===========================================
TEMPLATE: Midjourney Photo-Realistic
===========================================
[SUBJECT DESCRIPTION], [SETTING/ENVIRONMENT],
[LIGHTING DESCRIPTION], [CAMERA/LENS],
[PHOTOGRAPHY STYLE], [MOOD/ATMOSPHERE]
--ar [ASPECT RATIO] --v 6 --style raw
---
EXAMPLE:
Portrait of a woman with silver hair, Japanese garden in autumn,
soft diffused natural light through maple trees, shot on Canon 5D Mark IV 85mm f/1.4,
editorial fashion photography, serene and contemplative mood
--ar 2:3 --v 6 --style raw
---
COMMON PARAMETERS:
--ar 16:9 (widescreen)
--ar 2:3 (portrait photo)
--ar 1:1 (square)
--ar 3:2 (landscape photo)
--v 6 (version 6)
--style raw (more photorealistic)
--q 2 (higher quality, slower)
--s 250 (stylization level, 0-1000)
--c 20 (chaos/variation, 0-100)
===========================================
TEMPLATE: Midjourney Artistic/Stylized
===========================================
[SUBJECT], [ARTISTIC STYLE] style,
[ARTIST REFERENCE] inspired, [COLOR PALETTE],
[MOOD/ATMOSPHERE], [MEDIUM/TECHNIQUE]
--ar [ASPECT] --v 6 --s [STYLIZATION]
---
EXAMPLE:
Ancient temple ruins reclaimed by nature, Studio Ghibli style,
Hayao Miyazaki inspired, lush greens and warm golden light,
magical and peaceful atmosphere, painted backgrounds with fine detail
--ar 16:9 --v 6 --s 750
---
Another example:
Cyberpunk street market at night, Blade Runner aesthetic,
Syd Mead inspired, neon pink and blue with rain-slicked surfaces,
busy and atmospheric, cinematic concept art
--ar 21:9 --v 6 --s 500
===========================================
TEMPLATE: Midjourney Logo/Icon
===========================================
Minimal [STYLE] logo of [SUBJECT],
[COLOR SCHEME], clean vector style,
centered on [BACKGROUND], simple and memorable
--ar 1:1 --v 6 --s 50
---
EXAMPLE:
Minimal geometric logo of a mountain peak inside a circle,
deep blue and white, clean vector style,
centered on solid white background, simple and memorable
--ar 1:1 --v 6 --s 50
---
TIP: Lower stylization (--s 50-250) for logos
Higher stylization (--s 500-1000) for artistic images
Stable Diffusion Prompts
Stable Diffusion works best with comma-separated descriptors.
===========================================
TEMPLATE: Stable Diffusion General
===========================================
POSITIVE PROMPT:
[SUBJECT], [STYLE], [QUALITY BOOSTERS],
[LIGHTING], [CAMERA], [ADDITIONAL DETAILS]
NEGATIVE PROMPT:
[WHAT TO AVOID]
---
EXAMPLE:
POSITIVE PROMPT:
professional portrait photo of a business woman in modern office,
photorealistic, high quality, 8k uhd, dslr,
soft studio lighting, sharp focus, natural skin texture,
wearing navy blazer, confident expression
NEGATIVE PROMPT:
cartoon, anime, illustration, painting, drawing,
low quality, blurry, distorted, deformed,
unusual proportions, bad hands, watermark, text
---
QUALITY BOOSTERS (add these for better results):
- masterpiece, best quality, highly detailed
- 8k uhd, dslr, high resolution
- professional photography
- sharp focus
- intricate details
Combining Text and Images
Image + Context Prompt
===========================================
PROMPT: Contextual Image Analysis
===========================================
[Upload image]
I'm sharing this image because: [CONTEXT]
# REPLACE: Add your specific context
# Example: "This is a competitor's product page. I need to understand
# what they're doing well for our own product page redesign."
Based on this context, please:
1. [SPECIFIC QUESTION OR TASK 1]
2. [SPECIFIC QUESTION OR TASK 2]
3. [SPECIFIC QUESTION OR TASK 3]
Focus particularly on: [SPECIFIC FOCUS AREA]
---
EXAMPLE:
I'm sharing this image because: This is a mockup my designer sent
for our app's new onboarding flow. We're deciding whether to approve
or request changes before development.
Based on this context, please:
1. Identify any usability issues that could cause user drop-off
2. Check if the flow matches mobile UX best practices
3. Suggest 3 specific improvements with reasoning
Focus particularly on: The transition between screens and whether
users will understand what to do at each step.
Iterating on Generated Images
===========================================
PROMPT: Image Iteration
===========================================
[Reference the previous image or upload it]
Keep: [WHAT TO PRESERVE]
Change: [WHAT TO MODIFY]
Add: [WHAT TO INCLUDE]
Remove: [WHAT TO ELIMINATE]
Additional direction:
[MORE SPECIFIC GUIDANCE]
---
EXAMPLE:
The image is good, but please adjust:
Keep: The overall composition and color scheme
Change: Make the lighting warmer, more golden hour
Add: A subtle lens flare in the top right
Remove: The person in the background
Additional direction:
The mood should feel more intimate and inviting.
Current version feels too clinical.
Multi-Modal Workflow Prompts
Image to Content Pipeline
===========================================
PROMPT: Image to Blog Post
===========================================
[Upload image]
Create a blog post based on this image.
Image context: [WHAT THIS IMAGE IS]
Blog audience: [WHO READS THE BLOG]
Blog tone: [VOICE AND STYLE]
Desired length: [WORD COUNT]
Generate:
1. BLOG POST TITLE
3 options with different angles
2. INTRODUCTION
Hook that draws readers in
3. BODY CONTENT
[X] paragraphs expanding on what's shown
4. KEY TAKEAWAYS
Bullet points of main messages
5. CALL TO ACTION
What should readers do next?
6. SEO ELEMENTS
- Meta description
- Alt text for the image
- Suggested tags/categories
---
EXAMPLE:
Image context: Photo from our company's annual team retreat
Blog audience: Potential job candidates visiting our careers page
Blog tone: Warm, authentic, showing real company culture
Desired length: 500-700 words
Visual Feedback Collection
===========================================
PROMPT: Design Feedback Request
===========================================
[Upload design image]
I need feedback on this design before presenting to stakeholders.
Design purpose: [WHAT THIS IS FOR]
Stage: [EARLY CONCEPT / REFINED / FINAL]
Specific concerns: [WHAT YOU'RE UNSURE ABOUT]
Please provide:
1. FIRST IMPRESSION
Gut reaction in one sentence
2. STRENGTHS
What's working well (with specifics)
3. CONCERNS
Potential issues (with severity: critical/moderate/minor)
4. SUGGESTIONS
Specific, actionable improvements
5. STAKEHOLDER LENS
How might [STAKEHOLDER TYPE] react to this?
6. QUESTIONS TO ASK
What should I discuss with the team before proceeding?
---
EXAMPLE:
Design purpose: New checkout page for mobile app
Stage: Refined - presenting to product team tomorrow
Specific concerns: Not sure if the form fields are too small
for thumb input on smaller phones
Quick Reference
Image Analysis
| Need | Prompt Type |
|---|---|
| General understanding | Comprehensive Image Analysis |
| UI/UX feedback | UI/UX Screenshot Analysis |
| Data from charts | Chart and Data Analysis |
| Product photos | Product Photo Analysis |
| Before/after comparison | Image Comparison |
Document Analysis
| Need | Prompt Type |
|---|---|
| Quick summary | Document Summary |
| Get specific data | Data Extraction |
| Answer questions | Document Q&A |
Image Generation
| Tool | Best For |
|---|---|
| DALL-E 3 | Natural descriptions, product shots, marketing |
| Midjourney | Artistic, stylized, photorealistic |
| Stable Diffusion | Customizable, local control, specific styles |
Tips for Multi-Modal Prompting
1. Provide Context with Images
❌ "What is this?"
✅ "This is our competitor's landing page. What are they doing
well that we should consider for our redesign?"
2. Be Specific About What You’re Looking For
❌ "Analyze this chart"
✅ "Analyze this chart for trends in Q3-Q4. I need insights
for a board presentation about revenue growth."
3. Combine Image + Text Strategically
The image provides visual context.
The text provides intent, constraints, and specific questions.
Together they get better results than either alone.
4. For Image Generation: Details Matter
❌ "A cat"
✅ "A fluffy orange tabby cat curled up on a velvet armchair,
afternoon sunlight streaming through lace curtains,
soft focus background, cozy cottage atmosphere"
5. Reference Visual Styles
"In the style of [artist/photographer/brand]"
"Resembling [specific artwork or photo style]"
"With the aesthetic of [reference]"
6. Iterate Systematically
Generate → Review → Specify what to keep → Request changes
Build on what works rather than starting over.
7. Use Multiple Passes
First pass: Get the overall concept right
Second pass: Refine details and style
Third pass: Final polish and variations
What’s Next
- 📚 Prompt Templates & Variables — Create reusable prompts
- 📚 Mega-Prompt Engineering — Build complete AI assistants
- 📚 AI Prompts for Marketers — Marketing-specific image prompts
- 🛠️ AI Image Tools — Put these prompts into action
Found this guide helpful? Share it with your creative team!