GPT-4Vision Use Cases


Taxonomy of GPT-4V Use Cases

V For Vendetta Vision

Welcome to the 3,019 smart, curious folks. Got forwarded this email? Subscribing here

Do you know what to use GPT-4V for?

Just attach an image, simple right?

What do you get when you combine an intelligent model w/ the ability to see the world? An explosion of new use cases

It’s hard to be on twitter and not see demos flying around. They’re awesome.

Every demo is an inspiration-seed planted in the brains of future founders and builders.

But! GPT-4V demos feel like puzzle pieces on the floor that dropped out of the box - I know there is a bigger picture but I can’t quite see it yet.

So I went on a mission to collect over +100 demos with the goal of putting the pieces together and try to make buckets of the use cases I'm seeing.

Here’s a breakdown of how I’d split these up with examples.


Reply back and let me know if you agree or disagree (I respond to every reply)

BTW as email subscribers you get access to the full use case list. Check it out here.

Meta thoughts before we get into the use cases:

  • Because of GPT-4V's friction (no API access yet, browser only for now), most of the examples we see are 1-shot. You give ChatGPT an image and a prompt and then you’re done. Once API access opens up we’ll see new use cases open up.
  • Be careful, just like early days of ChatGPT, the output looks good, but there are plenty of hallucinations sneaking around.

7 Categories of GPT-4V Use Cases

I originally tested this framework on twitter, check out the tweet

1. Describe

First up is the base case, simply describe what's in an image. The examples I saw were either identification of a single object or an inventory of many objects.

Examples: Animal Identification, “What objects are in this photo?”, "Where was this photo taken?"

2. Interpret

The biggest category of the bunch. Here the model will explain the meaning within a photo or provide more context behind an image. These use cases take an image and go a layer deeper with synthesis.

Examples: Technical Flame Graph Interpretation, Schematic Interpretation, Twitter Thread Explainer, Car Crash Anticipation, Schematic Understanding

3. Recommend

Here we have two big categories: Critiques and Recommendations. With critiques, the user will ask GPT-4V for it’s opinion on an image. This could be a painting, landing page, logo, etc.

Users will also ask GPT-4V to give its opinion via a recommendation. Which piece of fruit is better? Which food menu item should I get?

Examples: Food Recommendations, Website Feedback (tons of these floating around), Painting Feedback

4. Convert

Here the model convert images into other forms (code, narrative, etc.) or generate something new. This category has massive implications.

My favorites are code generate via an image.

The key insight is that you can use models to generate config files - which are the basis for nearly all application interfaces.

Examples: Figma Screens > Code, Adobe Lightroom Settings, Suggest ad copy based on a webpage

5. Extract

Use the model to extract structured data based on entities within an image. I love this one for another way unstructured data will become structured.

Examples: Structured Data From Driver's License, Extract structured items from an image, Handwriting Extraction, Spice Rack Extraction

6. Assist

Use the model to offer solutions based on an image. The model will interpret what is happening in an image then have it'll propose suggestions on what to do next.

This is similar to when I use the model to debug code, but in real life.

Examples: Excel Formula Helper, Find My Glasses (love this one), Live Poker Advice, Video game recommendations

7. Evaluate

Subjective judgment based on the image. This is when the model will evaluate image contents or give a subjective opinion.

Examples: Dog Cuteness Evaluator, Bounding Box Evaluator, Thumbnail Testing

Cool Greg, so what should I do with this information?

Build. Ship. Iterate.

Jokes aside, you now have new capabilities to solve problems for your customers. Rather than go look for new problems, revisit the problem you and your customers already have and see if GPT-4V can aid them.

If you're not sure where to get started, just tinker around your curiosity and have fun.

Me personally? I love the extraction category, we now have a new capability to structure unstructured data.

Let me know what thought about this email, I love replies (I respond to every one).


Greg Kamradt

Twitter / LinkedIn / Youtube / Work With Me

Unsubscribe

Greg's Updates & News

AI, Business, and Personal Milestones

Read more from Greg's Updates & News

Sully Omar Interview 2 years of building with LLMs in 35 minutes Welcome to the 100 people who have joined us since last week! If you aren’t subscribed, join 9,675 AI folks. View this post online. Subscribe Now Sully, CEO Of Otto (Agents in your spreadsheets) came on my new series AI Show & Tell I reached out to him because you can tell he feels the AI. His experience is not only practical, it's battle tested. Sully's literally built a product of autonomous async agents that do research for...

Joining ARC Prize How the cofounder of Zapier recruited me to run a $1M AI competition Welcome to the 2,450 people who have joined us since last post! If you aren’t subscribed, join 9,619 AI folks. View this post online. Subscribe Now "We gotta blow this up." That's what Mike Knoop (co-founder of Zapier) says to me in early 2024. "ARC-AGI, we gotta make it huge. It's too important." "Wait, ARC? What are you talking about?" I quickly reply. "It's the most important benchmark and unsolved...

Building a business around a commodity OpenAI's models are a commodity, now what? Welcome to the 296 people who have joined us since last week! If you aren’t subscribed, join 3,939 AI folks. View this post online. Subscribe Now Large Language Models are becoming a commodity. We all know it. So if you’re a foundational model company, what do you do? You build a defensible business around your model. You build your moat. Google famously said they have no moat, “and neither does OpenAI.” But...