Taxonomy of GPT-4V Use Cases

V For ~~Vendetta~~ Vision

Welcome to the 3,019 smart, curious folks. Got forwarded this email? Subscribing here

Subscribe Now

Do you know what to use GPT-4V for?

Just attach an image, simple right?

What do you get when you combine an intelligent model w/ the ability to see the world? An explosion of new use cases

It’s hard to be on twitter and not see demos flying around. They’re awesome.

Every demo is an inspiration-seed planted in the brains of future founders and builders.

But! GPT-4V demos feel like puzzle pieces on the floor that dropped out of the box - I know there is a bigger picture but I can’t quite see it yet.

So I went on a mission to collect over +100 demos with the goal of putting the pieces together and try to make buckets of the use cases I'm seeing.

Here’s a breakdown of how I’d split these up with examples.

Reply back and let me know if you agree or disagree (I respond to every reply)

BTW as email subscribers you get access to the full use case list. Check it out here.

Meta thoughts before we get into the use cases:

Because of GPT-4V's friction (no API access yet, browser only for now), most of the examples we see are 1-shot. You give ChatGPT an image and a prompt and then you’re done. Once API access opens up we’ll see new use cases open up.
Be careful, just like early days of ChatGPT, the output looks good, but there are plenty of hallucinations sneaking around.

7 Categories of GPT-4V Use Cases

I originally tested this framework on twitter, check out the tweet

1. Describe

First up is the base case, simply describe what's in an image. The examples I saw were either identification of a single object or an inventory of many objects.

Examples: Animal Identification, “What objects are in this photo?”, "Where was this photo taken?"

2. Interpret

The biggest category of the bunch. Here the model will explain the meaning within a photo or provide more context behind an image. These use cases take an image and go a layer deeper with synthesis.

Examples: Technical Flame Graph Interpretation, Schematic Interpretation, Twitter Thread Explainer, Car Crash Anticipation, Schematic Understanding

3. Recommend

Here we have two big categories: Critiques and Recommendations. With critiques, the user will ask GPT-4V for it’s opinion on an image. This could be a painting, landing page, logo, etc.

Users will also ask GPT-4V to give its opinion via a recommendation. Which piece of fruit is better? Which food menu item should I get?

Examples: Food Recommendations, Website Feedback (tons of these floating around), Painting Feedback

4. Convert

Here the model convert images into other forms (code, narrative, etc.) or generate something new. This category has massive implications.

My favorites are code generate via an image.

The key insight is that you can use models to generate config files - which are the basis for nearly all application interfaces.

Examples: Figma Screens > Code, Adobe Lightroom Settings, Suggest ad copy based on a webpage

5. Extract

Use the model to extract structured data based on entities within an image. I love this one for another way unstructured data will become structured.

Examples: Structured Data From Driver's License, Extract structured items from an image, Handwriting Extraction, Spice Rack Extraction

6. Assist

Use the model to offer solutions based on an image. The model will interpret what is happening in an image then have it'll propose suggestions on what to do next.

This is similar to when I use the model to debug code, but in real life.

Examples: Excel Formula Helper, Find My Glasses (love this one), Live Poker Advice, Video game recommendations

7. Evaluate

Subjective judgment based on the image. This is when the model will evaluate image contents or give a subjective opinion.

Examples: Dog Cuteness Evaluator, Bounding Box Evaluator, Thumbnail Testing

Cool Greg, so what should I do with this information?

Build. Ship. Iterate.

Jokes aside, you now have new capabilities to solve problems for your customers. Rather than go look for new problems, revisit the problem you and your customers already have and see if GPT-4V can aid them.

If you're not sure where to get started, just tinker around your curiosity and have fun.

Me personally? I love the extraction category, we now have a new capability to structure unstructured data.

Let me know what thought about this email, I love replies (I respond to every one).

Greg Kamradt

Twitter / LinkedIn / Youtube / Work With Me

Unsubscribe

Greg's Updates & News

GPT-4Vision Use Cases

Taxonomy of GPT-4V Use Cases

Do you know what to use GPT-4V for?

7 Categories of GPT-4V Use Cases

"I use AI in every nook and cranny" - Sully Omar

Zapier's cofounder recruited me to run a $1M AI competition

What to do when your AI model becomes a commodity