FullStackRetrieval.com

My new home for retrieval related content

Welcome to the 135 people who have joined us since last week! If you aren’t subscribed, join 3,189 smart AI folks. View this post online.

Subscribe Now

The bottleneck is not GPT-4

GPT-4 is actually pretty smart, ask it to make a decision, and you’ll likely get the right answer.

Give it a few examples of previous successful decisions and you’ll bump your accuracy even further.

This led me to a realization, the limits of our ability to do more with Large Language Model (LLMs) isn’t necessarily the model itself, but rather the information we give it.

Give GPT-4 the right information and you’ll provide value to users.

So it’s not a reasoning challenge, it’s a retrieval challenge.

Retrieval is the process of gathering, storing, and serving your applications the data it needs. The concept of serving applications data is as old as computing itself, but in the unstructured world of LLM application building, new a retrieval mindset help us make magic.

All my optimization strategies came back to retrieval.

I constantly think about:

What’s the best way to collect my data?
How should you split it into manageable pieces?
Where should you store it?
How should you index
What’s the best way to actually retrieve data?

Retrieval is the umbrella that covers most of your application building. It’s the iceberg under your front end.

It’s also extremely hyped (2nd to only agents and maybe evals). But yet it’s still massively underrated.

As an educator in space I have a keen eye on how information is presented - with retrieval, I haven’t seen the take on it that I want to see. So I'm taking a stab at fixing this.

My retrieval content is finally getting a home: FullStackRetrieval.com

FullStrackRetrieval.com Trailer

(As a subscriber you're getting the beta access, you may seem some rough edges, I'll launch on Twitter later)

First thing new members will get is 1 advanced retrieval method a day for 5 days sent to them.

This will include:

Philosophies - How should think about chunking, top k, and retrieval methods?
Interviews - Let's hear how production retrieval is happening
Tutorials - Advanced RAG, building retrieval into your GPTs/OpenGPTs
Videos & code samples - Walkthroughs

When I try to understand topics I need to start with the big picture, then understand how its sub-components relate to each other.

Instead of thinking about retrieval in the abstract, here is my “least-wrong” tactical view.

Let’s break down into parts:

Query - The initial piece of data that will guide the retrieval process. This can be a user question, chat history, image, audio, prompt, table, or other various data.
Query Transformation - The process of modifying or reformatting the original query to make it more suitable for retrieval. It is not required to transform your query.
Raw Data Source - The home and original collection of information. Unprocessed and unstructured from which you'll extract. This could be websites, images, pictures, other applications, you name it. There may be multiple data sources.
Document Loaders - Tools or functions that extract data from a raw source
Documents - Individual units of data or information that have been extracted and are ready for indexing. This might be individual pieces of texts, single customer records, etc.
Index - A data structure that organizes data or information from your documents that makes retrieval faster, more efficient, or better performing.
Knowledge Base - A structured repository that contains indexed documents from which the retrieval process extracts documents from. This is often the combination of a vector store and document store.
Retrieval Method - The technique or algorithm used to search for an extract the most relevant documents from the knowledge base in response to a query
Relevant Docs - The subset of documents that the retrieval method determines to be most useful in addressing the query
Document Transform - The process of further refining or reformatting the relevant documents to make them more suitable for the language model. This could include summarization, compression (removing information) or other transformations.
Context - The combined content derived from the transformed documents that provide the necessary background or information for the language model to generate its response.
Large Language Model (LLM) - The model that will generate a response based on the context and prompt it's given
Prompting Method - The technique or method used to present the context to the language model. This also includes chaining different prompts together.
Response - The final answer or output generated by the language model based on the context and prompting method.

The point isn’t to cover 100% of retrieval permutations, but rather have a starting point for a discussion.

I tried bullet-proofing this view on twitter, but I’d love your feedback, what do you think?

All of the content is free for now, I may do some paywall in the future but as friends, email me and you’ll get the family discount.

I’d love to hear a few things from you:

What’re you working on? How can I help?
What’re your thoughts on new OpenAI GPTs, has one provided value to you? If so, which?

Greg's Updates & News

Retrieval Is All You Need

FullStackRetrieval.com

The bottleneck is not GPT-4

In case you missed it

"I use AI in every nook and cranny" - Sully Omar

Zapier's cofounder recruited me to run a $1M AI competition

What to do when your AI model becomes a commodity