Retrieval Augmented Generation is the ONE Skill You Need to Learn ASAP

Resources on learning RAG, overcoming challenges in RAG pipelines, and assessing results from RAG pipelines.

Sep 18, 2023

What’s up, everyone!

Thank you to everyone who joined the first LangChain ZoomCamp!

Don't worry if you missed it; you can find the recording in the podcast section of the Substack. The notebook and slides are available on GitHub.

In this first session, I went over a tentative schedule for the course, spoke at a high level about how LLMs generate text, discussed the applications of LLMs in various industries, and then went straight into building an app using LangChain.

My goal was to inspire you and give you a glimpse of the type of applications we’ll be building in this series. Of course, Murphy’s law kicked into effect, and I ran into some errors with the demo, but we overcame them and got it to work!

I showcased a basic Retrieval Augmented Generation (RAG) application. This is one of the most important skills data scientists can pick up when working with LLMs.

What is RAG?

RAG is a technique designed to enhance the capabilities of LLMs in generating accurate and up-to-date responses by combining the power of information retrieval with text generation.

Instead of relying solely on internal training data, RAG fetches relevant information from external sources, like Wikipedia, to provide more accurate and current answers.

LLMs, while powerful, sometimes produce "hallucinations" - plausible, but incorrect answers.

RAG addresses this by grounding the model's responses in verifiable external information, reducing inaccuracies and enhancing trustworthiness.

When given a prompt, RAG first retrieves relevant documents from an external source. It then combines these documents with the original prompt to provide context. This enriched context is fed to the text generator, which crafts the final response. This two-step process ensures that the model's answers are accurate and up-to-date.

RAG boasts dual knowledge sources:

Parametric Memory: The knowledge stored within the model's parameters.
Nonparametric Memory: The information fetched from external sources.

This combination allows RAG to offer the flexibility of traditional models while benefiting from the accuracy of retrieval-based methods.

I wrote a blog about RAG here that goes into the components of a RAG system, and its subsystems. It’s a hands-on coding tutorial, I think you’ll enjoy it. I made sure the code from the demo I showed was well documented, which also serves as a good tutorial.

I’ll give RAG a thorough treatment during the zoomcamp. Keep an eye out for that.

I also recommend this post on Twitter by

Deep (Learning) Focus

, it’s a succinct, yet detailed, treatment of a RAG system. I highly recommend giving Cameron’s newsletter a sub as well. He puts out stellar content on LLMs.

✨ Blog of the Week

I know right now, and for the next 70+ days, I’m only gonna be talking about LangChain. But I plan on doing 100 Days of LlamaIndex immediately after the 100 Days of LangChain series wraps up.

That’s why my blog of the week pick is this one by the folks over at LlamaIndex.

This blog provides techniques and insights for optimizing Retrieval Augmented Generation (RAG) applications to enhance performance, robustness, and scalability in production settings.

Here’s my summary of the blog:

The initial stages might be straightforward when prototyping a RAG application. However, optimizing it for performance, robustness, and scalability presents challenges, especially with a vast knowledge corpus. This guide offers insights and techniques to enhance the performance of a RAG pipeline.

1. General Techniques for Production-Grade RAG:

Decoupling Chunks for Retrieval vs. Synthesis: To effectively retrieve and synthesize data, it's important to differentiate between the appropriate chunks for each task. The chunk ideal for retrieval may differ from the one used for synthesis. One helpful technique is to include document summaries linked to their respective chunks or sentences that connect to the surrounding context.
Structured Retrieval for Larger Document Sets: Structured tagging or retrieval can be employed as the number of documents increases to improve retrieval precision. Techniques include using metadata filters and storing document hierarchies.
Dynamic Retrieval Depending on Task: RAG isn't limited to specific fact-based queries. Users might ask for summaries, comparisons, or other types of information. LlamaIndex offers modules like routers, data agents, and advanced query engines to cater to these diverse needs.
Optimizing Context Embeddings: Optimizing embeddings for a specific data corpus is crucial to ensure better retrieval. Techniques include fine-tuning the embedding model over an unstructured text corpus in a label-free manner.

2. Key Takeaways:

RAG applications can be enhanced for production by considering the specific needs of the task and the nature of the data corpus.
Structured retrieval, dynamic chunk retrieval, and optimized embeddings are techniques for better performance.
Tools and platforms like LlamaIndex provide essential modules and resources to streamline optimization.

🛠️ GitHub Gems

Evaluating RAG pipelines is critical in ensuring the effectiveness and accuracy of LLM applications.

The Ragas library offers a streamlined and efficient way to conduct these evaluations, providing insights to drive improvements and optimizations in RAG implementations.

Ragas is a specialized library designed to evaluate RAG pipelines. Here's what it offers:

Comprehensive Metrics: Ragas provides metrics to evaluate retrieval (like context_relevancy and context_recall) and generation (such as faithfulness and answer_relevancy).
Harmonic Mean: It calculates the harmonic mean of the provided metrics to give an overall performance score of the RAG system.
Minimal Annotated Data: Ragas leverages LLMs to produce actionable metrics with as little annotated data as possible, making the evaluation process more efficient.
Integration with Other Platforms: Tools like LangSmith can be integrated with Ragas for a more comprehensive evaluation process.

Here’s a notebook I made which uses the Ragas library. to evaluate the impact of

In this notebook, I set up a simple RAG pipeline in LangChain and then performed a basic ablation study to assess the impact of chain_type on a RAG pipeline using Ragas.

Enjoy!

🗓️ An upcoming series of panel discussions

I’ve partnered with the Generative AI World Summit to bring you a series of four-panel discussions.

One registration link will get all the sessions on your calendar. You can register here.

Below is the schedule and the tentative speakers for each session.

Thursday, September 28th at 4pm CST: "The Future of Generative AI: Vision and Challenges"

This session will explore the potential of Generative AI in reshaping industries, its future trajectory, and the challenges. Panelists will discuss the advancements they foresee in the next decade and the hurdles that must be overcome.

Panelists: Niels Bantilan, Meryem Arik, Rajiv Shah

Thursday, October 5th at 4pm CST: "From Academia to Industry: Bridging the Gap in Generative AI"

This panel will discuss translating academic research in Generative AI into real-world applications. Experts will discuss the challenges of this transition and share success stories.

Panelists: James Dborin and more!

Thursday, October 12th at 4pm CST: "Generative AI in Production: Best Practices and Lessons Learned"

This will discuss the practical aspects of deploying Generative AI solutions, the challenges, and the lessons learned.

Panelists: Goku Mohandas, Meryem Arikm and more!

Thursday, October 19th at 4pm CST: "Ethics and Responsibility in Generative AI"

Given the power of Generative AI, this session can address the ethical implications, potential misuse, and the responsibility of researchers and practitioners in ensuring its safe and beneficial use.

Panelists: Niels Bantilan, David Talby, and more!

💡 My Two Cents

Data science is evolving, and mid-career professionals need to keep pace.

Staying relevant is not just a matter of choice but a necessity.

For those who've been in the field for a while, the challenges and tools from earlier in your career might seem worlds apart from today's cutting-edge methodologies.

As we stand at the crossroads of traditional data science and the dawn of LLMs, there are four skills that every mid-career data scientist should consider mastering to remain at the forefront of this dynamic field.

1. Prompt Engineering

Gone are the days when data scientists solely relied on hard-coded algorithms.

Prompt engineering has emerged as a pivotal skill with the rise of models like GPT-3.5 and its successors. Crafting the right prompt can be the difference between a model generating a generic response and one that's insightful and tailored. It's like asking the right questions in an interview - the quality of your output is often directly proportional to the precision of your input.

Mastering prompt engineering allows data scientists to harness the full potential of LLMs, making them more efficient and context-aware.

2. Retrieval Augmented Generation (RAG)

The introduction of RAG has been nothing short of revolutionary.

By combining the prowess of information retrieval with text generation, RAG offers a solution to one of the most pressing challenges in AI: producing accurate, up-to-date, and contextually relevant responses. For a mid-career data scientist, understanding RAG is essential.

It's not just about staying updated; it's about ensuring the applications you deploy are grounded in the most current and verifiable information.

3. Fine-tuning an LLM

While pre-trained models like GPT-4 are powerful, they are not one-size-fits-all.

Every industry, from healthcare to finance, has its nuances. Fine-tuning an LLM allows data scientists to tailor these models to specific domains, ensuring higher accuracy and relevance. It's the difference between using an off-the-shelf software and one customized for your unique business needs.

In a world where precision can lead to significant business advantages, fine-tuning is a skill that can set you apart.

4. Training an LLM from Scratch

While this might seem daunting, especially given the resources and expertise required, there's an undeniable advantage to building something ground-up.

Training an LLM from scratch offers unparalleled customization. It allows data scientists to control the model's architecture, training data, and objectives.

This skill is invaluable for industries with highly specialized needs or those concerned about biases in pre-trained models.

The world of data science is in flux, with LLMs reshaping the landscape.

For the mid-career professional, adapting and evolving is crucial.

By mastering prompt engineering, RAG, fine-tuning, and training LLMs from scratch, data scientists not only future-proof their careers but also contribute to pushing the boundaries of what's possible in artificial intelligence.

I plan on touching on all these topics in this newsletter!

🔍 Research Refined

Sorry, I didn’t have time to get to this section this week. I’ll have it in the next edition.

Let me know how I did!

Shoot me a message and let me know what you thought of this edition or what topic you’d like me to cover in a future edition.

PS, let’s connect on X. You can find me here.

Cheers,

Harpreet

The Generative Generation

Discussion about this post