Kicking Off 2024: How We're Gonna Crush It

New LinkedIn courses, my upcoming book, plus news headlines and a comprehensive survey of retrieval augmented generation

Dec 30, 2023

What’s up, everyone?

Damn, 2023 was a hell of a year!

Banger models were being released every week. Researchers and practitioners from all walks are pushing the field forward at a breakneck pace. Honestly, there has never been a better time to be part of AI. The field is still in its early stages, and if you haven't begun your journey because you think it's too late... it's not!

Now that 2023 is done, let’s look forward to 2024.

I didn’t have much GenAI experience at the start of 2023. Sure, I had plenty of deep learning experience, but I wasn’t into GenAI. I had no hands-on experience working with LLMs besides typing prompts into ChatGPT.

However, a surprise invitation to collaborate on a research paper ignited my journey into LLMs and GenAI.

During the first half of the year, I did a lot of self-learning. Then, in July, my employer made a pivot into GenAI. At the same time, I joined a couple of cohorts put on by AI Makerspace, which accelerated my learning. It’s been a lot of early mornings, long days, late nights, and weekend hacking…but honestly, it feels like play to me.

Am I an expert? No, not yet. But that’s where I’m headed. I asked myself, “What would a GenAI expert do?” and started doing those things.

Big, highly visible projects? Bring them on. That’s my way of tricking myself into becoming the expert I want to be.

Here’s what you’ll see from me in 2024:

LinkedIn Learning Courses

Prompt Engineering with LangChain
RAG with LlamaIndex
Intro to Text Generation with Hugging Face
Intro to Fine-tuning LLMs with Hugging Face

Books

I’ll be working with Wiley Publications on a few projects. One project is acting as a technical editor for my friend Kristen Kehrer’s book on MLOps/LLMOps.

The other project I'm working on is actually a book that I'm writing!

The book is tentatively titled Practical Retrieval Augmented Generation, and if all goes according to plan, it should be out by the end of 2024.

RAG is gonna be my jam in 2024. I'll be writing blogs, tweets, and newsletter updates on it. I’ve said it once and’ll say it again: Retrievel augmented generation is the one skill you need to learn ASAP.

And I’ll be here to help you learn it!

There's a Punjabi saying that drives me: “Mehnat naal kamm kar agge vadhde.” In English, it translates to: “Progress comes from hard work.”

That's my mantra for 2024. Can't wait to hit next year running, with a trail of breakthroughs behind me and an eye on the even bigger leaps 2025 holds.

Let's make 2024 legendary!

✨ Blog(s) of the Week

Let's end the week (and the year) with some blogs written by yours truly!

I’ve been nerding out super hard on evals. Not from the perspective of academic benchmarks but from the perspective of an AI Engineer/Practitioner. Thankfully, my employer let me go all out and burn many GPU hours to bring you the following pieces:

As a bonus, I wrote a blog highlighting the top Smol LLMs (sub-13B parameters).

I hope you enjoy these reads - let me know if you have any questions or comments!

📰 Industry Pulse

🗞️ In a bold legal move, The New York Times has filed a lawsuit against OpenAI and Microsoft, claiming that their generative AI models, including the widely used ChatGPT and Microsoft's Copilot, were trained on the newspaper's content without permission.

The Times is seeking the destruction of the AI models and training data, along with potentially billions in damages. This case highlights the tension between AI innovation and copyright protection, raising questions about the future of content creation and ownership.

My favourite take on this topic comes from Dan Jeffries and Vin Vashishta. Dan comes at it from a hacker’s perspective, and Vin from data strategy. Go read their thoughts!

🤖 Could the very backbone of our digital future—artificial intelligence—be at risk of financial overload due to its reliance on the cloud?

In a recent exploration of the intersection between AI and cloud computing, it's clear that while AI is the driving force behind modern business innovation, it also brings a shadow of financial risk. The article highlights how AI's dependence on cloud services for storage and computing power creates a silent crisis of escalating costs.

The Wall Street Journal has shed light on the hidden expenses of cloud infrastructure, which are often overlooked yet can accumulate rapidly, leading to what some call "technical debt."

🍏 Could Apple's innovative edge be at risk with the recent departure of another key designer to Jony Ive's LoveFrom?

In a recent Bloomberg article by Mark Gurman, we learn that Tang Tan, Apple's product design chief responsible for iconic products like the iPhone and AirPods, is set to join Jony Ive's design firm LoveFrom. This move signals a continuation of the talent exodus from Apple's design team. At LoveFrom, Tan will lead the hardware design for a new artificial intelligence project, with OpenAI's Sam Altman providing the software expertise.

This project is still in its infancy, focusing on home devices and leveraging the latest deep learning technology.

🤖 Are we witnessing the dawn of a new era in coding with AI's helping hand? GitHub's latest move might just be a game-changer in software development.

GitHub has announced the general availability of Copilot Chat, a ChatGPT-like AI assistant designed to aid programmers. Initially available to organizations and individual subscribers, this tool is now accessible to all users, including those using Microsoft's IDEs, Visual Studio Code, and Visual Studio.

It's part of the paid GitHub Copilot service but remains free for certain groups like students and open-source project maintainers.

🔍 Research Refined

The paper "Retrieval-Augmented Generation for Large Language Models: A Survey" addresses the challenges LLMs face in practical applications, including issues like hallucinations, slow knowledge updates, and lack of transparency in answers.

The focus is on Retrieval-Augmented Generation (RAG), a method that enhances the accuracy of answers and reduces model hallucinations, especially for knowledge-intensive tasks. RAG combines the parameterized knowledge of LLMs with external, non-parameterized knowledge bases, making it a crucial method for implementing large language models.

The paper presents the development paradigms of RAG within the context of LLMs, specifically outlining three paradigms: Naive RAG, Advanced RAG, and Modular RAG.

Naive RAG: This paradigm primarily involves the 'retrieval-reading' process. In this approach, the model retrieves relevant information and reads it to formulate responses. This basic form of RAG focuses on the fundamental retrieval and response generation process.
Advanced RAG: This paradigm uses more refined data processing techniques. It optimizes the indexing of the knowledge base and introduces the concept of multiple or iterative retrievals. This approach represents an evolution from the Naive RAG, incorporating more sophisticated data handling and retrieval methods to improve the quality of generated responses.
Modular RAG: This approach integrates additional techniques like fine-tuning into the RAG process. Modular RAG represents a further advancement, enriching the RAG process with new modules and offering greater flexibility. This paradigm allows for more complex and adaptable implementations of RAG, utilizing a modular approach to tailor the process to specific needs and contexts.

It provides a detailed overview of the main components of RAG - the retriever, generator, and augmentation methods, along with the key technologies involved in each component.

Retriever: The retriever's primary function is to fetch the top-k relevant documents from a vast knowledge base. Crafting a high-quality retriever involves addressing key challenges such as acquiring accurate semantic representations, matching the semantic spaces of queries and documents, and aligning the output of the retriever with the preferences of the Large Language Model (LLM).
Generator: The generator in RAG transforms the retrieved information into natural and fluent text. Unlike traditional language models, the RAG generator enhances accuracy and relevance by leveraging the retrieved information. It inputs traditional contextual information and relevant text segments obtained through the retriever, allowing for more information-rich and contextually accurate responses.
Augmentation Methods: Augmentation in RAG refers to the technical approaches used during the language model training’s pre-training, fine-tuning, and inference stages. It involves enhancing the performance of Pre-trained Language Models (PTMs) in open-domain Question Answering (QA) through retrieval methods at various stages. These methods include embedding approaches, optimization techniques, and the integration of structured and unstructured data sources to improve the overall effectiveness and adaptability of the RAG system.

The paper discusses methods to evaluate the effectiveness of RAG models, highlighting two evaluation methods and key metrics for assessment.

Independent Evaluation:
- Retrieval Module: This involves assessing the performance of the RAG retrieval module. Metrics like Hit Rate, MRR (Mean Reciprocal Rank), NDCG (Normalized Discounted Cumulative Gain), Precision, etc., are used. These metrics measure the effectiveness of the retrieval module in ranking items according to queries or tasks, which is crucial for the success of RAG in various applications.
- Generation Module: This refers to evaluating the enhanced or synthesized input formed by supplementing the retrieved documents into the query. This is distinct from the final answer/response generation. The evaluation here mainly focuses on context relevance, measuring the relatedness of retrieved documents to the query or question posed.
End-to-End Evaluation:
- This method assesses the final response generated by the RAG model for a given input. It involves evaluating the relevance and alignment of the model-generated answers with the input query.
- Evaluation can be divided into:
  - Unlabeled Content: Metrics include answer fidelity, answer relevance, harmlessness, etc.
  - Labeled Content: Metrics include Accuracy and EM (Exact Match).
- End-to-end evaluation can be divided into manual and automated evaluation using LLMs.
- Specific evaluation metrics are adopted based on applying RAG in particular domains, such as EM, for question-answering tasks.

All in all, the paper provides a great overview of RAG, making it an excellent starting point for those interested in AI Engineering.

That’s it for this year!

Thanks for being one of the 17k readers of this newsletter. Your support means the world to me, and I hope you’re gaining value from each of these. If you are, please do me favour and spread the word about The Generative Generation!

Share it with your network on Twitter, LinkedIn, Discord, etc.

I’ll be pushing a lot of exclusive content through 2024, and I can’t wait to share it with you all!

Cheers and happy new year! I hope 2024 is everything you want it to be, and more!

Harpreet

The Generative Generation