Exploring the inner workings of a transformer model and how I experiment with LLMs
Plus community events, news headlines, papers that I didn't get to read, and yes I'm still a t-shirt designer
What’s up, everyone?
Apologies for being out the last couple of weeks, but I am back in action!
Heads up: All the LangChain Zoomcamp materials are now updated on GitHub. Thanks to everyone who participated live, watched the recording, or followed along with the repository!
🗓️ Community Events
Dec 11 - Beginners Guide to LangChain: Chat with Your Multi-Model Data:In this session, you'll learn how LangChain revolutionizes conversational AI through innovative multi-model data integration, offering practical techniques for dynamic, context-aware AI interactions and insights into future AI trends and best practices for implementing LangChain in your projects.
Deci 14 - LangServe: Deploying Mistral-7B RAG to Production: In this advanced technical workshop, you'll learn how to transition from prototype to production with LangServe and Mistral 7B, focusing on deploying Retrieval Augmented Generation (RAG) applications and leveraging the powerful LangChain ecosystem, including LCEL and LangSmith, for streamlining debugging, testing, and monitoring in production environments.
Dec 15 - A.I. LLM Journal Club hosted by Aggregrate Intellect: A weekly series to discuss papers with other practitioners in the field.
✨ Blog of the Week
Have you ever wondered what goes on behind the scenes when you interact with a LLM? It's not just a black box of mystery; a fascinating world of algorithms and processes is at play.
The Interactive Experience: Visualizing AI Thought Processes
Picture this: a simple task of sorting a sequence of six letters, say 'C B A B B C', into alphabetical order. Simple for a human, but how does an AI tackle this? This website presents an interactive visualization of this process, breaking down the complex LLM algorithm into understandable and visually appealing segments. You'll see each step the AI takes to transform 'C B A B B C' into 'ABBBCC' – a clear, step-by-step transformation that's both fascinating and revealing.
Deep Dive into the LLM Components
The visualization is more than just a pretty interface.
It's an educational journey through the building blocksthat make up the LLM. From the initial embedding, layer normalization, and the intriguing self-attention mechanism to the final output – every stage is laid out for you to explore.
You'll get to see:
Embedding: How the model interprets and processes the input.
Layer Norm: The process of normalizing the data within the model.
Self Attention: A critical component where the model determines which parts of the input to focus on.
Projection and MLP: The transformation stages where the real 'thinking' happens.
Transformer: The model's heart, where all the magic comes together.
Softmax and Output: The final stages where the AI's 'decision' is formed and presented.
Get a shirt, and support the newsletter.
I keep all my content freely available by partnering with brands for sponsorships. Lately, the pipeline for sponsorships has been a bit dry, so I launched a t-shirt line to gain community support.
To date, I haven't sold any shirts yet! However, that could change if you decide to purchase one for yourself, your favourite data scientist, or even your entire team!
You can check out my designs here and explore the excellent product descriptions I generated using GPT-4!
🛠️ GitHub Gems
My friend, Yujian Tang, over at Zilliz, put together an awesome series for the month of December called “The Advent of Code.”
I’m excited that they’ve included my team’s open-source library, SuperGradients, as part of the series!
Here’s what you need to know about The Advent of Code:
📅 Event: December 1 marks the start of the Advent of Code.
🌐 Daily Activities: Engage with a different open-source project each day.
🎁 Participation: Collect points all month for a chance to win exclusive swag from Zilliz and other projects (including SuperGradients merch!)
📚 Learning and Community: Gain new skills and join the community on the Zilliz Discord channel for tips and resources.
👥 Point System:
1️⃣ Earn one point for activities like starring a GitHub repo, creating and sharing a repo, or social media engagement.
3️⃣ Earn three points for higher involvement like writing a merged PR or blogging about the experience.
🗓️ Submission: Track and submit your activities after December 24th.
⏳ Deadline: Submit points by January 2nd, 2024.
🏆 Winner Announcement: Winners revealed on January 8th, 2024. Good luck!
📚 For those interested, I've introduced a new course on deep learning for image classification via the LinkedIn Learning platform.
Over 800 people have taken the course in the two weeks since its launch, which is insane to me!
If you don’t have a LinkedIn Learning subscription, you can purchase the course outright for $45.
📰 Industry Pulse
🤖 Could the quest for digital utopia lead to humanity's downfall?
This question lies at the heart of a heated debate that unfolded one chilly evening among Silicon Valley's elite, and it's a topic that continues to stir controversy today. In a gripping recount of a Silicon Valley gathering from years past, the article paints a vivid picture of a debate between two tech titans: Larry Page and Elon Musk. Despite a vocal cord ailment, Page whispered his vision of a future where humans and AI merge, competing for resources in a digital utopia.
Musk, on the other hand, starkly warned that such machines would spell doom for humanity. The exchange escalated until Musk was branded a "specieist" by“favouring humans over potential digital life forms.
This moment, though once just an esoteric debate at a party, has rippled into a broader, ongoing discourse about the role of AI in our future.
Pro-tip: open an incognito browser to get past the paywall.
Capsule, a Paris-based startup, aims to revolutionize your news consumption experience. Capsule is not your average news aggregator. It's carving out a niche as the "Spotify for news," offering a blend of AI and editorial curation to serve news in a digestible, engaging format. The app's interface is reminiscent of social media giants like TikTok, with a vertical scrolling feed that presents news in bite-sized, visually appealing snippets.
Users can tap on headlines for summaries or follow links to read full articles, all within a user-friendly app environment.
I imagine that would shake up the newsletter formats for creators who share only headlines. To be fair, this section of the newsletter you’re reading is the author’s least favourite part to do. I’m thinking of scraping this section and format altogether in favour of something more in-depth or making that format available for paying subscribers. Let me know your thoughts though, would love to hear what you want.
In a candid discussion at Slush, a key event for the European startup scene, the general partners of Benchmark, a venture capital firm with a history of successful bets, shared their perspectives on investment strategies, startup valuations, and the current AI boom amidst a general economic downturn. The conversation revealed Benchmark's commitment to understanding "exceptional" opportunities, their cautious approach to soaring valuations, and their belief in the transformative power of AI and open-source models over closed large language models (LLMs).
Despite the AI investment frenzy, Benchmark bets on the future impact of open-source AI over closed LLMs, suggesting a developer-driven world will outpace current AI models.
🔒 Have you ever wondered how your digital content can be protected in the age of AI-generated misinformation?
The Biden administration is taking steps to address this with a new executive order, but is it enough to safeguard the authenticity of digital media?
In a recent move, the White House issued an executive order to establish a framework for developing generative artificial intelligence, focusing on content authentication and digital watermarks. This initiative aims to help content creators verify their works amidst the rising tide of AI-generated misinformation.
The article delves into the history of watermarking, modern challenges in protecting digital content, and innovative solutions developed to combat these issues.
💡 My Two Cents
Over the last four months, I've spent 200+ hours playing with open-source models on HuggingFace.
And I've found that, while benchmarks are decent signals, they don't always translate into practical effectiveness or correlate with how I'm planning on using a model.
So, I want to pull back the curtain and share my 'vibe check' method because I don't like blindly following leaderboard rankings.
🏁 Starting with Baseline Generations
What I Do: I test 10-15 diverse prompts using the model's default generation parameters.
Why It Matters: This step gives me a raw, unfiltered look at the model's out-of-the-box behaviour and sets a baseline for further experimentation.
✅ Selective Prompt Analysis
Process: I choose a balanced mix of 3-5 prompts, some showcasing the model's high performance and others where it falls short.
Objective: It's all about levelling the playing field. This way, I get to see the real impact of tweaking those parameters. Just straightforward insights into how these changes play out.
🎛️ Parameter Adjustment - One at a Time
Approach: I experiment with one parameter at a time — temperature, num_beams, top_k, top_p, repetition_penalty, no_repeat_ngram_size.
Goal: Observing changes in output helps me understand how each parameter influences the model’s responses.
At this point, I usually have hundreds of generations from the model.
🕵️♂️ Deep Dive into Model Behavior
Method: I manually review the generations, hunting for odd or undesirable outputs.
Insight: This granular analysis is crucial for identifying the model's subtle nuances and potential pitfalls.
💻 Writing Targeted Tests
Strategy: Develop tests for specific issues noticed during the exploratory phase (e.g., output length, gibberish, repetition). Use type-token ratio for assessing lexical diversity, and check for repeat n-gram sizes.
Purpose: Makes it easier to do more fine-grained statistical analysis down the line.
🧩 The Grid Search
Execution: I perform a detailed grid search over a range of parameter values across several prompts.
Aim: Find a handful of effective settings that consistently yield desirable results.
🎯 The Final Stretch
Process: I test these top settings across an expanded set of 20+ prompts, looking for consistent performance and reliability.
Result: This gives me a comprehensive understanding of how the model behaves under various settings and prompts.
🔬 Utilizing Advanced Tools
Integration: Finally, I use tools like LangChain's criteria evaluators with GPT-4 to assess output.
Benefit: This step adds a layer of sophistication and accuracy to the selection process.
I could be totally wrong about the whole approach...but it's the best I came up with.
There are so many moving parts when selecting an LLM, that I was going through some analysis paralysis...this approach is a bit brute force, but it's at least helped me justify why I choose the settings I did.
I guess we could call this "principled vibe checking" lol.
🫵🏽 Your Turn: Share Your Insights! Do you have a different approach to selecting and tuning LLMs?
Share your strategies, tips, or even constructive critiques. Looking forward to your stories and experiences in the comments below!
🔍 Research Refined
I didn’t get a chance to read any papers this week, but here are some that I plan on skimming next week. If there’s one in particular you want me to break down/summarize, reply in the comments below.
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Scalable Extraction of Training Data from (Production) Language Models
That’s it for this one.
See you next week, and if there’s anything you want me to cover or have any feedback, shoot me an email.
Cheers,