From Papers to Production, LLMs as the New OS, and...
How different tokenizers work, automate yourself and get paid for life, and a diffusion model that makes GIFs
What’s up everyone!
Thank you to everyone who joined the LangChain ZoomCamp on Friday. My apologies for having to start the session a bit later than usual. It’ll be back to the normally scheduled time next week.
Don't worry if you missed it; I’ve got the recording for you. I was having issues uploading the video to the podcast section today. Instead, here’s the link to the Zoom recording where you can watch or download the videos.
The notebooks are available on GitHub.
From Academics to Industry: Bridging the Gap in Generative AI or From Papers to Production
This session was brought to you and sponsored by the Generative AI World Summit, which takes place at MLOps World Summit in Austin, TX, from October 25th through October 26th (with a virtual component).
Last week, I hosted a panel discussion called From Academics to Industry: Bridging the Gap in Generative AI. I was joined by Chris Alexiuk, Chris Brousseau, James Dborin, John Santerre, and Sophia Yang.
Here are my key takeaways from the panel discussion:
Transitioning from academia to industry? It's not just about the tech. Consider:
Shifting your mindset to prioritize user experience, business cases, and ROI.
Aligning your academic studies with what the industry truly needs.
Balancing the deep theory with hands-on practice.
Collaboration is golden:
Team up within academia and with industry pros to grasp the full tech stack and boost research results.
Contribute to open source and join sprints. It's a great way to gain experience and expand your network.
Facing challenges in the transition? Keep pushing to find where you truly fit in the industry.
When it comes to new generative AI research:
Always question its real-world application and feasibility.
Hold off on jumping in until there's more testing and validation.
And let's not forget scalability. As we bring generative AI models to the real world, we must focus on research and optimization to ensure they can handle the load.
You can register for the conference with the discount code 'harpreet' for 75$ off your ticket price.
🗓️ Schedule of upcoming series of panel discussions
I’ve partnered with the Generative AI World Summit to bring you a series of four-panel discussions. These are interactive sessions, and I welcome your participation. I’ll have some seed questions prepared, but your participation will guide the direction of the conversation.
One registration link will get all the sessions on your calendar, and I look forward to having you there. You can register here.
🗓️ Thursday, October 12th at 4pm CST: Generative AI in Production: Best Practices and Lessons Learned
This will discuss the practical aspects of deploying Generative AI solutions, the challenges, and the lessons learned. The panellists for this session include Chip Huyen, Alessya Visnijc, Greg Loughnane, Hannes Hapke, Meryem Arik, and Ville Tuulos.
🗓️ Thursday, October 19th at 4pm CST: Ethics and Responsibility in Generative AI
Given the power of Generative AI, this session can address the ethical implications, potential misuse, and the responsibility of researchers and practitioners in ensuring its safe and beneficial use.
✨ Blog of the Week
This week’s ‘Blog of the Week’ is actually…a YouTube Video 🧐
's video, which I watched last night, is amazing. It's surprising that it hasn't gotten more views.The video provides an introduction to the crucial role of tokenizers in large language models (LLMs).
Tokenizers chop input text into distinct segments, which can be entire words or fragments. Jay showcases the differences between well-known tokenizers like GPT2, GPT4, BERT, and Starcoder. He offers a tangible demonstration, by using a specific piece of text with various elements, from English words with different capitalizations to emojis, a Chinese character, Python code terms, and even tabs and spaces.
The video practically illustrates how each tokenizer interprets and processes the text.
Here are my notes:
BERT Tokenizers:
BERT's tokenizer adds special tokens (CLS and SEP) to the beginning and end of the text. It doesn't recognize new lines and replaces certain characters (emojis and Chinese characters) with an "unknown" token. Jay contrasts the "BERT base uncased" tokenizer, which doesn't preserve capitalization, with the "BERT cased" tokenizer, which does.
GPT2 Tokenizer:
GPT2's tokenizer recognizes new lines and preserves capitalization. It breaks down certain characters, like emojis and Chinese characters, into multiple tokens but can reconstruct the original token when these partial tokens are merged.
FlanT5 Tokenizer:
FlanT5, based on T5, preserves capitalization but loses new lines. It doesn't recognize certain characters like emojis and Chinese characters. It also breaks down certain Python code words into multiple tokens.
GPT4 Tokenizer:
GPT4's tokenizer is similar to GPT2's but has specific tokens for multiple spaces, which is useful for code-related tasks. It treats numbers differently, tokenizing them in a more segmented manner compared to GPT2.
Starcoder Tokenizer:
Starcoder is geared towards code. It has specific tokens for spaces and tabs. Each digit in numbers is treated as an individual token, which might be more consistent for representing numbers.
Galactica Tokenizer:
Galactica is highlighted as a unique and interesting tokenizer, especially in how it deals with spaces and tabs. Jay suggests there might be more to explore about this tokenizer in future content. He seemed excited about this tokenizer.
Sadly, Jay didn’t include any links to code or notebooks. However, if anyone decides to take on the challenge, please let me know, and I will share it in next week's newsletter.
🛠️ GitHub Gems
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Audio-visual diarization detects "who spoke when" using auditory and visual signals. However, existing datasets mainly focus on indoor environments. To develop methods for challenging in-the-wild videos, researchers created the AVA-AVD dataset. It improves diarization models for such videos despite being relatively small. The benchmark is challenging due to diverse scenes, complicated acoustics, and off-screen speakers. They also designed the AVR-Net to address these challenges. It outperforms state-of-the-art methods and is more robust when varying the ratio of off-screen speakers.
The data and code are publicly available here, and you can access the model on HuggingFace.
Hotshot-XL
Hotshot-XL is a model that creates GIFs using fine-tuned SDXL models. It works seamlessly with any SDXL model and allows you to load your SDXL-based LORAs, making it quicker and easier to make personalized GIFs. With Hotshot-XL, finding suitable images for training data is easier than finding videos. Plus, it fits into your existing LORA workflows.
The weights are available on HuggingFace.
📰 Industry Pulse
AutoGen: Microsoft's Leap Forward in Large Language Model Applications.
Microsoft has introduced AutoGen, a framework designed to streamline the orchestration, optimization, and automation of workflows for LLMs.
The traditional process of leveraging the full potential of LLMs is nuanced and demanding. AutoGen offers customizable agents that can communicate, using LLMs like GPT-4. These agents can interact with humans, tools, and other agents, making creating complex multi-agent conversation systems more intuitive.
With AutoGen, developers can more easily build next-generation applications, marking a significant advancement in the capabilities of LLMs.
The Evolving Landscape of AI: From ChatGPT to Multimodal Capabilities.
This article, penned by
, talks about the advancements AI, particularly in the context of LLMs.Beginning with the launch of ChatGPT, the AI landscape has seen significant developments, with Google's upcoming Gemini poised to surpass OpenAI's GPT-4. Ethan emphasizes that while the capabilities of these models are impressive, their full implications remain uncertain, especially in work and education. Integrating vision and voice into AI systems has expanded their potential applications, from deciphering handwritten texts to serving as personal assistants. However, with these advancements come challenges, including the risk of misuse and the ethical considerations of AI's pervasive role in our lives.
As Mollick suggests, AI's future hinges on our decisions and agency, urging us to harness this technology responsibly.
The Future of Work: Automate Yourself and Reap the Benefits.
In a thought-provoking piece by
, a new employment model in the age of AI is introduced.Drawing inspiration from historic job advertisements that promised adventure and uncertainty, Shipper proposes a modern-day job ad for the AI era: talented professionals are sought to train AI systems in their respective fields, aiming to automate their roles. As these professionals train the AI, they would gradually phase out their active involvement yet continue to receive their salary and a share of the profits indefinitely. This model envisions a future where employees, rather than trading time for money, exchange their expertise and data for sustained income. While the idea is promising, it comes with challenges, including the pace of AI advancements, the quality of training data, and the need to redefine employment contracts in a world where AI plays a dominant role.
Shipper suggests this approach could revolutionize professional services, turning them into scalable ventures with software-like margins.
💡 My Two Cents
In AI and tech in general, change is the only constant.
Those of us who've journeyed from the days of Windows 3.1 to today's sophisticated operating systems have witnessed a remarkable evolution. We've seen our computers transform from mere task executors to intuitive partners, integrating seamlessly with other software and devices. Remember the days of Novell NetWare and Windows NT, designed specifically for network communication? Fast forward, and we have iOS and Android, tailor-made for our smartphones and brimming with AI capabilities like Siri and Google Assistant.
But as we stand on the brink of another technological revolution, there's a buzz in the air. LLMs are now being touted as the next big thing in operating systems. Think about it: an OS that doesn't just execute but reasons.
Karpathy, a leading voice in AI, envisions LLMs as the core of a new Operating System1.
He emphasizes their multifaceted capabilities, from handling various modalities like text, audio, and vision to interpreting and executing code and accessing the internet.
Similarly, Nathan Lambert sees LLMs as foundational computing platforms2.
He touches upon the idea that LLMs can be integrated into various applications, similar to how an OS supports various software. Lambert also mentions the potential risks and challenges associated with the widespread adoption of LLMs, especially as they become more integrated into everyday products.
A recent paper on LLaMaS discusses the potential of integrating LLMs directly into the OS to manage diverse hardware resources3.
The system leverages the zero-shot learning capabilities of LLMs to adapt to new devices by understanding their textual descriptions. This is not just a technological advancement; it's a redefinition of how we perceive and interact with artificial intelligence.
This isn't just another incremental step in tech evolution; it's a radical leap.
The pace at which LLMs have advanced is nothing short of astonishing. But with great power comes great responsibility. Imagine an OS that learns from us, understands our preferences, and fine-tunes itself based on our interactions. It's like having a personal assistant embedded in our computers.
But what if this assistant gets compromised?
It's not just about losing data anymore; it's about someone taking hostage an entity that knows us intimately. The promise of innovation is immense, but so are the ethical and security challenges.
The future of computing is exciting, but it's up to us to navigate it wisely.
References
Karpathy's perspective on LLMs as the kernel process of a new Operating System.
LLMs are computing platforms by Nathan Lambert.
🔍 Research Refined
A paper titled "Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution" by researchers from Google DeepMind introduces a new mechanism, Promptbreeder, designed to evolve and adapt prompts for LLMs.
This technique is like a self-improving loop. It not only refines the main prompts (task prompts) but also tweaks the prompts that modify or mutate these main prompts (mutation prompts). It's like teaching the LLM to ask itself better questions and then refining how it asks those better questions.
The system operates by mutating a population of task prompts, evaluating their fitness on a training set, and iterating this process over multiple generations.
The results demonstrate that in arithmetic and commonsense reasoning benchmarks, the results demonstrate that Promptbreeder surpasses existing prompt strategies, such as Chain-of-Thought and Plan-and-Solve Prompting.
The paper emphasizes the potential of this approach, especially as LLMs become larger and more capable, suggesting a future where LLMs could further amplify the gains of such methods.
That’s it for this one.
See you next week, and if there’s anything you want me to cover or have any feedback, shoot me an email.
Cheers,
Harpreet
Jay Alammar’s Tokenizer new release has been on my list ! Thank you for summing it up 🙌
And I like how we can see LLMs as new OS ! Would be interesting to see these LLMs evolving more than just chatbots 👏