NeurIPS Recap Week

Over the next week I'll share my favourite talks and poster sessions from the conference; starting with Fei-Fei Li's keynote about visual intelligence!

Dec 16, 2024

Note: I just wrapped up a week at NeurIPS in Vancouver and will bring NeurIPS to you this week! I’ll share what I’ve learned from my favourite talks, summaries of some interesting posters I’ve encountered, and on-the-floor poster talks I recorded. You can expect ~5 emails from me this week. I’ll share talks from Fei-Fei Li, Ilya Sutskever, Ted Chiang, and more!

Hey, everyone!

Before we get into Fei-Fei Li's keynote, thank you again for being awesome subscribers and all your support. Your reactions, comments, and replies keep me going. I wouldn't be doing this without you.

Now, as many of you know, I'm big on education. I'm a lifelong learner who has taught myself everything from Python, Classical ML, Deep Learning, Generative AI and beyond. The downside of being self-taught is that it takes a lot of time. Without a structured path to get you from point A to point B, you might find yourself wasting valuable time.

I see many of you at this exact crossroads with GenAI right now. You've played with ChatGPT and maybe built a basic RAG system, but you're ready to stop dabbling and start building real solutions.

The landscape is vast, and technology is moving incredibly fast. Putting together random tutorials will not get you where you need to be.

You don't need just another course. You need hands-on experience and guidance from a mentor to build true intuition.

That's exactly what I've designed this course to deliver.

This isn't just another "intro to GenAI" tutorial compiled by a grifter who watched a few YouTube videos. This carefully crafted roadmap, built from real-world experience, takes you from surface-level knowledge to hands-on technical mastery.

You'll learn to:
• Build RAG systems from scratch
• Work with both API and local models
• Implement multimodal solutions
• Fine-tune models effectively
• Properly evaluate model outputs

No more piecing together random tutorials or getting lost in documentation.

This is a structured path to becoming a confident GenAI practitioner.

Perfect for data scientists, ML engineers, analysts, and technical PMs who want to:
• Move beyond basic interfaces
• Understand the tooling landscape, when to use what tool, and how to use them
• Understand patterns that stay relevant as technology evolves

The course starts Jan 6, 2025, and the first cohort gets:

✅ 70% off the future price ($249 vs $798; use discount code COHORTONE at checkout)
✅ Smaller class size
✅ More direct access to the instructor
✅ Opportunity to shape future cohorts

Join 20,000+ professionals who've learned from my courses.

Limited spots are available at the discounted price of $249. Use the code COHORTONE at checkout.

⏰ Early bird discount ends Dec 27. Don't miss out!

Save 50% off with code COHORTONE

Ascending the Ladder of Visual Intelligence with Fei-Fei Li

The "through line" of this entire talk is the ascent of visual intelligence, both in biological evolution and AI development.

Visual intelligence is the ability to perceive, understand, and reason about the world through sight. Its evolution began over 500 million years ago by developing the first photosensitive cells in simple marine animals. Since then, visual intelligence has been deeply intertwined with the evolution of intelligence itself, culminating in the complex visual abilities of humans.

In this talk, Fei-Fei walks us down a path from simple visual perception to complex reasoning and generation, highlighting the critical role of data in driving this progress.

Throughout the talk, I picked up on three key themes:

From Seeing to Doing: Visual intelligence is not merely passive observation; it involves interacting with and acting upon the world. This progression is evident in computer vision, which has moved from understanding images to reasoning about relationships and generating new visual content over the last few decades. Toward the end of the talk, she focuses on robotics, underscoring the goal of building machines that can translate visual perception into meaningful actions in the real world.
The Importance of Data: It was consistently emphasized that data is as crucial as algorithms for achieving visual intelligence. The limitations of early machine learning approaches, which struggled with small and homogenous datasets, led to the creation of ImageNet. This massive dataset fueled the deep learning revolution, demonstrating the power of data-driven learning. To achieve true spatial intelligence, models need to learn from 3D data that captures the geometric and physical properties of the world. This includes data from 3D scanners, depth sensors, and physics simulations, and we need to leverage existing knowledge from large language models and visual language models to guide robot learning and reduce the reliance on extensive physical data collection.
Beyond the "Flat Earth": Much of the talk focuses on the limitations of training AI models solely on 2D images because true visual intelligence requires understanding the 3D nature of the world. This involves reconstructing 3D scenes from images, generating realistic 3D objects and environments, and enabling robots to effectively navigate and interact with the physical world. The transition from "flat earth" AI to spatial intelligence necessitates new types of data and algorithms that can capture the complexity of 3D environments.

Ultimately, the talk argues that visual intelligence is a journey of continuous advancement, both for biological organisms and AI. This journey is driven by the availability of increasingly rich and diverse data, which allows us to build more capable and sophisticated models. By embracing the world's 3D nature and focusing on the interaction between perception and action, we can unlock new levels of visual intelligence and create AI systems that truly augment human capabilities.

How do we build machines with visual intelligence?

Building visually intelligent machines involves a three-step ladder:

Understanding: Teaching machines to recognize and label objects, scenes, and actions in images and videos. The first step is understanding the world by labeling semantic content and perceptual properties based on visual pixels.
Reasoning: Enabling machines to infer relationships, predict events, and draw conclusions based on visual information.
Generation: The ability to create or alter pixels, is a further step, enabled by advancements like GANs, VAEs, and diffusion models. These techniques allow machines to create new images, videos, and even 3D scenes based on given instructions or prompts.

These capabilities are achieved through data-driven algorithms, particularly deep learning models, trained on massive datasets.

What is the role of data in advancing visual intelligence?

Data is as crucial as algorithms for building visually intelligent machines. Large, diverse datasets like ImageNet, Visual Genome, and more recently, Behavior (for robotics), provide the necessary fuel for training powerful deep learning models. Data quality and diversity significantly impact AI systems' ability to generalize and perform well on unseen examples.

Past: Early machine learning algorithms were limited by data availability, resulting in poor generalization and performance on real-world problems.
- Creating ImageNet, a massive object-recognition dataset, marked a turning point. It provided the necessary data for training deep learning models and propelled the field forward.
Present: Data remains crucial for advancing visual intelligence, particularly in 3D reasoning, generation, and robotic learning.
- Large-scale datasets like Visual Genome, Behavior, and ObjectFolder enable the development of models that can reason about relationships, interact with complex environments, and understand multimodal information.
Future: The importance of data will only continue to grow as we strive to build more capable and sophisticated AI systems.
- Future advancements in robotics and embodied AI will heavily rely on diverse, large-scale, and multimodal datasets that capture the complexity of the real world.
- This push towards spatial intelligence brings new challenges and opportunities. Building models that can reason about 3D scenes, generate realistic 3D objects, and control robots in complex environments requires new types of data, including 3D models, depth maps, and physics simulations.
- Collecting data that allows robots to learn complex manipulation tasks, adapt to new environments, and interact naturally with humans will be critical for their successful integration into our lives.

Data has been, is, and will continue to be a driving force in advancing machine learning and realizing truly intelligent machines.

The Future

The talk concludes by looking towards the future of visual intelligence, focusing on the potential of robotics and embodied AI to revolutionize our world. Fei-Fei stresses the importance of building robots that can perform complex tasks in real-world environments, augmenting human capabilities rather than replacing them.
However, achieving this vision requires overcoming the significant challenge of acquiring real-world robotics data. The speaker highlights promising approaches, such as leveraging existing knowledge from large language models (LLMs) to guide robot learning and enable them to perform tasks with minimal physical training data.
The talk underscores that data will continue to be the driving force behind the advancement of visual intelligence and the development of truly intelligent machines.

The talk effectively presents a timeline of deep learning's evolution, from its early struggles with limited data to its current successes and future aspirations. The narrative highlights the central role of data in driving this progress and emphasizes the need to move beyond "flat earth" AI towards a deeper understanding of the 3D world. By embracing data-driven approaches and focusing on the interaction between perception and action, we can build AI systems that augment human capabilities and shape a better future.

Thanks for reading!

If you found the newsletter helpful, please do me a favour and smash that like button!

You can also help by sharing this with your network (whether via re-stack, tweet, or LinkedIn post) so they can benefit from it, too.

🤗 This small gesture and your support mean far more to me than you can imagine!

Stay tuned for tomorrow’s post, a breakdown of a rare talk by Illya Sutskever!

Cheers,

Harpreet

The Generative Generation

Discussion about this post