Learn AI
The best free resources for understanding and building with AI
Curated by people who've gone through the learning curve. No paywalls, no fluff — just the resources that actually move the needle.
fast.ai — Practical Deep Learning
Free top-down course. The fastest path from zero to building real models.
Free · Self-pacedNeural Networks: Zero to Hero
Andrej Karpathy builds everything from scratch. The clearest explainer on the internet.
Free · YouTube3Blue1Brown — Neural Networks
The best visual introduction to how neural networks actually work.
Free · YouTubeHugging Face NLP Course
Free, hands-on course covering transformers, fine-tuning, and deployment.
Free · InteractiveLLMs from Scratch
Sebastian Raschka's book — build a GPT-style LLM from the ground up in PyTorch.
Free · GitHubDeepLearning.AI Short Courses
Andrew Ng's bite-sized courses on agents, RAG, fine-tuning, and more.
Free · Short coursesOpenAI Cookbook
Practical examples and guides for building with GPT-4o and o3.
Free · ReferenceAnthropic Docs
Claude API docs, prompt engineering guides, and the Claude 3.7 system card.
Free · ReferenceHugging Face Docs
The definitive reference for Transformers, Diffusers, and the full HF ecosystem.
Free · ReferenceThe Batch — DeepLearning.AI
Andrew Ng's weekly newsletter. Consistently the best signal-to-noise in AI media.
Free · WeeklyTLDR AI
Daily 5-minute digest of the most important AI news, research, and launches.
Free · DailyImport AI — Jack Clark
Weekly newsletter from Anthropic co-founder. Dense, technical, essential.
Free · WeeklyVega: Learning to Drive with Natural Language Instructions
Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To address this, we first construct a large-scale driving dataset (InstructScene) containing around 100,000 scenes annotated with diverse driving instructions with the corresponding trajectories. We then propose a unified Vision-Language-World-Action model, Vega, for instruction-based g
Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving
Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectives or rely on fixed driving modes, lacking the ability to adapt to individual preferences or interpret natural language intent. To address this gap, we propose Drive My Way (DMW), a personalized Vision-Language-Action (VLA) driving framework that aligns with users' long-term driv
Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment
The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteBack-RAG, a framework that uses labeled examples to identify where retrieval succeeds, isolate the relevant documents, and distill them into compact knowledge units that are indexed alongside the original corpus. Because the method modifies only the corpus, it can be applied once
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the generation history through a novel three-partition KV-cache strategy. Specifically, we categorize the historical context into three distinct types: (1) Sink tokens, which preserve early anchor frames at full resolution to maintain global semantics; (2) Mid tokens, which achieve
PixelSmile: Toward Fine-Grained Facial Expression Editing
Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off between expression editing and identity preservation. We propose PixelSmile, a diffusion framework that disentangles expression semantics via fully symmetric joint training. PixelSmile combines intensity supervision with contrastive learning to produce stronger and more distinguis
Back to Basics: Revisiting ASR in the Age of Voice Agents
Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which conditions, in which languages, will cause what degree of degradation. We introduce WildASR, a multilingual (four-language) diagnostic benchmark sourced entirely from real human speech that factorizes ASR robustness along three axes: environmental degradation, demographic shift, and l
Inside our approach to the Model Spec
Learn how OpenAI’s Model Spec serves as a public framework for model behavior, balancing safety, user freedom, and accountability as AI systems advance.
Build a Domain-Specific Embedding Model in Under a Day
CERN uses tiny AI models burned into silicon for real-time LHC data filtering
STADLER reshapes knowledge work at a 230-year-old company
Learn how STADLER uses ChatGPT to transform knowledge work, saving time and accelerating productivity across 650 employees.
Namespace: We've raised $23M to build the compute layer for code
Show HN: Open-Source Animal Crossing–Style UI for Claude Code Agents
We posted here on Monday and got some great feedback. We’ve implemented a few of the most requested updates:- iMessage channel support (agents can text people and you can text agents) Other channels are simple to extend. - A built-in browser (agents can navigate and interact with websites) - Scheduling (run tasks on a timer / cron/ in the future) - Built in tunneling so that the agents can share local stuff with you over the internet - More robust MCP and Skills support so anyone can e