Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained

Length 57:06 • 23.4K Views • 2 years ago

Yannic Kilcher 📃 My History

LikeShare

Video Terkait

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning (Paper Explained)

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning (Paper Explained)

Autoregressive Diffusion Models (Machine Learning Research Paper Explained)

Autoregressive Diffusion Models (Machine Learning Research Paper Explained)

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)

Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)

Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)

Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)

Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained)

Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained)

∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)

∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)

DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)

DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)

FlashAttention - Tri Dao | Stanford MLSys #67

FlashAttention - Tri Dao | Stanford MLSys #67

Streamed 1 year ago

PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)

PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

The Narrated Transformer Language Model

The Narrated Transformer Language Model

Dynamic Inference with Neural Interpreters (w/ author interview)

Dynamic Inference with Neural Interpreters (w/ author interview)

Think Fast, Talk Smart: Communication Techniques

Think Fast, Talk Smart: Communication Techniques

FNet: Mixing Tokens with Fourier Transforms (Machine Learning Research Paper Explained)

FNet: Mixing Tokens with Fourier Transforms (Machine Learning Research Paper Explained)

Git & GitHub Tutorial | Visualized Git Course for Beginner & Professional Developers in 2024

Git & GitHub Tutorial | Visualized Git Course for Beginner & Professional Developers in 2024

Big Bird: Transformers for Longer Sequences (Paper Explained)

Big Bird: Transformers for Longer Sequences (Paper Explained)

Understanding AI from Scratch – Neural Networks Course

Understanding AI from Scratch – Neural Networks Course