Proximal Policy Optimization (PPO) - How to train Large Language Models

Length 38:23 • 28.5K Views • 9 months ago

Serrano.Academy 📃 My History

LikeShare

Video Terkait

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

DRL Lecture 2: Proximal Policy Optimization (PPO)

DRL Lecture 2: Proximal Policy Optimization (PPO)

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

What are Transformer Models and how do they work?

What are Transformer Models and how do they work?

An introduction to Policy Gradient methods - Deep Reinforcement Learning

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Stable Diffusion - How to build amazing images with AI

Stable Diffusion - How to build amazing images with AI

Proximal Policy Optimization | ChatGPT uses this

Proximal Policy Optimization | ChatGPT uses this

Machine Learning for Everybody – Full Course

Machine Learning for Everybody – Full Course

What is Q-Learning (back to basics)

What is Q-Learning (back to basics)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Proximal Policy Optimization Explained

Proximal Policy Optimization Explained

MIT 6.S191: Reinforcement Learning

MIT 6.S191: Reinforcement Learning

John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges

John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges

Streamed 1 year ago

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial

Deep Learning: A Crash Course (2018) | SIGGRAPH Courses

Deep Learning: A Crash Course (2018) | SIGGRAPH Courses

Streamed 6 years ago

CS 285: Eric Mitchell: Reinforcement Learning from Human Feedback: Algorithms & Applications

CS 285: Eric Mitchell: Reinforcement Learning from Human Feedback: Algorithms & Applications