Reinforcement Learning

Deep RL algorithms, applications, and small projects

Projects & Highlights

TorchRL

TorchRL

A data-driven decision-making library for PyTorch. Core contributor to the official PyTorch RL library providing modular, efficient components for RL research and applications.

Distributed Architectures

Distributed RL Architectures

Research on integrating distributed training architectures into modular RL libraries, enabling scalable and efficient reinforcement learning across multiple workers.

Guided Exploration

Guided Exploration with PPO

A method for guided exploration using Proximal Policy Optimization with a single demonstration, combining imitation learning cues with policy gradient methods.

TorchTrade

TorchTrade

An ML framework for algorithmic trading built on TorchRL. Supports online RL, offline RL, model-based RL, LLM agents, and rule-based strategies with live trading and backtesting environments.

SpaceX Falcon 9 Landing Simulator

SpaceX Falcon 9 Landing

RL agent trained to land a rocket on a drone ship using MuJoCo physics. GPU-accelerated training with 4096 parallel environments and PPO via TorchRL.

Happy to discuss any of these — feel free to reach out.

Other Projects and Implementations

SAC and Extensions ★ 295 SAC with PER, Emphasizing Recent Experience, Munchausen RL, D2RL, and parallel environments. CQL ★ 147 Conservative Q-Learning for offline RL. Includes DQN-CQL and SAC-CQL variants for discrete and continuous action spaces. DQN Atari Agents ★ 122 Modular DQN agents for Atari: DDQN, Dueling DQN, Noisy DQN, C51, Rainbow, and DRQN. IQN and Extensions ★ 93 Implicit Quantile Networks for distributional RL with PER, noisy layers, N-step bootstrapping, and dueling architecture. Deep RL Algorithm Collection ★ 80 Comprehensive collection of deep RL algorithm implementations in PyTorch. Upside-Down RL ★ 78 Implementation of Schmidhuber's Upside-Down RL — supervised learning on desired rewards and time horizons. SAC Discrete ★ 55 Soft Actor-Critic adapted for discrete action spaces. Implicit Q-Learning ★ 44 IQL for offline RL — learns optimal policies without querying out-of-distribution actions. Munchausen RL ★ 44 M-DQN and M-IQN: adding scaled log-policy to the reward for implicit KL regularization. FQF and Extensions ★ 34 Fully Parameterized Quantile Function for distributional RL with N-step, PER, noisy layers, and dueling. QR-DQN ★ 29 Distributional RL with quantile regression for learning the full return distribution. NAF ★ 28 Q-learning for continuous control using normalized advantage functions with PER and N-step returns. D4PG ★ 24 Distributed Distributional DDPG with an IQN critic, plus Munchausen RL and D2RL extensions. REDQ ★ 21 Randomized ensembled double Q-learning for sample-efficient continuous control. GARNE ★ 14 Genetic algorithm with recurrent networks and novelty-driven exploration for neuroevolution. GA Neural Network Optimization ★ 13 Genetic algorithms for neural network architecture search, hyperparameter tuning, and weight optimization. GANs ★ 12 ClusterGAN implementation — combining GANs with clustering for unsupervised representation learning. OFENet ★ 10 Online Feature Extractor Network — learned feature representations for continuous control RL. RA-PPO ★ 8 Risk-averse PPO — policy optimization with CVaR objectives for safer decision making. MBPO ★ 6 Model-Based Policy Optimization — using learned dynamics models to generate synthetic rollouts for SAC. PyTorch V-MPO ★ 5 V-MPO: on-policy maximum a posteriori policy optimization with learned temperature parameters. Hindsight Experience Replay ★ 4 HER for goal-conditioned RL — relabeling failed episodes with achieved goals to accelerate learning. PETS-MPC ★ 3 Probabilistic ensemble trajectory sampling with model predictive control for model-based RL. CEN Network ★ 2 Context-dependent Elastic Network for continual learning without catastrophic forgetting. D4PG-Ray ★ 2 Distributed D4PG using Ray for parallel data collection with IQN critic and Munchausen RL.
More on GitHub