Large Language Models

SFT on GPT-5.2 reasoning traces for trading

Teaching a 0.6B Model to Trade: SFT on GPT-5.2 Reasoning Traces

Distilling GPT-5.2 reasoning traces into a tiny 0.6B language model via supervised fine-tuning to create a compact trading agent that learns structured decision-making from a large teacher model.

GitHub Blog Post

Can We Fine-Tune a 0.6B LLM with GRPO for Trading?

Exploring Group Relative Policy Optimization (GRPO) to fine-tune a small 0.6B language model for trading decisions — combining RL-based optimization with LLM reasoning for financial markets.

GitHub Blog Post

Distributed RL for LLM Fine-Tuning

Multi-GPU distributed RL framework for fast and memory-efficient LLM fine-tuning. Uses Ray for orchestration, vLLM for inference, and Unsloth for training. Supports flexible actor-to-learner GPU ratios and implements Policy Gradient and GRPO.

GitHub

Agent Tool RL

Training small language models to use tools with RL. Even a 0.5B model goes from 12% to 100% accuracy on math tasks once trained with RL to call a calculator tool.

GitHub

SCoRe

Minimal implementation of "Training Language Models to Self-Correct via Reinforcement Learning" — teaching LLMs to iteratively refine their own answers through RL-based self-correction.

GitHub Paper

ARC Test-Time Training

Simplified test-time training for ARC-AGI abstract reasoning tasks. Fine-tunes per-task LoRA adapters on augmented training examples at inference time to boost puzzle-solving accuracy.

GitHub

CoT-Decoding

Minimal implementation of chain-of-thought reasoning without prompting. Explores top-k alternative token sequences to uncover inherent reasoning paths in LLMs during decoding.

GitHub Paper

Projects & Highlights

Teaching a 0.6B Model to Trade: SFT on GPT-5.2 Reasoning Traces

Can We Fine-Tune a 0.6B LLM with GRPO for Trading?

Distributed RL for LLM Fine-Tuning

Agent Tool RL

SCoRe

ARC Test-Time Training

CoT-Decoding