Hi, I'm Tyler Romero.
In the recent past, I worked on large-scale recommender systems at Twitter, where I built all of the ML and ran the A/B tests behind the experimentally-successful yet sadly short-lived downvote button. I also researched, trained, and shipped model architecture improvements for ranking Twitter’s home timeline and conversation reply trees. And some of my work at Twitter is now open-source! Although the git-blame has been sanitized. Before Twitter, I worked as a Research Scientist building out greenfield ML projects at Microsoft.
My academic background includes a Master’s in computer science and machine learning from Stanford, and a Bachelor’s in computer engineering from Texas A&M. As an undergraduate, I performed research on novel implementations of parallel algorithms written in C / Cilk and interned as a Software Engineer at Bloomberg and Microsoft. I made a few contributions to Bloomberg’s Asset and Investment Management function and wrote Microsoft a data retrieval package for R that is still supported 8 years later.
Posts
-
NanoGPT Speedrun Living Worklog
 
February 18, 2025 - How fast can I train GPT-2 on two RTX 4090 GPUs? This is a living worklog of my progress.
-
Reducing VRAM Footprint in PPO and GRPO Using Selective Log-Softmax
 
February 6, 2025 - Reduce VRAM usage by half when computing log probabilities by selectively applying log-softmax to only the necessary tokens. Perfect for many RLHF post-training algorithms (such as PPO and GRPO) where typically only one token's log probability is needed from the entire vocabulary at each sequence position.
-
An Extension to BADGE Active Learning for Variable-Sized Batches
 
January 5, 2025 - We show how BADGE's batch selection strategy can be adapted to handle flexible batch sizes without compromising its ability to select diverse, informative samples - enabling more practical active learning workflows.
-
Direct Preference Optimization Explained In-depth
 
April 13, 2024 - Covering DPO, a recently-proposed alternative to RLHF for preference tuning.
Projects
-
Liger-Kernel
Recently I've been contributing to Liger-Kernel, a collection of custom Triton Kernels for efficient LLM training. I've found these kernels very useful for training LLMs/VLMs on my RTX 4090. My contributions, as well as those of other top collaborators, were recently featured in a post on the LinkedIn Engineering Blog.
-
microR1
A micro-scale DeepSeek-R1 reproduction in the style of Karpathy's nanoGPT. Intended to be easy to understand and to hack on top of.
-
seahorse
I've also been building seahorse, a small Vision-Language model meant for research. It's early stages, but it is extensible and designed to train quickly on a single RTX 4090.
Favorite Reads
-
sibohem
An excellent ML engineering blog by Simon Boehm, and a large part of the inspiration for this site. I especially recommend Simon’s posts on optimizing multidimensional matrix multiplication on CPU and pipeline parallelism for distributed training.
-
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
A detailed guide to large-scale training of LLMs, covering 1D through 5D training parallelism, GPU fusing and threading, and more.
-
How to Scale Your Model: A Systems View of LLMs on TPUs
An online book that explains how TPU and GPU hardware works and how the Transformer architecture has evolved to perform well on current hardware.
-
Google Research's Tuning Playboook
A collection of valuable advice and practical guidelines for training deep learning models.
-
The 37 Implementation Details of Proximal Policy Optimization
A legendary ICLR blog post diving into the (unreported/underreported) implementation details of PPO. Necessary reading for anyone working on LLM post-training with PPO or GRPO.
-
Learning CUDA by optimizing softmax: A worklog
A nice post by Maharshi Pandya on optimizing a softmax CUDA kernel.
-
Michael Nielsen's Principles of Effective Research
A concise and thoughtful guide on cultivating habits, vision, and discipline to maximize research impact and personal growth.
Fun Stuff
-
Recipe Box
I find great joy in the process of cooking and I like to keep a recipe box of my favorite dishes.
-
How this startup used AI to keep raccoons from invading my house
My friends and I help a Seattle tech reporter keep some curious raccoons out of his living room. Related: Found a raccoon in the living room — now seeking a tech solution so it doesn’t happen again
Website
This website is made with 11ty, Tufte CSS, and eleventufte. Custom figures are made with Excalidraw. The combination of Tufte CSS and Excalidraw to achieve a notebook-like appearance was borrowed from Simon Boehm's website, because having a visually appealing site helps motivate me to write.