Hi, I'm Tyler Romero.

I am a Research Engineer at the Allen Institute (Ai2), where I work on open language modeling.

Previously, I was Lead ML Engineer at Groundlight, a startup building multimodal question-answering systems. Prior to that, I worked on large-scale recommender systems at Twitter, where I developed the ML and ran the A/B tests for the experimentally-successful yet short-lived downvote button. I also researched, trained, and shipped model architecture improvements for ranking Twitter's home timeline and conversation reply trees. And some of my work at Twitter is now open-source! Although the git-blame has been sanitized. Earlier in my career, I worked as a Research Scientist at Microsoft, where I built greenfield ML projects.

I studied CS and ML at Stanford and computer engineering at Texas A&M, where I researched parallel algorithms in C / Cilk. Outside of work, I contribute to open-source ML projects and write about what I’m learning here.

Posts

Mar 8, 2025
NanoGPT Speedrun Living Worklog
How fast can I train GPT-2 on two RTX 4090 GPUs?
Feb 6, 2025
Reducing VRAM Footprint in PPO and GRPO Using Selective Log-Softmax
Reduce VRAM usage by half when computing log probabilities by selectively applying log-softmax to only the necessary tokens.
Jan 5, 2025
An Extension to BADGE Active Learning for Variable-Sized Batches
We show how BADGE's batch selection strategy can be adapted to handle flexible batch sizes without compromising its ability to select diverse, informative samples - enabling more practical active learning workflows.
Apr 13, 2024
Direct Preference Optimization Explained In-depth
Covering DPO, a recently-proposed alternative to RLHF for preference tuning.

Publications & Preprints

Mar 2026
Olmo Hybrid ⊕ — William Merrill*, Yanhong Li*, Tyler Romero*, Anej Svete*, Caia Costello*, Pradeep Dasigi, Dirk Groeneveld, David Heineman, Bailey Kuehl, Nathan Lambert, Chuan Li, Kyle Lo, Saumya Malik, DJ Matusz, Benjamin Minixhofer, Jacob Morrison, Luca Soldaini, Finbarr Timbers, Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi, Ashish Sabharwal*.
Ai2 technical report.
We provide evidence for the advantages of hybrid models (mixing attention and linear RNN layers) over pure transformers on several fronts: theoretically, we show that hybrid models can express tasks beyond both transformers and linear RNNs individually; empirically, we find that hybrids scale more efficiently. We train Olmo Hybrid, a 7B model with sliding window layers replaced by Gated DeltaNet layers, and show it outperforms Olmo 3 across standard pretraining and long context evaluations.
Nov 2025
Olmo 3 ⊕ — Team Olmo (incl. Tyler Romero, core contributor).
arXiv preprint arXiv:2512.13961.
We introduce Olmo 3, a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales. Olmo 3 model construction targets long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall. This release includes the entire model flow, i.e., the full lifecycle of the family of models, including every stage, checkpoint, data point, and dependency used to build it. Our flagship model, Olmo 3 32B Think, is the strongest fully-open thinking model released to-date.

Projects

Liger-Kernel
Recently I've been contributing to Liger-Kernel, a collection of custom Triton Kernels for efficient LLM training. I've found these kernels very useful for training LLMs/VLMs on my RTX 4090. My contributions, as well as those of other top collaborators, were recently featured in a post on the LinkedIn Engineering Blog.
microR1
A micro-scale DeepSeek-R1 reproduction in the style of Karpathy's nanoGPT. Intended to be easy to understand and to hack on top of.

Favorite Reads

AI/ML Translations
English translations of selected posts from Chinese-language AI/ML blogs and forums.
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
A detailed guide to large-scale training of LLMs, covering 1D through 5D training parallelism, GPU fusing and threading, and more.
How to Scale Your Model: A Systems View of LLMs on TPUs
An online book that explains how TPU and GPU hardware works and how the Transformer architecture has evolved to perform well on current hardware.
Making Deep Learning Go Brrrr From First Principles
A great post by Horace He explaining how to speed up single-GPU training based on whether jobs are compute-, bandwidth-, or overhead-bound. See also What Shapes Do Matrix Multiplications Like?
siboehm
An excellent ML engineering blog by Simon Boehm, and a large part of the inspiration for this site. I especially recommend Simon’s posts on optimizing multidimensional matrix multiplication on CPU and pipeline parallelism for distributed training.
ezyang’s blog
Edward Yang's insightful blog on distributed ML systems. I especially like Computing sharding with einsum.
Simon Willison’s Weblog
An insightful collection of links, quotes, and short blog posts that helps navigate the firehose of ML news.
Interconnects
A substack with long-form technical posts about AI R&D by Nathan Lambert. Also check out Nathan's rlhfbook.com.

Fun Stuff

Recipe Box
I find great joy in the process of cooking and I like to keep a recipe box of my favorite dishes.
How this startup used AI to keep raccoons from invading my house
My friends and I help a Seattle tech reporter keep some curious raccoons out of his living room. Related: Found a raccoon in the living room — now seeking a tech solution so it doesn’t happen again