You May Also Enjoy
Universal Function Approximator
less than 1 minute read
Universal Approximation Theorem The universal approximation theorems (UATs) state that neural networks with a certain structure can, in principle, a...
Solving International Mathematical Olympiad with GPT-OSS-120B (AIMO3)
7 minute read
My Work This post documents my hosted implementation and results. AIMO3 + GPT-OSS 120B Jupyter Notebook HTML Result: On the IMO-style test set, my ...
Energy-Based Models & Structured Prediction
8 minute read
Energy-Based Models (EBMs) assign a scalar energy to configurations of variables and perform inference by minimizing energy. Intro We tackle structured p...
Transformer Architecture Tutorial
less than 1 minute read
Here is the list of good resources to understand transformer architecture. Distilled AI on Transformer Harvard Annotated Transformer ...