Web LLM Inference Workbench
Making the black box of language models transparent through interactive, browser-based visualization of every step in the inference process.
Overview
Language models are opaque by design—users input text and receive output without understanding the complex transformations happening inside. This workbench changes that by exposing the entire inference process as an interactive, visual experience running entirely in your browser.
Technical Implementation
- WebLLM.js Integration — Runs open-source model weights directly in browser memory
- Step-by-Step Visualization — Every stage of inference is clickable, explorable, and modifiable
- WebGL/WebGPU Rendering — Hardware-accelerated visualization of high-dimensional transformations
- Local Processing — All computation happens on your machine—no server required
What You Can Explore
- Weight Matrices — See the actual parameters that encode the model's knowledge
- Activation Patterns — Watch how inputs activate different parts of the network
- Latent Space Navigation — Explore the high-dimensional spaces where meaning lives
- Attention Mechanisms — Understand how the model decides what to focus on
- Token Transformations — Follow how text becomes vectors and back again
Mathematical Beauty
The inference process reveals itself as deeply geometric—transformations between manifolds, projections through latent spaces, and the elegant mathematical machinery that turns symbols into understanding. This tool makes these abstract concepts tangible and interactive.
Why This Matters
- Demystification — Breaks down the "magic" of AI into understandable components
- Education — Learn how language models actually work, not just how to use them
- Research — Explore model behavior at a granular level
- Transparency — Part of a larger mission to make AI systems interpretable
Technology Stack
Built on mature, open technologies:
- Open-source model weights in standardized formats
- WebGL/WebGPU for performant visualization
- Well-documented architectures from published papers
- Browser-native computation—no installation required
The goal is simple: take a small model that fits in laptop memory and let you play with it, seeing exactly how it transforms your input into output at every single step.