Web LLM Inference Workbench

·

Making the black box of language models transparent through interactive, browser-based visualization of every step in the inference process.

Overview

Language models are opaque by design—users input text and receive output without understanding the complex transformations happening inside. This workbench changes that by exposing the entire inference process as an interactive, visual experience running entirely in your browser.

Technical Implementation

  • WebLLM.js Integration — Runs open-source model weights directly in browser memory
  • Step-by-Step Visualization — Every stage of inference is clickable, explorable, and modifiable
  • WebGL/WebGPU Rendering — Hardware-accelerated visualization of high-dimensional transformations
  • Local Processing — All computation happens on your machine—no server required

What You Can Explore

  1. Weight Matrices — See the actual parameters that encode the model's knowledge
  2. Activation Patterns — Watch how inputs activate different parts of the network
  3. Latent Space Navigation — Explore the high-dimensional spaces where meaning lives
  4. Attention Mechanisms — Understand how the model decides what to focus on
  5. Token Transformations — Follow how text becomes vectors and back again

Mathematical Beauty

The inference process reveals itself as deeply geometric—transformations between manifolds, projections through latent spaces, and the elegant mathematical machinery that turns symbols into understanding. This tool makes these abstract concepts tangible and interactive.

Why This Matters

  • Demystification — Breaks down the "magic" of AI into understandable components
  • Education — Learn how language models actually work, not just how to use them
  • Research — Explore model behavior at a granular level
  • Transparency — Part of a larger mission to make AI systems interpretable

Technology Stack

Built on mature, open technologies:

  • Open-source model weights in standardized formats
  • WebGL/WebGPU for performant visualization
  • Well-documented architectures from published papers
  • Browser-native computation—no installation required

The goal is simple: take a small model that fits in laptop memory and let you play with it, seeing exactly how it transforms your input into output at every single step.