I spent the last few weeks building an AI command center from scratch and put the code on GitHub. Twelve JavaScript files, zero dependencies, no build step. You open index.html in a browser and you have 28 models across Groq and OpenRouter, adversarial shadow auditing that fact-checks every fourth response, entity extraction that builds a knowledge graph across sessions, and a pixel-art avatar with a breathing animation and chiptune sound design. The whole thing is at github.com/mikjgens/freeAI.

freeAI Command Center v2.7 initialization state showing the system panel, fleet selector with Qwen3 32B active, and empty chat awaiting input

The Command Center in its initialization state. Empty sidebar, fleet selector showing 28 models, and the system awaiting tactical input.

Here's why I built it and what's actually in the code.

KEY TAKEAWAYS
  • freeAI is 12 vanilla JS files. No framework, no build tool, no npm. Open index.html and you're running. github.com/mikjgens/freeAI
  • 28 free-tier models from two providers. 8 Groq models (Llama 4 Scout, Qwen3 32B, Llama 3.3 70B, GROQ Compound). 20 OpenRouter models (Kimi K2.6, Nemotron-3 Super, Qwen3 Coder, Gemma 4 31B). Two free API keys and you have the fleet.
  • Adversarial shadow model audits responses. A second model silently checks every fourth answer. Each sentence gets green (agreed), amber (uncertain), or red (disputed) underlines. Hover to see what was flagged.
  • Session intelligence watches for drift. Every 4 messages, a background LLM call scans for contradictions, unresolved questions, and topic drift. It surfaces non-intrusive notice cards with clickable follow-ups.
  • Delta Mode compares 4 models side-by-side. Same question, same system prompt, no conversation history. Staggered dispatch prevents rate-limit collisions. Clean A/B test.

The Problem

AI chat UIs fall into two camps and both of them are broken.

Camp one: SaaS products. ChatGPT, Claude, Perplexity. Monthly subscriptions, usage caps, your data shipped to analytics providers. Good products, but you're renting a UI someone else controls. If they deprecate a model or change the interface, that's your workflow getting disrupted.

Camp two: open-source wrappers. Someone publishes a chat UI on GitHub, usually React or Next.js, calling one API endpoint, requiring npm install and a build step and environment variables. You get one model behind a chat box. No context management. No fallback when the provider goes down.

Neither one gives you what you actually need: a tool. Not a product. Not a demo. Something you pick up, use, modify, and own.

What I Built

freeAI loads 12 JavaScript files from a single HTML page. Here's the architecture:

File Lines What it does
app.js 1,480 Orchestrator: sendMessage, delta queries, shadow audit, entity extraction, ambient watcher, voice input, event wiring
dom.js 1,006 All DOM rendering: chat messages, model list, context meter, streaming output, delta grid, shadow annotations, knowledge graph chips
avatar.js 333 16×16 procedural pixel-art sprite engine with 5-state expression machine, breathing, blink, pupil tracking, error flash
state.js 228 Centralized pub/sub store with streaming counter, token cache, session schema v2, knowledge graph persistence
api.js 187 OpenAI-compatible SSE streaming, auto failover chain with exponential backoff, Retry-After header parsing
utils.js 126 Recursive descent math parser, token estimation, HTML escaping, debounce
tools.js 95 Local tool executor: system time, math expression eval, DuckDuckGo web search, UI state inspector
models.js 60 28-model fleet data, provider endpoints, fallback chain configuration, tool definitions, storage keys
rag.js 49 TF-IDF document chunking with sentence-aware boundary detection, inverted index, top-3 retrieval
icons.js 45 32 inline SVG icons (Heroicons outline), zero external icon fonts
sound.js 39 Web Audio API chiptune oscillator, 12 sound types, pentatonic arpeggio, 40ms throttle

No framework. No build tool. The script tags load in dependency order and everything talks through StateManager, a lightweight pub/sub store in state.js. When a state key like selectedModel, conversationHistory, or isStreaming changes, subscribers fire DOM updates automatically. You don't scatter DomLayer.updateX() calls through business logic. Adding a new UI reaction is one subscribe() call.

Streaming state uses a counter, not a boolean. StateManager.incrementStreaming() and decrementStreaming() so Delta Mode's parallel model fan-out doesn't race-condition the Stop button. That's the kind of thing you catch when you build it yourself instead of wrapping a library.

freeAI showing active tool calling during a response: web_search and evaluate_math executed inline with Llama 3.3 70B

Tool calling in action. The system executing web_search and evaluate_math inline during a conversation with Llama 3.3 70B.

The Ambient Intelligence

The chat interface is table stakes. Any competent developer can wire a textarea to an SSE endpoint in an afternoon. What makes freeAI different is what runs around the conversation.

Adversarial Shadow Model

Every fourth response gets silently audited. The system picks the cheapest available model: Llama 3.1 8B Instant on Groq if you've got the key, or whatever fast model you have configured. It sends a structured prompt:

// SHADOW AUDIT PROMPT Audit this answer. Return ONLY a JSON array: [{"sentence_index":0,"confidence":"high|medium|low","concern":null|"reason"}]

The shadow model evaluates each sentence and returns a confidence level. The app parses the JSON, splits the response into sentences, and applies CSS classes directly to the rendered text: shadow-high (green underline, agreed), shadow-medium (amber dotted underline, uncertain), shadow-low (red dashed underline, disputed). Hover any annotated sentence to see what the shadow flagged.

This isn't a confidence score in a sidebar. It's embedded in the words you're reading. If the shadow model thinks a claim is weak, you see it the moment your eyes hit that sentence.

Session Intelligence Watcher

Every 4 messages, the same background loop fires a second LLM call. It compiles the last 10 exchanges and sends them with another structured prompt:

// WATCHER PROMPT Analyze this conversation. Return ONLY a JSON object: {"contradictions":[],"unresolved_questions":[],"drift_events":[], "recommendation":"","should_intervene":false}

If the watcher finds contradictions, unresolved questions, or topic drift, it surfaces a non-intrusive // Notice: card in the chat. Unresolved questions render as clickable links. Click one and it populates the input field for a follow-up. You don't have to remember what you asked four exchanges ago. The system does.

Session Intelligence Watcher card surfacing an unresolved question as a clickable follow-up chip beneath an AI response

The Session Watcher automatically surfacing an unresolved question as a clickable follow-up link below the model's response.

Knowledge Graph

Every response fires entity extraction in two passes. Pass one is local: regex-based extraction of named entities (capitalized multi-word phrases), acronyms (2-8 uppercase characters), and quoted phrases (3-60 characters). A stop-word filter removes common words. Results are upserted into a persistent entity store.

Pass two fires when a cheap model is available: sub-LLM enrichment that classifies each entity as concept, person, decision, or question. The entities render as a chip rail above the composer, color-coded by type. Click any chip to see its relationships. The graph persists in localStorage under war_chest_graph and accumulates across every session.

Delta Mode

Toggle Delta Mode and your next query fans out to four models simultaneously. The system picks candidates by tags from the fleet: the fastest model, the deepest reasoning model, the most creative model, plus your currently selected model. Dispatch is staggered: Groq models fire at 2-second intervals and OpenRouter at 1.5-second intervals to avoid rate-limit collisions across providers.

All four receive the same system prompt and the same question with no conversation history. Responses render side-by-side in a CSS grid. Same input, different brains. You can see immediately which model is stronger on a given question.

Delta Mode comparing four models side-by-side: Llama 3.3 70B, Qwen3 32B, GPT-OSS-120B, and Llama 4 Scout responding to the same prompt

Delta Mode comparing four models on the same question. Staggered dispatch prevents rate-limit collisions.

All of this runs in the browser. The background LLM calls for shadow audits, entity enrichment, and session watching are fired as fetch requests from your machine directly to the provider APIs. There's no server in the middle.

The Fleet

Twenty-eight models. Here's what's actually in the fleet definition at models.js:3-31:

Provider Model Context Tools Notable
Groq Llama 4 Scout 131K Function Calling Vision, 20MB images
Groq GROQ Compound 131K Built-in Web search, code execution, browser automation
Groq Llama 3.3 70B 131K Function Calling 280 tokens/sec, smartest in fleet
Groq Llama 3.1 8B 131K Function Calling 500K TPD, used for background shadow/watcher calls
OpenRouter Kimi K2.6 262K Function Calling Agent swarm: 100+ agents
OpenRouter Nemotron-3 Super 1M Function Calling Deep logic, 120B parameters
OpenRouter Qwen3 Coder 1M Function Calling Best coding model in fleet
OpenRouter Gemma 4 31B 262K Function Calling Frontier reasoning from Google

The fleet panel in the right sidebar shows all 28. Filter by provider, by capability (vision, tools, speed), or by free-text search. Click a model to see its full profile: provider, model ID, context window, tool support, tags, and its known weakness. Every model in the fleet has a documented weakness field.

The Guardrails

Running on free-tier APIs means you have to be smart about limits. The system bakes in specific protections:

TPM-aware trimming. Groq free-tier models have token-per-minute budgets (4,000 TPM for Qwen3 32B, 8,000 TPM for Llama 3.3 70B and Llama 3.1 8B). Before every send, the system estimates current token usage against the model's TPM budget and trims conversation history to fit. The trim function walks backward from the most recent message, keeping only what fits.

Rate limit retry. When a provider returns 429 or 413, the system doesn't just fail. It parses the Retry-After header from the response metadata, waits that duration, and retries up to 3 times with exponential backoff. If all retries fail on Groq, it walks the fallback chain to OpenRouter.

Image data scrubbing. After sending a message with an image attachment, the base64 data URL in the conversation history is replaced with an [image] placeholder. Without this, a single large image could push localStorage past its 5-10MB quota.

Sub-call cancellation. When you send a new message, any in-flight background calls (entity extraction, shadow audit, session watcher) are aborted via AbortController. The system doesn't let stale background work compete with your next query.

CONFIG.YAML Secure Vault modal showing masked API key fields for Groq and OpenRouter with commit and .env upload buttons

The CONFIG.YAML secure vault. API keys stored in browser localStorage with zero server-side persistence.

Why I Put It On GitHub

I didn't build freeAI to compete with ChatGPT. I built it because the thing I wanted didn't exist. Every AI chat UI I tried was either a subscription product or a thin wrapper with a dropdown menu. Neither gives you ambient intelligence. Neither lets you compare four models on the same question. Neither builds a persistent knowledge graph of everything you discuss. Neither fact-checks itself in the background.

I put the code on GitHub because that's where code goes when you want people to be able to read it. Twelve files. Vanilla JS. Every line of code that touches your data is right there in the repo. API keys stay in your browser's localStorage. The only network requests leaving your machine are the ones you authorize to Groq and OpenRouter. There's no telemetry. No analytics. No backend.

You can fork the repo and strip out everything except the chat interface and one model. You can add your own models, your own tools, your own system prompt. You can break it and fix it. It's a tool, not a service.

The repo is at github.com/mikjgens/freeAI. If you build something cool with it, I want to see it. If you find something broken, open an issue. If none of that happens and I'm the only person who ever uses it, that's fine too. I finally have the AI command center I actually wanted.