How pneuma generates, compiles, and runs software from a single prompt

March 29, 2026·the pneuma team·10 min read

When you type a sentence into pneuma and press enter, something unusual happens. There is no app store lookup. No package manager. No installation. Instead, a program is written, compiled, and executed — all in a few seconds. The software you asked for materializes on screen, running natively, from nothing.

This post explains how that works.

The pipeline

Every interaction in pneuma follows the same path:

Intent classification — figure out what the user wants
Code generation — an LLM writes a Rust program
Compilation — rustc compiles it to WebAssembly
Execution — Wasmtime runs it in a sandboxed thread
Rendering — draw commands are composited to the GPU

Each step is designed to be fast and recoverable. If any step fails, the system knows how to retry intelligently. Let's walk through each one.

Step 1: Intent classification

Before generating any code, pneuma needs to understand what you're asking for. The input "make a calculator" is a creation request. "add a history panel" is an edit to an existing agent. "close the clock" is a deletion.

Classification starts locally. A fast pattern matcher (what we call T0) handles common commands — quitting, listing agents, opening help — without touching the network. Everything else goes to Claude for classification, which returns one of: create, edit, delete, close, or navigate.

This two-tier approach keeps simple interactions instant and reserves AI inference for genuine ambiguity.

Step 2: Code generation

This is where the real work happens. The user's intent, along with the current screen dimensions, is sent to Claude Sonnet. The system prompt is a 600+ line specification that defines everything a generated program needs to know:

The complete syscall ABI — drawing, input, networking, storage, IPC, time, audio
Memory and execution constraints (#![no_std], #![no_main], no allocator)
Patterns for common tasks — text rendering, number formatting, state management
A catalog of mistakes LLMs tend to make, with corrections

The prompt asks Claude to output only a Rust source file. No markdown fences, no explanation, no commentary. Just code that conforms to the ABI.

The generated code is a complete, self-contained Rust program. It declares extern "C" imports for the host functions it needs, defines a pneuma_main() entry point, and manages its own state with static mut globals. There's no standard library, no heap allocation, no runtime beyond what the host provides.

A typical agent — say, a clock — is around 40-80 lines of Rust. A complex dashboard with HTTP requests and persistent storage might be 200-400 lines. Claude generates this in 3-8 seconds.

Step 3: Compilation

The generated source goes through a preparation pipeline before reaching rustc:

Common mistake fixing — automated patches for patterns LLMs get wrong (method-style .sqrt() calls become free functions, integer/float modulo mismatches get corrected)
Math intrinsic injection — sin, cos, sqrt, abs, and other math functions are injected as WASM-native imports, since no_std Rust on wasm32 doesn't provide them

Then rustc compiles to wasm32-unknown-unknown:

rustc --target wasm32-unknown-unknown \
      --edition 2021 -O \
      -C debuginfo=1 \
      -C link-args=--export=__stack_pointer \
      --crate-type cdylib \
      agent.rs -o agent.wasm

The flags matter. -O for optimization. debuginfo=1 so WASM backtraces include function names. The __stack_pointer export is critical — it lets the runtime save and restore the stack across fuel-based execution pauses. Dead code and unused import warnings are suppressed, since generated code may include defensive declarations.

Compilation typically takes under a second. The output is a 50-200KB WASM module.

Step 4: Execution

Each agent runs on its own OS thread inside a Wasmtime sandbox. The sandbox enforces hard limits:

64MB memory per agent
Fuel-based scheduling — each frame gets ~1 billion fuel units (roughly 16ms of compute at 60 FPS). When fuel runs out, the agent yields and the host collects its draw commands
Capability-based permissions — agents declare what they need (network, storage, audio, IPC) and the user approves or denies at launch

The execution model is cooperative but enforced. An agent's pneuma_main() runs in an infinite loop, calling pneuma_present() to mark each frame and pneuma_sleep() to yield time. If an agent tries to spin without yielding, the fuel limit forces a pause anyway.

A subtle detail: when fuel runs out mid-function, the WASM stack is in an undefined state — the function epilogue never ran. So the runtime saves the __stack_pointer global before each frame call and restores it after fuel exhaustion. Without this, agents would leak stack memory on every frame.

Step 5: Rendering

Agents don't render directly to the screen. They issue draw commands — clear, rect, text, circle, line — which accumulate in a shared buffer. The main thread's compositor collects these buffers every frame and composites them into a single scene using wgpu (Vulkan/Metal/DX12).

Each agent gets a screen region. The compositor handles layout — tiling agents in a grid, letting the user resize and rearrange them. System agents (the status bar, the dock, the prompt overlay) render on top of everything else.

Text is rasterized with glyphon and cosmic-text. Rounded rectangles use SDF shaders. The visual language is minimal: dark backgrounds, muted text, electric blue accents.

Error correction

LLMs don't write perfect code every time. pneuma is designed around this reality.

When rustc fails to compile generated code, the error message and source are sent back to Claude with an error correction prompt. The system gets up to 3 attempts:

Attempts 1-2: Claude Sonnet rewrites the code with the compiler error as context
Attempt 3: the system escalates to Claude Opus — a more capable model that's better at reasoning through complex type errors or logic issues

Runtime errors follow the same pattern. If a WASM agent traps (panics, out-of-bounds access, integer overflow), the crash is caught, and the source is sent back through the correction pipeline with the runtime error message. User agents get 2 crash recovery attempts. System agents (networking, filesystem, crypto, audio) restart automatically with no limit — they're too important to stay down.

This layered retry strategy means most generation failures are invisible to the user. The prompt goes in, and a working program comes out, even if it took a correction pass or two behind the scenes.

The agent ecosystem

Not everything in pneuma is generated on the fly. The system ships with a set of immutable system agents written in native Rust:

Compositor — manages window layout and agent regions
Status bar — shows tier, agent count, system status
Dock — side panel for launching and managing agents
Prompt — the text input overlay
Net agent — handles all HTTP and TCP/TLS for user agents via IPC
FS agent — provides scoped file storage backed by an embedded database
Crypto agent — SHA-256, HMAC, base64
Audio agent — tone and note playback via rodio

User agents don't make network requests directly. They send an IPC message to the net agent, which makes the request on their behalf. This architecture means the WASM sandbox never needs network access — all I/O goes through capability-checked message passing. The same pattern applies to filesystem, crypto, and audio operations.

The IPC router is a simple message bus: each agent has a 256-message mailbox, messages are binary blobs with optional request ID headers, and delivery is FIFO. Agents can find each other by name and send targeted messages. This is how a weather dashboard fetches data — it asks the net agent to make an HTTP GET, waits for the response in its mailbox, then parses the JSON using kernel-provided pneuma_json_get functions.

The full picture

pneuma is not a thin wrapper around an LLM. It's a runtime environment — closer to an operating system than a chatbot. The desktop app is pure Rust: wgpu for GPU rendering, winit for windowing, wasmtime for sandboxed execution, sled for storage. There's no Electron, no browser engine (except for the optional embedded webview), no JavaScript in the core path.

The design bet is that code generation is fast enough and reliable enough to replace pre-built software for a meaningful set of use cases. You don't need a weather app — you need weather information, and the program that displays it can be written in real time. You don't need a todo app — you need a list, and the code for that list can exist for exactly as long as you need it.

Every program in pneuma is ephemeral by default and persistent by choice. State lives in a scoped key-value store. The source code lives on the server. The compiled WASM lives in memory. Close an agent and it's gone. Reopen it from the store and it's regenerated. The software is a function of intent, not an artifact to manage.

That's the pipeline. Intent in, software out. Every time.