building a fully local text-to-CAD editor

local ai
cad
gemma 3n
ollama
gguf
cadquery
fastapi
react
react ink
developer tools
cli
3d rendering

Cxmpute 3D Lab, or C3D, started from an observation about local inference. AI models are now small enough to run on consumer hardware, which makes it worth asking which categories of software can embed a model directly rather than depending on a cloud service.

CAD is a reasonable place to ask that question. A CAD installation is already large: SolidWorks can occupy tens of gigabytes, and Fusion, AutoCAD, slicers, simulation tools, plugins, and asset libraries add more on top. Against that baseline, a few gigabytes of model weights is a small marginal cost. The premise behind C3D is that, for that cost, a CAD environment can be AI-native rather than cloud-assisted: a fully local system in which the user describes an object in plain English and the machine produces the corresponding CAD geometry.

CAD also differs from the domains where prompting is already routine. Text-to-image generation produces pixels, and a plausible image is often sufficient. CAD output is not pixels. A generated bracket has to be real geometry: a hole has to be a hole, a cylinder has to be dimensioned, and the part has to render, export, and eventually be printable, machinable, or editable. The output cannot merely look correct from one camera angle; it has to exist as a construction the system can operate on.

C3D is an experiment in that direction: a fully local text-to-CAD editor powered by a fine-tuned Gemma 3n model. The system takes a natural language prompt, generates CadQuery code, executes it locally, renders the resulting object, and opens it in a browser-based 3D viewer. The current version is rough, but the shape of the product is clear: a prompt box, a rendering screen, and a local model that knows how to write CAD code.

The first version of the idea was called prompt3d. It used Replicad, a JavaScript wrapper around OpenCascade, because the browser-native version of the idea was the most attractive one. A browser-based CAD editor with a prompt box and a render screen is a compelling product shape. No install, no heavyweight CAD package, no cloud dependency, just code-CAD in the browser.

That version failed for a simple reason: the models were not good enough at the code dialect. Replicad was too niche, the examples were too sparse, and dumping documentation into a large cloud model's context did not reliably produce valid geometry. The model could write code-shaped text, but it could not consistently write code that OpenCascade could turn into a useful part.

C3D took the opposite approach. Instead of forcing a general model to learn a niche CAD library at inference time, it fine-tuned a small model directly on CAD generation. The final stack uses CadQuery, a Python wrapper around OpenCascade; a Gemma 3n fine-tune called C3Dv0; Ollama for local inference; FastAPI for rendering; React/Vite for the viewer; and a React Ink CLI as the entry point.

This post covers why the project moved from browser-native JavaScript CAD to Python code-CAD, why a fine-tuned local model made more sense than a larger prompted cloud model, how the rendering pipeline works, how the CLI and viewer fit together, and what needs to be true before text-to-CAD starts feeling like an actual editor instead of a demo.

The product shape

The target experience is not complicated.

c3d generate "make a small phone stand with a backrest and rounded base"

The system should generate a CadQuery script, execute it, convert the resulting geometry into an STL, and make it available in a local viewer. If the code fails, the system should capture the error and try again. If the object renders but is not quite right, the user should be able to reprompt or iterate.

Flattened into one diagram, the architecture looks like this:

             ┌───────────────────────────┐
             │  User prompt              │
             │  "make a phone stand..."  │
             └─────────────┬─────────────┘
                           │
                           ▼
             ┌───────────────────────────┐
             │  React Ink CLI            │
             │  c3d generate / list /    │
             │  render / viewer / config │
             └─────────────┬─────────────┘
                           │
                           ▼
             ┌───────────────────────────┐
             │  Ollama                   │
             │  C3Dv0 local inference    │
             │  Gemma 3n fine-tune       │
             └─────────────┬─────────────┘
                           │ CadQuery code
                           ▼
             ┌───────────────────────────┐
             │  FastAPI render server    │
             │  Python + CadQuery        │
             │  code in, STL out         │
             └─────────────┬─────────────┘
                           │
                           ▼
             ┌───────────────────────────┐
             │  React / Vite viewer      │
             │  browser-based 3D preview │
             │  iteration workflow       │
             └───────────────────────────┘

The CLI is the product entry point. The model is not hidden behind a web app; it is something the user runs locally. The viewer is still browser-based, but the actual CAD generation pipeline is local.

The core commands are deliberately small:

c3d generate <prompt>        generate a CAD model from text
c3d list                     list previously generated models
c3d render <python_script>   render an existing CadQuery script
c3d viewer                   open the browser viewer
c3d server start             start the local render/viewer server
c3d server stop              stop the local server
c3d server status            inspect server state
c3d config                   configure prompts, retries, and model behavior

The point is not to recreate a full CAD menu system in the terminal. The point is to make generation, rendering, and iteration feel like one local loop.

Why code-CAD

There are two broad ways to build a text-to-3D system.

One is to generate meshes directly. The model outputs vertices, implicit fields, NeRF-like representations, Gaussian splats, or some other shape representation. That approach can produce impressive visuals, especially for organic assets. But it is usually a poor fit for engineering geometry. CAD parts need constraints, flat faces, holes, booleans, dimensions, fillets, shells, symmetry, and editable construction history. A mesh that looks like a bracket is not the same thing as a bracket.

The other approach is to generate code that constructs geometry.

Natural language
   │
   ▼
CAD program
   │
   ▼
OpenCascade geometry
   │
   ▼
STL / STEP / preview mesh

The output is not just a shape; it is a recipe. A cylinder is a cylinder because the script made a cylinder. A hole is a hole because the script cut it. A bracket has dimensions because the code contains dimensions. The generated object may still be wrong, but it is wrong in a way that can be inspected, repaired, and regenerated.

That is why C3D uses code-CAD. The model's job is not to hallucinate geometry. The model's job is to write a small Python program.

The Replicad version

The first implementation used Replicad. The reason was mostly product taste. Replicad is JavaScript, and JavaScript means the CAD editor can live close to the browser. The dream version of the app looked like this:

Browser
 ├─ prompt box
 ├─ local model call or API call
 ├─ generated Replicad code
 ├─ OpenCascade running in the browser
 └─ rendered object

That architecture is clean. The browser is the runtime, the editor, and the renderer. No Python server, no local daemon, no split between frontend and backend.

The problem was not the architecture. The problem was the model.

Cloud models performed poorly on Replicad. They could infer the broad shape of the task, but they were unreliable on the actual API. The outputs had missing imports, wrong method names, invalid workplane chains, broken booleans, and geometry that did not match the prompt. Prompting helped, but not enough. Context-dumping the library examples helped, but not enough. The model needed familiarity with the CAD dialect, not just a few examples pasted above the request.

That failure changed the direction of the project. Instead of choosing the CAD library that made the frontend architecture prettiest, C3D moved to the CAD library that gave the model the best chance of succeeding.

Why CadQuery

CadQuery is also built on OpenCascade, but it is Python, more widely used, and much more natural for current language models. Python matters. The model does not have to learn both geometry and an obscure syntax at the same time. It can use a language it already understands and a CAD library with more available examples.

The shift from Replicad to CadQuery changed the stack:

Before:

   React frontend
      │
      ▼
   Replicad / OpenCascade in browser
      │
      ▼
   browser-rendered object


After:

   React frontend
      │
      ▼
   FastAPI backend
      │
      ▼
   CadQuery script execution
      │
      ▼
   STL output
      │
      ▼
   browser viewer

Moving CAD execution to a Python backend added a server tier, but the generated code became better. That was the right tradeoff. A fully browser-native CAD editor is only useful if the generated CAD actually works.

The model

C3Dv0 is a Gemma 3n fine-tune for CadQuery generation. The model was trained on roughly 48,000 Text-to-CadQuery examples, converted to GGUF, and published through Hugging Face and Ollama. The Ollama release passed 1,000 downloads.

The training setup was constrained by free-tier compute. It was not a full multi-epoch training run over the entire dataset, and it was not a giant custom CAD foundation model. It was a targeted fine-tune for one job: given a text description, emit CadQuery code likely to produce a renderable object.

That narrowness is the point.

A large general model can know more about the world and still be bad at a specific code-CAD dialect. A smaller model fine-tuned on the right distribution can be less generally intelligent and still better at the task that matters.

The generation boundary looks like this:

Prompt
  │
  ▼
Prompt template
  │
  ▼
C3Dv0 through Ollama
  │
  ▼
Generated CadQuery code
  │
  ▼
Execution check
  │
  ├── success ─► STL + viewer
  │
  └── failure ─► error-aware retry

The model is not asked to explain CAD theory. It is asked to produce executable Python.

Prompting as a protocol

The model behaved less like a chat assistant and more like a completion model. That changed the prompt design.

Four prompt families were tested:

1. Instructional prompt

   "You are a CAD agent. Generate CadQuery code for the user's request."

2. Instructional completion prompt

   "I am a CAD agent. I generate CadQuery code..."

3. Thinking prompt

   Force a response structure:
     <description>
     <reasoning>
     <code>

4. Thinking completion prompt

   Same structure as above, but written as if the model is completing
   the CAD agent's own internal format.

The fourth prompt worked best. The useful detail was not "chain of thought" in the abstract. The useful detail was giving the model a stable output contract. The system needed to know where the code started, where it ended, and what text could be ignored.

A text-to-CAD model has a different failure mode from a chatbot. Extra prose is not harmless. Markdown fences are not harmless. A half-explained idea followed by invalid Python is not harmless. The renderer needs code, and the parser needs to know which part of the model output is code.

So the prompt is less of a personality instruction and more of a wire format.

Model output contract:

   explanation / plan       optional
   generated CadQuery code  required
   render target            required

The config system lets the user switch prompt styles, because the right prompt is not universal. Some prompts produce cleaner code. Some produce more creative geometry. Some are better at simple primitives. Some recover better after an error message. C3D treats prompting as part of the runtime, not as a fixed hidden constant.

The rendering boundary

The most important boundary in the system is the one between generated text and executable geometry.

The model can say anything. CadQuery cannot. OpenCascade is the judge.

Generated code
    │
    ▼
Python execution
    │
    ├── syntax error
    ├── import error
    ├── CadQuery API error
    ├── invalid boolean
    ├── empty shape
    └── valid shape
              │
              ▼
            STL

The backend accepts CadQuery code and attempts to render it into an STL. That means the backend is not just a conversion service. It is the validation layer. A successful render is the first hard signal that the model produced something real.

The FastAPI server is intentionally small:

Input:
   CadQuery Python script

Process:
   execute script in controlled environment
   locate exported shape
   convert shape to STL

Output:
   STL file
   render metadata
   error payload on failure

The error payload matters. Without it, failure is a dead end. With it, failure becomes part of the generation loop.

Attempt 1:
   prompt → code → render
   result: AttributeError: Workplane has no method ...

Attempt 2:
   prompt + previous error → code → render
   result: invalid boolean operation

Attempt 3:
   prompt + previous error → code → render
   result: STL

The current retry loop is simple. It does not make the system autonomous in a deep sense. But it is enough to move from "the model failed" to "the system has another chance with concrete feedback."

The local stack

The local deployment has four moving pieces:

┌──────────────────────────────────────────────────────────────┐
│  React Ink CLI                                               │
│  entry point for generate, render, list, server, config       │
├──────────────────────────────────────────────────────────────┤
│  Ollama                                                      │
│  local inference engine running C3Dv0                        │
├──────────────────────────────────────────────────────────────┤
│  FastAPI                                                     │
│  render server: CadQuery code in, STL out                    │
├──────────────────────────────────────────────────────────────┤
│  React / Vite viewer                                         │
│  browser interface for previewing and iterating on models     │
└──────────────────────────────────────────────────────────────┘

Ollama handles model loading and inference. FastAPI handles Python execution and CadQuery rendering. React handles the visual viewer. React Ink gives the whole thing a clean CLI interface.

The CLI bundles the workflow:

c3d generate "make a hexagonal pencil holder with rounded holes"

Under the hood:

1. Ensure local server is running.
2. Send prompt to Ollama using the configured model and prompt template.
3. Extract CadQuery code from the model output.
4. Send code to the FastAPI render endpoint.
5. If render succeeds, save the script and STL.
6. If render fails, optionally retry with the error message.
7. Add the result to local history.
8. Open or refresh the viewer.

From the user's perspective, this is one command. From the system's perspective, it is a model call, a code extraction problem, a Python execution problem, a geometry conversion problem, and a local file-management problem.

The CLI as the product

A web UI would have been the obvious choice. But C3D started as a CLI because local AI tools often feel better when they behave like developer tools.

The terminal gives the product a few advantages.

First, it makes server lifecycle explicit. Local AI products tend to accumulate hidden daemons, background processes, ports, and model servers. C3D exposes that directly:

c3d server start
c3d server status
c3d server stop

Second, it makes iteration scriptable. A user can generate several variants, render an existing file, or list past outputs without clicking through a UI.

c3d generate "make a wall hook with two screw holes"
c3d generate "make the same hook thicker and with rounded edges"
c3d list
c3d render ./models/hook_v2.py

Third, it keeps the viewer from becoming the whole product. The viewer is for inspecting geometry. The CLI is for controlling the generation system.

The product shape is closer to a local developer tool than a SaaS app:

Terminal:
   commands, config, history, generation

Browser:
   render, inspect, iterate

Local filesystem:
   scripts, STLs, generated artifacts

Ollama:
   model runtime

That separation made the project easier to ship.

The viewer

The viewer's job is to close the loop between text and object.

Without the viewer, C3D is just a code generator that drops STL files into a directory. With the viewer, it starts to feel like an editor.

Prompt
  │
  ▼
Generated code
  │
  ▼
Rendered object
  │
  ▼
Visual inspection
  │
  ▼
Reprompt / iterate

This matters because text-to-CAD is almost never one-shot. Even when the model produces valid geometry, the result is usually only approximately right. It may have the wrong proportions. It may miss a hole. It may make a base too thin. It may produce a cylinder when the user wanted a rounded rectangular enclosure.

The viewer gives the user a place to decide what to do next.

A future version of the system should make the viewer part of the model loop. The ideal flow is:

1. User submits prompt.
2. Model generates CadQuery code.
3. Backend renders object.
4. Viewer captures screenshot.
5. Vision model compares screenshot against the original request.
6. If mismatch, model revises the code.
7. Repeat until either success or retry budget is exhausted.

That is the actual editor loop. Text-only generation is the first half. Visual self-evaluation is the second half.

Why fully local matters

The local constraint is not just aesthetic.

CAD is often private. A user may be designing a product enclosure, a robotics part, a manufacturing jig, a repair component, or an object that belongs to a larger unreleased project. Sending every prompt and generated part to a cloud model is not always acceptable.

Local also changes the interaction model. Cloud CAD agents usually have latency, account systems, rate limits, and sometimes opaque model behavior. A local model can be worse in raw intelligence but better as a tool because it is always available and fully under the user's control.

C3D runs through Ollama, so the model is just another local model.

Local machine
 ├─ C3Dv0 model weights
 ├─ Ollama runtime
 ├─ FastAPI render server
 ├─ generated CadQuery scripts
 ├─ generated STL files
 └─ browser viewer

The user owns the loop. The generated scripts live locally. The rendered models live locally. The model can be swapped, reconfigured, or fine-tuned again.

There is a cost. The model takes meaningful memory. On Apple Silicon, running with a large context can use around 10GB of RAM. But that is an acceptable tradeoff for the class of users who already install CAD tools, slicers, local dev environments, and model runtimes.

Why fine-tuning beat context dumping

Before C3Dv0, the obvious approach was to use a stronger cloud model and give it examples. That should have worked in theory. CAD code is code. Modern models write code. A few examples and library docs should be enough.

In practice, it was not enough.

The issue is that code-CAD has a tight correctness surface. A small API mistake can make the whole output unusable. A wrong method name is not a stylistic issue. A bad boolean operation is not a minor hallucination. The renderer either produces a shape or it does not.

The difference between a general model and a fine-tuned model showed up in boring places:

imports
workplane construction
primitive selection
boolean operations
export conventions
object naming
common CadQuery idioms

A fine-tune helps because these are distributional habits. The model does not need to rediscover the CadQuery style on every prompt. It has seen thousands of examples of the task shape.

The lesson was narrower than "fine-tunes are better." The lesson was:

For specialized code generation,
where execution correctness matters,
and the target library is not heavily represented in general pretraining,
a small fine-tune can beat a larger model prompted at inference time.

That is the core bet behind C3D.

The failure modes

C3D has three different kinds of failure.

The first is code failure.

SyntaxError
NameError
AttributeError
missing import
invalid CadQuery method
bad export path

These are the easiest failures. The backend can catch them, pass the error back into the model, and retry.

The second is geometry failure.

empty shape
failed boolean
non-manifold result
object too small
object too large
wrong orientation
unusable STL

These are harder. The code may run, but the object may not be useful. Some of this can be caught with heuristics: bounding boxes, volume checks, face counts, export success, or shape validity checks. But many geometry failures need visual inspection.

The third is semantic failure.

User asked for a phone stand.
Model made a rectangular block.

User asked for a gear.
Model made a cylinder with holes.

User asked for a clamp.
Model made a bracket.

This is the hardest failure because the system needs to understand the relationship between the rendered object and the original request. A text-only model cannot reliably do that after the object has been rendered. This is why the roadmap points toward multimodal generation and visual feedback.

The current system handles the first category best, some of the second, and only indirectly handles the third through user iteration.

The editor loop

A one-shot generator is not enough. CAD is iterative even when humans do it manually. The user usually does not know the exact object they want until they see a version of it.

The C3D editor loop is:

Generate
  │
  ▼
Render
  │
  ▼
Inspect
  │
  ▼
Reprompt
  │
  ▼
Regenerate

That loop matters more than any single model output. A mediocre first generation can still be useful if the system makes revision cheap. A good generation can still be useless if the user cannot inspect or modify it.

The CLI and viewer are designed around that loop. c3d list lets the user find previous models. c3d render lets them rerender edited scripts. The viewer lets them inspect the result. Configurable prompts and retries let them change how generation behaves.

In the ideal version, C3D becomes less like this:

prompt → object

and more like this:

prompt → object → critique → edit → object → critique → edit

The first version shipped the prompt-to-object path. The next versions should make critique and edit much stronger.

The model artifacts

The model exists in multiple forms because local AI distribution is fragmented.

C3Dv0 transformers
   │
   ├─ useful for research and further fine-tuning
   └─ published on Hugging Face

C3Dv0 GGUF
   │
   ├─ useful for llama.cpp-style runtimes
   └─ published on Hugging Face

C3Dv0 Ollama
   │
   ├─ useful for local installation
   └─ published through Ollama

Ollama is the easiest path for users. Hugging Face is the easiest path for researchers and people who want to inspect or adapt the model. GGUF is the practical bridge between the two worlds.

The product depends on the Ollama path because the goal is not just to publish a model. The goal is to make the model usable inside a local application.

ollama run joshuaokolo/C3Dv0

A model card by itself is not a product. A CLI that knows how to call the model, render its output, and show the result is much closer.

The backend

The backend is intentionally not smart. It does not try to be the agent. It is the execution and rendering boundary.

FastAPI server:

   POST /render
      input: CadQuery script
      output: STL + metadata or error

   GET /models
      output: generated model history

   GET /viewer
      output: browser UI

   server lifecycle:
      controlled by c3d CLI

The backend has one critical responsibility: do not pretend that generated code is valid until it has actually run.

That sounds obvious, but it changes the product. Many AI code generation demos stop at the code block. C3D cannot stop there. The code block is not the artifact. The rendered model is the artifact.

Code that looks right is not success.
Code that executes is closer.
Geometry that renders is the first real success.
Geometry that matches the prompt is the actual goal.

The backend enforces the first hard transition: from text to geometry.

The roadmap

The roadmap has three main pieces.

The first is finishing the text-to-CAD fine-tuning path. The current model is promising, but limited by training resources. More complete training, better data filtering, stronger evaluation sets, and more examples of useful mechanical parts would all improve the model.

The second is better prompt and retry infrastructure. The retry loop should become more systematic:

1. Generate code.
2. Execute code.
3. If code fails, classify error.
4. Repair using error-specific prompt.
5. If geometry succeeds, run basic shape checks.
6. If shape checks fail, retry with structured feedback.

The third is multimodal feedback. This is the largest missing piece.

The intended loop is:

User prompt
   │
   ▼
Generated CadQuery
   │
   ▼
Rendered object
   │
   ▼
Screenshot
   │
   ▼
Vision model critique
   │
   ├── matches request ─► done
   └── mismatch ────────► regenerate with critique

There is also a second multimodal path:

Input image
   │
   ▼
Model describes target geometry
   │
   ▼
CadQuery generation
   │
   ▼
Rendered approximation

That path would make C3D closer to image-to-CAD. The blocker is not conceptual. The blocker is runtime support. The local inference stack needs multimodal support that is good enough, accessible enough, and compatible with the model path. Once Ollama or llama.cpp-style runtimes make that practical for the relevant model family, the C3D architecture can absorb it.

The end-to-end picture

The whole system is easier to understand if you flatten it into one flow.

                 ┌─────────────────────────────┐
                 │  User                       │
                 │  writes text prompt         │
                 └──────────────┬──────────────┘
                                │
                                ▼
                 ┌─────────────────────────────┐
                 │  C3D CLI                    │
                 │  React Ink                  │
                 │                             │
                 │  · reads config             │
                 │  · starts server if needed  │
                 │  · formats prompt           │
                 │  · manages retries          │
                 └──────────────┬──────────────┘
                                │
                                ▼
                 ┌─────────────────────────────┐
                 │  Ollama                     │
                 │  C3Dv0 Gemma 3n fine-tune   │
                 │                             │
                 │  prompt in                  │
                 │  CadQuery code out          │
                 └──────────────┬──────────────┘
                                │
                                ▼
                 ┌─────────────────────────────┐
                 │  Code extraction layer      │
                 │                             │
                 │  · strip prose              │
                 │  · recover code block       │
                 │  · normalize output         │
                 └──────────────┬──────────────┘
                                │
                                ▼
                 ┌─────────────────────────────┐
                 │  FastAPI render backend     │
                 │  Python + CadQuery          │
                 │                             │
                 │  · execute script           │
                 │  · catch errors             │
                 │  · export STL               │
                 └──────────────┬──────────────┘
                                │
             ┌──────────────────┴──────────────────┐
             │                                     │
             ▼                                     ▼
┌─────────────────────────────┐       ┌─────────────────────────────┐
│  Failure path               │       │  Success path               │
│                             │       │                             │
│  · syntax error             │       │  · save CadQuery script     │
│  · API error                │       │  · save STL                 │
│  · invalid geometry         │       │  · update model history     │
│  · retry with error         │       │  · open viewer              │
└─────────────────────────────┘       └──────────────┬──────────────┘
                                                     │
                                                     ▼
                                      ┌─────────────────────────────┐
                                      │  Browser viewer             │
                                      │  React / Vite               │
                                      │                             │
                                      │  · inspect object           │
                                      │  · iterate from prompt      │
                                      │  · rerender edited scripts  │
                                      └─────────────────────────────┘

Reading top to bottom: the user gives C3D a prompt; the CLI formats the request and calls the local model through Ollama; C3Dv0 emits CadQuery code; the backend executes the code and attempts to export an STL; failures are turned into retry context; successes are saved and opened in the browser viewer. The whole loop runs locally.

The important part is not any one component. The important part is the boundary between them. The model generates code. The backend proves whether the code is executable. The viewer proves whether the geometry is useful. The CLI ties those steps into something that feels like a tool rather than a notebook.

C3D is not finished CAD automation. It is an early version of the interface that CAD software is probably moving toward: less menu navigation, more prompt-driven construction, more code-backed geometry, and more local control. The system around the model matters: the execution boundary, the rendering, the iteration loop. But the hard part is the model itself. Getting it to reliably produce not just code-shaped text, but valid, executable CadQuery that renders the object the user actually asked for, is the real bottleneck. That is why the project moved from prompting a large general model to fine-tuning a small specialized one.