The Invisible Assembly Line: How Modern AI Development Actually Works
Dan Crane
June 2, 2026
When people see an AI-powered product and ask how it was built, they usually imagine someone typing prompts into ChatGPT. The reality of anything non-trivial looks almost nothing like that.
What actually happens, when you build thoughtfully rather than just experimentally, is closer to a production line. A series of specialist tools, each handling the step it's genuinely best at, passing output forward to the next station. The user sees the finished product. The assembly line that produced it is invisible.
This matters because the single-model approach, reaching for one tool and trying to make it do everything, is one of the primary reasons AI-assisted development produces mediocre results. Different models have genuinely different strengths. Different tools are designed for different jobs. Chaining them correctly produces outcomes that none of them could produce alone.
Here's what that actually looks like in practice, using a real app ideation and build workflow as the example.
Stage One: Ideation and Research
The starting point for any new application is messy. You have a problem you want to solve, some rough intuitions about how to solve it, and a lot of unanswered questions about whether your approach makes sense, what's already been tried, and what the real constraints are.
This is where a capable reasoning model earns its place, and this is where I consistently reach for Claude Opus. Not because it's the only model that can handle this conversation, but because the quality of its thinking at the ideation stage is genuinely different from what you get from a faster, cheaper model. It pushes back on weak assumptions. It surfaces constraints you hadn't considered. It holds the complexity of a multi-dimensional problem without collapsing it into a simple answer.
The workflow at this stage is conversational but structured. I'll typically start with the problem statement and let the model challenge it: what am I actually solving for, who has this problem, what does success look like, what are the failure modes. Then move into research grounding: what's been built in this space, what are the technical approaches, what are the architectural decisions I'll need to make. Opus is good at synthesising across a broad knowledge base in a way that surfaces non-obvious connections.
The output of this stage is not code or a design. It's a clear, tested mental model of what you're building and why, with the major assumptions articulated and the key decisions identified. This stage is usually underinvested, and the cost of that underinvestment shows up later when you're rebuilding things that weren't thought through properly at the start.
Stage Two: Visualising and Iterating on the Design
There's a significant gap between having a mental model of an application and being able to see it. Bridging that gap quickly, before committing to any code, is where the visual design and prototyping stage comes in.
This is where Google Stitch and Claude's visual understanding capabilities come into their own as a combination. Stitch allows you to rapidly generate UI mockups and iterate on them through natural language: describe what you want, see it rendered, refine it. Claude's ability to engage with images means you can feed those mockups back into a conversation and discuss them: what works, what doesn't, what a user is likely to misunderstand, how the information hierarchy should change.
The iteration loop here is fast. Considerably faster than designing in a traditional tool, and considerably more exploratory than jumping straight to code. You're not committing pixels to a final design. You're developing a shared language between your mental model and the actual interface, catching misalignments before they become expensive to fix.
The output of this stage is a clear enough visual reference that you can build against it with confidence. Not pixel-perfect, not production-ready. Good enough that the next stage isn't spent making design decisions that should have been made here.
Stage Three: Building on the Right Foundation
With a clear design reference and a tested mental model of what you're building, the build stage is considerably more focused than it would otherwise be. This is where Codex enters the pipeline.
Codex, OpenAI's code-generation model, is genuinely capable at translating a well-specified design and set of requirements into working code. The key word is "well-specified." The quality of what Codex produces is directly proportional to the quality of the context you give it. A vague brief produces vague code. A precise brief, informed by the ideation and design work done in earlier stages, produces something you can actually use.
My build target for new applications defaults to the Cloudflare developer stack, as I wrote about in a previous post. Workers for the application logic, D1 for the database, R2 for file storage, Pages for the frontend. Codex understands this stack well enough to produce scaffolding that doesn't require significant rework to run in the edge environment.
The workflow here is iterative: generate, test, identify what doesn't work, feed the failure back with context, regenerate. Codex is good at this loop when you give it enough information to understand why something failed. It's less good when you just tell it something is broken without explaining what the expected behaviour was. The quality of your debugging prompts matters as much as the quality of your initial build prompts.
What you end up with at the end of this stage is a working application in a production-capable environment, not a local demo that collapses under real conditions.
Stage Four: Visual Assets and Video
A working application still needs to look like something worth using. This is where the pipeline extends into image and video generation, and where kie.ai has become a genuinely useful part of the toolkit.
Kie.ai's value is that it gives you API access to a broad range of specialist visual models through a single interface. For image generation, that includes GPT Image 2 for photorealistic outputs and product-quality visuals. For video, it provides access to ByteDance's Seedance models, which handle the specific challenge of consistent, cinematic video generation from text and image inputs.
The Seedance workflow is particularly interesting for anyone building marketing assets or product demonstrations. You start with a static image or a visual reference, provide a text prompt describing the motion and context you want, and get back a high-quality video clip with stable multi-shot consistency. For product demos, explainer content, or social assets, this replaces what would previously have required a videographer, a motion designer, or a significant production budget.
The practical workflow I use is to generate core brand or product imagery first, then feed those images into Seedance to produce motion versions. The consistency between the static and video outputs is strong enough that they read as a coherent visual language rather than outputs from disconnected tools. At 1080p and with native audio support in the newer models, the output quality is well above what most people expect from a generated asset.
Why This Approach Produces Better Outcomes
The reason this assembly line produces better results than reaching for one tool and hoping isn't abstract. Each model in the chain is doing the thing it was designed or optimised for. Opus is doing deep reasoning. Stitch is doing rapid visual iteration. Codex is doing code generation with good context. Seedance is doing high-quality video synthesis.
Asking any one of these to do the others' jobs produces worse results, not because the models are bad, but because they're not optimised for those tasks. A model that's good at reasoning through ambiguous problems is not necessarily the best choice for generating consistent visual assets. A code generation model is not the best choice for the early-stage strategic thinking that determines whether you're building the right thing.
There's also a cost dimension that's worth being direct about. Routing the expensive, capable reasoning model to the steps that genuinely require deep reasoning, and using cheaper, faster models for the steps that don't, is not just sensible architecture. It's what makes the economics of building this way sustainable. Burning Opus-level compute on tasks that a lighter model handles perfectly well is a habit that adds up quickly.
The Coordination Layer
One thing this description glosses over is the human in the middle of all these stages. The assembly line doesn't run itself. Someone has to move output from one stage to the next, evaluate whether it's good enough to proceed, and make the calls about when to iterate within a stage versus move on.
That's the work that doesn't get automated, and it's also the work where experience matters most. Knowing when the ideation stage has produced a clear enough model to move to design. Knowing when the design is specific enough to build against. Knowing when Codex has produced something worth building on versus something that needs to be scrapped and restarted. Those judgment calls are not visible in the output, but they're what determines whether the output is any good.
The AI shepherd concept I wrote about recently is relevant here: the value isn't in the individual tools. It's in the person who understands how they fit together and has the judgment to know when each one has done its job.
If you're building an AI product and want to think through the toolchain and architecture, I'm happy to compare notes. The specific tools are less important than getting the logic of the pipeline right.

