AI Feels Like Magic Until You Understand It. Then It Feels Like Engineering.

You don’t need to know how to build an LLM. But knowing why it just confidently made something up would save a lot of people a lot of time.

I’ve been working with AI tools daily for a couple of years now — across different codebases, different projects, different levels of complexity. And the single biggest shift in how useful these tools became wasn’t a model upgrade or a new feature. It was understanding what’s actually happening under the hood.

They Don’t Know Things. They Predict Things.

LLMs generate text based on statistical probability. That’s it. They’re not retrieving facts from a database. They’re not reasoning through a problem the way you or I would. They’re predicting the most likely next token based on patterns in their training data.

This sounds like a technicality, but it changes everything about how you should work with them.

When you understand this, you stop expecting them to “know” things and start giving them better context to work with. You stop asking “why did it get this wrong?” and start asking “what did I give it that led here?” The responsibility shifts — from the tool to the person using it.

It also explains hallucinations. The model isn’t lying. It’s not broken. It’s doing exactly what it’s designed to do: produce the most statistically probable continuation of your input. Sometimes that continuation is accurate. Sometimes it’s completely fabricated but sounds perfectly confident. The model doesn’t know the difference. You have to.

Why Your Codebase Is the Most Important Part of Your AI Prompt

Here’s something that most conversations about “prompt engineering” miss entirely.

When you’re working with AI tools in your IDE — Copilot, Cursor, Claude, whatever your setup is — the model isn’t just reading your prompt. It’s reading your codebase. Your file structure, your naming conventions, your architectural patterns, your tests, your documentation. All of that is context. All of that is, effectively, part of the prompt.

And this is where things get interesting.

When the code is well-documented, tested, and follows clear architectural patterns — AI is a great colleague. It reads your conventions, fits right in, writes tests that actually make sense. It generates code that looks like it belongs in your project.

When the code is a jungle — no documentation, inconsistent patterns, no tests — AI becomes the fastest machete-wielding maniac you’ve ever seen. Cutting through everything, including the stuff you needed.

Same tool. Wildly different results. The difference isn’t the model. It’s the soil it’s planted in.

The single biggest shift in how useful AI tools became wasn't a model upgrade. It was understanding what's actually happening under the hood.

We spend a lot of time talking about structured prompts and detailed agent markdown files, and those matter. But there’s too little attention paid to the largest portion of the prompt — the codebase itself. Good documentation, clear architectural patterns, solid test coverage — for years these were “the boring stuff.” The things you’d get to after the sprint. After the launch. After the next feature.

Turns out they were always the foundation. AI just made that impossible to ignore.

Context Is Your Responsibility

LLMs don’t remember your previous conversations. They don’t carry knowledge from one chat to the next. Every context window starts from zero.

What they do is compact what they think is relevant into every response within a single conversation. There’s a limit to how much an LLM can hold at once — the context window. When it starts “forgetting” things mid-conversation, it’s not broken. It’s just full.

This means context management is your job, not the model’s.

But “more context” isn’t automatically better. Quality matters more than quantity. If you dump everything into the context window — every file, every requirement, every edge case — you’re not helping the model. You’re diluting the signal. The model has to figure out what’s relevant in a sea of information, and it doesn’t always get that right.

The skill is providing context that is precise and relevant. An outline of the project structure. The specific architectural decisions that matter for this task. The conventions you want followed. Not everything — just what the model needs to do this particular job well.

Think of it like briefing a new contractor. You wouldn’t hand them the entire company wiki and say “figure it out.” You’d tell them what the project is, what standards to follow, what’s already been decided, and what you need from them specifically. Same principle applies.

Constraints Prevent Chaos

Here’s something I’ve learned the hard way: without clear constraints, AI will confidently go out of bounds.

Not maliciously. Not because the model is bad. But because you didn’t tell it where the boundaries are, and its job is to generate the most helpful-seeming response it can. Without guardrails, “helpful” can mean rewriting your architecture, introducing dependencies you didn’t ask for, or solving a problem you don’t have.

Constraints are the engineering discipline of working with AI. Scope the task. Define the format. Specify what not to do. The tighter and clearer the boundaries, the better the output.

This isn’t about limiting the tool. It’s about directing it. A well-constrained prompt with clear context will outperform a vague prompt with a more powerful model almost every time. The quality of the input determines the quality of the output — same as any other engineering tool.

Verify in a Separate Context

One practice that’s made a real difference in my workflow: don’t let the model check its own work.

When you’ve been going back and forth with an AI in a conversation — building something, iterating, refining — the model has accumulated context about what you’re trying to do. That’s useful for building. But it’s terrible for reviewing. The model is biased by its own conversation history. It will defend its previous decisions. It will see patterns that aren’t there because it generated them.

Open a fresh context window. Paste the output. Ask it to review, critique, find problems. A clean context gives you a genuinely fresh perspective — something closer to an actual code review than asking the author to review their own pull request.

This is simple, costs almost nothing in terms of time, and catches things that hours of same-context iteration would miss.

The Cost Dimension

Understanding token limits and context scope also directly impacts costs. This matters right now, and it will matter a lot more soon.

A well-structured prompt with the right context is cheaper than hoping for the best. You use fewer tokens, you get to the answer faster, you need fewer iterations. At scale — whether that’s a team of developers or automated pipelines — the difference between thoughtful and lazy prompting is a real line item.

And if token pricing goes up (and my bet is that it will, significantly), the gap between teams that understand their tools and teams that don’t will show up directly on the invoice.

This is true of smaller models too. Understanding context windows explains why smaller, cheaper, or self-hosted models with shorter context windows can still be incredibly useful — if you know how to work within their limits. You don’t always need the biggest, most expensive model. You need the right model with the right context.

A well-constrained prompt with clear context will outperform a vague prompt with a more powerful model almost every time.

Engineering, Not Magic

We don’t expect miracles from any other engineering tool. We read the docs, understand the limitations, and work within them. LLMs deserve the same respect.

AI is a brilliant application of statistical models with some very clever engineering. But it’s a tool with limitations, good applications, and bad applications.

The people getting real value from it are the ones who treat it that way.