The why

Is anyone even reading blogs anymore?

Genuine question (please share your thoughts in the comment!).

We’re back, apparently. Not with another “AI will change everything” post, because we’ve had enough of those to power a small, inefficient data center. In fact, we’re back because the last few years of AI development have been… a lot.

A lot of hype.
A lot of actual progress.
A lot of demos that looked like magic.
A lot of products that looked like magic until you gave them a real production workload, a vague business requirement, and a user who types like they are fighting the keyboard.

And honestly, it made me question wether we should even continue this blog.

Because what do you even write about when every week there is new model, a new benchmark, a new agent framework, a new “this changes software forever” thread. Far worse than JavaScript-based-frameworks era at it’s finest, really. Then two days later everyone quietly discovers it still needs authentication, logging, retries, cost controls, permissions, test data, deployment strategy, and someone to explain to finance why inference is suddenly a line item?

So maybe this is exactly the right time to write.

Not to add more noise, but to make sense of some of it.

This post is the beginning of a series about AI from a software engineering point of view: not the hype version, not the doom version, and definitely not the LinkedIn carousel version.

The engineering version

And before we talk about tools, agents, RAG, coding assistants, model hosting, vector databases, or whether your backlog is now “agent-ready” (whatever that means) we should probably define what we mean by “AI” in 2026.

Because right now, people use the word “AI” to describe at least six different things:

  • a model,
  • a chatbot,
  • a search experience,
  • an automation workflow,
  • a product feature,
  • and occasionally, a PowerPoint slide with a gradient background.

That confusion matters.

Stanford’s 2026 AI Index gives a useful “why now” context. It reports that generative AI reached around 53% population-level adoption within three years, faster than earlier technologies like the PC or the internet. In other words, this is no longer just a research topic or a demo day toy. It is in products, workflows, budgets, roadmaps, and probably somewhere in your company’s strategy document, looking suspiciously expensive.

So, what is AI in 2026?

From a developer’s point of view, “AI” is not one thing. It is a stack.

The model

The model is the part everyone talks about.

GPT, Claude, Gemini, Llama, Mistral, DeepSeek, and whatever gets announced before I finish this sentence.

A large language model is not a brain. It is not a database. It is not a tiny person living in a Kubernetes pod waiting to help you refactor your DTOs.

At its core, a language model estimates likely sequences of tokens. Tokens can be words, parts of words, characters, or other encoded chunks of text. [2] That sounds underwhelming, but scale makes it useful. Train a large enough model on enough data, with enough compute, and it can learn patterns in language, code, documentation, APIs, formats, and human instructions.

This is why it can write an email, explain a stack trace, generate a React component, summarize a document, or help think through a system design.

But it is still not “thinking” in the human sense.

It is generating statistically plausible output based on patterns learned during training and the context provided at runtime. That output can be useful, impressive, wrong, confident, or all of those in the same response.

A model is a powerful probabilistic engine.

Not an oracle.

The data

The model is shaped by data.

Some of that data was used during training. Some may be added later through fine-tuning. Some may be provided directly in the prompt. And in many real products, some is retrieved at runtime from external systems.

This is where a lot of misunderstandings begin.

When someone says “the AI knows this”, they may mean:

  1. it appeared somewhere in training data,
  2. it was included in the prompt,
  3. or the application retrieved it from a database, search index, document store, or API.

Those are very different things.

This is why Retrieval-Augmented Generation, or RAG, became such a common pattern. RAG connects a model to external information so the answer can be grounded in more current or domain-specific content. [3]

In simpler terms: the model is not your database.

Your database is your database.

The model is the language interface around it. Sometimes it is brilliant. Sometimes it is drunk autocomplete with excellent posture and suspeciously correct grammar.

Training and inference

Training is where the model learns patterns. For frontier models, this usually happens at massive scale and is not something most product teams do themselves.

Most companies are not training their own GPT from scratch. They are using existing models through APIs, managed platforms, or open-source deployments. They may fine-tune smaller models, adjust prompts, add examples, build retrieval systems, or wrap the model in business logic.

That distinction matters.

When a company says “we trained an AI”, it might mean they trained a model from scratch. More often, it means they configured, fine-tuned, prompted, evaluated, or integrated an existing model.

And then there is inference.

Inference is what happens when you actually use the model. You send input, and the model generates output. In machine learning, inference generally means using a trained model to make predictions on new data. [4] With LLMs, that usually means predicting the next token, then the next one, and so on until the answer is complete.

This is also where production reality enters the chat.

Latency matters.
Cost matters.
Context size matters.
Rate limits matter.
Caching matters.
Retries matter.
Observability matters.
Data privacy definitely matters.

A demo can call a model and print the response.

A product needs to handle timeouts, malformed output, prompt injection attempts, user permissions, audit logs, fallback behavior, and the customer who uploads a 400-page scanned PDF and asks for “a quick summary”.

Inference is not just “call the model”.

Inference is runtime architecture.

The product wrapper

Most users do not interact with a model directly.

They interact with a product wrapper.

That wrapper might be a chat window, a search bar, an IDE plugin, a support assistant, a workflow builder, or a button that says “Generate summary”.

This layer matters more than people think.

A good AI product does not just expose a text box and hope for the best. It gives the model the right context, limits the task, handles permissions, validates output, explains uncertainty, and fits into the user’s actual workflow.

For a .NET and React team, this is familiar territory.

You still need APIs, identity, authorization, front-end state, queues, jobs, storage, telemetry, deployment pipelines, feature flags, and tests.

AI does not remove software engineering.

It adds a very powerful, very non-deterministic dependency to your software engineering.

Congratulations. The distributed system has opinions now.

Safety layer

Then there is the safety layer.

This is the part many teams underestimate because it does not look exciting in a demo.

Safety is not only about preventing obviously bad outputs. It includes security, privacy, abuse prevention, evaluation, monitoring, access control, and deciding what the system should refuse to do.

NIST’s AI Risk Management Framework describes AI risk management around governance, measurement, and management of AI risks across design, development, deployment, and use. [5]

That sounds enterprise-y, because it is.

But the practical point is simple: if your AI feature can read customer data, summarize sensitive information, write code, make recommendations, or trigger workflows, you need more than a prompt saying “be safe”.

You need boundaries.

You need logs.

You need evaluation.

You need someone to ask: “What happens when this fails?”

Because it will fail.

Not always dramatically. Sometimes it will just be slightly wrong in a very confident way, which is worse.

So what does AI mean now?

In 2026, AI is less about one magical model and more about systems built around models.

The model matters, of course. Stanford’s 2026 AI Index reports rapid capability improvements across areas like coding and agentic computer-use tasks, and notes that industry produced over 90% of notable frontier models in 2025. [6]

But the model is only one layer.

The real product is the full stack:

  • model,
  • data,
  • training or adaptation,
  • inference,
  • product experience,
  • safety

That is where the interesting engineering work is.

Not “can we call an API?”

Everyone can call an API.

The question is: can we build something reliable, useful, secure, understandable, maintainable, and worth the cost?

That is the version of AI I want this series to explore.

AI in 2026 is not magic.

It is models, data, infrastructure, product design, and safety wrapped together into systems.

And if we want to build with it responsibly, we need to understand the stack before we start decorating everything with sparkle icons.

So is the article from 1,5 years ago “What it takes to be a full stack developer” still actual?

Welcome back.

Apparently, we’re doing this.

Sources

[2] The Stanford report describes itself as tracking AI across research and development, technical performance, responsible AI, economy, science, medicine, education, policy/governance, and public opinion.

[3] Google’s Machine Learning Crash Course describes language models as estimating the probability of tokens or token sequences, where tokens may be words, subwords, or characters. IBM similarly describes LLMs as statistical prediction systems that learn patterns and predict language sequences.

[4] Microsoft Azure describes Retrieval-Augmented Generation as a pattern where an LLM is combined with external data retrieval to ground or enhance generated responses.

[5] AWS documentation describes machine learning predictions/inference as using a model to generate predictions, including real-time predictions for interactive applications.

[6] NIST’s AI Risk Management Framework is intended to help organizations incorporate trustworthiness considerations into the design, development, use, and evaluation of AI systems.

Fun fact: Stanford’s 2026 AI Index top takeaways report that industry produced over 90% of notable frontier models in 2025, while capability benchmarks for coding and agentic computer-use tasks rose significantly