Infrastructure8 min read

Going from Systems of Record to Systems of Intelligence

bysanjay

Your Database Knows Everything. Your Product Knows Nothing.

Acme Corp has a bunch of PostgreSQL databases strewn across multiple organization silos. Users, Orders, Invoices, Support, Product Catalog, Shipping Records, CRM, etc. Every byte of data, faithfully recorded.

And yet, when a customer opens the Acme support chat and asks "Where's my order?" -- the "system" struggles to find the answer. The system has the answer. It just can't think.

This is the gap between a System of Record and a System of Intelligence.

Every enterprise built in the last 30 years is a System of Record. Salesforce. SAP. Your internal tooling. They store data with extraordinary fidelity. They are the corporate equivalent of a filing cabinet with perfect alphabetical order and zero intuition.


The Three Memory Problem

Here's what changed how I think about building software & products with AI: your systems should model themselves after how human memory actually works.

When a support agent at Acme picks up a customer call, they don't load the customer's entire history into their brain (although I have seen systems attempt to do exactly this -- dumping every record into a context window and praying the model figures it out). The human brain operates on three layers, instinctively:

Working Memory -- Who is this person? What's the immediate context? What did they just say? This is the RAM. Small, fast, always present. You load it for every single interaction.

Episodic Memory -- What happened in this specific interaction? What have they ordered recently? What was their last support ticket about? This is your recent files. Indexed and fast, but you don't load it until you need it.

Semantic Memory -- General knowledge. Company policies. Product specifications. Things not tied to this customer but needed to help them. This is the knowledge base. Slower to search, but deep.

This is how MemGPT and the broader industry have converged on designing AI systems. The research is clear: intelligent systems need layered context, not flat databases.

The reason your current product feels dumb despite having all the data? It treats every query as a fresh lookup against a flat store. No context. No continuity. No progressive understanding. Every message is a cold start.


What "Intelligence" Actually Looks Like

Let me be concrete. At Acme Corp, "intelligence" isn't a chatbot that answers FAQs. I cannot stress this enough -- if your AI strategy is "bolt a chatbot onto our knowledge base," you are building a slightly fancier search bar.

A System of Intelligence does something fundamentally different.

A customer messages: "Can I return the thing I bought last week?"

A System of Record would require the customer to find their order number, navigate to a returns page, fill out a form, and wait for an email. We've all been through this. It's miserable.

A System of Intelligence resolves this in under 3 seconds. Here's what actually happens under the hood:

First, it loads working memory. Who is this customer? What's their account tier? What were the last few messages? This is pre-loaded, cached, instant. Then it classifies the query -- this is an action request about returns. The phrase "the thing I bought last week" gets resolved to an actual date range, mapped to a specific order. No ambiguity, no asking the customer to go hunt for an order number.

Now it knows what to fetch. It pulls the order details, the return policy for that product category, and the customer's return history -- in parallel. Order #AC-7823 (a standing desk, delivered Jan 31). Return policy says 30-day window, free returns for Premium members. Customer has zero prior returns. All of this comes back in under a second.

Then it assembles a prompt. And here's where it gets interesting -- you can't just dump all of this into the context window randomly. LLMs have a well-documented attention problem called "Lost in the Middle." They focus on the beginning and end of their context and lose the plot in the middle. So you structure it: critical context at the top, retrieved details in the middle, customer's actual message at the end. The stuff the model needs to pay attention to sits where it actually pays attention.

The response: "Your standing desk from Jan 31 is eligible for free return since you're a Premium member. Want me to start the process? I can schedule a pickup for this week."

But we're not done. The system evaluates its own confidence before sending anything. Did retrieval return relevant data? Does the response reference specific details from that data? Is there hedging language like "I think" or "probably"? This one scores high -- send it directly. If it scored low? Route to a human. More on that in a minute.

Each of those steps is a distinct component with a single responsibility. Classification doesn't fetch data. Retrieval doesn't generate text. The confidence evaluator doesn't deliver messages. This isn't pedantic software architecture -- this is how you build something you can actually debug, tune, and trust in production.


Stop Loading Everything

The most expensive mistake I see teams make when adding AI to their product: loading everything into the context window.

"Let's just put all the customer data in the prompt." This is the AI equivalent of SELECT * FROM everything. It's slow, expensive, noisy, and -- counterintuitively -- makes the AI worse. Remember Lost in the Middle? More context is not better context. More context is more noise for the model to wade through.

The system should load context progressively. Working memory is always there (~800 tokens). Episodic memory gets fetched on demand -- when the customer asks about their order, then load order details. When they ask about returns, then load return policy. Semantic memory (general policies, product info) gets searched only when the first two layers don't have the answer.

I keep the total input under ~4,000 tokens. That sounds small when models support 200K context windows, but that's the point. You want the model focused, not drowning in data. The customer's entire order history, their full account profile, every support ticket they've ever filed -- that data exists in the database. It doesn't belong in every prompt.

And you enforce a truncation priority. If you're over budget, the first thing to go is semantic (general knowledge). Then episodic (older, less relevant details). You never truncate: the system prompt, pinned safety notes ("this customer has an active dispute"), or the current message. The hierarchy is non-negotiable.


The Confidence Problem

Here's where most AI implementations fall apart. They either trust the AI completely (dangerous) or don't trust it at all (pointless). Both are wrong.

Every response needs a confidence evaluation. Not just "did the model say it was confident" -- models are always confident, that's the whole problem. You need external signals.

Did retrieval actually return relevant data? Does the response reference specifics from that data, or is it vague? Is there hedging language? Is the topic something where being wrong costs the customer money?

The routing logic is topic-aware and this matters a lot. An order status question needs maybe 0.75 confidence for the AI to handle directly. A billing dispute? 0.90. Complaints? Always escalate, no exceptions. The threshold reflects the cost of being wrong. Getting an order status slightly off wastes a customer's time. Getting a billing answer wrong erodes trust and potentially costs real money.

The magic is in the middle tier -- AI responds but gets flagged for human review. The customer gets an immediate answer. A human reviews it asynchronously. If the AI was wrong (which will happen), the human follows up. If it was right (which it will be most of the time), the human moves on. You get AI speed with human oversight. This is the tier most teams skip, and it's the one that makes the whole system viable in production.


Conversations Have Memory. Your Product Should Too.

A support conversation at Acme might span 50 messages over three days. Loading all 50 messages into every prompt is wasteful and eventually impossible as conversations grow.

The solution is rolling checkpoints. Every 20 messages or so (or when the topic changes, or when a task completes), the system summarizes the conversation into a structured snapshot:

[CHECKPOINT #3 | Messages 45-67 | 2026-02-04]

RESOLVED:
- Return initiated for standing desk (order #AC-7823)
- Pickup scheduled for Feb 6
- Refund of $849 will process in 3-5 business days

PENDING:
- Customer asked about exchange for sit-stand converter
- Waiting on warehouse availability check

KEY FACTS:
- Premium member since 2024
- Home office setup (mentioned multiple items)
- Prefers email confirmations over SMS

Now the system loads: checkpoint summary (~300 tokens) + messages after the checkpoint (~500 tokens). Bounded context regardless of conversation length. The customer can reference something from message #12 and the system still knows, because it's in the checkpoint.

This is not a nice-to-have. Without checkpoints, your AI either forgets what happened 20 messages ago or you burn through your token budget loading the full history. Neither works.


Stale Context Will Destroy Trust

Here's a subtlety that separates production systems from demos, and it's one nobody talks about until it bites them: stale context is worse than no context.

If a customer's refund was processed 30 seconds ago but the AI still says "your refund is pending" -- you've destroyed trust. One wrong answer about money and the customer will never trust the AI again. They'll demand a human for everything going forward, and honestly, they should.

The fix is layered freshness, matched to the consequence of being wrong:

Some things are always fresh -- current time, conversation state, pending actions. You compute these every single message. Zero caching. Other things are event-invalidated -- order status, payment status, account flags. When an order ships, the cache gets invalidated immediately via an event. Seconds of staleness at most. Then there's stuff that's fine with a TTL -- product availability, shipping estimates. A few minutes stale is acceptable. And finally, things loaded once per session -- customer name, account tier. These won't change mid-conversation.

The trick is matching freshness to consequence. Getting the customer's name wrong is awkward. Getting their payment status wrong is a support escalation.


The Human-AI Handoff

Let me be real -- the hardest UX problem isn't the AI. It's what happens when the AI can't handle something and a human needs to take over.

When the system escalates, the customer sees a clean system message: "I'm connecting you with a specialist who can help with this." Not the AI's failed response. Not an awkward "I don't know." A clear, honest handoff.

The agent receives everything: full conversation context (via checkpoint), the customer's last message, the confidence signals that triggered escalation, and a draft of what the AI would have said. The agent has the full picture. The customer doesn't repeat themselves. This matters more than any model improvement. The number one complaint customers have about support is repeating their problem to every new person.

After the agent resolves the issue, the system can resume AI handling. No toggle, no setting. The AI starts responding again when the agent stops. The whole thing should feel seamless -- and if you've done the architecture right, it will be.


The Architecture

I'll keep this brief because this could be its own post. The system is a pipeline:

Message arrives
       |
  WorkingMemoryLoader    --> Loads context (cached + fresh)
       |
  QueryClassifier        --> Intent, topic, entities, urgency
       |
  RetrievalPlanner       --> Decides WHAT to fetch (pure logic, no I/O)
       |
  ContextRetriever       --> Fetches in parallel with timeouts
       |
  ContextAssembler       --> Builds prompt within token budget
       |
  ResponseGenerator      --> LLM call with retry + circuit breaker
       |
  ConfidenceEvaluator    --> Score, route (AI / review / escalate)
       |
  ResponseDeliverer      --> Persist, broadcast, trigger checkpoint

Each component implements a Protocol. The QueryClassifier doesn't know if it's using regex rules, a fine-tuned model, or an LLM fallback. The ContextRetriever doesn't know if episodic memory comes from PostgreSQL or a cache. The ResponseGenerator doesn't know if it's calling Claude or a local model. This is how you swap models, add data sources, and tune routing logic without rewriting the pipeline.

The classifier itself is worth calling out. It's a 3-layer hybrid: regex rules catch emergencies and explicit escalation requests in ~1ms. A lightweight ML model handles 70-80% of messages in ~20ms. An LLM fallback catches the ambiguous cases in ~200ms. Weighted average classification time: ~40ms at near-zero cost. You don't need to call a frontier model just to figure out if someone is asking about their order or their refund.


The Real Shift

Going from a System of Record to a System of Intelligence isn't about adding a chatbot. And no, Anthropic and OpenAI won't solve this problem for you. This is an architecture problem, not a model problem.

The database doesn't change. The tables don't change. What changes is the layer between the data and the user -- a layer that understands context, loads progressively, classifies intent, retrieves selectively, evaluates confidence honestly, and knows when to hand off to a human.

Acme's databases are an asset, not a limitation. The data was always there. What was missing was the architecture to make it think.

It'll be up to each company to build this into their strategic roadmaps. The model providers give you the raw intelligence. The memory architecture, the confidence routing, the human handoffs, the cache freshness -- that's on you.