BACKDOORS IT KNOWLEDGE BASE

Newest Posts

All Categories

All Tags

Why AI “forgets”: context, tokens, and what’s really happening

Dec 19, 2025 | Artificial Intelligence

Tags: ai token

You tell an AI your situation. It nails the answer. Then 20 messages later it acts like it never heard half of it.

That’s not mood swings. That’s context.

LLMs don’t have “memory” the way people imagine. They have something closer to a temporary workspace: whatever text is currently included in the request (the chat history + your latest message + any pasted docs). If important details fall out of that workspace, the model can’t use them.

The desk metaphor

Imagine you’re working with a super-smart assistant, but they can only see what’s on their desk:

Your current message
Some amount of chat history
Any documents you pasted in

That desk has a fixed size. If you keep adding paper, older pages slide off. The assistant isn’t refusing to remember — the paper is simply not there anymore.

That desk size is the context window.

What is a context window?

A context window is the maximum amount of text the model can consider at once.

Important detail: it’s not “how much you can paste in.” It’s the combined total of:

input (your text + conversation history + tool results / docs)
output (what the model generates)

So if you ask for a huge answer, you’re spending the same budget that could have kept more history visible.

Tokens: the units AI counts (not words)

Models don’t count “words”. They count tokens.

A token is a chunk of text produced by a tokenizer. It can be:

a whole word (house)
part of a word (extra + ordinary)
punctuation (. , { })
a space + word (often " hello" is one token)
parts of code identifiers (CustomerOrderID might split)

Quick intuition (rough but useful)

English prose: ~1 token ≈ 4 characters on average
Code / JSON / logs: often more tokens per visible character (lots of symbols + long identifiers)
Languages with diacritics: tokenization can be slightly less efficient depending on the tokenizer

This is why pasting “just a few logs” can destroy your context budget.

Why context limits create the “AI got dumb” effect

1) It loses your constraints

You say early:

“Don’t use OFFSET pagination”
“We must keep transaction boundaries”
“Naming convention is fixed”

Later, those rules may no longer be in the active context. The model switches back to generic defaults.

2) It becomes inconsistent

If it can’t see your earlier decisions, it may confidently suggest the opposite approach — not because it’s lying, but because it’s now optimizing for a different (incomplete) picture.

3) It starts “filling gaps”

When a key detail is missing, the model predicts the most likely continuation. That can look like confident facts, but it’s basically a high-quality guess.

That’s the origin of a lot of hallucinations in long threads: missing context + plausible completion.

Does the AI remember anything long-term?

Most of the time, no — not in the way people think.

There are two separate things:

Context (short-term): what’s inside the current conversation window
Training (long-term): knowledge learned during training, not your personal chat

Some products add extra features like “memory”, summaries, profiles, or saved instructions. But that’s external state handled by the app, not the model magically learning your life.

How serious AI apps “cheat” context limits (the right way)

If an app wants the model to behave like it remembers a lot, it usually uses one (or more) of these patterns:

1) Summaries (a.k.a. compaction)

Older parts of the conversation are compressed into a short summary like:

Project uses Postgres
No OFFSET
Batch size 500k
Data quality logging is mandatory

So instead of 50 messages, the model keeps a small “rules + decisions” page on the desk.

2) Retrieval (RAG)

Instead of pasting your whole knowledge base every time, the system:

searches your docs/code/logs
picks the most relevant chunks
injects only those chunks into the model’s context
answers grounded on what it retrieved

This is how you scale from “chat toy” to “enterprise assistant that doesn’t drift.”

3) Tools (real work is tool-driven)

Good coding assistants don’t rely on memory. They:

read files
search the repo
run commands
query databases
fetch logs

That makes answers deterministic because the model is anchored in fresh evidence instead of vague recollection.

Practical tricks that make AI way more reliable

These are boring, but they work.

1) Keep a “Pinned Facts” block

At the top of the conversation (or in a doc you paste repeatedly), keep something like:

Pinned Facts:

Goal: migrate MSSQL → Postgres ETL
Constraints: no OFFSET, prefer batching, verbose logs
Schemas: staging, infradb
Output: Python + SQL snippets, production-safe

When the thread gets long, paste the latest version again. You just re-loaded the desk with the rules that matter.

2) Put constraints before details

Bad order: dump data → mention constraints at the end
Good order: constraints → input → example → expected output

Models follow structure. If constraints are late, they get violated.

3) Ask for assumptions, not guesses

One sentence that changes behavior:

“If something is unclear, list assumptions and ask questions instead of inventing details.”

4) Don’t paste huge dumps unless you must

Instead of 3000 log lines, paste:

error lines
~20 lines around the error
versions (driver, OS, DB)
what changed last

If you truly need everything, use retrieval/search over the dump instead of stuffing it into the prompt.

5) For refactors: go file-by-file

Even with large context models, this is safer:

ask for a plan + file list
generate diffs for 1–3 files
apply + run tests
repeat

This prevents drift and keeps decisions stable.

Why bigger context isn’t the whole story

A bigger context window helps, but two models with the same token limit can behave very differently:

some are better at pulling the right detail from earlier text
some follow strict constraints better
some produce cleaner code with fewer subtle bugs
some are better at summarizing and staying consistent over long sessions

So “more tokens” is not the same as “more reliable.” It’s just more workspace.

The takeaway

Tokens are the units LLMs count.
The context window is the maximum tokens the model can “see” at once.
When important info falls out of context, the model doesn’t remember it — and it starts acting inconsistent.
Real solutions are: pinned facts, summaries, retrieval, and tool-based workflows.

If you want, tell me your target audience for this post (general people vs business owners vs devs) and I’ll rewrite the same topic with examples tailored to your world (ETL pipelines, logs, code reviews, incident response).

BACK TO KNOWLEDGE BASE

← What Really Happened During Cloudflare’s November 18, 2025 Outage?

MCP server program in a modern company

Mission Standardize how AI apps (ChatGPT, Claude, in-house agents) safely act on your systems: files, tickets, code, dashboards, calendars, DBs. One interface, permissioned actions, full audit. Think “USB-C for AI tools.” Model Context Protocol+1 What the MCP layer...

Ilya Sutskever’s Warning From Toronto: Digital Minds Are Coming—Architect the Brakes Now

1) Who is Ilya Sutskever? (3 sentences) Ilya Sutskever co-founded OpenAI and helped steer the deep-learning wave that produced GPT-class systems. In 2024 he launched Safe Superintelligence Inc. (SSI), a lab organized around a single objective: build superintelligence...

Understanding How OpenAI Runs in Azure vs. OpenAI API

Artificial intelligence (AI) models, especially those from OpenAI like GPT-4, are widely used across industries for various applications. However, there is often confusion about the differences between using OpenAI models via Azure OpenAI Service and OpenAI API...

Unraveling the Art of Prompt Design and Engineering in AI

In the rapidly evolving field of artificial intelligence (AI), one aspect that often goes unnoticed is the art of prompt design. This crucial component plays a significant role in guiding the outputs of generative AI models. This blog post aims to shed light on...

Harnessing AI Capabilities in Google Cloud Platform for Cutting-Edge Solutions

Google Cloud Platform (GCP) is a leader in innovation, especially in the realm of artificial intelligence (AI) and machine learning (ML). Known for its pioneering work in data analytics and AI, GCP provides a suite of powerful tools that enable businesses to deploy...

Exploiting AI Capabilities in AWS for Advanced Solutions

Amazon Web Services (AWS) is renowned for its extensive and powerful suite of cloud services, including those geared towards artificial intelligence (AI) and machine learning (ML). AWS offers a broad array of tools and platforms that empower organizations to implement...

Leveraging AI Capabilities in Azure for Innovative Solutions

Introduction As cloud technologies continue to evolve, the integration of artificial intelligence (AI) has become a cornerstone in delivering sophisticated, scalable, and efficient solutions. Microsoft Azure stands out with its robust AI frameworks and services,...

Harnessing ChatGPT in Data Science: Empowering Your Business with AI

We are thrilled to share insights on how we're pioneering the use of ChatGPT in the field of Data Science to bring cutting-edge solutions to your business. In this blog post, we will explore the transformative potential of ChatGPT across various data science...

Unpacking GPT-4’s Token Magic: From 8K to 32K Explained

The concept of "tokens" in the context of models like GPT-4 refers to the basic units of text that the model processes. When we talk about GPT-4 "8k token" or "32k token," we're referring to the model's capability to handle inputs and generate outputs within a limit...

Navigating the Landscape of Foundational Models: A Guide for Non-Tech Leaders

As the digital age accelerates, foundational models in artificial intelligence (AI) have emerged as pivotal tools in the quest for innovation and efficiency. For non-tech leaders, understanding the diversity within these models can unlock new avenues for growth and...

Our Work & SERVICES

Book Online