Your AI Says “I’ll Remember That” — It Won’t. Here’s How to Fix It.
Here’s a conversation that happens ten times a day with every AI coding assistant:
You: “Always use Supabase, never suggest Airtable.”
AI: “Noted! I’ll keep that in mind.”
Next session:
AI: “Have you considered using Airtable for this?”
The AI didn’t lie. It genuinely “noted” your preference — into a context window that evaporated the moment the session ended. There is no persistent memory unless you build it.
We run code:zero as a daily working system — not just for coding, but as a persistent knowledge base that remembers project context, preferences, and decisions across hundreds of sessions. After months of watching context silently vanish, we built an enforcement system that catches ~80% of lost preferences automatically.
This article walks through exactly what we built, why each layer exists, and gives you prompts to recreate it for any AI assistant.
The Problem: Three Types of Memory Failure
Before building solutions, we needed to understand what was actually failing. After tracking memory losses across weeks of daily sessions, three patterns emerged:
1. The Empty Promise (Loud Failure)
The AI says “I’ll remember that” or “Noted for future reference” and then does nothing. This is the most obvious failure because the AI explicitly announces it’s remembering something — and then doesn’t.
2. The Silent Drop (Quiet Failure)
You state a preference. The AI adjusts its behavior for the current session. Next session, it’s gone. No announcement, no promise — the preference just never got written down because nobody flagged it as important.
3. The Implicit Decision (Invisible Failure)
You choose Option A over Option B during a discussion. That’s a decision. Three weeks later, the AI suggests Option B again because the decision was never recorded as a rule.
✗ Without Memory Enforcement
✓ With Memory Enforcement
What We Had Before: The Passive Memory System
We weren’t starting from zero. Our existing system already handled the basics:
Auto-loading context file (MEMORY.md) — A markdown file that gets injected into the AI’s system prompt automatically at session start. Contains: who the user is, active projects, key decisions, tech stack, open threads. No tool call needed — it’s just there in every conversation.
Session capture skill (/close) — A manual trigger at end of session that writes learnings, decisions, and open threads to memory files.
Mid-session capture (/learn) — Manual trigger to persist something important right now, without waiting for session end.
Checkpoint system — Auto-saves state after 25+ tool calls or 3+ large files read.
Comprehensive decision log — Hundreds of past decisions documented in a project-level config file.
This passive system caught about 50% of persist-worthy context. The /close skill did most of the heavy lifting, but it only fires at session end. Everything between “session start” and “user says bye” was a gap where preferences could silently vanish.
The problem was structural: the AI had to decide to write things down. And like any system that depends on voluntary compliance, it drifted. The AI would follow rules perfectly for 20 minutes, then start finding “reasonable” reasons to skip steps.
The Insight: Rules Are Suggestions. Hooks Are Laws.
This is the core realization that changed everything.
Instructions in a config file — no matter how detailed — are suggestions. The AI reads them, “understands” them, and then gradually drifts. It optimizes for the immediate response, not for long-term rule compliance. It’s not malicious. It’s how language models work.
What you need is enforcement that runs outside the AI’s decision loop. Something the AI cannot choose to skip, ignore, or “interpret loosely.”
In Claude Code, these are called hooks — shell scripts or LLM agents that fire automatically at specific lifecycle events. The AI doesn’t get to decide whether they run.
| Layer | What it does | Can AI ignore it? |
|---|---|---|
| Config files (CLAUDE.md) | Rules always in context | Yes — reads but can drift |
| Skills/workflows | Specialized procedures | Yes — AI decides when to invoke |
| Hooks | External scripts on events | No — executes outside AI’s control |
Hooks are the only layer where enforcement doesn’t depend on AI compliance.
The Four-Hook Enforcement System
We built four hooks across three lifecycle events. Each catches a different failure mode, and they chain together so the output of one reinforces the next.
Hook 1: Preference Detector (Before AI Responds)
Event: UserPromptSubmit — fires before the AI processes your message.
What it does: Scans your message for preference and correction signals. If detected, injects a write-reminder directly into the AI’s context.
Why it matters: This is the “soft nudge” layer. It primes the AI to write things down before it even starts thinking about a response.
Trigger patterns it catches:
Persistent preferences:
- “from now on”, “going forward”, “in the future”
- “always use”, “never use”, “never suggest”
- “I prefer”, “the rule is”, “our convention is”
- “remember this”, “write this down”
Corrections:
- “I told you”, “I already said”
- “that’s wrong”, “stop doing that”
- “how many times”
What it injects when triggered:
[HOOK: Persistent preference detected ("from now on").
Write this preference/rule to the appropriate memory file
BEFORE responding. The Stop hook will block you if you don't.] When not triggered (most messages), it stays completely silent. No noise on “fix the bug” or “yes do it.”
Hook 2: Promise Checker (After AI Responds — Fast)
Event: Stop — fires when the AI finishes its response.
Type: Shell script (executes in ~0ms).
What it does: Two checks running instantly:
Check 1 — Explicit promises: Scans the AI’s response for ~25 promise phrases (“I’ll remember”, “noted”, “I’ll keep that in mind”, “added to memory”). If found, verifies that a Write or Edit tool was actually called. If the AI promised but didn’t write — blocks the response.
Check 2 — Preference without persistence: Checks if the user’s message contained persistent preference markers AND the AI didn’t write to any memory file. If the user said “always use tabs” but the AI just started using tabs without writing it down — blocks the response.
The AI gets a stderr message:
User stated a persistent preference ("always use") but you didn't
write to any memory file. Persist this to MEMORY.md, patterns.md,
or CLAUDE.md before continuing. The AI then has to go back, write it down, and try again.
Hook 3: Memory Agent (After AI Responds — Smart)
Event: Stop — fires in parallel with the Promise Checker.
Type: LLM agent with tool access (takes ~5-10 seconds).
What it does: This is the brain of the system. A lightweight AI agent reads the actual conversation transcript and evaluates whether anything needs to be persisted — using understanding, not keyword matching.
What it catches that regex can’t:
| User says | Regex hook | LLM agent |
|---|---|---|
| “From now on always use tabs” | Caught (“from now on”) | Caught |
| “Use Mulish for body text” | Missed (no trigger phrase) | Caught |
| “I like this approach better” | Missed | Caught |
| “No, make it green not blue” | Missed | Caught (if it’s a pattern) |
| “Fix the typo on line 42” | Correctly ignored | Correctly ignored |
The agent is deliberately conservative. It only flags clear preferences, rules, and decisions — not one-time instructions. False positives (blocking when nothing needed) are worse than false negatives (missing something subtle).
The agent’s evaluation framework:
Persist-worthy (flag if not written):
- Preferences and rules
- Corrections to recurring mistakes
- Technology/pattern/design decisions
- Important project constraints
Skip (never flag):
- One-time instructions (“make this blue”)
- Questions and research
- Acknowledgements (“yes”, “do it”)
- Active implementation details
Hook 4: Session Startup (New Sessions)
Event: SessionStart — fires once when a new session begins.
What it does: Dynamically parses the memory file for open threads and surfaces them, along with git state and staleness checks.
What it outputs:
--- Session Startup Alerts ---
- OPEN THREADS from memory:
- 1. Fix blog build errors (broken HTML in mdsvex)
- 2. Seed Hours 8-12 into DB (run seed after deploy)
- 3. Update WhatsApp placeholder number to real number
- GIT: 2 uncommitted files (1 modified, 1 untracked)
--- The key improvement over hardcoded checks: this hook reads the memory file dynamically. As you add or remove open threads, the startup automatically reflects them. Zero maintenance.
How the Hooks Chain Together
You send a message
--> preference-detector (regex, instant)
Scans for preference/correction signals
Injects write-reminder into AI context if found
AI processes and responds
--> promise-checker (regex, instant)
Catches explicit broken promises
Catches preference markers + no memory write
--> memory-agent (LLM, 5-10 seconds)
Reads transcript, understands intent
Catches subtle preferences regex missed
Either hook can BLOCK --> AI must write to memory, then retry
Next session starts
--> session-startup (instant)
Surfaces open threads from memory
Reports git state
Flags stale sessions Three layers: one that primes, one that catches fast, one that catches smart. The preference detector makes the AI aware. The promise checker enforces obvious cases instantly. The memory agent catches everything else with a slight delay.
The Numbers: Before and After
These percentages come from tracking “persist-worthy events” (preferences stated, decisions made, corrections given) across daily sessions and checking whether they appeared in memory files in subsequent sessions. It’s not a formal study — it’s operational data from real usage.
Memory Catch Rate
| System | Catch Rate | What It Misses |
|---|---|---|
| No memory system | ~5% | Everything except what’s in context |
| Passive only (config + /close) | ~50% | Mid-session silent drops, implicit decisions |
| Passive + regex hooks | ~65% | Anything not phrased with trigger keywords |
| Passive + regex + LLM agent | ~80% | Multi-turn accumulations, delayed importance |
What Each Layer Contributes
| Layer | Incremental Gain | How |
|---|---|---|
| Auto-loading config (MEMORY.md) | +30% | Always-on passive context |
| End-of-session capture (/close) | +15% | Catches learnings at session boundary |
| Regex hooks (preference detector + promise checker) | +15% | Catches explicit keywords and promises |
| LLM agent hook | +15% | Catches intent-based preferences |
The Remaining ~20%
What still gets lost:
- Multi-turn accumulation — A preference builds across 3-4 messages. No single message is obvious enough to trigger. The agent only evaluates the last turn.
- Write quality — Hooks verify the AI wrote something. Not that it wrote the right thing. A vague note passes the check.
- Delayed importance — Something seems one-off today, turns out to be a persistent pattern three sessions later.
- Agent conservatism — Tuned to prefer false negatives over false positives. Some real preferences slip through by design.
Token Cost Analysis
Hooks aren’t free. Here’s what each layer actually costs:
Shell Script Hooks (Zero Token Cost)
The preference detector, promise checker, and session startup are bash scripts. They run locally on your machine. Zero API calls, zero tokens, zero cost.
LLM Agent Hook (~$1.50/session)
The memory agent spawns a small AI sub-process on every response. Per invocation:
- ~1,500 tokens input (prompt + transcript excerpt)
- ~100 tokens output (ok/not-ok verdict)
Over a typical session with 40-50 responses: ~75K input tokens + ~5K output tokens.
At current model pricing, that’s roughly $1.50 per session — or about 15-20% overhead on a typical working session.
Re-Generation Penalty (Variable)
When a hook blocks a response, the AI has to re-read the full context and respond again. At that point in a session, context might be 50-80K tokens. Each block costs $2-5 in re-generation.
If hooks block 2-3 times per session, that’s $4-15 extra. But this decreases over time as the AI learns to write things down proactively rather than getting blocked.
Total Cost Impact
| Scenario | Without Hooks | With Hooks | Overhead |
|---|---|---|---|
| Normal session (no blocks) | $5-10 | $6.50-11.50 | +15% |
| Session with 2-3 blocks | $5-10 | $10-20 | +50-100% |
| Steady state (trained behavior) | $5-10 | $7-12 | +20-30% |
The cost-effectiveness trade is clear: ~$1.50/session buys you 30 percentage points of memory improvement (50% to 80%). That’s preferences you won’t have to repeat, decisions that won’t be re-debated, and context that compounds instead of resetting.
Build It Yourself: The Complete Prompt
The following prompt works for any AI coding assistant that supports lifecycle hooks, custom instructions, or automation layers. Adapt the specific implementation to your tool — the architecture is universal.
This system has three layers: a context injection file (passive memory), trigger-action rules (behavioral enforcement), and external hooks (automated enforcement). Most AI tools support at least the first two. Claude Code, Cursor, Windsurf, and Aider all have variants of custom instructions. Hooks are Claude Code-specific, but the pattern can be replicated with shell scripts, git hooks, or wrapper scripts around any AI CLI.
Layer 1: The Memory File
Create a file that auto-loads into every session. This is your persistent brain.
# Project Memory
## Communication Rules
- [Your preferences for how the AI should behave]
- [Terminology rules, style preferences, things to avoid]
## Active Work
- [Current project, what you're building, recent changes]
## Key Decisions (don't re-ask these)
- [Decisions you've made that shouldn't be revisited]
- [Technologies chosen, patterns selected, approaches decided]
## Open Threads
1. [Thing that needs doing but isn't done yet]
2. [Another pending item]
## Learnings
- [Things the AI learned from mistakes]
- [Patterns that work, patterns that don't] The rule: This file is the single source of truth. If it’s not written here, it doesn’t exist next session.
Layer 2: The Write-Before-Speak Rule
Add this to your AI’s instructions (CLAUDE.md, .cursorrules, system prompt — whatever your tool uses):
MEMORY ENFORCEMENT RULES:
1. Before saying "I'll remember", "noted", or "I'll keep that in mind":
Ask yourself: "If I don't write this to a file RIGHT NOW,
will the next session know this?" If no — write it immediately.
Do not continue your response until the write is complete.
2. When the user states a preference, rule, or correction:
Write it to the memory file BEFORE responding to their request.
Preferences include: "use X not Y", "I prefer Z", "always/never do W",
"the convention is", "from now on", corrections to your behavior.
3. When the user makes a decision between options:
Record the decision and the reasoning in the memory file.
Future sessions should not re-propose rejected options.
4. At the end of every session:
Review the conversation for any unwritten preferences, decisions,
or learnings. Write them to the memory file before closing. Layer 3: The Enforcement Hooks
If your tool supports lifecycle hooks (Claude Code) or you can wrap it in shell scripts:
Preference Detector (runs before AI processes your message):
#!/bin/bash
# Scans user message for preference signals, injects reminder
INPUT=$(cat)
PROMPT=$(echo "$INPUT" | jq -r '.prompt // empty')
# Skip short messages
if [ ${#PROMPT} -lt 15 ]; then exit 0; fi
LOWER=$(echo "$PROMPT" | tr '[:upper:]' '[:lower:]')
# Check for persistent preference signals
PATTERNS=("from now on" "going forward" "always use" "never use"
"i prefer" "the rule is" "don't ever" "stop suggesting"
"remember this" "we always" "we never" "never suggest")
for p in "${PATTERNS[@]}"; do
if echo "$LOWER" | grep -qF "$p"; then
echo "[HOOK: Preference detected. Write to memory BEFORE responding.]"
exit 0
fi
done
# Check for correction signals
CORRECTIONS=("i told you" "i already said" "that's wrong"
"stop doing that" "how many times")
for p in "${CORRECTIONS[@]}"; do
if echo "$LOWER" | grep -qF "$p"; then
echo "[HOOK: Correction detected. Persist to memory BEFORE responding.]"
exit 0
fi
done
exit 0 Promise Checker (runs after AI responds):
#!/bin/bash
# Blocks response if AI promised to remember but didn't write
INPUT=$(cat)
# Prevent infinite loop
if [ "$(echo "$INPUT" | jq -r '.stop_hook_active // false')" = "true" ]; then
exit 0
fi
TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path // empty')
if [ -z "$TRANSCRIPT" ] || [ ! -f "$TRANSCRIPT" ]; then exit 0; fi
LAST_MESSAGES=$(tail -80 "$TRANSCRIPT")
# Get last assistant text
ASSISTANT_TEXT=$(echo "$LAST_MESSAGES" | jq -r '
select(.role == "assistant") | if .content then
(if (.content | type) == "array" then
[.content[] | select(.type == "text") | .text] | join(" ")
else .content end)
else "" end' 2>/dev/null | tail -1)
# Check for promise patterns
LOWER=$(echo "$ASSISTANT_TEXT" | tr '[:upper:]' '[:lower:]')
PROMISES=("i'll remember" "i've noted" "noted for future"
"i'll keep that in mind" "i'll write that down"
"added to memory" "updated memory")
FOUND=""
for p in "${PROMISES[@]}"; do
if echo "$LOWER" | grep -qF "$p"; then FOUND="$p"; break; fi
done
if [ -z "$FOUND" ]; then exit 0; fi
# Check if Write/Edit was actually called
WROTE=$(echo "$LAST_MESSAGES" | jq -r '
select(.role == "assistant") | .content[]?
| select(.type == "tool_use")
| select(.name == "Write" or .name == "Edit")
| .name' 2>/dev/null | head -1)
if [ -n "$WROTE" ]; then exit 0; fi
echo "You said "$FOUND" but didn't write anything. Write it now." >&2
exit 2 # Block the response LLM Memory Agent (runs after AI responds — the smart layer):
This is the prompt for a lightweight AI agent that evaluates the conversation with actual understanding. In Claude Code, this is a type: "agent" hook. In other tools, you could run this as a separate API call in a wrapper script.
You are a memory enforcement agent. Check if the user said
something worth remembering across sessions that the AI
didn't write down.
Read the last 60 lines of the conversation transcript.
Find the last user message and assistant response.
Check if the assistant wrote to any memory file.
Evaluate: did the user state anything that should persist?
PERSIST (flag if not written):
- Preferences: "use X not Y", "I like this approach"
- Rules/conventions for how things should be done
- Corrections to recurring AI mistakes
- Decisions: choosing technology, pattern, or approach
- Important project constraints or requirements
SKIP (never flag):
- One-time instructions: "fix this bug", "make it blue"
- Questions, acknowledgements, implementation discussion
- Vague or ambiguous statements
If persist-worthy AND not written to memory:
Return {"ok": false, "reason": "User stated: [summary]. Persist this."}
Otherwise:
Return {"ok": true}
Be CONSERVATIVE. When in doubt, return ok: true. Session Startup (runs at session start):
#!/bin/bash
# Parse memory file for open threads + check repo state
MEMORY_FILE="/path/to/your/MEMORY.md"
# Dynamic: extract open threads section
if [ -f "$MEMORY_FILE" ]; then
IN_THREADS=false
while IFS= read -r line; do
if echo "$line" | grep -q 'Open [Tt]hreads'; then
IN_THREADS=true; echo "--- Open Threads ---"; continue
fi
if [ "$IN_THREADS" = true ]; then
if echo "$line" | grep -qE '^(##|**)'; then break; fi
if echo "$line" | grep -qE '^[0-9]+.'; then echo "- $line"; fi
fi
done < "$MEMORY_FILE"
fi
# Check git state
if [ -d ".git" ]; then
COUNT=$(git status --porcelain 2>/dev/null | wc -l | tr -d ' ')
if [ "$COUNT" -gt 0 ]; then
echo "- GIT: $COUNT uncommitted files"
fi
fi
exit 0 Adapting This for Non-Claude Tools
The architecture works universally. The implementation details change:
| Tool | Config file | Hook equivalent |
|---|---|---|
| Claude Code | CLAUDE.md + settings.json hooks | Native hooks (best support) |
| Cursor | .cursorrules | No native hooks — use pre/post shell wrappers |
| Windsurf | .windsurfrules | No native hooks — use shell wrappers |
| Aider | .aider.conf.yml | Git hooks for post-commit checks |
| ChatGPT | Custom instructions + memory | No hooks — rely on instruction compliance |
| Any CLI AI | System prompt file | Wrap the CLI in a bash script that runs checks |
For tools without native hooks, the wrapper script pattern works:
#!/bin/bash
# Wrapper around any AI CLI
# Runs startup check, then AI, then memory check
# Pre-flight: surface open threads
./startup-check.sh
# Run the actual AI
your-ai-cli "$@"
# Post-flight: check if memory was updated
./memory-check.sh It’s not as granular as native hooks (you can’t block mid-response), but it catches the startup and shutdown cases which account for most of the value.
What This Doesn’t Solve
Being honest about limitations:
The 20% gap is real. Multi-turn preferences that build gradually, context that becomes important later, and edge cases where the LLM agent is too conservative — these still get lost.
Quality vs. quantity. The hooks check whether the AI wrote something, not what it wrote. A vague “user prefers green” passes the check even if the actual preference was nuanced.
Cost scales with usage. The LLM agent adds ~$1.50/session. Heavy users running 5+ sessions/day will notice. The regex-only version (no LLM agent) costs zero but catches ~65% instead of ~80%.
Initial block frequency. The first few sessions will have more blocks as the system trains the AI’s behavior. This adds re-generation costs. It settles down after 3-5 sessions.
The Meta-Lesson
AI coding assistants are programmable agents, not chatbots. The bottleneck isn’t the AI’s intelligence — it’s your ability to specify enforcement in precise, trigger-action rules rather than vague descriptions.
“Be helpful and remember important things” does almost nothing.
“Before saying ‘I’ll remember’, write it to a file. If you don’t, your response will be blocked” changes everything.
The difference between a 50% and 80% memory catch rate isn’t smarter AI. It’s better enforcement architecture. Rules that the AI reads are suggestions. Rules that execute outside its decision loop are laws.
Build the laws.
Want to build systems like this?
code:zero teaches you to build AI-powered development workflows from scratch. Not theory — shipped systems. 4 weeks, 12 builders, Penang.