Yana Li

Posted on Jun 20

AI memory should be a product state, not a prompt trick

#ai #privacy #webdev #architecture

I ran into a memory problem while building a reflective AI product.

The easy version of AI memory is tempting:

Save useful facts about the user.
Put them back into the next prompt.
Call it memory.

That can work for low-risk personalization.

It is probably fine if an assistant remembers that a repo uses pnpm, or that a user prefers short answers, or that a team calls its staging branch preview.

But the problem changes when the user is bringing personal material into the product.

In my case, the product lets people explore dreams, moods, relationship patterns, recurring symbols, and private reflections. I did not want meaningful sessions to disappear into raw chat history. But I also did not want every sentence the user typed to quietly become permanent memory.

That changed the architecture for me.

The question stopped being:

What can the model remember?

It became:

What does the user own?
What did they approve?
When is it stored?
When is it used?
When can it be paused, exported, or deleted?

That distinction sounds like product design, but it quickly becomes system design.

The common mistake: treating memory as one bucket

Many AI products describe memory as if it were one thing:

Memory: on/off

That is simple, but it hides too much.

In practice, "memory" usually mixes several different jobs:

short-term conversation context
a summary of the last session
facts the user explicitly wants remembered
model-inferred patterns
user-authored background context
retrieval results selected for the current turn
account data that must be exportable or deletable

If all of that becomes one invisible bucket, the developer gets a simpler implementation and the user gets a murkier product.

You also lose the ability to answer basic questions:

Why did the assistant bring this up?
Did it save that automatically?
Is this being used in future prompts?
Can I delete this one thing without deleting everything?
What happens if I pause memory?
What happens if my subscription ends?

Those are not only UX questions. They are architecture questions.

Stored memory and prompt memory are different

The biggest architectural shift for me was separating stored memory from prompt-time memory.

A user may own saved memory assets, but that does not mean the model should always be allowed to use them.

I ended up thinking about memory in two different states:

This belongs to the user.
This is currently allowed to enter the prompt.

Those are not the same promise.

A user should be able to view, export, or delete saved memory without the assistant automatically using it in every future session.

That separation led to a small but important access layer:

type MemoryAccessState = {
  userMemoryEnabled: boolean;
  planKey: string;
  entitlementState: string;
  subscriptionActive: boolean;
  canUsePromptMemory: boolean;
  pendingMemoryActivation: boolean;
  memoryExpiresAt: string | null;
};

function getCanUsePromptMemory(state: MemoryAccessState) {
  return state.userMemoryEnabled && state.subscriptionActive;
}

The important field is not userMemoryEnabled.

It is canUsePromptMemory.

That field answers the real runtime question: should memory be included in the prompt for this turn?

This avoids a confusing product promise.

For example:

a free user may have memory assets saved, but not active in future prompts
a paid user may allow long-term memory into sessions
a user may pause memory without deleting the assets
an interrupted subscription may retain memory for a defined period without letting the model call it
deletion and export can still work even when prompt memory is disabled

Without this separation, memory becomes a hidden privilege state. The data exists, the user sees some of it, the model may or may not use it, and nobody can explain the boundary clearly.

The pattern: split memory by lifecycle

I stopped treating memory as one feature and started treating it as a set of product states.

The simplified shape looks like this:

conversation
-> session note
-> user-approved memory item
-> user-authored room context
-> long-term inner map
-> retrieval evidence
-> prompt context

Each layer has a different lifecycle.

conversation is the raw exchange. It is useful for same-session continuity, but too noisy to become long-term memory by itself.

session note is a structured artifact created after a session. It can summarize themes, symbols, conflicts, and useful continuity points.

memory item is smaller and more explicit. It is the kind of thing the user can inspect and approve.

room context is user-authored background. I treat this as higher-trust than inferred memory because the user wrote it directly.

inner map is a versioned long-term snapshot of recurring themes. This layer needs caution because it can easily sound more authoritative than it is.

retrieval evidence is what the system selected for the current turn.

prompt context is the final compiled memory that the model sees.

This adds work, but it makes the system easier to reason about.

The user can own assets without every asset being active in the prompt. The assistant can use relevant memory without pretending all saved history is equally important. The product can pause, retain, export, or delete different layers without inventing a new exception every time.

Retrieval should return evidence, not vibes

Once memory is layered, retrieval should also become more explicit.

A memory result should not just be text pasted into a prompt. It should carry enough metadata to explain why it was selected.

A simplified version:

type MemoryEvidence = {
  source: "session-note" | "memory-item";
  id: string;
  title: string;
  excerpt: string;
  score: number;
  reason: string;
  createdAt?: string;
};

The reason field is not fancy, but it is useful.

It can say things like:

semantic long-memory match
strong theme/symbol overlap
partial approved-memory overlap
recent continuity evidence

This helps debugging, but it also keeps the product honest.

If the assistant brings a memory into the current turn, the system should be able to explain why that memory was selected.

For a first version, retrieval does not need to be magical. I prefer a conservative hybrid approach:

user-authored context
+ latest long-term snapshot
+ relevant session notes
+ recent notes as fallback
+ approved memory items
-> compact prompt memory

Vector search is useful, but it should not be the only rule.

For personal or reflective products, recency, explicit approval, user-authored context, and clear fallback behavior matter just as much as similarity score.

A small failure case: the creepy callback

Here is the kind of behavior I wanted to avoid.

Suppose a user once wrote:

I keep dreaming about a locked garden gate.

Two weeks later they write:

I felt ashamed today when I wanted to ask for help.

Bad memory behavior:

This connects to your locked garden gate dream.

Maybe it does. Maybe it does not.

The callback may be clever, but if the user did not invite that connection, and the system cannot explain why it surfaced the memory, it can feel creepy.

Better behavior:

There may be a threshold theme here, but I would not force it.
If it feels connected, we can compare today's shame with the earlier gate image.
If not, we can stay with what happened today.

The wording is different, but the real difference is architectural.

The system needs to know:

Was the earlier note user-approved or model-inferred?
Was it selected because of semantic similarity, theme overlap, or mere recency?
Can the user inspect it?
Can the user delete it?
Can memory be paused before the next turn?

If those answers only exist inside the model response, the product is too soft in the wrong place.

The memory boundary needs to exist outside the prompt.

Retention is also part of memory design

Another mistake is treating retention as a legal/privacy page problem only.

For memory-heavy products, retention is part of the product behavior.

I prefer making the rules boring and explicit:

Active subscription:
  memory can accumulate
  memory can enter prompts if enabled

Paused or interrupted subscription:
  memory assets are retained for a defined period
  memory does not enter prompts

Memory disabled:
  existing assets remain manageable
  new long-term memory writes stop

Deletion:
  individual memory items can be deleted
  account-owned data can be exported or removed

The important part is not the exact number of days. The important part is that the product does not blur ownership, access, and usage.

A user should not have to guess whether saved memory is currently active.

They should not have to delete everything just to stop the assistant from using it.

Guardrails I would keep

For sensitive or reflective AI products, I would start with these guardrails:

Do not treat the raw transcript as the long-term memory layer.
Create a structured session note before creating smaller memory items.
Require explicit approval for strong long-term memory items.
Separate stored memory from prompt-time memory.
Show memory state in account or session UI.
Let users pause memory without deleting saved assets.
Let users delete individual memory items.
Keep export and deletion paths boring and reliable.
Record why retrieved memory entered the prompt.
Make retention rules explicit when billing or account state changes.

None of this requires a complicated agent framework.

It requires treating memory as product state instead of prompt decoration.

Where this is probably overkill

This pattern is too heavy for some products.

I would not build all of this for:

a throwaway demo
a stateless assistant
a tool that only remembers harmless UI preferences
a private local prototype where the user and developer are the same person

It becomes worth it when:

users bring sensitive or personal material
memory changes future model behavior
the product has accounts, billing, export, or deletion promises
users may reasonably ask why the AI remembered something
continuity is part of the paid value

That last point matters.

If memory is part of what people pay for, it cannot just be a prompt trick.

How I am applying this

I built this pattern while working on Jung Room, a non-clinical AI self-exploration room for dreams, moods, symbols, and recurring patterns.

The product-specific version has:

session notes
+ saved memory items
+ user-authored room context
+ inner-map snapshots
+ account memory controls
+ subscription-gated prompt-time memory
+ retention and deletion paths

The general lesson is broader than this product:

Memory should be inspectable before it becomes powerful.

Builder checklist

If you are adding memory to an AI product, these are the questions I would ask early:

What exactly is being stored?
Which parts were written by the user?
Which parts were inferred by the model?
Which parts require explicit approval?
Which parts can enter future prompts?
Can the user pause prompt-time memory?
Can the user delete one memory without deleting the account?
What happens when billing state changes?
What is the fallback when retrieval fails?
Can you explain why a memory was selected?

If those answers are unclear, the memory feature may work technically while still feeling untrustworthy.

Open question

How are you handling memory in your own AI apps?

Are you treating it as private prompt context, user-owned product state, retrieval evidence, or something else entirely?

DEV Community