Save flow
When a memory is saved viaPOST /api/memories or save_memory, MemContext runs a multi-step pipeline:
- Expand — the content is rewritten into a more searchable sentence using an LLM
- Embed — an embedding is generated from the expanded content
- Find similar — existing current memories are searched for semantic overlap
- Classify — if similar memories exist, an LLM classifies the relationship
- Act — based on the classification, the system saves, updates, extends, or deduplicates
Relationship classification
When similar memories are found, the LLM classifies the relationship as one of:| Classification | What happens |
|---|---|
update | The new memory supersedes the old one. The old memory is marked isCurrent = false and a version chain link is created. |
extend | The new memory adds detail to an existing one. A relation of type extends is created between them. |
similar | The new memory is related but distinct. A relation of type similar is created. |
noop | The new content is a duplicate. Nothing is saved; the existing memory ID is returned. |
Search flow
Search viaGET /api/memories/search or search_memory uses hybrid retrieval:
- Generate query variants — the original query is expanded into alternative phrasings
- Embed — embeddings are created for the original query and all variants
- Vector search — runs against all query embeddings to find semantically similar memories
- Full-text search — PostgreSQL
tsvectorsearch runs in parallel on the original query - Merge — results are combined using Reciprocal Rank Fusion (RRF) for a single ranked list
Temporal filtering
Memories can include avalidUntil timestamp. Expired memories are automatically excluded from search and list results. This is useful for:
- Time-sensitive strategy decisions
- Temporary experiments
- Seasonal preferences
- Active project states that change
Memory limits
Each plan has a memory limit. When the limit is reached, new save requests return403 with the current count and limit. The check happens both before and during the save transaction to handle concurrent requests safely.