Skip to content

Memories

GitHub: divyekant/memories

AI assistants lose all context when a session ends. Memories gives them persistent, searchable memory that survives across sessions, projects, and machines. It runs locally as a Docker service, provides sub-50ms hybrid search combining BM25 keyword matching with vector similarity, and works with any AI client that supports MCP or REST.

  • Hybrid search — BM25 keyword + vector similarity with Reciprocal Rank Fusion, under 50ms
  • AUDN extraction pipeline — Automatically classifies facts as Add, Update, Delete, or Noop to keep memory clean over time
  • Multi-client support — MCP for Claude Code, Cursor, Codex; REST API for ChatGPT, OpenClaw, and anything else
  • Automatic memory hooks — 5-event lifecycle (session start, prompt, response, pre-compact, session end) for hands-free memory management
  • Novelty detection — Checks if information is already known before storing, preventing duplicates
  • Auto-backups — Snapshots after every write, with optional cron and Google Drive/S3 off-site backup
  • ONNX Runtime inference — Same model quality as PyTorch (all-MiniLM-L6-v2) in a 68% smaller Docker image
  • Web UI — Built-in memory browser at /ui for viewing and managing stored memories
  • Extraction providers — Anthropic, OpenAI, ChatGPT Subscription, Ollama, or skip entirely

Memories is the foundational persistence layer of the Arkos ecosystem. Carto stores its codebase index in Memories. Learning stores failure-fix patterns in Memories. Hermes writes generated documentation entries to Memories. Any tool that needs to remember something across sessions uses Memories as its backend.

Terminal window
# Clone and start
git clone https://github.com/divyekant/memories.git
cd memories
docker compose -f docker-compose.snippet.yml up -d
# Verify
curl http://localhost:8900/health
# Add a memory
curl -X POST http://localhost:8900/memory/add \
-H "Content-Type: application/json" \
-d '{"text": "Always use TypeScript strict mode", "source": "standards.md"}'
# Search
curl -X POST http://localhost:8900/search \
-H "Content-Type: application/json" \
-d '{"query": "TypeScript config", "k": 3, "hybrid": true}'

The service runs at http://localhost:8900. API docs at /docs, memory browser at /ui.

Memories runs as a FastAPI service inside Docker. Clients connect via MCP protocol (Claude Code, Codex, Cursor) or REST API (everything else). The MCP server is a thin Node.js wrapper that translates MCP tool calls into REST requests.

Internally, the engine maintains a vector index (ONNX Runtime embeddings) alongside a BM25 keyword index. Search queries hit both and results are fused using Reciprocal Rank Fusion. All data is persisted to disk as vector_index.bin + metadata.json with automatic backups after every write.

The optional extraction pipeline uses an LLM (Anthropic, OpenAI, Ollama, or ChatGPT Subscription) to analyze conversation transcripts and classify facts through the AUDN loop before storing them.