eva | Stephan Glaue

What it does

An AI character companion with personality, memory, and journaling capabilities. Text-first design, voice optional. Self-hosted and privacy-focused.

Why I built it

Since LLMs first emerged, I’ve been fascinated by them. We suddenly got a huge step closer to the AI assistants we see in media — think J.A.R.V.I.S. I knew I had to build my own version.

When tool use became available, I got obsessed with the idea of controlling everything through natural language. Imagine the accessibility implications: an LLM that could control Home Assistant, manage your files, automate your workflows.

I’ll be honest — I never checked if something like this already existed. I wanted to learn by finding my own solutions. So that’s what I did.

The core innovation: Two-track memory

Traditional chatbots either forget everything or bloat the context with repeated information, causing them to lose coherence and slow down. Some work around this by “compacting” conversations through summarization. I had that idea too, and I’ll implement it eventually.

But something else bothered me: chatbots can only reference the current chat and what you explicitly give them. Compacting makes them “forget” things permanently. I didn’t like that, so I designed a way to let them remember anything without bloating the context.

Eva uses a two-track architecture:

Track 1: Conversation history
Clean dialogue stored in PostgreSQL — the standard chat history you’d have with any chatbot.

User: Hi Eva! I'm working on a new AI project.
Eva: That sounds exciting! What kind of AI project?
User: I'm building my own assistant!

Track 2: Injected context
Relevant memories and metadata injected when generating each response, but not kept in the chat history.

Related excerpts from past conversations:
- User mentioned interest in building AI assistants (2024-03-15)
- User has experience with Python and FastAPI

Current context:
- Time: 13:34
- Weather: Sunny, 26°C
- Device: Mobile

This context gets used once to generate the response, then discarded. Relevant past conversations are retrieved via semantic search, giving the bot access to information from years ago without filling up the context window.

The result: much longer conversations without summarization, while still remembering things discussed months or years ago.

Tech stack

Backend: Python, FastAPI
Databases: PostgreSQL (structured), Redis (cache/sessions), ChromaDB (vector embeddings)
LLM: Claude API / local models via Ollama
Architecture: Two-track memory with semantic retrieval

Current status

Phase 5 — WebSocket endpoint implementation.

Completed:

✅ Foundation & infrastructure
✅ LLM integration
✅ Database schema & models
✅ Two-track memory system (basics)
✅ Terminal interface with debug modes

What I learned

Designing memory systems that scale without bloating context windows
Semantic search with vector embeddings (ChromaDB)
Balancing local LLM performance vs API costs
The importance of separating “memory” from “conversation history”

What it does#

Why I built it#

The core innovation: Two-track memory#

Tech stack#

Current status#

What I learned#