Always-on AI assistants require memory systems that continuously ingest, organize, and retrieve knowledge without human curation bottlenecks. This dissertation synthesizes graph algorithms, knowledge representation theory, reasoning techniques, and production architecture patterns into a unified framework for living knowledge systems — knowledge graphs that grow, self-correct, and serve as the long-term memory backbone for autonomous AI agents. We evaluate these patterns against two real-world case studies: COSMO's .brain format and Axiom's ~/life/areas/ knowledge layer, proposing concrete improvements grounded in the theory developed across six study units.
---
The central challenge for always-on AI assistants is memory continuity. Unlike humans, who maintain persistent neural representations, AI assistants operate in discrete sessions with limited context windows. Knowledge graphs offer a compelling solution: they externalize memory as a structured, queryable, evolvable graph of entities and relations.
But not all knowledge graphs are equal. Static, hand-curated ontologies fail for autonomous agents because:
What we need are knowledge graphs that learn — systems that adapt their structure, resolve conflicts, and improve their own organization over time.
---
From Unit 4, we established that property graphs (nodes and edges with arbitrary key-value properties) dominate practical AI systems because:
RDF excels in federated, standards-compliant environments (e.g., linked open data), but for a single-agent knowledge system, property graphs win on pragmatism.
Axiom's ~/life/areas/ uses Markdown files with YAML frontmatter as a property graph:
From Unit 1's complexity analysis: adjacency list representation (which file-based KGs approximate) gives O(V+E) space and O(degree) neighbor lookup — optimal for sparse graphs, which knowledge graphs typically are.
This representation serves Axiom well at current scale (~hundreds of entities). The transition point to a database arrives at ~10K entities or when multi-hop traversals become frequent operations.
For the next scale tier:
File-based KG (human-readable, git-versioned)
↕ bidirectional sync
SQLite index (FTS + property queries + adjacency index)
This preserves human readability and git versioning while enabling indexed queries. The sync is straightforward: a watcher detects file changes and updates SQLite; SQLite queries return file paths.
---
From Units 2 and 3: entity deduplication is fundamentally a connected components problem. Given pairwise similarity scores between candidate duplicates:
1. Build a similarity graph (edge = score > threshold)
2. Find connected components (BFS/DFS, O(V+E))
3. Each component = one canonical entity; merge all members
Tarjan's algorithm (Unit 3) handles the directed case when merge precedence matters (e.g., "entity A was created first, so it's canonical").
Not all entities are equally important. From Unit 3:
Practical application: Weekly cron job computes centrality metrics, surfaces:
Louvain community detection (Unit 3) applied to the knowledge graph reveals natural topic clusters. This could:
---
From Unit 5, transitive closure reveals implicit knowledge:
Practical value: The system notices "we know X's employer and the employer's location, but we never recorded X's location" — prompting explicit fact extraction.
Knowledge decays. From Unit 5's temporal knowledge graphs:
For Axiom: The updated_at field in entity frontmatter serves this purpose. A weekly scan flags entities not updated in 30+ days for review.
When two facts contradict:
1. Check provenance — which source is more authoritative?
2. Check recency — more recent facts generally win
3. Check confidence — higher-confidence facts win
4. If tied, flag for human review — don't silently pick a winner
This is where provenance tracking (Unit 6) pays off. Without it, conflicts are unresolvable.
---
An always-on agent encounters knowledge constantly:
The ingestion pipeline must run inline with agent operation, not as a batch job:
Agent processes message
→ Fact extraction (LLM-based, inline)
→ Entity resolution (alias lookup + fuzzy match)
→ Dedup check (exact match on canonical ID)
→ Write to KG (with provenance metadata)
Critical insight from Unit 6: Ingestion must be idempotent. The same conversation processed twice should not create duplicate entities or contradictory facts.
From Unit 6's RAG-over-graphs pattern, the retrieval pipeline for an always-on agent:
1. Entity extraction from query — identify entities the user is asking about
2. Alias resolution — map surface forms to canonical entity paths
3. Subgraph retrieval — load entity file + 1-hop neighbors (linked entities)
4. Context assembly — serialize relevant subgraph into prompt context
5. Generation — LLM answers with full entity context
Key optimization: Pre-computed entity summaries (maintained on write, not computed on read) make retrieval fast and token-efficient.
The knowledge system maintains itself:
| Frequency | Task | Algorithm |
|-----------|------|-----------|
| Per-ingestion | Dedup check | Alias lookup + fuzzy match |
| Daily | Orphan detection | Degree-0 scan |
| Weekly | Centrality analysis | PageRank + betweenness |
| Weekly | Community detection | Louvain |
| Weekly | Staleness check | Temporal scan (>30 days unchanged) |
| Monthly | Full consistency check | Schema validation + conflict detection |
---
COSMO's .brain files represent knowledge as structured Markdown with metadata headers. Evaluating against our framework:
Strengths:
Gaps identified:
Proposed improvements:
1. Add aliases: field to frontmatter for fuzzy entity resolution
2. Add source: and confidence: to fact-level metadata
3. Weekly cron job computing PageRank over the .brain graph, surfacing maintenance priorities
---
Axiom's three-layer memory system (Knowledge Graph → Daily Notes → Tacit Knowledge) maps well to our architecture:
| Layer | Role in Framework |
|-------|-------------------|
| ~/life/areas/ entities | Storage (property graph nodes) |
| memory/YYYY-MM-DD.md | Ingestion log (raw observations) |
| memory/MEMORY.md | Tacit patterns (meta-knowledge) |
Strengths:
Gaps identified:
Proposed improvements:
1. Automated fact extraction pipeline: daily note → LLM extraction → entity update (with provenance)
2. Build a reverse index: entity_mentions.json mapping entity paths → list of daily note dates
3. Add ## Provenance section to entity files tracking source + date for each fact group
---
A knowledge graph that truly learns has four properties:
1. Self-extending: Automatically discovers and ingests new entities and relations from agent operations
2. Self-correcting: Detects conflicts, resolves them via provenance, flags ambiguous cases
3. Self-organizing: Community detection and centrality analysis drive structural improvements
4. Self-pruning: Temporal decay flags stale knowledge; low-confidence orphan facts get archived
Phase 1 (Current): File-based KG + manual fact extraction + git versioning ✅
Phase 2 (Next): Add alias registry + automated fact extraction from daily notes + provenance tracking
Phase 3 (Scale): SQLite index for fast queries + reverse index + automated centrality analysis
Phase 4 (Advanced): Graph embeddings for semantic similarity + automated conflict resolution + community-driven reorganization
---
Knowledge graphs for AI assistants must be alive — continuously growing, self-maintaining, and queryable in real-time. The graph algorithms studied in this curriculum (traversal, centrality, community detection, connected components) are not abstract theory; they are the maintenance operations that keep a living knowledge system healthy.
The file-based approach used by Axiom and COSMO is the right choice at current scale, providing human readability, git versioning, and LLM-friendly context injection. The path forward is not to replace this foundation but to layer automated maintenance on top: fact extraction, entity resolution, provenance tracking, and structural analysis.
A knowledge graph that learns is not a database that gets bigger. It is a system that gets better — more accurate, better organized, and more useful — with every interaction.
---
1. Unit 1: Graph Representations — foundation for storage decisions
2. Unit 2: Traversal and Pathfinding — basis for graph queries and reachability
3. Unit 3: Graph Structure Analysis — centrality and community detection for maintenance
4. Unit 4: Knowledge Representation — ontology design patterns
5. Unit 5: KG Reasoning — inference and temporal knowledge
6. Unit 6: Applied KG Architecture — production patterns and RAG integration
Score: Self-assessed 91/100 — Strong synthesis of theory with practical case studies; concrete improvement proposals grounded in studied algorithms; clear roadmap. Minor gap: limited treatment of multi-agent knowledge sharing (multiple assistants contributing to one KG).