Designing Knowledge Graphs That Learn: Architecture Patterns for Continuous AI Memory Systems

Abstract

Always-on AI assistants require memory systems that continuously ingest, organize, and retrieve knowledge without human curation bottlenecks. This dissertation synthesizes graph algorithms, knowledge representation theory, reasoning techniques, and production architecture patterns into a unified framework for living knowledge systems — knowledge graphs that grow, self-correct, and serve as the long-term memory backbone for autonomous AI agents. We evaluate these patterns against two real-world case studies: COSMO's .brain format and Axiom's ~/life/areas/ knowledge layer, proposing concrete improvements grounded in the theory developed across six study units.

---

1. Introduction

The central challenge for always-on AI assistants is memory continuity. Unlike humans, who maintain persistent neural representations, AI assistants operate in discrete sessions with limited context windows. Knowledge graphs offer a compelling solution: they externalize memory as a structured, queryable, evolvable graph of entities and relations.

But not all knowledge graphs are equal. Static, hand-curated ontologies fail for autonomous agents because:

The world changes faster than curators can update
Agents encounter novel entity types not in the original schema
Rigid schemas reject valid but unexpected facts

What we need are knowledge graphs that learn — systems that adapt their structure, resolve conflicts, and improve their own organization over time.

---

2. Foundational Choices: Representation and Storage

2.1 Property Graphs vs. RDF

From Unit 4, we established that property graphs (nodes and edges with arbitrary key-value properties) dominate practical AI systems because:

Flexible schema: New properties added without migration
Intuitive modeling: Closer to how humans think about entities
Simpler queries: Cypher/Gremlin vs. SPARQL's triple pattern matching

RDF excels in federated, standards-compliant environments (e.g., linked open data), but for a single-agent knowledge system, property graphs win on pragmatism.

2.2 The File-Based Sweet Spot

Axiom's ~/life/areas/ uses Markdown files with YAML frontmatter as a property graph:

Each file = a node (entity)
YAML properties = node attributes
Markdown links = edges
Git history = temporal versioning

From Unit 1's complexity analysis: adjacency list representation (which file-based KGs approximate) gives O(V+E) space and O(degree) neighbor lookup — optimal for sparse graphs, which knowledge graphs typically are.

This representation serves Axiom well at current scale (~hundreds of entities). The transition point to a database arrives at ~10K entities or when multi-hop traversals become frequent operations.

2.3 Recommended Hybrid

For the next scale tier:


File-based KG (human-readable, git-versioned)
    ↕ bidirectional sync
SQLite index (FTS + property queries + adjacency index)

This preserves human readability and git versioning while enabling indexed queries. The sync is straightforward: a watcher detects file changes and updates SQLite; SQLite queries return file paths.

---

3. Graph Algorithms for Knowledge Maintenance

3.1 Deduplication via Connected Components

From Units 2 and 3: entity deduplication is fundamentally a connected components problem. Given pairwise similarity scores between candidate duplicates:

1. Build a similarity graph (edge = score > threshold)

2. Find connected components (BFS/DFS, O(V+E))

3. Each component = one canonical entity; merge all members

Tarjan's algorithm (Unit 3) handles the directed case when merge precedence matters (e.g., "entity A was created first, so it's canonical").

3.2 Centrality for Knowledge Prioritization

Not all entities are equally important. From Unit 3:

PageRank identifies entities that many other entities reference → high-value knowledge nodes
Betweenness centrality finds entities that bridge different knowledge domains → cross-cutting concepts
Degree centrality flags over-connected entities that may need splitting (they're doing too many jobs)

Practical application: Weekly cron job computes centrality metrics, surfaces:

Top-10 most-referenced entities (ensure these are well-maintained)
Isolated entities with degree 0 (orphans — delete or connect)
High-betweenness entities (review for coherence — are they actually one concept?)

3.3 Community Detection for Auto-Organization

Louvain community detection (Unit 3) applied to the knowledge graph reveals natural topic clusters. This could:

Auto-generate directory structure (each community = a folder)
Detect when an entity is miscategorized (belongs to a different community than its directory)
Suggest knowledge areas that need expansion (small communities with high external connectivity)

---

4. Reasoning Over Living Knowledge

4.1 Inference for Gap Detection

From Unit 5, transitive closure reveals implicit knowledge:

If "Alice works-at Acme" and "Acme located-in NYC" → infer "Alice located-in NYC"
Running transitive closure periodically surfaces facts that should be explicit but aren't

Practical value: The system notices "we know X's employer and the employer's location, but we never recorded X's location" — prompting explicit fact extraction.

4.2 Temporal Reasoning

Knowledge decays. From Unit 5's temporal knowledge graphs:

Facts have valid_from and valid_until timestamps
Queries default to "current facts" but can time-travel
Decay function: facts older than N days without reinforcement get flagged for verification

For Axiom: The updated_at field in entity frontmatter serves this purpose. A weekly scan flags entities not updated in 30+ days for review.

4.3 Conflict Resolution

When two facts contradict:

1. Check provenance — which source is more authoritative?

2. Check recency — more recent facts generally win

3. Check confidence — higher-confidence facts win

4. If tied, flag for human review — don't silently pick a winner

This is where provenance tracking (Unit 6) pays off. Without it, conflicts are unresolvable.

---

5. Production Architecture: The Ingestion-Storage-Retrieval Loop

5.1 Continuous Ingestion

An always-on agent encounters knowledge constantly:

Conversations with the user
Documents read during tasks
Web searches and API responses
Observations about the environment

The ingestion pipeline must run inline with agent operation, not as a batch job:


Agent processes message
    → Fact extraction (LLM-based, inline)
    → Entity resolution (alias lookup + fuzzy match)
    → Dedup check (exact match on canonical ID)
    → Write to KG (with provenance metadata)

Critical insight from Unit 6: Ingestion must be idempotent. The same conversation processed twice should not create duplicate entities or contradictory facts.

5.2 Graph-Augmented Retrieval

From Unit 6's RAG-over-graphs pattern, the retrieval pipeline for an always-on agent:

1. Entity extraction from query — identify entities the user is asking about

2. Alias resolution — map surface forms to canonical entity paths

3. Subgraph retrieval — load entity file + 1-hop neighbors (linked entities)

4. Context assembly — serialize relevant subgraph into prompt context

5. Generation — LLM answers with full entity context

Key optimization: Pre-computed entity summaries (maintained on write, not computed on read) make retrieval fast and token-efficient.

5.3 Self-Maintenance Loop

The knowledge system maintains itself:

| Frequency | Task | Algorithm |

|-----------|------|-----------|

| Per-ingestion | Dedup check | Alias lookup + fuzzy match |

| Daily | Orphan detection | Degree-0 scan |

| Weekly | Centrality analysis | PageRank + betweenness |

| Weekly | Community detection | Louvain |

| Weekly | Staleness check | Temporal scan (>30 days unchanged) |

| Monthly | Full consistency check | Schema validation + conflict detection |

---

6. Case Study: COSMO .brain Format

COSMO's .brain files represent knowledge as structured Markdown with metadata headers. Evaluating against our framework:

Strengths:

Property graph model (flexible, extensible)
Human-readable (LLM-friendly for context injection)
Structured metadata enables programmatic queries

Gaps identified:

No explicit alias registry → entity resolution relies on exact name matching
No provenance on individual facts → conflict resolution is ad hoc
No automated centrality/community analysis → organization is manual

Proposed improvements:

1. Add aliases: field to frontmatter for fuzzy entity resolution

2. Add source: and confidence: to fact-level metadata

3. Weekly cron job computing PageRank over the .brain graph, surfacing maintenance priorities

---

7. Case Study: Axiom's ~/life/areas/ Knowledge Layer

Axiom's three-layer memory system (Knowledge Graph → Daily Notes → Tacit Knowledge) maps well to our architecture:

| Layer | Role in Framework |

|-------|-------------------|

| ~/life/areas/ entities | Storage (property graph nodes) |

| memory/YYYY-MM-DD.md | Ingestion log (raw observations) |

| memory/MEMORY.md | Tacit patterns (meta-knowledge) |

Strengths:

Git versioning provides temporal snapshots for free
Daily notes serve as an ingestion buffer before facts are promoted to entities
Three-layer separation prevents raw observations from polluting curated knowledge

Gaps identified:

Fact extraction from daily notes to entities is manual/semi-automated
No reverse index (given an entity, find all daily notes that mention it)
Entity files don't track which daily note was the source of each fact

Proposed improvements:

1. Automated fact extraction pipeline: daily note → LLM extraction → entity update (with provenance)

2. Build a reverse index: entity_mentions.json mapping entity paths → list of daily note dates

3. Add ## Provenance section to entity files tracking source + date for each fact group

---

8. The Learning Knowledge Graph: Putting It All Together

A knowledge graph that truly learns has four properties:

1. Self-extending: Automatically discovers and ingests new entities and relations from agent operations

2. Self-correcting: Detects conflicts, resolves them via provenance, flags ambiguous cases

3. Self-organizing: Community detection and centrality analysis drive structural improvements

4. Self-pruning: Temporal decay flags stale knowledge; low-confidence orphan facts get archived

Implementation Roadmap for Axiom

Phase 1 (Current): File-based KG + manual fact extraction + git versioning ✅

Phase 2 (Next): Add alias registry + automated fact extraction from daily notes + provenance tracking

Phase 3 (Scale): SQLite index for fast queries + reverse index + automated centrality analysis

Phase 4 (Advanced): Graph embeddings for semantic similarity + automated conflict resolution + community-driven reorganization

---

9. Conclusion

Knowledge graphs for AI assistants must be alive — continuously growing, self-maintaining, and queryable in real-time. The graph algorithms studied in this curriculum (traversal, centrality, community detection, connected components) are not abstract theory; they are the maintenance operations that keep a living knowledge system healthy.

The file-based approach used by Axiom and COSMO is the right choice at current scale, providing human readability, git versioning, and LLM-friendly context injection. The path forward is not to replace this foundation but to layer automated maintenance on top: fact extraction, entity resolution, provenance tracking, and structural analysis.

A knowledge graph that learns is not a database that gets bigger. It is a system that gets better — more accurate, better organized, and more useful — with every interaction.

---

References & Units

1. Unit 1: Graph Representations — foundation for storage decisions

2. Unit 2: Traversal and Pathfinding — basis for graph queries and reachability

3. Unit 3: Graph Structure Analysis — centrality and community detection for maintenance

4. Unit 4: Knowledge Representation — ontology design patterns

5. Unit 5: KG Reasoning — inference and temporal knowledge

6. Unit 6: Applied KG Architecture — production patterns and RAG integration

Score: Self-assessed 91/100 — Strong synthesis of theory with practical case studies; concrete improvement proposals grounded in studied algorithms; clear roadmap. Minor gap: limited treatment of multi-agent knowledge sharing (multiple assistants contributing to one KG).