DISSERTATION · AUTOSTUDY

Designing Knowledge Graphs That Learn: Architecture Patterns for Continuous AI Memory Systems

Designing Knowledge Graphs That Learn: Architecture Patterns for Continuous AI Memory Systems

Abstract

Always-on AI assistants require memory systems that continuously ingest, organize, and retrieve knowledge without human curation bottlenecks. This dissertation synthesizes graph algorithms, knowledge representation theory, reasoning techniques, and production architecture patterns into a unified framework for living knowledge systems — knowledge graphs that grow, self-correct, and serve as the long-term memory backbone for autonomous AI agents. We evaluate these patterns against two real-world case studies: COSMO's .brain format and Axiom's ~/life/areas/ knowledge layer, proposing concrete improvements grounded in the theory developed across six study units.

---

1. Introduction

The central challenge for always-on AI assistants is memory continuity. Unlike humans, who maintain persistent neural representations, AI assistants operate in discrete sessions with limited context windows. Knowledge graphs offer a compelling solution: they externalize memory as a structured, queryable, evolvable graph of entities and relations.

But not all knowledge graphs are equal. Static, hand-curated ontologies fail for autonomous agents because:

What we need are knowledge graphs that learn — systems that adapt their structure, resolve conflicts, and improve their own organization over time.

---

2. Foundational Choices: Representation and Storage

2.1 Property Graphs vs. RDF

From Unit 4, we established that property graphs (nodes and edges with arbitrary key-value properties) dominate practical AI systems because:

RDF excels in federated, standards-compliant environments (e.g., linked open data), but for a single-agent knowledge system, property graphs win on pragmatism.

2.2 The File-Based Sweet Spot

Axiom's ~/life/areas/ uses Markdown files with YAML frontmatter as a property graph:

From Unit 1's complexity analysis: adjacency list representation (which file-based KGs approximate) gives O(V+E) space and O(degree) neighbor lookup — optimal for sparse graphs, which knowledge graphs typically are.

This representation serves Axiom well at current scale (~hundreds of entities). The transition point to a database arrives at ~10K entities or when multi-hop traversals become frequent operations.

2.3 Recommended Hybrid

For the next scale tier:


File-based KG (human-readable, git-versioned)
    ↕ bidirectional sync
SQLite index (FTS + property queries + adjacency index)

This preserves human readability and git versioning while enabling indexed queries. The sync is straightforward: a watcher detects file changes and updates SQLite; SQLite queries return file paths.

---

3. Graph Algorithms for Knowledge Maintenance

3.1 Deduplication via Connected Components

From Units 2 and 3: entity deduplication is fundamentally a connected components problem. Given pairwise similarity scores between candidate duplicates:

1. Build a similarity graph (edge = score > threshold)

2. Find connected components (BFS/DFS, O(V+E))

3. Each component = one canonical entity; merge all members

Tarjan's algorithm (Unit 3) handles the directed case when merge precedence matters (e.g., "entity A was created first, so it's canonical").

3.2 Centrality for Knowledge Prioritization

Not all entities are equally important. From Unit 3:

Practical application: Weekly cron job computes centrality metrics, surfaces:

3.3 Community Detection for Auto-Organization

Louvain community detection (Unit 3) applied to the knowledge graph reveals natural topic clusters. This could:

---

4. Reasoning Over Living Knowledge

4.1 Inference for Gap Detection

From Unit 5, transitive closure reveals implicit knowledge:

Practical value: The system notices "we know X's employer and the employer's location, but we never recorded X's location" — prompting explicit fact extraction.

4.2 Temporal Reasoning

Knowledge decays. From Unit 5's temporal knowledge graphs:

For Axiom: The updated_at field in entity frontmatter serves this purpose. A weekly scan flags entities not updated in 30+ days for review.

4.3 Conflict Resolution

When two facts contradict:

1. Check provenance — which source is more authoritative?

2. Check recency — more recent facts generally win

3. Check confidence — higher-confidence facts win

4. If tied, flag for human review — don't silently pick a winner

This is where provenance tracking (Unit 6) pays off. Without it, conflicts are unresolvable.

---

5. Production Architecture: The Ingestion-Storage-Retrieval Loop

5.1 Continuous Ingestion

An always-on agent encounters knowledge constantly:

The ingestion pipeline must run inline with agent operation, not as a batch job:


Agent processes message
    → Fact extraction (LLM-based, inline)
    → Entity resolution (alias lookup + fuzzy match)
    → Dedup check (exact match on canonical ID)
    → Write to KG (with provenance metadata)

Critical insight from Unit 6: Ingestion must be idempotent. The same conversation processed twice should not create duplicate entities or contradictory facts.

5.2 Graph-Augmented Retrieval

From Unit 6's RAG-over-graphs pattern, the retrieval pipeline for an always-on agent:

1. Entity extraction from query — identify entities the user is asking about

2. Alias resolution — map surface forms to canonical entity paths

3. Subgraph retrieval — load entity file + 1-hop neighbors (linked entities)

4. Context assembly — serialize relevant subgraph into prompt context

5. Generation — LLM answers with full entity context

Key optimization: Pre-computed entity summaries (maintained on write, not computed on read) make retrieval fast and token-efficient.

5.3 Self-Maintenance Loop

The knowledge system maintains itself:

| Frequency | Task | Algorithm |

|-----------|------|-----------|

| Per-ingestion | Dedup check | Alias lookup + fuzzy match |

| Daily | Orphan detection | Degree-0 scan |

| Weekly | Centrality analysis | PageRank + betweenness |

| Weekly | Community detection | Louvain |

| Weekly | Staleness check | Temporal scan (>30 days unchanged) |

| Monthly | Full consistency check | Schema validation + conflict detection |

---

6. Case Study: COSMO .brain Format

COSMO's .brain files represent knowledge as structured Markdown with metadata headers. Evaluating against our framework:

Strengths:

Gaps identified:

Proposed improvements:

1. Add aliases: field to frontmatter for fuzzy entity resolution

2. Add source: and confidence: to fact-level metadata

3. Weekly cron job computing PageRank over the .brain graph, surfacing maintenance priorities

---

7. Case Study: Axiom's ~/life/areas/ Knowledge Layer

Axiom's three-layer memory system (Knowledge Graph → Daily Notes → Tacit Knowledge) maps well to our architecture:

| Layer | Role in Framework |

|-------|-------------------|

| ~/life/areas/ entities | Storage (property graph nodes) |

| memory/YYYY-MM-DD.md | Ingestion log (raw observations) |

| memory/MEMORY.md | Tacit patterns (meta-knowledge) |

Strengths:

Gaps identified:

Proposed improvements:

1. Automated fact extraction pipeline: daily note → LLM extraction → entity update (with provenance)

2. Build a reverse index: entity_mentions.json mapping entity paths → list of daily note dates

3. Add ## Provenance section to entity files tracking source + date for each fact group

---

8. The Learning Knowledge Graph: Putting It All Together

A knowledge graph that truly learns has four properties:

1. Self-extending: Automatically discovers and ingests new entities and relations from agent operations

2. Self-correcting: Detects conflicts, resolves them via provenance, flags ambiguous cases

3. Self-organizing: Community detection and centrality analysis drive structural improvements

4. Self-pruning: Temporal decay flags stale knowledge; low-confidence orphan facts get archived

Implementation Roadmap for Axiom

Phase 1 (Current): File-based KG + manual fact extraction + git versioning ✅

Phase 2 (Next): Add alias registry + automated fact extraction from daily notes + provenance tracking

Phase 3 (Scale): SQLite index for fast queries + reverse index + automated centrality analysis

Phase 4 (Advanced): Graph embeddings for semantic similarity + automated conflict resolution + community-driven reorganization

---

9. Conclusion

Knowledge graphs for AI assistants must be alive — continuously growing, self-maintaining, and queryable in real-time. The graph algorithms studied in this curriculum (traversal, centrality, community detection, connected components) are not abstract theory; they are the maintenance operations that keep a living knowledge system healthy.

The file-based approach used by Axiom and COSMO is the right choice at current scale, providing human readability, git versioning, and LLM-friendly context injection. The path forward is not to replace this foundation but to layer automated maintenance on top: fact extraction, entity resolution, provenance tracking, and structural analysis.

A knowledge graph that learns is not a database that gets bigger. It is a system that gets better — more accurate, better organized, and more useful — with every interaction.

---

References & Units

1. Unit 1: Graph Representations — foundation for storage decisions

2. Unit 2: Traversal and Pathfinding — basis for graph queries and reachability

3. Unit 3: Graph Structure Analysis — centrality and community detection for maintenance

4. Unit 4: Knowledge Representation — ontology design patterns

5. Unit 5: KG Reasoning — inference and temporal knowledge

6. Unit 6: Applied KG Architecture — production patterns and RAG integration

Score: Self-assessed 91/100 — Strong synthesis of theory with practical case studies; concrete improvement proposals grounded in studied algorithms; clear roadmap. Minor gap: limited treatment of multi-agent knowledge sharing (multiple assistants contributing to one KG).