---
name: qmd-memory-search
description: >
  Search Rob's memory system using qmd — a local hybrid search engine combining BM25, vector embeddings,
  and LLM reranking. Covers the indexed collection, context configuration, search modes, GPU/CPU behavior,
  reindexing, and integration patterns. Use this skill when searching or querying Rob's memory via qmd.
compatibility: Created for Zo Computer
metadata:
  author: rob.zo.computer
---

## Prerequisites

This skill builds on top of the `zo-memory` skill (`Skills/zo-memory/SKILL.md`), which documents
the memory directory structure, update mechanism, and schema principles. Read that first if you
need to understand what's being indexed.

## What is qmd

[qmd](https://github.com/tobi/qmd) is a local search engine for markdown documents. It combines:

- **BM25 full-text search** (SQLite FTS5)
- **Vector semantic search** (embeddinggemma-300M, GGUF)
- **LLM reranking** (qwen3-reranker-0.6b, GGUF)
- **Query expansion** (qmd-query-expansion-1.7B, fine-tuned GGUF)

All models run locally via node-llama-cpp. No external API calls.

## Installation

Installed from source at `tools/qmd/`:

```bash
cd tools/qmd && npm install && npm run build && npm link
```

Binary is available globally as `qmd`.

### Models

Three GGUF models are auto-downloaded on first use to `~/.cache/qmd/models/`:

| Model | Purpose | Size |
|-------|---------|------|
| embeddinggemma-300M-Q8_0 | Vector embeddings | ~300MB |
| qwen3-reranker-0.6b-q8_0 | Reranking | ~640MB |
| qmd-query-expansion-1.7B-q4_k_m | Query expansion | ~1.1GB |

## Collection Setup

One collection indexes the entire memory directory:

```
Collection: memory
Path:       /home/workspace/memory
Pattern:    **/*.md
```

### Contexts

qmd contexts provide descriptive metadata that gets returned with search results, helping LLMs
make better contextual choices. Current configuration:

| Path | Description |
|------|-------------|
| `*` (global) | Personal knowledge base for Rob's Zo AI assistant |
| `qmd://memory/` (root) | Full memory system description |
| `qmd://memory/identity` | Core identity: personality, preferences, work style |
| `qmd://memory/goals` | Active goals and aspirations |
| `qmd://memory/projects` | Projects — technical details and status |
| `qmd://memory/interests` | Intellectual interests, hobbies, research threads |
| `qmd://memory/knowledge` | Insights, lessons learned, experiences |
| `qmd://memory/people` | People, relationships, relevant details |

**When the memory directory structure changes** (new categories added, directories renamed), update
contexts to match:

```bash
qmd context add qmd://memory/<new-category> "Description of what this category contains"
qmd context list   # verify
```

## Search Modes

Three modes, in increasing quality (and cost):

### `qmd search` — BM25 keyword search
```bash
qmd search "fundraise series A" -n 5
```
Fast. Good for exact terms and known phrases. No GPU needed.

### `qmd vsearch` — Vector semantic search
```bash
qmd vsearch "how does rob approach new projects" -n 5
```
Finds conceptually related content even without keyword overlap. Requires embeddings.

### `qmd query` — Hybrid with reranking (best quality)
```bash
qmd query "what motivates rob" -n 5
```
Full pipeline: query expansion → parallel BM25 + vector → RRF fusion → LLM reranking.
Uses all three models. Best results, ~2-5s on GPU.

### Useful flags

```bash
-n <num>            # number of results (default 5)
-c memory           # restrict to memory collection (only one, so optional)
--json              # structured JSON output (good for piping to scripts)
--full              # include full document content
--files             # output as docid,score,filepath,context
--min-score <num>   # minimum score threshold
--all               # return all matches (combine with --min-score)
```

### Score interpretation

| Score | Meaning |
|-------|---------|
| 0.8–1.0 | Highly relevant |
| 0.5–0.8 | Moderately relevant |
| 0.2–0.5 | Somewhat relevant |
| < 0.2 | Low relevance |

## GPU vs CPU

qmd works with or without a GPU. The behavior depends on what hardware is active.

### With GPU (e.g. H100)

All three models run on CUDA. `qmd query` completes in ~2-5s.

**Requirements for CUDA:**
- CUDA toolkit must be installed (`nvcc` in PATH)
- node-llama-cpp must be built with CUDA support:
  ```bash
  export PATH=/usr/local/cuda-12.8/bin:$PATH
  cd /home/workspace/tools/qmd
  npx --no-install node-llama-cpp source build --gpu cuda
  ```
- On this machine, CUDA 12.8 is installed at `/usr/local/cuda-12.8/`

Verify with `qmd status` — look for `GPU: cuda (offloading: yes)`.

### Without GPU (CPU fallback)

node-llama-cpp falls back to CPU automatically. All commands still work, just slower:
- `qmd search` — unaffected (BM25 is pure SQLite)
- `qmd vsearch` — slower embedding inference but still functional
- `qmd query` — noticeably slower due to reranking + query expansion

**If you switch away from GPU hardware**, node-llama-cpp will detect the missing CUDA and fall
back to its CPU build. No reconfiguration needed — it handles this transparently.

**If you switch back to GPU**, you may need to rebuild:
```bash
export PATH=/usr/local/cuda-12.8/bin:$PATH
cd /home/workspace/tools/qmd
npx --no-install node-llama-cpp source build --gpu cuda
```

### Practical guidance

- For quick keyword lookups (`qmd search`), CPU is fine — always fast
- For semantic queries when no GPU is available, `qmd vsearch` is the best tradeoff
- `qmd query` (full hybrid) is best reserved for when a GPU is active, or when you need
  the highest quality results and can tolerate a few extra seconds

## Reindexing

After memory files are added, edited, or restructured:

```bash
qmd update          # re-scan files, update FTS index
qmd embed           # regenerate vector embeddings for changed docs
qmd embed -f        # force re-embed everything (after major restructuring)
```

The memory `_update.py` pipeline (see `Skills/zo-memory/SKILL.md`) modifies files asynchronously.
If you need fresh search results immediately after a memory update, run `qmd update && qmd embed`.

## MCP Server

qmd exposes an MCP server for tighter integration:

```bash
qmd mcp              # stdio transport (subprocess per client)
qmd mcp --http       # HTTP transport at localhost:8181 (shared, long-lived)
qmd mcp --http --daemon  # background daemon
```

Tools exposed: `qmd_search`, `qmd_vector_search`, `qmd_deep_search`, `qmd_get`, `qmd_multi_get`,
`qmd_status`.

## Data Storage

- Index: `~/.cache/qmd/index.sqlite`
- Models: `~/.cache/qmd/models/`
- Source code: `tools/qmd/`

## Example Usage from Zo

When a question about Rob's background, preferences, or history comes up and context is unclear:

```bash
# Quick keyword check
qmd search "pharmacogenomics" -n 3

# Semantic question
qmd vsearch "what is rob's communication style" -n 3

# Best quality answer
qmd query "what projects is rob actively working on" --json -n 5
```

When integrating with scripts or agentic workflows:

```bash
# Get structured results
qmd query "rob's fundraising strategy" --json -n 10

# Get full document content for a known file
qmd get "memory/goals/active.md" --full

# All results above a threshold
qmd search "stripe" --all --files --min-score 0.3
```
