--- name: qmd-memory-search description: > Search Rob's memory system using qmd — a local hybrid search engine combining BM25, vector embeddings, and LLM reranking. Covers the indexed collection, context configuration, search modes, GPU/CPU behavior, reindexing, and integration patterns. Use this skill when searching or querying Rob's memory via qmd. compatibility: Created for Zo Computer metadata: author: rob.zo.computer --- ## Prerequisites This skill builds on top of the `zo-memory` skill (`Skills/zo-memory/SKILL.md`), which documents the memory directory structure, update mechanism, and schema principles. Read that first if you need to understand what's being indexed. ## What is qmd [qmd](https://github.com/tobi/qmd) is a local search engine for markdown documents. It combines: - **BM25 full-text search** (SQLite FTS5) - **Vector semantic search** (embeddinggemma-300M, GGUF) - **LLM reranking** (qwen3-reranker-0.6b, GGUF) - **Query expansion** (qmd-query-expansion-1.7B, fine-tuned GGUF) All models run locally via node-llama-cpp. No external API calls. ## Installation Installed from source at `tools/qmd/`: ```bash cd tools/qmd && npm install && npm run build && npm link ``` Binary is available globally as `qmd`. ### Models Three GGUF models are auto-downloaded on first use to `~/.cache/qmd/models/`: | Model | Purpose | Size | |-------|---------|------| | embeddinggemma-300M-Q8_0 | Vector embeddings | ~300MB | | qwen3-reranker-0.6b-q8_0 | Reranking | ~640MB | | qmd-query-expansion-1.7B-q4_k_m | Query expansion | ~1.1GB | ## Collection Setup One collection indexes the entire memory directory: ``` Collection: memory Path: /home/workspace/memory Pattern: **/*.md ``` ### Contexts qmd contexts provide descriptive metadata that gets returned with search results, helping LLMs make better contextual choices. Current configuration: | Path | Description | |------|-------------| | `*` (global) | Personal knowledge base for Rob's Zo AI assistant | | `qmd://memory/` (root) | Full memory system description | | `qmd://memory/identity` | Core identity: personality, preferences, work style | | `qmd://memory/goals` | Active goals and aspirations | | `qmd://memory/projects` | Projects — technical details and status | | `qmd://memory/interests` | Intellectual interests, hobbies, research threads | | `qmd://memory/knowledge` | Insights, lessons learned, experiences | | `qmd://memory/people` | People, relationships, relevant details | **When the memory directory structure changes** (new categories added, directories renamed), update contexts to match: ```bash qmd context add qmd://memory/ "Description of what this category contains" qmd context list # verify ``` ## Search Modes Three modes, in increasing quality (and cost): ### `qmd search` — BM25 keyword search ```bash qmd search "fundraise series A" -n 5 ``` Fast. Good for exact terms and known phrases. No GPU needed. ### `qmd vsearch` — Vector semantic search ```bash qmd vsearch "how does rob approach new projects" -n 5 ``` Finds conceptually related content even without keyword overlap. Requires embeddings. ### `qmd query` — Hybrid with reranking (best quality) ```bash qmd query "what motivates rob" -n 5 ``` Full pipeline: query expansion → parallel BM25 + vector → RRF fusion → LLM reranking. Uses all three models. Best results, ~2-5s on GPU. ### Useful flags ```bash -n # number of results (default 5) -c memory # restrict to memory collection (only one, so optional) --json # structured JSON output (good for piping to scripts) --full # include full document content --files # output as docid,score,filepath,context --min-score # minimum score threshold --all # return all matches (combine with --min-score) ``` ### Score interpretation | Score | Meaning | |-------|---------| | 0.8–1.0 | Highly relevant | | 0.5–0.8 | Moderately relevant | | 0.2–0.5 | Somewhat relevant | | < 0.2 | Low relevance | ## GPU vs CPU qmd works with or without a GPU. The behavior depends on what hardware is active. ### With GPU (e.g. H100) All three models run on CUDA. `qmd query` completes in ~2-5s. **Requirements for CUDA:** - CUDA toolkit must be installed (`nvcc` in PATH) - node-llama-cpp must be built with CUDA support: ```bash export PATH=/usr/local/cuda-12.8/bin:$PATH cd /home/workspace/tools/qmd npx --no-install node-llama-cpp source build --gpu cuda ``` - On this machine, CUDA 12.8 is installed at `/usr/local/cuda-12.8/` Verify with `qmd status` — look for `GPU: cuda (offloading: yes)`. ### Without GPU (CPU fallback) node-llama-cpp falls back to CPU automatically. All commands still work, just slower: - `qmd search` — unaffected (BM25 is pure SQLite) - `qmd vsearch` — slower embedding inference but still functional - `qmd query` — noticeably slower due to reranking + query expansion **If you switch away from GPU hardware**, node-llama-cpp will detect the missing CUDA and fall back to its CPU build. No reconfiguration needed — it handles this transparently. **If you switch back to GPU**, you may need to rebuild: ```bash export PATH=/usr/local/cuda-12.8/bin:$PATH cd /home/workspace/tools/qmd npx --no-install node-llama-cpp source build --gpu cuda ``` ### Practical guidance - For quick keyword lookups (`qmd search`), CPU is fine — always fast - For semantic queries when no GPU is available, `qmd vsearch` is the best tradeoff - `qmd query` (full hybrid) is best reserved for when a GPU is active, or when you need the highest quality results and can tolerate a few extra seconds ## Reindexing After memory files are added, edited, or restructured: ```bash qmd update # re-scan files, update FTS index qmd embed # regenerate vector embeddings for changed docs qmd embed -f # force re-embed everything (after major restructuring) ``` The memory `_update.py` pipeline (see `Skills/zo-memory/SKILL.md`) modifies files asynchronously. If you need fresh search results immediately after a memory update, run `qmd update && qmd embed`. ## MCP Server qmd exposes an MCP server for tighter integration: ```bash qmd mcp # stdio transport (subprocess per client) qmd mcp --http # HTTP transport at localhost:8181 (shared, long-lived) qmd mcp --http --daemon # background daemon ``` Tools exposed: `qmd_search`, `qmd_vector_search`, `qmd_deep_search`, `qmd_get`, `qmd_multi_get`, `qmd_status`. ## Data Storage - Index: `~/.cache/qmd/index.sqlite` - Models: `~/.cache/qmd/models/` - Source code: `tools/qmd/` ## Example Usage from Zo When a question about Rob's background, preferences, or history comes up and context is unclear: ```bash # Quick keyword check qmd search "pharmacogenomics" -n 3 # Semantic question qmd vsearch "what is rob's communication style" -n 3 # Best quality answer qmd query "what projects is rob actively working on" --json -n 5 ``` When integrating with scripts or agentic workflows: ```bash # Get structured results qmd query "rob's fundraising strategy" --json -n 10 # Get full document content for a known file qmd get "memory/goals/active.md" --full # All results above a threshold qmd search "stripe" --all --files --min-score 0.3 ```