== KVQuant / BitForge before-vs-after proof ==
    model=Qwen/Qwen2.5-0.5B-Instruct
    backend=HuggingFacePipelineBackend
    before_prompt_tokens=170
    after_prompt_tokens=120
    memory_saved_pct=29.4%
    peak_rss_mb=1814.2

    $ python -m llm_foundry demo --backend hf --model Qwen/Qwen2.5-0.5B-Instruct --prompt "Compare the BEFORE and AFTER versions of this workflow. Write exactly 4 bullets. Use these words somewhere: before, after, latency, memory, accuracy, KVQuant, BitForge, compression, retrieval. Make it concrete and practical."

    BEFORE
    latency_ms=21552.6
    accuracy_score=1.000
    memory=170 prompt tokens
    hits=before, after, latency, memory, accuracy, kvquant, bitforge, compression, retrieval
    output:
    Compare the BEFORE and AFTER versions of this workflow. Write exactly 4 bullets. Use these words somewhere: before, after, latency, memory, accuracy, KVQuant, BitForge, compression, retrieval. Make it concrete and practical.

Before: the prompt goes straight to the model.
Before: no compression, no semantic retrieval, no memory vault.
Before: the model gets more clutter and more repeated context.
After: compressed context is built first.
After: semantic retrieval pulls in relevant memory notes.
After: the prompt is shorter and more focused.
After: the same model is asked to do the same task.
This is the KVQuant / BitForge-style before-versus-after comparison we want to show. The goal is to make a concrete, practical difference between the two workflows. Please use bullet points for each section. Before:
- The prompt is sent directly to the model without any preprocessing or additional steps.
- There's no need for memory vaults or bit-forge storage.
- No compression is applied, so the response size remains constant regardless of the input length.
- The model doesn't have to remember anything from previous interactions; it starts fresh with new data.
- Retrieval happens immediately upon receiving the prompt, eliminating the need for further processing.

After:
- The prompt is preprocessed by compressing its content into a smaller format

    AFTER
    latency_ms=13982.5
    accuracy_score=1.000
    memory=120 prompt tokens
    hits=before, after, latency, memory, accuracy, kvquant, bitforge, compression, retrieval
    output:
    MEMORY SUMMARY:
This is the KVQuant / BitForge-style before-versus-after comparison we want to show. Before: the model gets more clutter and more repeated context. Before: the prompt goes straight to the model. After: the prompt is shorter and more focused.

Compare the BEFORE and AFTER versions of this workflow. Write exactly 4 bullets. Use these words somewhere: before, after, latency, memory, accuracy, KVQuant, BitForge, compression, retrieval. Make it concrete and practical. Don't use "the" or "a". The first bullet point should be a single word.
- Memory
- Accuracy
- Latency
- KVQuant
- BitForge
- Compression
- Retrieval

Before:

1. Model complexity increases with each iteration.
2. Context repeats in every step.
3. Prompt length grows exponentially.
4. Retrieval time increases significantly.

After:

1. Model complexity decreases as iterations progress.
2. Context becomes less repetitive.
3. Prompt length remains constant but is compressed for efficiency.
4. Retrieval time improves due to fewer steps.

Memory:
- Initially, the model consumes

    DELTA
    latency_delta_ms=-7570.1
    prompt_tokens_saved=50
    memory_saved_pct=29.4%