---
name: refresh-apple-health
description: Skill for refreshing Apple Health data export, rebuilding the DuckDB database, and regenerating all downstream health visualizations and notebooks in rc-data.
compatibility: Created for Zo Computer
metadata:
  author: rob.zo.computer
---

# Refresh Apple Health Data

This skill handles the complete workflow for updating Apple Health data when you get a fresh export from your phone.

## Motivation

Apple Health exports are generated periodically from your iPhone (typically via the Health app: Health → Profile → Export All Health Data). Each export contains all your historical health data up to that point. To keep the analysis notebooks up to date, you need to:

1. **Rebuild the database** — The ingest script transforms the XML export into a queryable DuckDB database
2. **Regenerate visualizations** — The rc-data notebooks query DuckDB and generate static JSON files that are bundled with the site

This skill automates both steps, ensuring fresh data flows from your iPhone to all health analysis notebooks.

---

## Prerequisites

- A new Apple Health export zip file (from iPhone)
- The dataset directory exists at `zo-data/apple_health/`
- The rc-data site exists at `rc-data/`

---

## Workflow

### Step 1: Replace the source zip

Copy the newest export zip into the dataset source location:

```bash
cp "/path/to/new-export.zip" /home/workspace/zo-data/apple_health/source/export.zip
```

### Step 2: Clear and re-extract the export

The dataset expects the extracted XML under `zo-data/apple_health/source/extracted/apple_health_export/`.

```bash
rm -rf /home/workspace/zo-data/apple_health/source/extracted
mkdir -p /home/workspace/zo-data/apple_health/source/extracted
unzip -o /home/workspace/zo-data/apple_health/source/export.zip -d /home/workspace/zo-data/apple_health/source/extracted
```

The export structure typically contains:

```text
apple_health_export/
├── export.xml
├── export_cda.xml
└── workout-routes/
    └── route_*.gpx
```

### Step 3: Rebuild the DuckDB database

Run the ingest script from anywhere using its absolute path:

```bash
python3 /home/workspace/zo-data/apple_health/ingest/ingest.py
```

Notes:
- The script rebuilds `/home/workspace/zo-data/apple_health/data.duckdb` from scratch on every run.
- Large exports may take a few minutes.

**Expected output:**

```text
Parsing /home/workspace/zo-data/apple_health/source/extracted/apple_health_export/export.xml...
Extracting records...
  Found X quantity records
  Found Y sleep records
Extracting workouts...
  Found Z workouts
Extracting activity summaries...
  Found N activity summaries
Creating records table...
Creating sleep table...
Creating workouts table...
Creating activity_summaries table...
Created /home/workspace/zo-data/apple_health/data.duckdb
```

### Step 4: Regenerate rc-data health JSON

```bash
python3 /home/workspace/rc-data/src/data/health/overview.py
```

This reads the refreshed DuckDB database and rewrites:
- `rc-data/src/data/health/overview.json`

**Expected output:**

```text
Connecting to /home/workspace/zo-data/apple_health/data.duckdb...
Latest data date: YYYY-MM-DD
Generating overview data (7 days)...
Generating overview data (14 days)...
Generating overview data (30 days)...
Generating overview data (365 days)...
Generating sleep stages data (7 days)...
Generating sleep stages data (14 days)...
Generating sleep stages data (30 days)...
Done! Wrote /home/workspace/rc-data/src/data/health/overview.json
```

### Step 5: Verify the refresh

Quick checks:

```bash
duckdb /home/workspace/zo-data/apple_health/data.duckdb -c "SHOW TABLES"
python3 - <<'PY'
import json
from pathlib import Path
p = Path('/home/workspace/rc-data/src/data/health/overview.json')
data = json.loads(p.read_text())
print(data['latestDataDate'])
print(data['generatedAt'])
PY
```

Success means:
- `data.duckdb` contains `records`, `sleep`, `workouts`, and `activity_summaries`
- `overview.json` has a fresh `generatedAt`
- `latestDataDate` matches the new export's most recent day

---

## Database Schema

### `records` Table

| Column | Type | Description |
|--------|------|-------------|
| type | VARCHAR | Measurement type (step_count, heart_rate, etc.) |
| source_name | VARCHAR | Device/app that recorded it |
| unit | VARCHAR | Unit of measurement |
| value | DOUBLE | The numeric value |
| start_date | TIMESTAMP WITH TIME ZONE | When measurement started |
| end_date | TIMESTAMP WITH TIME ZONE | When measurement ended |
| creation_date | TIMESTAMP WITH TIME ZONE | When record was created |

Common record types:
- `step_count` — Steps taken
- `distance_walking_running` — Distance in meters
- `flights_climbed` — Floors climbed
- `active_energy_burned` — Calories burned
- `resting_heart_rate` — Heart rate at rest
- `heart_rate` — Real-time heart rate samples

### `sleep` Table

| Column | Type | Description |
|--------|------|-------------|
| stage | VARCHAR | sleep stage (core, deep, rem, awake, asleep, in_bed) |
| source_name | VARCHAR | Device (Eight Sleep, Connect, iPhone) |
| start_date | TIMESTAMP WITH TIME ZONE | Sleep stage start |
| end_date | TIMESTAMP WITH TIME ZONE | Sleep stage end |
| hours | DOUBLE | Duration in hours |

Sleep stages hierarchy:
- `core` — Light sleep
- `deep` — Deep/slow-wave sleep
- `rem` — Rapid eye movement sleep
- `awake` — Periods of wakefulness
- `asleep` — Unspecified sleep
- `in_bed` — Time in bed before/asleep/after sleep

### `workouts` Table

| Column | Type | Description |
|--------|------|-------------|
| activity_type | VARCHAR | Type of workout (running, walking, swimming, etc.) |
| duration | DOUBLE | Duration in minutes |
| total_distance | DOUBLE | Distance in meters |
| total_energy_burned | DOUBLE | Calories burned |
| source_name | VARCHAR | Device/app |
| start_date, end_date | TIMESTAMP | Workout timing |

### `activity_summaries` Table

Daily Apple Watch ring data (Move, Exercise, Stand).

---

## Troubleshooting

### Ingest script fails with XML parsing errors

- Ensure the export is fully extracted and `export.xml` exists
- Check that the export is a standard Apple Health export (not modified)
- Large exports may take several minutes to parse

### No sleep data in the database

- Newer iOS versions include sleep stage data (Core/Deep/REM)
- Older exports may only have basic InBed/Asleep entries
- Check the source XML for `HKCategoryTypeIdentifierSleepAnalysis` entries

### rc-data site shows old data

- Confirm `overview.py` ran successfully
- Clear browser cache if needed (files are served from build)
- Check the `generatedAt` timestamp and `latestDataDate` in the JSON file

### Missing record types

Apple Health records depend on which apps and devices are connected. Make sure:
- Apple Watch is paired and syncing
- Health app has permission to read from connected apps
- The export includes data from the full date range needed

---

## Related Files

| File | Purpose |
|------|---------|
| `zo-data/apple_health/ingest/ingest.py` | Ingest script that rebuilds the database |
| `rc-data/src/data/health/overview.py` | Generates JSON for health notebooks |
| `rc-data/src/pages/health/Overview.tsx` | Health overview notebook page |
| `Skills/personal-data/SKILL.md` | Overall personal data documentation |

---

## Future Enhancements

Potential additions to this skill:
- Auto-detect new exports in a watched directory
- Add sleep analysis notebooks (sleep quality, trends)
- Add heart rate variability analysis
- Add workout performance tracking
- Integration with other health datasets (Function Health, 23andMe)