# SNAP & WIC Database Schema

**Database File:** `snap-wic-data.db`  
**Format:** SQLite 3  
**Size:** ~92 KB  
**Tables:** 7

---

## Table Summaries

### 1. **snap_annual_national** (60 rows, 1969-2024)
National-level SNAP participation and costs by fiscal year.

**Columns:**
- `fiscal_year` (TEXT) - Fiscal year (1969-2024)
- `average_participation_thousands` (REAL) - Average monthly participants in thousands
- `average_benefit_per_person_1_dollars` (REAL) - Average monthly benefit per person in dollars
- `total_benefits_millions_of_dollars` (REAL) - Total benefits paid in millions
- `all_other_costs_2_millions_of_dollars` (REAL) - Administrative and other costs in millions
- `total_costs_millions_of_dollars` (REAL) - Total program costs in millions

**Use for:** Historical trends in SNAP participation and spending over 55+ years

**Sample query:**
```sql
SELECT fiscal_year, average_participation_thousands, 
       average_benefit_per_person_1_dollars
FROM snap_annual_national 
WHERE fiscal_year >= '2020'
ORDER BY fiscal_year;
```

---

### 2. **wic_annual_national** (55 rows, 1974-2024)
National-level WIC participation and costs by fiscal year since program inception.

**Columns:**
- `fiscal_year` (TEXT) - Fiscal year (1974-2024)
- `total_participation_thousands` (REAL) - Average monthly participants in thousands
- `food_costs_millions_of_dollars` (REAL) - Cost of WIC food packages in millions
- `nsa_costs_millionsof_dollars` (REAL) - Nutrition services and administration costs in millions
- `total_costs_1_millions_of_dollars` (REAL) - Total program costs in millions
- `average_monthly_food_cost_per_person_dollars` (REAL) - Average monthly food cost per participant

**Use for:** Long-term WIC trends, program growth/decline, cost per participant analysis

**Sample query:**
```sql
SELECT fiscal_year, total_participation_thousands,
       average_monthly_food_cost_per_person_dollars
FROM wic_annual_national
WHERE fiscal_year >= '2010'
ORDER BY fiscal_year;
```

---

### 3. **snap_monthly_state_fy2024** (121 rows)
State-level SNAP data for fiscal year 2024. Contains detailed monthly participation and cost data.

**Columns:**
- `snap_monthly_state_participation_and_benefit_summary_pebtother_excluded_public_data_fiscal_year_2024` (TEXT) - State name
- `col_1` through `col_5` - Monthly data columns (participation, benefits, costs by month)

**Note:** Column headers need additional parsing - they contain monthly metrics. This table structure reflects how USDA provides state-level data with months as column headers rather than rows.

**Use for:** State-by-state comparisons for FY2024, geographic analysis

**Sample query:**
```sql
SELECT * FROM snap_monthly_state_fy2024 
WHERE snap_monthly_state_participation_and_benefit_summary_pebtother_excluded_public_data_fiscal_year_2024 
LIKE '%New York%';
```

---

### 4. **snap_state_latest** (64 rows)
Most recent snapshot of SNAP participation by state (May 2025 vs May 2024).

**Columns:**
- `state_territory` (TEXT) - State or territory name
- `may_2024` (TEXT) - Participation count in May 2024
- `april_2025_preliminary` (TEXT) - Preliminary count for April 2025
- `may_2025_initial` (TEXT) - Initial count for May 2025
- `percent_change_may_2025_vs_april_2025` (TEXT) - Month-over-month percent change
- `percent_change_may_2025_vs_may_2024` (TEXT) - Year-over-year percent change

**Use for:** Most current state participation data, recent trends, YoY comparisons

**Sample query:**
```sql
SELECT state_territory, may_2024, may_2025_initial,
       percent_change_may_2025_vs_may_2024
FROM snap_state_latest
WHERE state_territory NOT LIKE '%Total%'
ORDER BY CAST(percent_change_may_2025_vs_may_2024 AS REAL) DESC
LIMIT 10;
```

---

### 5. **snap_spending_history** (28 rows, 2000-2024+)
Historical SNAP participation and inflation-adjusted spending from ERS.

**Columns:**
- `fiscal_year` (TEXT) - Fiscal year
- `average_monthly_participation_millions_of_people` (TEXT) - Avg monthly participants in millions
- `inflationadjusted_annual_spending_billions_2024_dollars` (TEXT) - Inflation-adjusted spending in billions (2024 dollars)

**Use for:** Long-term spending trends adjusted for inflation, comparing "real" costs over time

**Sample query:**
```sql
SELECT fiscal_year, 
       CAST(average_monthly_participation_millions_of_people AS REAL) as participants_millions,
       CAST(inflationadjusted_annual_spending_billions_2024_dollars AS REAL) as spending_billions
FROM snap_spending_history
WHERE fiscal_year >= '2010'
ORDER BY fiscal_year;
```

---

### 6. **wic_state_annual** (40 rows)
**Status:** Needs additional parsing - contains metadata about WIC state files rather than actual data.

This table loaded the introductory text from the WIC state file rather than the data itself. The actual WIC state data is complex with multiple sheets per category (pregnant women, infants, children).

**Current state:** Not usable for analysis without reprocessing the source Excel files.

---

### 7. **food_security_annual** (9 rows)
**Status:** Needs additional parsing - contains file documentation rather than actual data.

This table loaded the documentation/description sheet from the food security data file rather than the actual statistics. The source file has multiple tabs with different food security metrics.

**Current state:** Not usable for analysis without reprocessing the source Excel files.

---

## Data Type Notes

Many columns are stored as TEXT rather than proper numeric types because:
1. Source Excel files have inconsistent formatting
2. Some cells contain notes, annotations, or '--' for missing data
3. Header rows and metadata are mixed with data

### Converting to Numeric Types in Queries

When querying, cast TEXT columns to numeric types:

```sql
-- Cast TEXT to REAL (float)
SELECT fiscal_year, 
       CAST(average_participation_thousands AS REAL) as participants
FROM snap_annual_national;

-- Cast TEXT to INTEGER
SELECT state_territory,
       CAST(may_2025_initial AS INTEGER) as current_participants
FROM snap_state_latest;

-- Handle percent strings (convert "0.15" to 15%)
SELECT state_territory,
       CAST(percent_change_may_2025_vs_may_2024 AS REAL) * 100 as pct_change
FROM snap_state_latest;
```

---

## Suggested Analyses

### 1. SNAP Growth Since 1969
```sql
SELECT fiscal_year, average_participation_thousands,
       total_costs_millions_of_dollars
FROM snap_annual_national
ORDER BY fiscal_year;
```

### 2. WIC vs SNAP Participation Trends (2010-2024)
```sql
SELECT 
    s.fiscal_year,
    CAST(s.average_participation_thousands AS REAL) as snap_participants,
    CAST(w.total_participation_thousands AS REAL) as wic_participants
FROM snap_annual_national s
JOIN wic_annual_national w ON s.fiscal_year = w.fiscal_year
WHERE s.fiscal_year >= '2010'
ORDER BY s.fiscal_year;
```

### 3. Cost Per Participant Over Time
```sql
SELECT fiscal_year,
       average_benefit_per_person_1_dollars as snap_monthly_per_person,
       average_monthly_food_cost_per_person_dollars as wic_monthly_per_person
FROM snap_annual_national
JOIN wic_annual_national USING(fiscal_year)
WHERE fiscal_year >= '2010'
ORDER BY fiscal_year;
```

### 4. States with Largest YoY Changes
```sql
SELECT state_territory,
       CAST(may_2024 AS INTEGER) as may_2024_participants,
       CAST(may_2025_initial AS INTEGER) as may_2025_participants,
       CAST(percent_change_may_2025_vs_may_2024 AS REAL) as pct_change
FROM snap_state_latest
WHERE state_territory NOT LIKE '%Total%'
  AND may_2024 != '--'
ORDER BY ABS(CAST(percent_change_may_2025_vs_may_2024 AS REAL)) DESC
LIMIT 10;
```

### 5. Peak SNAP Participation
```sql
SELECT fiscal_year, 
       CAST(average_participation_thousands AS REAL) as participants
FROM snap_annual_national
ORDER BY CAST(average_participation_thousands AS REAL) DESC
LIMIT 5;
```

---

## Known Limitations

1. **WIC state data** - Not fully parsed. Source file has 16 separate sheets for different categories (pregnant women, infants, children by month).

2. **Food security data** - Not fully parsed. Source file has multiple sheets for different metrics (overall food security, households with children, child food security trends).

3. **SNAP monthly state data** - Column headers encode month names and require additional parsing to use effectively.

4. **Data types** - Most numeric columns stored as TEXT for safety. Cast to REAL/INTEGER in queries.

5. **Missing values** - Represented as '--', 'NA', or NULL depending on source file.

6. **Fiscal year definitions** - FY = October 1 to September 30 (e.g., FY2024 = Oct 2023 - Sep 2024).

---

## Extending the Database

To add more comprehensive data:

1. **WIC state monthly data** - Parse all 16 sheets from wic-state-fy2023.xlsx and wic-state-fy2024.xlsx
2. **Food security details** - Parse the actual data sheets from ers-food-security-complete-data.xlsx
3. **Historical SNAP state data** - Process FY89.xls through FY19.xlsx files
4. **SNAP characteristics** - Extract tables from snap-characteristics-fy2023.pdf
5. **WIC coverage rates** - Extract tables from wic-eligibility-coverage-2022.pdf

---

## Accessing the Database

### Python
```python
import sqlite3
import pandas as pd

conn = sqlite3.connect('snap-wic-data.db')
df = pd.read_sql("SELECT * FROM snap_annual_national", conn)
conn.close()
```

### Command Line
```bash
sqlite3 snap-wic-data.db "SELECT * FROM snap_annual_national LIMIT 5"
```

### R
```r
library(DBI)
conn <- dbConnect(RSQLite::SQLite(), "snap-wic-data.db")
df <- dbGetQuery(conn, "SELECT * FROM snap_annual_national")
dbDisconnect(conn)
```

---

## Update Schedule

To refresh with new data:
- SNAP annual data: Updated monthly (2-3 month lag)
- WIC annual data: Updated monthly (2-3 month lag)  
- Latest state snapshot: Updated monthly
- Food security: Updated annually (September/October)
- Source: https://www.fns.usda.gov/pd/

---

## Related Files

- `overview-statistics.md` - Statistical summary and context
- `DATA-README.md` - Comprehensive guide to all raw data files
- `*.xlsx` files - Source Excel data from USDA
- `*.pdf` files - Research reports and detailed analyses
