# HealthData.gov MCP MCP

> HealthData.gov (HHS Open Data) connects your agent directly to thousands of public health records from the U.S. Department of Health & Human Services. Use this MCP to discover datasets, run complex data queries using SoQL, and pull real-time metrics on anything from hospital capacity to Medicare utilization.

## Overview
- **Category:** data-analytics
- **Price:** Free
- **Tags:** open-data, public-health, datasets, data-query, research, soql

## Description

This connector lets your AI client access massive amounts of open government data from HHS. Instead of navigating confusing state portals or downloading huge CSV files, you tell your agent what you need—say, 'What's the current ICU rate in Texas?' Your agent handles finding the right dataset and writing the query code automatically. You can list every available resource to see what exists, then focus on filtering and sorting down to just the records you need. This means data scientists or public health officials can get deep insights without ever touching a complex database interface. It's like having an analyst sitting next to your AI client, ready to pull precise figures from any corner of government data, all managed through Vinkius.

## Tools

### get_catalog
Lists the names and details of thousands of datasets available within the HealthData.gov catalog.

### query_dataset
Runs a specific query against a named dataset using SoQL to pull filtered or aggregated data.

## Prompt Examples

**Prompt:** 
```
Search the HealthData.gov catalog for datasets related to 'COVID-19 hospitalizations'.
```

**Response:** 
```
I found several datasets. The most relevant is 'COVID-19 Reported Patient Impact and Hospital Capacity by State' (ID: 6xf2-c3ie). Would you like me to query the latest records from this dataset?
```

**Prompt:** 
```
Query dataset '6xf2-c3ie' to show the top 5 states with the highest adult ICU utilization.
```

**Response:** 
```
I've queried the dataset sorting by ICU utilization. The top 5 states currently reporting the highest percentages are: 1. Alabama, 2. Georgia, 3. Texas, 4. Florida, 5. Mississippi. Would you like the specific percentage values?
```

**Prompt:** 
```
List the first 10 datasets available in the health catalog.
```

**Response:** 
```
Fetching the first 10 entries from the HealthData.gov catalog... I've retrieved datasets including 'Medicare Provider Utilization', 'Community Health Status Indicators', and 'National Health Nutrition Examination Survey'. Which one would you like to explore?
```

## Capabilities

### Discover available datasets
Lists thousands of public health and social service datasets hosted by HHS for you to review.

### Filter specific data records
Narrows down massive tables using advanced filtering parameters to extract only the relevant rows of information.

### Aggregate public health metrics
Calculates summaries, counts, and averages from selected datasets—like finding the top five states by utilization rate.

## Use Cases

### Tracking state-level ICU capacity changes
A public health official needs to know which states reported the highest adult ICU utilization last month. They ask their agent, and it uses `query_dataset` with sorting parameters to pull a ranked list of the top 5 states immediately.

### Finding datasets about community wellness
A researcher needs data on nutrition surveys but isn't sure which dataset name to use. They ask their agent to run `get_catalog` first, which lists relevant options like 'National Health Nutrition Examination Survey,' allowing the researcher to select it.

### Comparing multiple health metrics
A developer wants to build a dashboard comparing provider utilization and Medicare costs. They use their agent to query two different datasets sequentially, compiling both data sets into one analysis for immediate review.

## Benefits

- Stop downloading massive files. By using the `query_dataset` tool, you filter and sort data in real-time, getting only the specific records you need—no gigabytes of junk data to sift through.
- You don't have to remember complex IDs. Use the `get_catalog` tool first. It lets your agent search and list datasets by topic, making discovery straightforward for anyone.
- Analyze diverse public metrics in one conversation. You can track everything from COVID-19 stats to provider directories using a single interface, eliminating switching between different government websites.
- Speed up research prototyping. Data scientists use this MCP to test complex SoQL queries immediately within your agent's chat window, speeding up the development cycle significantly.
- Access Medicare/Medicaid data easily. This connector gives you direct access to critical health service utilization records that are usually locked behind developer portals.

## How It Works

The bottom line is that you use natural language conversation instead of complex database commands to get structured data answers.

1. Subscribe to this MCP. (If needed, enter your HealthData.gov App Token for higher limits.)
2. Your agent uses the catalog tool to search and list relevant datasets based on a topic or ID.
3. You prompt your agent to query the specific dataset using criteria like sorting or filtering parameters.

## Frequently Asked Questions

**How do I find a specific dataset about 'Medicare'?**
Use the `get_catalog` tool and provide 'Medicare' in the `q` parameter. This will return a list of relevant datasets along with their unique identifiers (dataset_id).

**Can I filter results to only show data from a specific state?**
Yes! When using `query_dataset`, use the `$where` parameter with a SoQL filter like `state='NY'`. You can also use `$select` to pick specific columns.

**Is an API key required to access this data?**
No, the data is public. However, providing a `HEALTHDATA_APP_TOKEN` is recommended for higher rate limits if you plan to perform many queries.

**What happens if my queries exceed the default rate limit when I use `query_dataset`?**
The system will return a rate-limiting error. To handle high volumes, you should enter your HealthData.gov/Socrata App Token in the setup options. This token increases your query capacity and helps maintain stable data retrieval.

**Does running `get_catalog` show every single dataset published by the HHS?**
No, it provides access to the main catalog of datasets managed via HealthData.gov. While this is a massive collection, certain specialized or internal departmental data may exist outside of the scope shown here.

**If I get an error when using `query_dataset`, how can I debug my SoQL syntax?**
Check your specific dataset's documentation for precise usage examples. You must ensure that all field names and parameters used in the query exactly match what is available in the target data source.

**When I use `query_dataset`, what format does the returned public health data take?**
The tool returns structured, machine-readable data—typically JSON. This format makes it easy for your AI client to parse and analyze the metrics immediately within the conversation.

**Can I use the information from `get_catalog` in an application outside of Vinkius or my AI agent?**
The MCP is designed specifically for integration with your AI agent. While the underlying data is public, you must connect through a compatible client to execute the catalog retrieval tools.