# Data.gov Catalog MCP

> Data.gov Catalog MCP connects your AI agent directly to the official US Government open data catalog. You can search thousands of datasets, track publishing organizations, and map precise geographic boundaries from agencies like NASA or NOAA using a single query. This bypasses manual website browsing entirely.

## Overview
- **Category:** data-analytics
- **Price:** Free
- **Tags:** open-data, federal-data, dataset-discovery, spatial-data, public-records, data-catalog

## Description

Need specific government data? This connection lets you talk to the Data.gov Catalog, giving your AI agent access to a massive repository of federal data. You don't have to navigate dozens of agency websites; you just ask for what you need—whether it’s a list of all publishing agencies or datasets related to climate modeling.

For example, if you are building a mapping tool, you can first find the precise geographic boundary for a city using its location ID. Then, you can use that geometry to filter down thousands of available records to only show data relevant to that area. You'll also get deep insight into how any dataset was created by inspecting both the original source files and the cleaned-up versions.

When you connect this MCP through Vinkius, your agent treats the entire government data ecosystem as one searchable pool. This means whether you need a simple keyword count or complex spatial filtering, it's all available right in your chat window.

## Tools

### get_location_geometry
Returns the precise GeoJSON boundary coordinates for a given location identifier.

### get_harvest_record_raw
Retrieves the original, unmodified source data payload for inspection.

### get_harvest_record
Pulls metadata detailing how a specific dataset was originally added to the catalog.

### get_harvest_record_transformed
Gets a cleaned-up version of the record in a standardized format ready for use.

### get_keywords
Lists popular keywords and counts how many datasets reference each term across the catalog.

### get_organizations
Provides a complete list of every publishing organization in the catalog.

### search_locations
Suggests location names and IDs that can be used to accurately narrow down a search area.

### search_datasets
Searches the entire government data catalog using specific keywords, filters, and sorting options.

## Prompt Examples

**Prompt:** 
```
Search for NASA datasets related to climate change.
```

**Response:** 
```
I found several NASA datasets. Notable results include 'Global Surface Temperature Analysis' and 'Climate Model Outputs'. Would you like the full metadata for any of these?
```

**Prompt:** 
```
List all government organizations that publish data here.
```

**Response:** 
```
I've retrieved the list of publishing organizations. There are over 1,200 entities, including major agencies like NOAA, NASA, and the Department of Education. Which one would you like to explore?
```

**Prompt:** 
```
Get the GeoJSON boundary for 'Los Angeles' to filter my search.
```

**Response:** 
```
I've found the location ID for Los Angeles. Retrieving the GeoJSON geometry now... Done. You can now use this boundary to filter datasets using the `search_datasets` tool.
```

## Capabilities

### Search and filter datasets
Find specific public datasets by using keywords, organization names, and defined filters.

### Identify publishing organizations
Get a complete list of all government agencies that publish data to the catalog.

### Map geographic boundaries
Retrieve precise GeoJSON coordinates for any known location ID, allowing you to filter other datasets spatially.

### Inspect dataset metadata
View the original source data payload and the processed, structured version of a record.

### Analyze data trends
Check which keywords are most common across the entire catalog and how many datasets use them.

## Use Cases

### Mapping a specific environmental issue
A researcher needs all water quality reports for Chicago, Illinois. They first run `search_locations` to get the location ID, then use `get_location_geometry` with that ID. Finally, they pass the resulting GeoJSON boundary into `search_datasets` to filter only relevant datasets.

### Auditing data source reliability
A developer needs to know if a dataset's metadata is complete. They find a promising record and use `get_harvest_record_raw` to check the original, unmodified payload before building their application logic.

### Comparing federal focus areas
A policy analyst wants to know if climate change is more frequently discussed than economic development. They run `get_keywords` and then compare the resulting dataset counts for 'climate' versus 'economy'.

### Listing all data providers
Someone building a directory of government open resources needs to know who publishes what. They simply call `get_organizations` to get an instant list of every contributing agency.

## Benefits

- Find relevant data without sifting through thousands of links. Use the search functionality to pull datasets based on keywords or specific organization filters.
- Pinpoint exact areas using geometry. You can run a location ID through `get_location_geometry` and immediately use that boundary to filter your searches, making results hyper-local.
- Understand data lineage. Don't just take the metadata; check the raw source payload or the transformed version of a record to verify data integrity.
- Build knowledge maps quickly. Run `get_organizations` to get an exhaustive list of agencies, letting you target your research scope immediately.
- Spot trends in government focus. Use `get_keywords` to see which topics are generating the most open data records right now.

## How It Works

The bottom line is that your agent handles the complex sequence of API calls needed to pull together multiple pieces of federal data into one readable answer.

1. Subscribe to this MCP on Vinkius and provide your API key or credentials.
2. Ask your AI client for a specific data task, like 'Find all water quality data near Miami.'
3. The agent calls the necessary functions (e.g., first locating the geometry, then searching datasets) and returns structured results directly to you.

## Frequently Asked Questions

**How do I use get_location_geometry with search_datasets?**
You first run `get_location_geometry` using a location ID to pull the specific boundary coordinates. You then pass those exact boundaries into your query when calling `search_datasets`. This limits results perfectly.

**What is the difference between get_harvest_record and get_harvest_record_raw?**
The raw record gives you the original, untouched source data payload. The standard harvest record provides metadata about how that dataset was initially ingested into the catalog.

**Can I use get_keywords to find a specific type of dataset?**
No, `get_keywords` only tells you which topics are popular and how many datasets mention them. To actually *find* those datasets, you must run the results through `search_datasets`.

**How do I list all available government agencies?**
Use `get_organizations`. It returns a complete list of every publishing organization that contributes data to the catalog. This is your starting point for scoping research.

**When using search_datasets, what do I need regarding API keys or authentication?**
You must provide an API key if your proxy requires it. The process is simple: connect the MCP via Vinkius and supply your required credentials at the connection step. This ensures your agent can access the full US Government repository.

**If I run get_location_geometry and receive an error, what does that usually mean?**
An error typically means the provided location ID is invalid or hasn't been fully indexed. Double-check the ID against the output of search_locations first. If the ID is correct, you might be hitting a temporary service limit.

**What structure does get_harvest_record_transformed provide for my data?**
It returns a standardized DCAT-US payload structure. This transformed format makes it easy to parse common metadata fields like publication date and spatial bounding boxes, regardless of the original source schema.

**How can I filter search_datasets using multiple criteria simultaneously?**
You combine filters directly in your query prompt. For instance, you can specify both a keyword AND an organization slug. The MCP handles prioritizing these combined parameters to narrow down results efficiently.

**Can I search for datasets within a specific geographic area?**
Yes! Use `search_locations` to find a location ID, then `get_location_geometry` to get the GeoJSON. Finally, pass that to `search_datasets` with the `spatial_geometry` parameter.

**How do I find datasets from a specific agency like NASA?**
Use the `search_datasets` tool and provide 'nasa' in the `org_slug` parameter. You can combine this with a search query `q` for more specific results.

**What is the difference between raw and transformed harvest records?**
The `get_harvest_record_raw` tool returns the original metadata from the source agency, while `get_harvest_record_transformed` returns the data mapped to the standard DCAT-US schema used by Data.gov.