# Dades Obertes Catalunya MCP

> Dades Obertes Catalunya lets your agent search and query thousands of public datasets from Catalonia's official open data portal. Use natural language to find relevant dataset IDs via `search_catalog`, then use those identifiers with `query_dataset` to filter, select specific columns, and extract structured information on topics like health, transport, or demographics.

## Overview
- **Category:** data-analytics
- **Price:** Free
- **Tags:** open-data, public-sector, dataset-discovery, soql, catalonia-data, administrative-data

## Description

Need to pull public statistics from the Government of Catalonia? This MCP lets your agent browse thousands of government datasets without you having to manually navigate dozens of websites. You start by searching the entire catalog using keywords—it'll point you toward the right dataset ID. Once you have that identifier, you can run complex queries against it in natural language. For instance, you can ask the system to find all records about electric vehicle charging stations within a specific postal code and only show the location and power rating. The whole process happens through your AI client, which handles the underlying data structure. Vinkius makes connecting this open data platform easy, letting any MCP-compatible client run sophisticated queries that used to require an actual database expert. You just talk to it.

## Tools

### query_dataset
Runs advanced queries against a specific dataset using complex filters and selection parameters.

### search_catalog
Searches the full open data catalog to find relevant datasets by keyword or category name.

## Prompt Examples

**Prompt:** 
```
Search the catalog for datasets related to 'hospitals' in Catalonia.
```

**Response:** 
```
I found several datasets. The most relevant is 'Centres sanitaris de Catalunya' with ID `8u93-v9az`. Would you like me to query its first few records?
```

**Prompt:** 
```
Query dataset 'abcd-1234' and show me the top 5 results where population is over 10000.
```

**Response:** 
```
Using `query_dataset` with `$where: "population > 10000"` and `$limit: 5`... Here are the results: 1. Barcelona (1.6M), 2. L'Hospitalet (264k), 3. Terrassa (223k)...
```

**Prompt:** 
```
List the columns available in dataset 'xyz1-2345' using a full-text search for 'electric'.
```

**Response:** 
```
I've queried `xyz1-2345` with `$q: "electric"`. I found records related to electric vehicle charging stations, including columns for 'location', 'power_kw', and 'status'.
```

## Capabilities

### Search for public datasets
Find the unique identifiers needed to query by searching the Dades Obertes Catalunya catalog using keywords or categories.

### Query specific data records
Filter and retrieve structured rows from any dataset by applying complex criteria, selecting only the columns you need, and sorting results.

### Analyze grouped statistics
Group raw data results to calculate or summarize metrics across entire datasets on demand.

## Use Cases

### Analyzing urban growth trends
A researcher asks their agent to find datasets related to 'urban planning' using `search_catalog`. They then use that ID with `query_dataset` to filter for records where the year is between 2010 and 2015, getting a clean list of growth metrics.

### Checking infrastructure capacity
A developer needs data on electric vehicle charging stations. They use `search_catalog` to find the relevant dataset ID, then run a query to see all records where 'power_kw' is above 50 and group them by location.

### Comparing public health metrics
A data analyst wants to compare hospital capacity across regions. They use `search_catalog` for 'health centers,' get the ID, and then query it using `query_dataset` with complex filtering on location and date ranges.

### Finding demographic information
A student needs to know population trends. They ask their agent to search the catalog for 'demographics,' get the ID, and then query it to show them the top 5 records sorted by the largest population count.

## Benefits

- Stop manually downloading spreadsheets. Use `search_catalog` to find the right dataset ID, then use `query_dataset` to pull exactly the columns and rows you need in one go.
- Filter massive datasets instantly. Instead of writing complex SQL syntax, ask your agent to 'show me records where population is over 10k' using the query function.
- Cross-reference multiple data points easily. You can find identifiers for transport routes and then immediately query them against demographic data in a single workflow.
- Get structured results right away. The system handles grouping and limiting, giving you clean, actionable lists without needing to write any aggregation code yourself.
- Saves research time. Instead of wasting hours navigating government portals, you let your agent do the searching and querying for you.

## How It Works

The bottom line is: your AI client handles all the back-and-forth of data discovery and querying so you just get the final, filtered list.

1. First, subscribe to this MCP. You can optionally add a Socrata App Token if you plan on making heavy queries.
2. Next, tell your AI client what kind of data you need (e.g., 'datasets about hospital populations'). The agent uses the `search_catalog` tool to find relevant IDs.
3. Finally, give the system the target ID and the specific filters or metrics you want. The agent executes a query using `query_dataset` and returns the results.

## Frequently Asked Questions

**How do I find the unique identifier for a specific dataset?**
Use the `search_catalog` tool with a descriptive query. The response will include the 4x4 alphanumeric ID (e.g., 'abcd-1234') which you can then use with the `query_dataset` tool.

**Can I filter data by specific values like a city or a date range?**
Yes! When using `query_dataset`, use the `$where` parameter to apply SoQL filters. For example, `$where: "poblacio > 5000"` or `$where: "comarca = 'Barcelonès'"`.

**Is a Socrata App Token mandatory to use this server?**
No, it is optional. However, providing a `DADES_APP_TOKEN` allows for higher rate limits and prevents throttling during intensive data exploration.

**When I use `query_dataset`, how do I handle extremely large result sets?**
Use the `$limit` and `$offset` parameters. You can request results in controlled batches to manage memory and ensure you process every record systematically without overwhelming your client.

**What kind of metadata does `search_catalog` return so I know if a dataset is right for me?**
`search_catalog` provides the dataset's official identifier, its associated categories, and a brief description. This lets you verify the data source before attempting any complex queries.

**How do I perform a general keyword search across all columns using `query_dataset`?**
You need to use the `$q:` parameter instead of filtering with `$where:`. This allows you to run full-text searches, finding records based on keywords across multiple fields.

**If my AI client fails when using `query_dataset`, what is the first thing I should check?**
Check your usage against the API's standard rate limits. While basic access works, applying a Socrata App Token dramatically increases throughput and prevents connection failures.

**Before running `query_dataset`, how do I use `search_catalog` to narrow down my research area?**
When using `search_catalog`, specify keywords or categories in your prompt. This narrows the catalog from thousands of records, allowing you to select only the precise dataset ID you need.