# Veraset MCP

> Veraset connects your AI agent directly to billions of points of interest and mobility datasets. You can run complex geospatial SQL, inspect dataset structures, monitor long-running queries, and generate secure download links for massive location data drops.

## Overview
- **Category:** data-analytics
- **Price:** Free
- **Tags:** geolocation, mobility-data, geospatial-sql, poi-data, data-extraction, s3-integration

## Description

This MCP lets you treat huge, raw geolocation datasets like a local database. Instead of having to export data into your own environment just to run an aggregate query, your agent handles the SQL execution directly against Veraset's cloud cluster. You can ask it to find patterns—like identifying POI clusters or tracking movement flow across cities—and get immediate results without context switching.

If you’re building automated pipelines that rely on this location intelligence, Vinkius AI Analytics provides full visibility into the entire process. You always know exactly which datasets are being queried and how much of your budget is tied up in the background job runs. This makes managing complex data flows reliable. You'll use it to inspect dataset schemas first, then fire off custom SQL queries that compute geolocation aggregates on demand. Finally, when the math is done, you generate a temporary, secure link so you can download the resulting tables for final analysis.

## Tools

### cancel_running_query
Immediately stops an SQL query that is currently running in the background.

### execute_sql_query
Starts a new, complex SQL task against Veraset's massive dataset pool.

### generate_download_link
Creates a temporary, secure URL for downloading data stored in S3 buckets.

### get_dataset_metadata
Retrieves high-level technical information about an entire mobility dataset package.

### get_query_results
Retrieves the final result rows from an SQL query that has already finished processing.

### get_query_status
Checks and reports the current status, progress percentage, and estimated completion time of a running job.

### get_dataset_sample
Provides a small, immediate preview of the first few rows in any given dataset.

### get_dataset_schema
Returns the precise column names and data types for a specific mobility dataset.

### list_mobility_datasets
Identifies all available mobility dataset packages that Veraset has provisioned for your account.

### list_s3_delivery_folders
Lists the specific S3 cloud folders where scheduled data drops are delivered to your organization.

## Prompt Examples

**Prompt:** 
```
List all our provisioned delivery folder buckets for S3 mobility packets.
```

**Response:** 
```
I've fetched 3 attached S3 delivery prefixes from your Veraset Organization profile:
- `veraset-delivery-east-1/org/xyz/weekly-visits/`
- `veraset-delivery-east-1/org/xyz/movement-v3/`
- `veraset-delivery-east-1/org/xyz/poi-base/`
Would you like me to instantly generate a secure, temporary download link for a parquet file in the 'weekly-visits' tier?
```

**Prompt:** 
```
Get a basic preview 10-row sample from the dataset 'movement_global'.
```

**Response:** 
```
Here is the sample of `movement_global`:

The schema primarily includes `id_type`, `accuracy`, `timestamp`, `latitude`, `longitude`, `country`.

Sample Row 1: `ad_id`, `14.5m`, `2023-11-20T10:11:00Z`, `37.7749`, `-122.4194`, `USA`
Sample Row 2: `ad_id`, `8.0m`, `2023-11-20T10:14:32Z`, `51.5074`, `-0.1278`, `GBR`
...
This validates the coordinate systems are populated.
```

**Prompt:** 
```
Execute an aggregation query on 'dataset-v5' grouping total foot traffic by 'store_id' and get the current execution status.
```

**Response:** 
```
I've fired off the custom SQL directly to the cloud execution cluster (Job ID: `veraset-sql-job-4299b`). 

I then instantly checked `getQueryStatus` for you. The infrastructure shows the job status as currently `IN_PROGRESS` at approx `15%`. Total scanned bytes sits around `14.2 GB`.

We can fetch the results dynamically using the `getQueryResultsTool` in a few moments. Hang tight.
```

## Capabilities

### Run Custom Geo-SQL
Send complex ANSI SQL queries to calculate specific metrics across mobility datasets.

### Inspect Dataset Structure
Retrieve the column definitions and data types for any available dataset before writing a query.

### Sample Data Preview
Fetch quick samples of rows to confirm the expected format and content of a dataset.

### Monitor Job Progress
Check the real-time status, progress percentage, and total bytes scanned for long-running queries.

### Manage Results & Cleanup
Pull the final result rows from a completed query or immediately abort an intensive, stalled job.

### Generate Download Links
Create temporary, secure links for bulk downloads of structured data delivered to S3.

## Use Cases

### Figuring out what data is available
A new analyst needs to know if Veraset has movement data for the last quarter. They simply ask their agent to run `list_mobility_datasets`, getting a list of all accessible packages without having to navigate complex AWS or internal portal menus.

### Validating raw column names
A geospatial engineer needs to write a join query but isn't sure if the 'accuracy' field is stored in meters or feet. They run `get_dataset_schema` on the target dataset, confirming the data type and unit before writing any code.

### Getting final structured reports
A retail lead runs a complex aggregation query to calculate total foot traffic per store ID. Once the job is done, they use `get_query_results` to pull the exact table needed for their presentation slide deck.

### Getting data into an external system
The team has completed a massive analysis and needs the raw parquet files. They instruct their agent to call `generate_download_link`, getting a temporary, time-limited URL they can pass directly to their BI tool.

## Benefits

- Stop guessing what data you have. Use `list_mobility_datasets` to see every available package, saving time spent clicking through confusing developer consoles.
- Before writing a single line of SQL, check the structure using `get_dataset_schema`. This saves massive headaches when column names change or types are unexpected.
- Need to prove your query works? Use `get_dataset_sample` first. You get an instant view of ten rows so you know exactly what kind of data you’re about to run against.
- The process is messy sometimes, especially big jobs. Instead of waiting blindly, use `get_query_status` to track the progress and see if the job is stalled or still running.
- When your query finishes, don't manually copy-paste results. Use `get_query_results` for paginated access, or `generate_download_link` to get a single secure file you can use immediately.

## How It Works

The bottom line is: You guide your agent through discovering data structure, running the calculation in the cloud, and finally retrieving/saving the finished result.

1. First, ask your agent to list the available datasets or inspect a specific dataset's schema using `get_dataset_schema` to figure out what fields you can query.
2. Next, instruct it to construct and execute the full SQL logic via `execute_sql_query`. If the job takes time, use `get_query_status` to track its progress. You’ll get a unique Job ID back.
3. Once the status shows 'Completed,' ask for the results using `get_query_results`, or if you need permanent files, generate a secure link with `generate_download_link`.

## Frequently Asked Questions

**How do I find out what Veraset datasets are available using `list_mobility_datasets`?**
The agent calls `list_mobility_datasets` and returns a list of all dataset packages you have access to. This tells you the names you need before running any queries.

**What happens if my SQL query is too big for Veraset? Can I cancel it using `cancel_running_query`?**
Yes, if a job is taking longer than expected or hitting limits, you use `get_query_status` first to verify the progress, and then issue `cancel_running_query` to abort the task.

**Can I get metadata for an S3 bucket using this MCP?**
No. While you can list available S3 folders with `list_s3_delivery_folders`, you must use a separate tool or process to manage the actual cloud storage structure.

**After running an SQL query, how do I get the final data out?**
You first check the status with `get_query_status`. Once it's done, you either request the results using `get_query_results` or ask for a permanent file link via `generate_download_link`.

**Before I run a complex query, how do I check the column definitions using `get_dataset_schema`?**
You use `get_dataset_schema` to pull the technical definition of any dataset. This instantly shows you all available columns and their data types (like text, date, or float). It's crucial for validating your SQL syntax before running a job.

**I want to quickly validate if a dataset is relevant without querying everything; how does `get_dataset_sample` help?**
`get_dataset_sample` retrieves the first few rows of data, giving you an immediate look at what it actually contains. This lets you confirm the format and general quality of the location records before writing a full aggregation query.

**If my SQL job is running for hours, how do I check its progress using `get_query_status`?**
`get_query_status` allows you to track long-running jobs without re-executing the query. It reports metrics like percentage completion and total bytes scanned, letting you know if the process is still active.

**After I pull results using `get_query_results`, how do I generate a secure, permanent download link with `generate_download_link`?**
`generate_download_link` creates a temporary, pre-signed URL for bulk downloads. This is the easiest way to grab the full data set into your own environment without having to manually export it.

**Can the AI really create and download a pre-signed link from Veraset's S3 directly?**
Yes. Upon using the `generateDownloadLinkTool`, the agent will interface via Veraset's protocol using your API token, instantly retrieving an authenticated, time-sensitive download link to let you extract the immense dataset files securely.

**What happens if a SQL statement to Veraset starts taking too long?**
You don't need to panic or swap tools. Instruct your agent: `cancel the running query id 'query-xx9'`, and the `cancelQueryTool` fires an immediate abort network call. The computation ends, saving extensive costs without abandoning the conversational flow.

**How can I preview geolocation signals before compiling expensive queries?**
Ask for the `getSchemaTool` followed by `getSampleTool`. The AI perfectly delivers the dataset definitions and outputs five physical preview rows right inside your message history, confirming expected structure formatting.