# Olostep MCP

> Olostep handles large-scale web scraping using a headless browser API that renders JavaScript and returns structured data. You run complex, automated data extraction workflows through natural conversation with your AI client. It manages everything from single URL scrapes to full batch orchestration.

## Overview
- **Category:** developer-tools
- **Price:** Free
- **Tags:** headless-browser, data-extraction, web-automation, scraping-batches, structured-data, api-integration

## Description

**Olostep MCP Server - Web Scraping & Data Extraction**

This server lets you run huge, automated data scrapes through your AI client. It's built for complex workflows that need JavaScript rendered and structured output—not just basic HTML grabs. You handle everything from checking connectivity to running massive job batches.

You start by verifying the link between your agent and Olostep using `check_olostep_status`. This confirms you’re connected and ready to go.

You manage scraping agents with three tools: you can see every configured profile by calling `list_agents`, pull all the specific details for one agent using `get_agent`, or initialize a brand new agent setup through `create_agent`.

For quick data pulls, you use `scrape_url`. This scrapes content from just one web page and lets you specify exactly what format you need: markdown, plain text, or full HTML. If you're running a massive job, you initiate it by calling `create_batch`, passing multiple URLs separated by commas to start the bulk scrape.

The system tracks everything you do. You can see every scrape batch job you’ve ever run—their IDs and status—by using `list_batches`. To check on a specific running or finished job, you use `get_batch` to get its operational status and job ID. When the scraping is done, you pull the final structured results (Markdown or JSON) using `get_batch_results`.

To keep tabs on your usage, always run `get_usage`. This gives you current numbers on how many pages you’ve scraped and how much bandwidth you've used this cycle.


## Tools

### check_olostep_status
Verifies your API connection to Olostep and confirms connectivity status.

### create_agent
Initializes a new scraping agent profile within the system.

### create_batch
Starts a bulk scrape job, taking multiple URLs separated by commas as input.

### get_agent
Retrieves the full details and configuration for a specific scraping agent.

### get_batch_results
Fetches the final structured results (Markdown/JSON) once a scraping batch is finished.

### get_batch
Checks the operational status, job ID, and progress of a running or completed scrape batch.

### get_usage
Provides current metrics on API usage, including pages scraped and bandwidth consumed this cycle.

### list_agents
Lists all available scraping agents configured under your account.

### list_batches
Provides a list of every scrape batch job you've run, including IDs and status.

### scrape_url
Scrapes the content from a single web page; lets you specify if you want markdown, html, or plain text output.

## Prompt Examples

**Prompt:** 
```
Scrape the homepage of example.com as markdown.
```

**Response:** 
```
Page scraped successfully. The content is 2,400 words with 15 images and 8 links. Here's the markdown output. Would you like to save it or scrape additional pages?
```

**Prompt:** 
```
Create a batch scrape for 5 competitor product pages.
```

**Response:** 
```
Batch created with 5 URLs. Job ID: BATCH-7291. Processing has started — 2 of 5 pages already completed. Would you like me to check back when all are done?
```

**Prompt:** 
```
Show my Olostep API usage this month.
```

**Response:** 
```
This month: 1,245 pages scraped, 82 batch jobs, 340 MB bandwidth used. You have 8,755 credits remaining on your plan. Would you like to see agent-level breakdown?
```

## Capabilities

### Check Connection Status
Verify the connection between your AI client and the Olostep service using `check_olostep_status`.

### Manage Scraping Agents
List, retrieve details, or create new scraping agents using `list_agents`, `get_agent`, and `create_agent`.

### Create Batch Scrapes
Initiate a large-scale scrape job by passing multiple URLs as comma-separated values to `create_batch`.

### Monitor Job Progress
Get the status and details of existing scraping jobs using `get_batch` or retrieve the final structured data via `get_batch_results`.

### Scrape a Single URL
Perform a quick scrape on one web page, specifying the output format (markdown, html, or text) with `scrape_url`.

### Track API Consumption
Check your current usage statistics—pages scraped and bandwidth used—using the `get_usage` tool.

## Use Cases

### Tracking Competitor Pricing
A product manager needs to track pricing changes across 50 competitor pages. Instead of writing a script, they ask their agent: 'Create a batch scrape for these 50 URLs.' The agent uses `create_batch`, and when done, the PM checks `get_batch_results` to get all the structured data ready for analysis.

### Building a Knowledge Base from Articles
A researcher needs content summaries from 10 different scientific journals. The agent runs a targeted scrape using `scrape_url`, specifying 'markdown' format. This gives the researcher clean, article-ready text to feed into their RAG pipeline.

### Auditing API Costs
An operations engineer wants to know if their scraping volume is too high before running a major campaign. They simply call `get_usage`. This immediately shows the total pages scraped and remaining bandwidth, preventing unexpected overages.

### Managing Scraping Efforts
A development team needs to ensure all their scraping jobs are accounted for. They use `list_agents` to see what agents are active, then run `list_batches` to get a complete history of every job ID.

## Benefits

- Scale data acquisition instantly. Instead of manually scraping pages, running `create_batch` lets you manage hundreds of URLs in a single job.
- Get clean, structured output every time. Olostep uses a headless browser that renders JavaScript, so the content is accurate—not just raw HTML garbage.
- Keep track of everything with zero clicks. Use `list_batches` and `get_batch` to see real-time status updates for all your scraping jobs.
- Monitor cost and limits easily. The `get_usage` tool shows you exactly how many pages you've scraped and the bandwidth used, keeping you accountable.
- Control your data sources with agents. You can use `create_agent` to build specific scraping profiles tailored for different sites or domains.

## How It Works

The bottom line is, you talk to your agent like a person, and it handles all the complex API calls needed to get clean data from the web.

1. First, connect your AI client to the Olostep MCP Server using your API Key.
2. Next, tell your agent what data you need: 'Create a batch scrape for these 10 URLs' or 'Scrape this single page in markdown format.'
3. The server processes the request (rendering JavaScript if needed) and returns structured results—either immediate output from `scrape_url` or a job ID to monitor with `get_batch`.

## Frequently Asked Questions

**How do I scrape multiple pages with Olostep's `create_batch`?**
You provide the URLs as a comma-separated list. The agent handles the rest, running them all in one scheduled job and giving you job IDs to track.

**What format does `get_batch_results` return?**
It returns structured data, typically Markdown or JSON, depending on what was requested during the batch creation. This makes it immediately usable for processing.

**Can I check my API usage with Olostep's `get_usage` tool?**
Yes, calling `get_usage` gives you a clear breakdown of your current month's activity—total pages scraped and bandwidth consumed.

**Is Olostep good for scraping JavaScript-heavy sites using `scrape_url`?**
Absolutely. The server uses a headless browser that renders JavaScript first, meaning the content you get back is what the live user sees, not just the initial HTML.

**If I run into API connectivity issues, how do I use `check_olostep_status`?**
It verifies your direct connection to the Olostep service. Running this tool confirms that your AI client can communicate with the server endpoint before you initiate any large-scale scraping operations. This saves time debugging simple authentication failures.

**How do I track my historical scrapes and manage multiple jobs using `list_batches`?**
It provides an overview of all your completed or running batches. You get the job ID, creation date, and current status for quick reference without needing to download individual result sets. This is great for auditing.

**When I call `scrape_url`, how do I specify that the data should be in JSON format?**
You must pass the desired output format (markdown, html, or text) as a parameter to the tool. Specifying 'json' ensures the agent receives structured key-value pairs immediately, making it easy for your LLM pipeline to consume.

**After setting up agents with `create_agent`, how do I view and manage my existing scraping resources?**
You use `list_agents` to retrieve a directory of all created agent instances. You can then select a specific agent ID and use the `get_agent` tool to pull detailed configurations for modification or monitoring.

**How do I scrape a web page via AI?**
Use the `scrape_url` tool with the target URL and optional format (markdown, html, or text). The content is extracted and returned instantly.

**Can I scrape multiple URLs at once?**
Yes. Use `create_batch` with comma-separated URLs to submit a batch job. Track progress with `get_batch` and retrieve results with `get_batch_results`.

**What are scraping agents and how do I use them?**
Agents are reusable scraping configurations with custom extraction rules. Use `create_agent` to set one up and `list_agents` to manage them.