# PubChem PUG REST API MCP

> The PubChem PUG REST API MCP Server lets your AI agent access authoritative molecular data from PubChem, a resource managed by the National Library of Medicine. It allows you to audit chemical compounds, retrieve detailed metadata, and query specific formulas and Compound Identifiers (CIDs) without ever visiting a scientific portal. You can search for CIDs using keywords or look up full compound details instantly just by providing a name or an ID.

## Overview
- **Category:** the-unthinkable
- **Price:** Free
- **Tags:** chemistry, molecular-data, compound-search, biomedical-data, research-tools, api-integration

## Description

Listen up. This server hooks your AI agent right into PubChem, which is basically the gold standard for molecular data—it’s run by the National Library of Medicine. It lets your agent audit chemical compounds, pull deep metadata, and query specific formulas or Compound Identifiers (CIDs) without you ever having to visit a clunky scientific website yourself. You're getting direct access to authoritative chemistry info.

When you use this tool set, your agent doesn’t just search; it performs structured chemical audits. It handles the complex stuff so you don't have to babysit databases manually. This speeds up your workflow big time.

Here’s how your AI client uses the tools:

First, before running anything else, your agent checks the service health using `check_api_status`. That confirms whether the PubChem PUG REST service is actually online and ready to spit out data. You don't want a failed retrieval because the endpoint was down.

To pull specific details on an already known compound, you’ve got two main options. If you know the exact Compound Identifier (CID), you use `get_compound_by_cid`. This tool retrieves all the technical metadata, molecular formulas, and detailed specs for that chemical record. On the other hand, if you only have a common or IUPAC name—say, 'Aspirin'—you use `get_compound_by_name`. That gives your agent comprehensive records and formulas based purely on the name you feed it.

If you don’t know which CID to start with, you search first. You run `search_compound_cids` by giving it a general keyword or phrase. This scans the whole database and finds multiple CIDs that match your search term. 

Knowing these tools lets your agent handle everything from basic lookups to deep technical auditing. It gathers high-resolution properties and formulas you need when you’re verifying any chemical record. You can find detailed records, including IUPAC names and molecular formulas, using the known name of a compound via `get_compound_by_name`. For full scientific verification, all available details for a compound are pulled through `get_compound_by_cid` once that unique ID is provided. If you just need to know which CIDs exist for a term—like 'glucose' or something else—you use `search_compound_cids` to get that list of matching identifiers.

This setup makes sure your agent never gets stuck waiting on another person to run the search, giving it immediate access to core molecular data. You just hook up your preferred AI client and point it at this server. It handles all the necessary calls—status checks, name lookups, CID lookups, and keyword searches—so you're always working with verified chemical information.

## Tools

### check_api_status
Verifies if the PubChem PUG REST service is currently running and accessible.

### get_compound_by_cid
Retrieves detailed information for a chemical compound using its specific CID number.

### get_compound_by_name
Gets comprehensive details and formulas for a chemical compound when you provide its name.

### search_compound_cids
Searches the database to find multiple CIDs that match a given keyword or search term.

## Prompt Examples

**Prompt:** 
```
Get details for compound 'Aspirin' using PubChem.
```

**Response:** 
```
I've retrieved the details for Aspirin! It has a CID of 2244 and its molecular formula is identified as C9H8O4. Would you like the full IUPAC name or other technical metadata for this compound?
```

**Prompt:** 
```
What is the molecular formula for CID '2519'?
```

**Response:** 
```
I've identified the compound for CID 2519! It is Caffeine, and its molecular formula is identified as C8H10N4O2. I can assist you with more technical properties for this record if you'd like.
```

**Prompt:** 
```
Search for compound CIDs matching 'glucose'.
```

**Response:** 
```
I've scanned the PubChem database for glucose! I've identified several matching CIDs, including 5793 and 107526. I can provide the full metadata for any of these identifiers to help you in your research.
```

## Capabilities

### Validate API Health
Checks the operational status of the PubChem PUG REST service to ensure data retrieval won't fail.

### Lookup by Compound CID
Retrieves all technical metadata, formulas, and details for a chemical compound when provided with its unique Identifier (CID).

### Lookup by Common Name
Finds detailed records, including molecular formulas and IUPAC names, using the known name of a chemical compound.

### Search for CIDs via Keyword
Scans the database to find multiple Compound Identifiers (CIDs) that match a general keyword or search phrase.

### Retrieve Technical Specifications
Access high-resolution properties and formulas needed for scientific auditing of any chemical record.

## Use Cases

### Auditing a Formula Sheet
A chemist has a physical list of 50 compounds. Instead of manually searching for each on PubChem's website, they ask their agent to run `get_compound_by_name` on the entire batch. The agent processes them all and returns a structured list detailing the molecular formula for every single entry.

### Following an Unknown Lead
A researcher finds a compound mentioned in an old paper but doesn't have its CID. They use `search_compound_cids` with the vague keyword from the paper. The agent returns several potential CIDs, allowing the researcher to then run `get_compound_by_cid` on the most likely candidate.

### Cross-Checking IDs
You receive two different sources for a compound—one gives the common name, and the other gives the CID. You ask your agent to run both `get_compound_by_name` and `get_compound_by_cid`. The resulting data confirms they match, giving you high confidence in the identity.

### Building a Data Pipeline
Before starting research, you ask your agent to run `check_api_status` just to confirm system uptime. This simple step ensures that subsequent complex data retrieval calls using `get_compound_by_cid` won't fail mid-process.

## Benefits

- Verify compound details on the fly. Instead of opening a web portal, use `get_compound_by_name` to pull IUPAC names and molecular structures directly into your chat window.
- Accelerate discovery with broad searches. Use `search_compound_cids` when you only know a general chemical class; it returns a list of potential CIDs for follow-up.
- Guarantee data integrity. Always run `check_api_status` first to ensure the entire chemical research workflow is operational before starting an audit.
- Deep dive into specifics. When you have a target CID, running `get_compound_by_cid` pulls every single technical property—from formula to metadata—in one go.
- Automate data collection. Your agent handles the tedious process of cross-referencing chemical identifiers (CIDs) and common names, giving you structured results instantly.

## How It Works

The bottom line is that you get verified chemical data pushed directly into your workflow without needing to open or navigate a web portal.

1. Subscribe to the PubChem PUG REST API server.
2. Connect your AI client (e.g., Claude, Cursor) using the MCP protocol.
3. Ask your agent to run a tool—for example, request `get_compound_by_name` for 'Aspirin' and provide the name as input.

## Frequently Asked Questions

**How do I find the CID for a compound using PubChem PUG REST API?**
Use `search_compound_cids` and input the common name or keyword. This tool scans the database and returns multiple potential CIDs, which you can then use with `get_compound_by_cid`.

**Is `get_compound_by_name` better than `get_compound_by_cid`?**
No. If you have the CID, always use `get_compound_by_cid`. It is faster and more precise because you are querying by a guaranteed unique identifier.

**What should I run if I want to know if PubChem is working?**
Run the `check_api_status` tool. This confirms that the entire data service is up, which is critical before running any compound lookups or searches.

**Can I find a formula using only my agent and PubChem PUG REST API?**
Yes. You can run `get_compound_by_name` (if you know the name) or `get_compound_by_cid` (if you have the ID). Both tools return the molecular formula as part of the comprehensive metadata.

**How do I authenticate my agent when using PubChem PUG REST API?**
You don't need to worry about authentication. The service is free and open, so your AI client connects immediately without needing an API key or credentials.

**What should I do if my call to `search_compound_cids` fails due to rate limiting?**
If the search request fails, your agent must implement retry logic with exponential backoff. Always check PubChem's published usage policy for specific rate limits and quotas.

**Does `get_compound_by_name` return the full molecular formula or just basic details?**
It returns comprehensive metadata. You'll get structured data including IUPAC names, structural formulas, and technical properties necessary for deep auditing.

**If a chemical name I pass to `get_compound_by_name` doesn't exist, how does PubChem PUG REST API handle the error?**
The tool will return an explicit error code or an empty dataset rather than crashing. Your agent must check the response structure for null values before attempting to process results.

**Is an API Key required for PubChem API?**
No. PubChem is a free and open service provided by the NIH. This server works out of the box without any static credentials required.

**What is a CID?**
CID stands for Compound Identifier. It is a unique numerical ID assigned to each chemical compound in the PubChem database.

**Can the agent show molecular formulas?**
Yes. The compound details retrieved by your agent include the molecular formula and IUPAC name metadata where available.