# PubChem MCP

> PubChem connects your AI agent to the world's largest open chemistry database, containing over 116 million compound records. This server lets you search for chemicals using common names, IUPAC nomenclature, or molecular formulas; it also retrieves deep data like SMILES strings, molecular weight (MW), and XLogP scores directly into your workflow.

## Overview
- **Category:** the-unthinkable
- **Price:** Free
- **Tags:** chemistry, molecular-data, drug-discovery, scientific-database

## Description

Listen up. This server connects your AI client straight into PubChem—the biggest open chemistry database out there, sitting on over 116 million compound records. You don't need some headache API key to get at this data; you just use the tools right here.

**How it works:** Your agent calls one of these specific functions, and *bam*, it gets structured molecular data ready for your workflow. It's pure chemical intelligence, period. Forget wading through web pages; you get what you need directly.

When you need to find a compound using its common name or any known synonym, you use the `search_pubchem` tool. This function takes natural language identifiers—like 'aspirin' or some IUPAC-style synonym—and spits back key data points for matching compounds, including their molecular weight (MW) and XLogP scores.

If a name ain’t gonna cut it, you can use the `search_pubchem_formula` tool. Give it a specific molecular formula, like C8H10N4O2, and the server will identify *all* known compounds that match that exact composition. It's how you filter down the haystack when you only know the count of atoms you got.

When you already have a PubChem Compound ID (CID), you don't wanna waste time searching; you just want all the details on that molecule. That’s where `get_pubchem_compound` comes in. You feed it the CID, and it pulls back every piece of deep molecular data—you get the full chemical formula, the exact weight, the SMILES notation (that's the structure string), and more physicochemical properties. It's a complete profile for that compound.

Think about this: If your workflow needs to track compounds across different search vectors, you don't need multiple integrations or complex database lookups. You just use the specialized tools we put here. Need common names? Use `search_pubchem` and pull those MW and XLogP scores immediately. Wanna check a formula against thousands of possibilities? Run `search_pubchem_formula`. Got a specific ID you're working off of? `get_pubchem_compound` delivers the complete molecular blueprint, including that vital SMILES string. These tools act as direct pipelines to high-quality chemistry data, letting your agent work faster than anything else on this end.

## Tools

### get_pubchem_compound
Retrieves full molecular data—including formula, weight, SMILES, etc.—for a specific PubChem Compound ID (CID).

### search_pubchem_formula
Finds all matching compounds when given a specific molecular formula (e.g., C8H10N4O2).

### search_pubchem
Searches for chemical compounds using common names or synonyms and returns key identifiers like MW and XLogP.

## Prompt Examples

**Prompt:** 
```
What are the molecular properties of aspirin?
```

**Response:** 
```
Aspirin (CID 2244): Formula C9H8O4, MW 180.16, SMILES CC(=O)OC1=CC=CC=C1C(O)=O, XLogP 1.2, H-bond donors: 1, H-bond acceptors: 4. Passes Lipinski's Rule of Five — classified as drug-like.
```

**Prompt:** 
```
Search for compounds with the molecular formula C8H10N4O2.
```

**Response:** 
```
Found compounds matching C8H10N4O2: Primary result is Caffeine (CID 2519) — the alkaloid found in coffee, tea, and chocolate. MW: 194.19, SMILES: CN1C=NC2=C1C(=O)N(C(=O)N2C)C. Also found: Theophylline (CID 2153) — used as bronchodilator for asthma treatment.
```

**Prompt:** 
```
Get the full chemical details for PubChem compound CID 5090.
```

**Response:** 
```
CID 5090: Metformin — Formula C4H11N5, MW 129.16, SMILES CN(C)C(=N)NC(=N)N. XLogP: -1.4 (highly water-soluble). H-bond donors: 3, acceptors: 3. Widely used as first-line treatment for type 2 diabetes. Molecular complexity: 95.
```

## Capabilities

### Search by Common Name
Find chemical compounds using their common names or synonyms.

### Search by Molecular Formula
Identify all known compounds that match a specific molecular formula (e.g., C8H10N4O2).

### Retrieve Full Compound Data
Pull detailed chemical information, including SMILES notation and physicochemical properties, using the PubChem Compound ID.

## Use Cases

### Validating a Drug Candidate's Structure
A chemist suspects a new lead compound has the formula C8H10N4O2. They run `search_pubchem_formula` to confirm candidates, identifying Caffeine (CID 2519). Next, they use `get_pubchem_compound` on CID 2519 to pull its full data payload, including MW and SMILES, for immediate comparison against known benchmarks.

### Comparing Known Compounds
You need to compare the properties of Aspirin vs. Caffeine. You run `search_pubchem` twice—once for 'Aspirin' and once for 'Caffeine'. The agent retrieves both sets of data, allowing you to see key differences in XLogP or H-bond counts side-by-side without leaving your coding environment.

### Handling Ambiguous Inputs
You only know the compound by a common name, like 'Glucose'. You use `search_pubchem` to query it. This returns initial data and IDs. If you need more than just the basic properties, you take those IDs and feed them into `get_pubchem_compound` for maximum detail.

### Filtering a Large Library by Composition
A researcher is building a library of possible compounds. Instead of manually checking thousands of names, they use `search_pubchem_formula` with the target formula (e.g., C4H11N5) to pull every matching structure and then fetch their specific properties.

## Benefits

- You get the full molecular picture in one go. Instead of searching for a compound name and then having to find its SMILES string somewhere else, `get_pubchem_compound` delivers everything—structure, weight, formula—in a single call.
- Need to verify a structure by formula? Use `search_pubchem_formula`. You simply input the molecular recipe (like C9H8O4), and it returns all matching candidates. No manual filtering required.
- Stop relying on vague text searches. The `search_pubchem` tool handles common names and synonyms, immediately giving you key metrics like XLogP and hydrogen bond counts for fast screening.
- The data is structured for code. When your agent runs a query, the output isn't just text; it’s organized molecular records you can pass directly to other scripts or databases.
- It works with real-world drug knowledge. The server handles complex compounds and validates them against global standards, letting you focus on chemistry, not data hygiene.

## How It Works

The bottom line is: you use a search tool first to get candidate IDs, then pass those IDs to the detail retrieval tool for the final payload.

1. Start by directing your agent to use `search_pubchem` when you know a compound's common name or synonym.
2. If naming fails, use `search_pubchem_formula` with the molecular formula (e.g., C9H8O4) to narrow down possibilities.
3. Once an ID is confirmed, call `get_pubchem_compound` using that CID to pull every available data point.

## Frequently Asked Questions

**How do I find a compound by name using PubChem MCP Server?**
You use `search_pubchem`. Just pass the common or IUPAC name you're looking for. It returns initial data and key identifiers like MW, which is usually enough to confirm what you're working with.

**What if I only have a molecular formula? Can PubChem MCP Server help?**
Yes, use `search_pubchem_formula`. You pass the exact formula (e.g., C8H10N4O2), and it finds all known compounds that match that composition.

**What is the best tool to get deep data for a specific compound?**
`get_pubchem_compound` is your go-to. You must feed it a PubChem Compound ID (CID) first. This tool pulls the deepest set of properties, including SMILES and InChI.

**Can I use `search_pubchem` for more than one compound?**
While you pass multiple names to the agent's prompt, you typically need to execute `search_pubchem` sequentially or gather individual results first before processing them all.

**Do I need an API key to use the `search_pubchem` tool?**
No, you don't need an API key for any of the tools. Just connect your AI client and start running searches immediately. The server handles all authentication on Vinkius.

**What happens if I use `get_pubchem_compound` with a non-existent CID?**
The tool will return an error message indicating that the specified PubChem Compound ID (CID) could not be found. You'll need to verify the ID or try searching by name instead.

**Does `search_pubchem` handle ambiguous common names?**
Yes, it searches across 116M+ compounds and handles common names like 'aspirin' or 'caffeine'. The results will provide the most accurate molecular data available for that name.

**Can I use `search_pubchem_formula` to find organic molecules outside of drug discovery?**
The tool finds compounds by any valid molecular formula, making it useful for general biochemistry research. It covers both pharmaceutical leads and foundational biological metabolites.

**Do I need an API key to use PubChem?**
No. PubChem PUG REST is completely free and open without any authentication. The only limitation is a rate limit of 5 requests per second and 400 requests per minute, which is more than sufficient for conversational AI usage.

**What molecular properties are returned for each compound?**
Each compound includes: CID, IUPAC name, molecular formula, molecular weight, canonical SMILES, InChI identifier, XLogP (lipophilicity), hydrogen bond donor count, hydrogen bond acceptor count, and molecular complexity score. These cover Lipinski's Rule of Five for drug-likeness assessment.

**Can I search by molecular formula instead of name?**
Yes! Use the formula search tool with standard notation (e.g., C8H10N4O2 for caffeine, C9H8O4 for aspirin, H2O for water). PubChem will return all compounds matching that exact formula with their names and properties.