# UniProt MCP

> UniProt connects your AI agent directly to UniProt: the massive knowledge base for protein sequences. It lets you search 250M+ entries using keywords, find proteins related to a specific gene, or pull full functional details using an accession ID. If you work with proteomics, this is where the data lives.

## Overview
- **Category:** the-unthinkable
- **Price:** Free
- **Tags:** protein-sequences, bioinformatics, genomics, functional-annotation, biological-data, research-database

## Description

UniProt hooks your AI agent right into UniProt—it’s the giant knowledge base for protein sequences. If you work in proteomics, this is where you gotta go to get the data. You can search over 250 million entries using keywords, pull out proteins linked to a specific gene, or grab full functional details just from an accession ID.

When your agent uses these tools, it doesn't just give you vague summaries; it runs precise queries against the massive database and spits back structured JSON. You get sequence data, function breakdowns, gene names, and subcellular locations—all ready to use.

If you're starting broad, you can run `search_uniprot` using general keywords for things like 'hemoglobin' or a specific disease pathway. This tool pulls up basic protein info (name, organism, function) along with the amino acid sequence so you know what you're looking at right off the bat.

Need to compare proteins across different species? You gotta use `search_uniprot_gene`. Just drop in a gene name, and the agent pulls all known isoforms and functional annotations for that gene across multiple organisms. That makes comparative analysis way easier.

When you've narrowed it down and you know exactly which protein you want, pull up its accession ID. Then you use `get_uniprot_protein`. This is your direct lookup: it grabs every piece of functional and sequence data for that single protein—its full record, function summaries, location details, and the complete amino acid sequence. It's the deep dive you need without any guesswork.

## Tools

### get_uniprot_protein
Gets the full protein details, including sequence and function, when you provide a specific UniProt accession ID.

### search_uniprot_gene
Finds all protein isoforms and functional annotations associated with a specific gene name across different species.

### search_uniprot
Searches for proteins using general keywords like name or function. It returns basic data plus the amino acid sequence.

## Prompt Examples

**Prompt:** 
```
Tell me about the p53 tumor suppressor protein and its function.
```

**Response:** 
```
P53_HUMAN (UniProt P04637): Gene TP53, Homo sapiens. 393 amino acids. Swiss-Prot reviewed ✅. Function: Acts as a tumor suppressor in many tumor types by inducing growth arrest or apoptosis. Activated in response to DNA damage, oncogene activation, and hypoxia. Located in nucleus and cytoplasm. Critical note: TP53 is the most frequently mutated gene in human cancers.
```

**Prompt:** 
```
Find all proteins encoded by the BRCA1 gene.
```

**Response:** 
```
Found 10 entries for gene BRCA1. Top result: BRCA1_HUMAN (P38398, Swiss-Prot reviewed ✅) — Homo sapiens, 1863 amino acids. Function: E3 ubiquitin ligase, key role in DNA repair via homologous recombination. Mutations associated with increased risk of breast and ovarian cancer. Also found orthologs in mouse (Q61187), rat, and zebrafish.
```

**Prompt:** 
```
Look up UniProt accession Q9BYF1 and show me its full details.
```

**Response:** 
```
Q9BYF1: ACE2_HUMAN — Angiotensin-converting enzyme 2. Gene: ACE2, Homo sapiens. 805 amino acids. Swiss-Prot reviewed ✅. Function: Metalloprotease that converts angiotensin II to angiotensin-(1-7). Notably serves as the functional receptor for SARS-CoV and SARS-CoV-2 spike proteins. Located in cell membrane.
```

## Capabilities

### Search by Keyword or Function
Run a broad query using `search_uniprot` to find proteins associated with general terms like 'hemoglobin' or 'p53'.

### Retrieve Full Protein Record
Use the accession ID in `get_uniprot_protein` to get every piece of functional and sequence data for a single protein.

### Compare Gene Orthologs
Run `search_uniprot_gene` by entering a gene name; the agent returns all known isoforms encoded by that gene across different organisms.

## Use Cases

### Comparing Orthologs
A bioinformatician needs to compare the structure of p53 across human and mouse. Instead of searching two databases manually, they run `search_uniprot_gene` with 'TP53'. The agent returns multiple entries, including ortholog IDs for both species, allowing immediate comparison.

### Target Identification
A drug discovery team is looking for metalloproteases that convert angiotensin II. They use `search_uniprot` with 'metalloprotease' and 'angiotensin'. The agent pulls candidate proteins (like ACE2) and their functional annotations, narrowing down the list of viable targets.

### Deep Dive Lookup
The user finds a promising protein ID (Q9BYF1). They don't want the summary; they need *everything*. Running `get_uniprot_protein` guarantees they get the full, canonical record—function, gene, location, and sequence—in one shot.

### Understanding Gene Family Relationships
A researcher suspects a novel protein belongs to the BRCA1 family. They run `search_uniprot_gene` on 'BRCA1'. The agent returns ten entries, immediately showing all known isoforms and their associated roles in DNA repair.

## Benefits

- Get full protein records instantly. Use `get_uniprot_protein` with the accession ID to pull every known function, location, and amino acid sequence for a single target.
- Search by context, not just name. Running `search_uniprot_gene` on 'BRCA1' lets you compare all related isoforms across different organisms without multiple API calls.
- Avoid manual database browsing. The `search_uniprot` tool handles broad queries using keywords (like 'spike protein'), giving you an immediate set of candidate proteins and their sequences.
- Work with diverse data types. You get functional annotations, gene names, subcellular locations, and the raw amino acid sequence all in one structured output.
- Minimize redundancy. Instead of running three different searches for a single target, use `get_uniprot_protein` to pull the complete, verified record.

## How It Works

The bottom line is that you send a specific biological query, and the agent gets back structured data from the massive UniProt database.

1. First, tell your AI client which protein data you need: do you have an ID (use `get_uniprot_protein`), a general term (use `search_uniprot`), or a gene name (use `search_uniprot_gene`)?
2. The agent executes the specific tool call, sending parameters like 'BRCA1' or 'P04637' to the server.
3. The UniProt MCP Server runs the query against the database and returns structured JSON containing all requested details: sequence, function, location, etc.

## Frequently Asked Questions

**How do I use search_uniprot to find proteins by function?**
You pass the functional keyword directly into `search_uniprot`. For example, if you want all enzymes that process lipids, you'd input 'lipid metabolizing enzyme'. The tool returns candidates and their sequences.

**What is the difference between search_uniprot and get_uniprot_protein?**
Use `search_uniprot` when you are guessing or researching, using a keyword like 'spike protein'. Use `get_uniprot_protein` only if you have a precise UniProt accession ID (like P04637) and need the complete record.

**Can I find all proteins from a gene name with search_uniprot_gene?**
Yes. `search_uniprot_gene` is designed for exactly that. Give it the gene symbol (e.g., 'BRCA1'), and it returns every known isoform across multiple species, which is crucial for evolutionary comparison.

**Does UniProt MCP Server require an API key?**
No. You don't need to worry about managing keys or endpoints; Vinkius handles the connection when you subscribe and connect your AI client.

**When I use get_uniprot_protein, what detailed information do I get besides the sequence?**
You receive the full amino acid sequence alongside critical annotations. This includes functional descriptions, subcellular location data, and whether the entry is manually curated (Swiss-Prot) or auto-annotated (TrEMBL). The tool provides context necessary for experimental design.

**How does search_uniprot handle non-protein keywords or general terms?**
It returns results based on matching names, functions, and gene symbols. If you use a general term like 'oxidative stress,' the tool finds proteins associated with that function, rather than requiring an exact match.

**If I need to check multiple related genes, is it better to run search_uniprot_gene or search_uniprot?**
You should use search_uniprot_gene. This tool specifically compiles all protein isoforms and their annotations for a single gene across different organisms, giving you a more complete comparison set.

**Are there any rate limits when I run multiple searches using the search_uniprot tool?**
While no specific limit is published here, running many requests in quick succession may trigger throttling. For large-scale comparative analyses, batching your calls or incorporating delays between API invocations is advisable.

**What is the difference between Swiss-Prot and TrEMBL entries?**
Swiss-Prot contains 570K+ entries that have been manually reviewed and curated by expert biologists — the gold standard for protein annotation. TrEMBL contains 250M+ entries that are computationally annotated from gene sequences. Swiss-Prot entries are marked as 'reviewed' and are highly reliable; TrEMBL entries are automatically generated and may contain errors.

**Do I need to register or pay for an API key?**
No. UniProt REST API is completely free and open without any authentication. There are no rate limits for reasonable usage patterns. UniProt is funded by the National Institutes of Health (NIH), European Molecular Biology Laboratory (EMBL), and the Swiss Institute of Bioinformatics (SIB).

**Can I retrieve full amino acid sequences for proteins?**
Yes. Every protein entry includes the full amino acid sequence with length information. The sequence is returned in standard one-letter amino acid code. For very large proteins (10,000+ residues), the sequence may be truncated in the response but the full accession data is always provided for direct download.