# EBI Proteins API MCP

> The EBI Proteins API MCP connects your AI client directly to UniProt, giving you access to millions of protein entries and their full functional data. You can pull sequences, map genetic variants, find binding sites, and check post-translational modifications all in one query. It's a deep dive into the entire protein biology knowledge base.

## Overview
- **Category:** the-unthinkable
- **Price:** Free
- **Tags:** proteins, uniprot, bioinformatics, genomics, variants, proteomics, embl-ebi

## Description

Working with protein data shouldn't mean juggling five different databases. This MCP lets you query the comprehensive UniProt knowledge base for everything related to proteins—from basic sequence retrieval to complex functional annotations. Need to know if a certain variant is clinically significant? You can pull that aggregated information from sources like ClinVar and gnomAD. Want to map where a protein sits on a genome? Or check out every known binding site and active domain? It all comes together here. By connecting this MCP via Vinkius, your agent acts as an expert molecular biology assistant, giving you direct access to the full range of data sourced by EMBL-EBI. You get precise details like mass spectrometry evidence for specific modifications or detailed descriptions of mutagenesis experiments. The result is a single source of truth that bypasses hours of manual searching across multiple bioinformatics platforms.

## Tools

### get_antigen
Retrieves peptide regions used for antibody generation, useful when targeting specific protein parts.

### get_coordinates
Returns the precise genomic location (chromosome, start/end) and Ensembl IDs for a given protein.

### get_protein_features
Retrieves detailed annotations about the sequence, including domains, binding sites, active sites, and signal peptides.

### get_genecentric
Provides a view of how many related proteins exist for a specific gene ID within a proteome.

### get_mutagenesis
Lists known mutagenesis experiments, detailing the wild-type residue, mutant residues, and observed functional effects.

### get_protein
Fetches a complete record for any protein using its standard UniProt accession code.

### get_proteome
Gets high-level information about an entire organism's protein set, such as total protein count or taxonomy details.

### get_proteomics
Provides mass spectrometry data to show which peptides were experimentally detected for a specific protein.

### get_proteomics_ptm
Specifies residue-level positions and evidence counts for post-translational modifications found via mass spec.

### get_taxonomy
Looks up scientific names, ranks, and lineage connections using an NCBI taxon ID.

### get_variation
Gathers genetic variant data for a protein from multiple sources, noting clinical significance and consequence type.

### search_features_by_type
Searches across proteins to find specific types of features like binding sites or transmembrane regions.

### search_proteins
Finds a summarized list of proteins by searching based on gene name, organism, or keyword.

### search_proteomes
Searches for entire proteomes using general terms like 'homo sapiens' or 'escherichia coli'.

### search_taxonomy
Finds correct taxonomy entries by name, providing the necessary IDs to start a protein query.

### search_variation
Searches for clinically relevant genetic variants based on their consequence type or source study.

## Prompt Examples

**Prompt:** 
```
Get all known genetic variants for the TP53 tumor suppressor protein.
```

**Response:** 
```
I've retrieved the variant data for TP53 (P04637) from the EBI Proteins API. The protein has hundreds of documented variants aggregated from UniProtKB, ClinVar, gnomAD, COSMIC, and other sources. These include missense mutations in the DNA-binding domain (residues 100-290) which are the most clinically significant, along with their pathogenicity classifications and population frequencies.
```

**Prompt:** 
```
Show me the domain architecture and binding sites of the EGFR protein.
```

**Response:** 
```
I've retrieved the sequence features for EGFR (P00533). The protein contains multiple annotated domains including the extracellular ligand-binding domain, furin-like cysteine-rich domains, a transmembrane region, and the intracellular tyrosine kinase domain. Key binding sites include the ATP-binding site in the kinase domain and EGF receptor-binding regions. I also found signal peptides, glycosylation sites, and disulfide bonds.
```

**Prompt:** 
```
Map the BRCA1 protein to its genome coordinates on GRCh38.
```

**Response:** 
```
I've mapped BRCA1 (P38398) to genome coordinates. The protein maps to chromosome 17 on the GRCh38 assembly, with Ensembl gene ID ENSG00000012048 and transcript ID ENST00000357654. The coding region spans a large genomic interval on the reverse strand, reflecting BRCA1's complex exon structure. This mapping allows you to cross-reference protein-level variant annotations with genomic positions.
```

## Capabilities

### Retrieve complete protein records
Fetch an entire protein entry using its UniProt accession, including names, organism data, and cross-references.

### Identify structural features
Get annotated details on domains, binding sites, active sites, and transmembrane regions for any given sequence.

### Analyze genetic mutations
Access curated variant data aggregated from multiple large-scale studies, assessing clinical significance and consequence type.

### Map protein locations to the genome
Find the exact chromosome coordinates, Ensembl gene IDs, and transcript mapping for a specific protein.

### Check structural modifications
Query data on post-translational modifications (PTMs) or mass spectrometry peptide evidence for validation.

## Use Cases

### Assessing drug targets for cancer research
A geneticist finds a promising mutation in TP53. Instead of checking ClinVar, then running a separate search, they use get_variation and search_variation to immediately cross-reference the variant's clinical significance and population frequency across multiple sources.

### Designing an antibody against a novel protein
A structural biologist wants to know which part of the target protein is best for binding. They first use get_protein_features to identify all potential domains, then run get_antigen to narrow down the optimal epitope region.

### Integrating proteomics into a metabolomics pipeline
A researcher needs to validate protein expression levels. They retrieve the full protein record using get_protein and then call get_proteomics to see which peptides were actually detected via mass spectrometry in their samples.

### Understanding species differences for a conserved pathway
A bioinformatician needs to compare human and mouse versions of the same enzyme. They use search_taxonomy first, then get_proteome on both IDs to quickly compare total protein counts and general architecture.

## Benefits

- You stop guessing which database holds the variant info. By calling get_variation, you pull aggregated data from sources like ClinVar and gnomAD in one step.
- Forget manually tracing protein IDs across genomic maps. Use get_coordinates to map a protein directly to its chromosome location (GRCh38) with Ensembl IDs.
- Need context for an entire species? Instead of searching piece by piece, use search_proteomes and get_proteome to view the total count and taxonomy status immediately.
- Structural analysis is faster than ever. The get_protein_features tool automatically pulls domains, binding sites, and signal peptides, saving you hours of manual annotation review.
- Validating protein activity used to be a nightmare. Now, calling get_proteomics or get_proteomics_ptm delivers actual mass-spectrometry evidence for modifications at the residue level.

## How It Works

The bottom line is that you ask a complex biological question once, and the MCP handles all the data sourcing and structuring for you.

1. First, connect your AI client to this MCP via Vinkius and confirm access.
2. Next, give your agent a specific biological question—for instance, 'What are the known variants for TP53?' or 'Show me the domain structure of EGFR.'
3. Finally, your agent executes the necessary queries using the specialized tools and delivers structured data directly to you.

## Frequently Asked Questions

**How do I find all known variants for a protein? (using get_variation)**
You run get_variation with the UniProt accession. This tool pulls variant data from multiple sources like ClinVar and gnomAD, giving you aggregated results on clinical significance.

**What is the difference between search_proteins and get_protein?**
Use search_proteins when you only know a keyword or an organism name. Use get_protein when you already have the precise UniProt accession ID for the specific protein record.

**Can I find out where a gene is located on a chromosome? (using get_coordinates)**
Yes, use get_coordinates to map a protein. It returns the exact genomic location, including Ensembl IDs and the start/end positions on chromosomes like GRCh38.

**How do I check for structural modifications? (using get_proteomics_ptm)**
Use get_proteomics_ptm. This tool delivers residue-level positions and evidence counts, showing exactly where post-translational modifications were detected in mass spectrometry data.

**I need to find the correct species ID first. (using search_taxonomy)**
Start with search_taxonomy. This tool accepts common names or IDs and returns the precise NCBI taxon ID you need before querying anything else for that organism's proteome.

**When using the tool `search_features_by_type`, what types of protein characteristics can I filter for?**
It returns a predefined list of feature categories like DOMAIN, BINDING, ACTIVE_SITE, and SIGNAL. You can narrow your search to specific regions—for instance, looking only for TRANSMEM or CARBOHYD features on a given protein.

**What is the benefit of using `get_genecentric` over just fetching a full protein entry?**
It provides a gene-centric view by showing both the canonical protein count and related protein counts for that specific gene. This context helps you understand how important or prevalent that particular gene is within its entire proteome.

**If I want an overview of all available data for an organism, should I use `search_proteomes`?**
Yes, using `search_proteomes` gives you essential summary metrics. It provides the proteome IDs, protein counts, and gene counts, letting you quickly gauge the scope of the reference data available for a given species.

**Do I need an API key to use this server?**
No. The EMBL-EBI Proteins API is completely public and requires no authentication. Simply subscribe to this server and enter any placeholder value in the API key field to start querying protein data immediately.

**What kind of variant data is available?**
The server aggregates genetic variants from multiple authoritative sources: UniProtKB curated variants, ClinVar clinical significance data, gnomAD population frequencies, 1000 Genomes Project, COSMIC somatic mutations, TOPMed whole-genome sequencing, ExAC exome data, and TCGA cancer variants. Each variant includes consequence type, clinical significance, and source cross-references.

**Can I map protein positions to genome coordinates?**
Yes. The get_coordinates tool maps any UniProt protein to reference genome coordinates on GRCh38 and GRCh37 assemblies. It returns Ensembl gene, transcript, and translation identifiers along with chromosome, start/end positions, and strand orientation. This bridges the gap between protein-level annotations and genomic-level analyses.