# EBI InterPro MCP

> EBI InterPro MCP connects your AI agent to the world's central resource for protein classification. This MCP allows you to classify proteins by predicting functional domains, mapping structural data, and exploring evolutionary relationships across every species. It lets you go beyond simple sequence matching, giving deep insight into a protein's function based on its domain profile.

## Overview
- **Category:** the-unthinkable
- **Price:** Free
- **Tags:** interpro, pfam, protein-domains, protein-families, bioinformatics, embl-ebi, functional-annotation

## Description

This connector gives your agent the ability to analyze proteins at a highly technical level. You can feed it a protein sequence, and it will tell you exactly what functional domains that sequence belongs to—whether those are kinase domains or something else entirely. It pulls data from a huge collection of databases like Pfam and CDD through one unified interface. Need to know which species carry this domain? Or where the 3D structure exists in the PDB archive? The MCP finds it all. You can even check how well an entire organism's proteome is annotated for domains, giving you a full picture of its functional coverage. Because Vinkius hosts this resource, your AI client treats InterPro like a single, massive domain expert, making complex biological questions easy to ask.

## Tools

### get_cdd_entry
Retrieves detailed information about a specific Conserved Domain Database (CDD) entry using an accession number.

### get_clan
Gets super-family grouping details for Pfam clans, including the name and member count of the group.

### get_entry
Fetches general metadata for any InterPro entry, listing its family type, GO terms, and cross-references to other databases.

### get_entry_proteins
Returns a list of all protein accessions that match a given InterPro entry across the UniProt database.

### get_entry_structures
Finds all PDB structures associated with an InterPro entry, giving names and experimental resolution data.

### get_entry_taxonomy
Returns the taxonomic distribution of an InterPro entry, showing which organisms contain that specific domain or family.

### get_pfam_entry
Gets detailed information for a Pfam domain or family using its accession number (e.g., PF00069).

### get_protein
Retrieves comprehensive details for a protein, including counts of associated InterPro entries, structures, and taxa.

### get_protein_entries
Lists every single InterPro entry that matches a specific protein sequence, which is key for characterization.

### get_proteome
Gets domain coverage statistics and basic details for an entire organism's proteome using its UniProt ID.

### get_structure
Retrieves a PDB structure file along with mapped InterPro annotations based on a 4-character PDB ID.

### get_taxonomy
Returns the taxonomic node details for an organism, including its rank and counts of associated entries and proteins.

### list_entry_databases
Lists all member databases supported by InterPro and shows how many total entries each one contains.

### search_entries
Searches across InterPro entries using keywords (like 'kinase') or filtering by type, returning accessions and counts.

### search_proteins
Searches for proteins within InterPro using a name or keyword, giving back UniProt IDs and annotation counts.

### search_taxonomy
Searches the organism database by common names (like 'human') to find relevant taxon IDs and associated entries.

## Prompt Examples

**Prompt:** 
```
What domains and families does the human TP53 protein belong to?
```

**Response:** 
```
I've retrieved all InterPro annotations for TP53 (P04637). The protein is classified into the p53 tumor suppressor family (IPR002117), contains a p53 DNA-binding domain (IPR011615), a p53 tetramerization domain (IPR010991), and a transactivation domain. These domains are conserved across vertebrates and define the functional architecture of p53 as a transcription factor and tumor suppressor.
```

**Prompt:** 
```
Show me all member databases in InterPro and how many entries each has.
```

**Response:** 
```
I've retrieved the InterPro database overview. The platform integrates 13+ member databases: Pfam leads with over 20,000 families and domains, followed by PANTHER, Gene3D, and CDD with thousands of entries each. SMART, Prosite, HAMAP, PRINTS, PIRSF, SFLD, NCBIfam, and AntiFam complete the collection. InterPro itself has over 46,000 integrated entries that unify signatures across these databases.
```

**Prompt:** 
```
Which organisms have the kinase domain PF00069?
```

**Response:** 
```
I found the Pfam entry PF00069 (protein kinase domain) and retrieved its InterPro integration (IPR000719). The taxonomic distribution shows this domain is universally conserved across all kingdoms of life — from bacteria and archaea to plants, fungi, and animals. The human genome alone contains hundreds of kinase domain-containing proteins, making this one of the most expanded protein families in eukaryotic evolution.
```

## Capabilities

### Classify protein domains
Predicts and retrieves detailed information about known functional domains (like Pfam or CDD) associated with a given protein sequence.

### Map structural data
Finds specific PDB IDs and resolutions for proteins that match an InterPro entry, connecting function to physical structure.

### Trace evolutionary lineage
Determines which organisms or taxonomic groups contain a protein matching a specific domain or family. This helps map evolution across life.

### Identify proteome coverage gaps
Assesses how completely the proteins of an entire organism (a proteome) are annotated with known domains.

### Retrieve all protein members
Lists every known protein in a database that shares a specific domain or family annotation, across different organisms.

## Use Cases

### Investigating an unknown sequence
A molecular biologist gets a novel protein sequence. Instead of guessing, they ask their agent to run `get_protein_entries`. The MCP returns every associated domain (Pfam, CDD), letting them immediately predict the function and family type.

### Mapping functional changes across species
An evolutionary biologist wants to know if a specific kinase domain is conserved. They use `get_entry_taxonomy` on the relevant InterPro entry, quickly seeing which kingdoms—from archaea to mammals—contain that critical domain.

### Building a functional pipeline
A bioinformatician needs to process thousands of protein sequences. They use `search_proteins` by name (e.g., 'insulin') to get core IDs, then run `get_entry_proteins` on those IDs to batch-process all associated domain assignments.

### Checking database scope
A student needs to know the full scale of InterPro. They execute `list_entry_databases`, which immediately provides a count breakdown for every member database, proving the depth and breadth of the resource.

## Benefits

- You can immediately determine a protein's domain profile by calling `get_protein_entries`, answering the core question: 'What domains does my protein have?'
- Instead of checking multiple sites, you use `list_entry_databases` to see exactly how many entries are available across all 13+ member databases in one API call.
- To understand evolution, run a query with `get_entry_taxonomy`; this tells you which organisms share that domain or family, essential for conservation studies.
- When you need structural context, use `get_structure` to pull PDB IDs and corresponding annotations directly related to an InterPro entry. No manual database cross-referencing needed.
- You can assess a whole genome's completeness by running `get_proteome`, which provides domain coverage statistics for the entire organism, helping pinpoint annotation gaps.

## How It Works

The bottom line is, you don't have to jump between 13 different bioinformatics websites to get a full picture of a protein.

1. Tell your AI client what biological question you're asking—like 'What domains does this protein have?'
2. The MCP sends the query to InterPro and pulls data from multiple sources, consolidating domain details, structures, and species information.
3. Your agent receives a structured report detailing the protein's classification across various databases (Pfam, CDD) and its evolutionary context.

## Frequently Asked Questions

**How do I find all domains for one protein using get_protein_entries?**
You pass the UniProt accession to `get_protein_entries`. It returns every associated InterPro entry, giving you a complete list of its functional domain assignments.

**Can I use search_taxonomy to find an organism's full profile?**
Yes. You run `search_taxonomy` with the name (e.g., 'human'), and then use `get_taxonomy` on the returned ID to get its rank, lineage, and associated protein counts.

**What is the difference between get_protein and get_proteome?**
`get_protein` gives you domain data for a single sequence. `get_proteome` looks at an entire organism's ID, providing stats on how many proteins in that whole genome are annotated with domains.

**How do I check if Pfam is included in the database scope? Use list_entry_databases.**
Running `list_entry_databases` shows you all member databases. You'll see Pfam listed there, along with its current entry count, confirming it's part of the unified resource.

**When using `search_entries`, how should I filter my results to look only at domains, excluding entire families?**
You must specify the entry type parameter in `search_entries`. This allows you to narrow the search scope immediately. Instead of getting all related data, you limit the output strictly to domain or family types as needed for your analysis.

**If I run `get_entry` for an InterPro ID, how do I use `get_entry_structures` to find its associated 3D models?**
You pass the primary InterPro accession ID into `get_entry_structures`. This function links the abstract annotation metadata directly to physical structural data. It returns PDB IDs and resolution details for comparison.

**When I use `get_taxonomy`, how do I explore the evolutionary lineage or parent nodes of an organism?**
The `get_taxonomy` tool includes rank information in its results. This lets you trace ancestry and see which broader groups contain a specific species ID. It maps hierarchical relationships, essential for understanding conservation.

**Can I use `get_pfam_entry` to query multiple Pfam accessions (like PF00069 and another one) in a single request?**
Yes. The tool accepts an array or comma-separated list of Pfam IDs for batch querying. This is the most efficient way to retrieve domain details for several known protein families at once.

**Do I need an API key?**
No. The InterPro API is completely public and requires no authentication. Enter any placeholder value in the API key field to activate the server immediately.

**What databases does InterPro integrate?**
InterPro integrates 13+ member databases including Pfam (protein families), CDD (conserved domains from NCBI), SMART (signalling domains), Prosite (patterns and profiles), PANTHER (evolutionary classification), Gene3D (structural domains from CATH), HAMAP (microbial families), PRINTS (fingerprints), PIRSF (classification system), SFLD (superfamilies), and NCBIfam. This gives you a unified view of protein domain and family annotations from the world's leading classification resources.

**Can I find which organisms have a specific protein domain?**
Yes. Use the get_entry_taxonomy tool with any InterPro accession to see the taxonomic distribution of that domain or family. This shows which organisms — from bacteria to humans — contain proteins with that specific domain. It is one of the most powerful tools for evolutionary biology, revealing how protein domains have been conserved or diversified across the tree of life.