Translating the code of Life using the human proteome

The Human Proteome Organization launched the Human Proteome Project (HPP) in 2010, which led to the generation of the first high-stringency HPP blueprint—now covering >90% of the human proteome— that is being used by scientists worldwide to better understand human biology and enable better clinical decision-making.

The problem

Human biology is driven by genetic and protein components that can lead to disease when disrupted, but the number of genes and proteins involved in human health is massive, and their interactions create a complex, difficult-to-translate network. Diseases are difficult to understand, let alone treat, without the right tools, knowledge, and data-sharing practices.

The solution

The Human Proteome Organization launched the Human Proteome Project (HPP) in 2010, which generated the first high-stringency HPP blueprint—covering >90% of the human proteome. The HPP consists of many scientific teams and collaborators that are all working towards two common goals:

Catalogue the human proteome and discover its complexity by:

Establishing consensus on stringent and reliable standards
Identifying more than one protein product from each protein-coding gene
Detecting expression of missing proteins

Integrate proteomics into multi-omics studies to advance life sciences, medical sciences and precision medicine.

The HPP high-stringency blueprint explained

“High-stringency” refers to rigorous HPP standards for post-acquisition processing and any protein inference made from raw mass spectrometry peptide data. HPP relies on a system developed by neXtProt and UniProtKB, which attributes five levels of supporting evidence for protein existence, four being used by HPP to evaluate claims of new protein detection.

There are roughly 19,700 protein-coding genes. The latest neXtProt HPP reference release (data release: 17-01-2020) designates credible evidence for 90.4% of the human proteome, up from 69.8% neXtProt entries in 2011. The HPP has two strategic initiatives: chromo-some-centric (C-HPP; 25 scientific teams) and biology/disease-centric (B/D-HPP; 19 scientific teams).

Chromosome-centric (C)-HPP

Annotates all genome-encoded proteins
Explores proteins unobserved in high confidence by analytical methods such as MS

Biology/disease-centric (B/D)-HPP

Measures and interprets human proteome data under diverse physiological and pathological conditions.
Elucidates hallmark protein drivers of biology/disease
Promotes the development of new proteomics analytical tools, such as antibody-based approaches

Key resource pillars support the C-HPP and B/D-HPP, ensuring effective data generation, integration, implementation, and the establishment of metrics and guidelines.

HPP resource pillars

Antibodies	Mass spectrometry	Knowledge bases	Pathology
Supports a comprehensive understanding of biology, health and disease.	Improves depth and accuracy of proteome identification, quantification and modification.	Enables a cohesive and seamless knowledge transfer system.	Promotes professional application of proteomics in pathology.
Utilizes Ab-based strategies to analyse spatio-temporal aspects of the proteome.	Informs the community about MS technology/workflow advances and appropriate high-stringency standards.	Captures, collects, collates, analyses and re-distributes all human proteome data.	Coordinates the identification of unmet clinical needs, develops clinical assay guidelines/standards, coordinates access to quality clinical samples and metadata.
Links protein identification with real-time localization at tissue, cell and subcellular levels.			Liaises with pathology organizations, diagnostic companies and regulatory.

The impact

A high-coverage human proteome blueprint that is constantly being updated and analyzed by a massive scientific community enables proteomics to be useful in tackling medical challenges. Reliable and robust proteomic data is translatable to precision medicine and is critical for understanding, diagnosing, and treating a variety of complex diseases such as cancer and cardiovascular disease, as well as infectious diseases, including the pandemic-causing SARS-CoV-2. Furthermore, integrating genomic and proteomic data (i.e., proteogenomics) has the potential to unmask causes and mechanisms underlying diseases, including the hallmarks of cancer, facilitating effective therapeutic intervention.

Use Cases and Applications of the HPP Blueprint of the Human Proteome

Characterizing cancer at the proteogenomics level

NCI’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) in collaboration with the B/D-HPP cancer team, established guidelines, data sharing, and standards in workflows, producing a more comprehensive understanding of cancer biology and implemented the new processes into clinical research studies.
This led to the creation of the International Cancer Proteogenome Consortium (ICPC).
CPTAC and ICPC collaborators have comprehensively characterized 13 cancer types at the proteogenomics level (datasets publicly accessible).

Tackling pandemic-causing SARS-CoV-2

Building on the data generated from numerous omics studies on coronavirus infection, recent proteomics studies have focused on SARS-CoV-2, revealing promising therapeutic targets, potential biomarkers and protective antibodies.
The HPP teams and collaborators continue to investigate viral, bacterial, fungal and parasitic diseases to understand pathogenic infection and to develop diagnostics and therapeutics.

Sub-classifying Cardiovascular Diseases (CVD)

Proteogenomics allows scientists to assess interactions, pathways and networks to inform diagnosis and treatment of complex CVDs.
Since the launch of the HPP, CVD proteomics has expanded from identifying single canonical proteins to mapping proteoforms, enabling CVD sub-classification and assay development.

You can easily access and analyze the Human Proteome Reference Library (HPRL), linking all HPP-associated PubMed searches.

References

Adhikari, Subash, et al. "A high-stringency blueprint of the human proteome." Nature communications 11.1 (2020): 1-16.

Translating the code of life

Characterizing cancer at the proteogenomics level

Tackling pandemic-causing SARS-CoV-2

Sub-classifying Cardiovascular Diseases (CVD)