Case Studies and Examples¶
Clustergrammer was developed to visualize high-dimensional biological data (e.g. genome-wide expression data), but it can also generally be applied to any high-dimensional data. Below are links to several case studies and examples using Clustergrammer to explore high-dimensional data. All examples are below are publically available through GitHub.
Cancer Cell Line Encyclopedia Gene Expression Data¶
The Cancer Cell Line Encyclopedia (CCLE) is a publicly available project that has characterized (e.g. genetic characterization) over 1,000 cancer cell lines. We used Clustergrammer to re-analyze and visualize CCLE’s gene expression data in the CCLE Explorer. The CCLE Explorer allows users to explore the CCLE by tissue type and visualize the most commonly differentially expressed genes for each tissue type as an interactive heatmap. The CCLE Jupyter Notebook generates an overview of the CCLE gene expression data, investigates specific tissues, and explains how to use Enrichrgram to understand the biological functions of differentially expressed genes.
Lung Cancer Post-Translational Modification and Gene Expression Regulation¶
Lung cancer is a complex disease that is known to be regulated at the post-translational modification (PTM) level, e.g. phosphorylation driven by kinases. Our collaborators at Cell Signaling Technology Inc used Tandem Mass Tag (TMT) mass spectrometry to measure differential phosphorylation, acetylation, and methylation in a panel of 42 lung cancer cell lines compared to non-cancerous lung tissue. Gene expression data from 37 of these lung cancer cell lines was also independently obtained from the publicly available Cancer Cell Line Encyclopedia (CCLE). In the Jupyter notebook CST_Data_Viz.ipynb we:
- Visualize PTM data, gene expression data, and merged PTM/gene-expression data
- Identify co-regulated clusters of PTMs/genes in distinct lung cancer cell line subtypes
- Perform enrichment analysis to understand the biological processes involved in PTM/expression clusters
CyTOF Data: Single Cell Immune Response to PMA Treatment¶
White blood cells are a key component of the immune system and kinase signaling is known to play an important role in immune cell function (see Isakov and Altman 2013). Our collaborators in the Giannarelli Lab and the Icahn School of Medicine Human Immune Monitoring Core used Mass Cytometry, CyTOF (Fluidigm), to investigate the phosphorylation response of peripheral blood mononuclear cells (PBMC) immune cells exposed to PMA (phorbol 12-myristate 13-acetate), a tumor promoter and activator of protein kinase C (PKC). A total of 28 markers (18 surface markers and 10 phosphorylation markers) were measured in over 200,000 single cells. In the Jupyter notebook Plasma_vs_PMA_Phosphrylation.ipynb we semi-automatically identify cell types using surface markers and cluster cells based on phosphorylation to identify cell-type specific behavior at the phosphorylation level. See the Plasma_vs_PMA_Phosphrylation.ipynb Jupyter notebook for more information.
Large Network: Kinase Substrate Similarity Network¶
Clustergrammer can be used to visualize large networks without the formation of ‘hairballs’. In the Kinase Substrate Similarity Network example we use Clustergrammer to visualize a network kinases based on shared substrate that includes 404 kinases and 163,216 links. Kinases are shown as rows and columns. For more information see the Kinase Substrate Similarity Network example.
Machine Learning and Miscellaneous Datasets¶
Clustergrammer was used to visualize several widely used machine learning Datasets and other miscellaneous Datasets:
These examples demonstrate the generality of heatmap visualizations and enable users to interactively explore familiar Datasets.
Zika Virus RNA-seq Data Visualization¶
Clustergrammer was used to visualize the results of an RNA-Seq data analysis pipeline within a Jupyter notebook: An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study (Wang et al.).