Biology-Specific Features

Clustergrammer was developed to visualize high-dimensional biological data (e.g. genome-wide expression data), but it can also generally be applied to any high-dimensional data (e.g. a matrix). Clustergrammer has several biology-specific features that facilitate the analysis of gene-level biological data, such as: gene-expression data, proteomics-data, etc. To take advantage of these features, row names must be genes. See the CCLE Explorer for examples of gene-expression data. These optional biology-specific features are available in the Clustergrammer Web App as well as the Clustergrammer Jupyter Widget and will automatically activate if the row-names are genes.

Mouseover Gene Name and Description

The human genome consists of over 20,000 genes and modern high-throughput measurements are capable of making measurements across the entire genome (e.g. genome-wide expression studies). Human genes have official gene symbols (e.g. EGFR) that are frequently used to label genes in these datasets. Since no biologist can be knowledgeable about every gene in the genome a common and repetitive task for biologists is looking up the names and descriptions of genes in a dataset or visualization. To streamline this activity, Clustergrammer automatically displays the full name and description of a gene (provided by data aggregated through the Harmonizome) as a tooltip when a user mouses over a gene label (see screenshot below). This simple feature speeds up analysis of large gene-level datasets.

Gene Info

Mousing over a gene name row brings up the full gene name and description (provided by data aggregated through the Harmonizome).

The JavaScript file hzome_functions.js provides this functionality by utilizing Harmonizome’s RESTful API to obtain gene names and descriptions on mouseover events. hzome_functions.js is passed to Clustergrammer-JS as a callback function. See load_clustergram.js for an example use case. Mouseover callback functions can be used by developers to extend functionality for other domain-specific problems.

Enrichment Analysis

The field of biology has amassed an enormous amount of information about the genes in living organisms such as: function, disease-association, up-stream regulators, protein-level binding partners, etc. Integration of this information can help biologists understand patterns in their data. For instance, enrichment analysis a popular method to identify biological information specific to a list of genes – e.g. a biologist may use enrichment analysis to identify up-stream regulatory transcription factors that specifically target their measured set of up-regulated genes and thereby form hypotheses about potential up-stream regulators in their system.

Export to Enrichr

When a user visualizes a matrix with genes as rows, Clustergrammer automatically enables integration with the enrichment analysis tool Enrichr. Users can export a set of clustered genes to Enrichr using the interactive dendrogram (see screenshot) or import enriched terms into the visualization using Enrichrgram.

Send to Enrichr Modal

Clicking a row dendrogram cluster opens a modal window with cluster information, row names, and a ‘Send genes to Enrichr’ link that allows users to export their gene list (e.g. cluster of row-genes) to Enrichr.

Enrichrgram

Users can also import biological information about their genes directly into the visualization (see screenshot below). Simply click the Enrichr-logo at the top-left of the heatmap to bring up a list of libraries from Enrichr, then click on a library to obtain enriched terms for your genes of interest. For instance, clicking on ‘ChEA 2016’ will enrich for up-stream transcription factors. The enriched terms are shown as row categories, which enables users to see which genes are associated with each term. The row-category titles give the enriched term name, and the red-bars represent the significance of the enrichment (see Enrichr combined score). Users can run enrichment analysis on specific clusters of genes by filtering the matrix to only show only their genes of interest: e.g. use the dendrogram Crop buttons or Brush-Crop buttons to select a subset of genes for analysis.

Enrichrgram Menu

Users can perform enrichment analysis to find biological information specific to their genes (e.g. a cluster of genes). Users can select from several enrichment libraries, and the top 10 enriched terms will be shown as rows categories. The combined scores for the enriched terms will be shown as red bars behind the row category titles.

Enrichrgram.js provides this functionality and works with the Clustergrammer-JS API to depict enriched terms and their associated genes as row categories. The update-row-category functionality can be extended by developers for other domain-specific problems.