Clustergrammer was developed to visualize high-dimensional biological data (e.g. genome-wide expression data), but it can also generally be applied to any high-dimensional data (e.g. a matrix). Clustergrammer has several biology-specific features that facilitate the analysis of gene-level biological data, such as: gene-expression data, proteomics-data, etc. To take advantage of these features, row names must be genes. See the CCLE Explorer for examples of gene-expression data. These optional biology-specific features are available in the Clustergrammer Web App as well as the Clustergrammer Jupyter Widget and will automatically activate if the row-names are genes.
Mouseover Gene Name and Description¶
The human genome consists of over 20,000 genes and modern high-throughput measurements are capable of making measurements across the entire genome (e.g. genome-wide expression studies). Human genes have official gene symbols (e.g. EGFR) that are frequently used to label genes in these datasets. Since no biologist can be knowledgeable about every gene in the genome a common and repetitive task for biologists is looking up the names and descriptions of genes in a dataset or visualization. To streamline this activity, Clustergrammer automatically displays the full name and description of a gene (provided by data aggregated through the Harmonizome) as a tooltip when a user mouses over a gene label (see screenshot below). This simple feature speeds up analysis of large gene-level datasets.
The field of biology has amassed an enormous amount of information about the genes in living organisms such as: function, disease-association, up-stream regulators, protein-level binding partners, etc. Integration of this information can help biologists understand patterns in their data. For instance, enrichment analysis a popular method to identify biological information specific to a list of genes – e.g. a biologist may use enrichment analysis to identify up-stream regulatory transcription factors that specifically target their measured set of up-regulated genes and thereby form hypotheses about potential up-stream regulators in their system.
Export to Enrichr
When a user visualizes a matrix with genes as rows, Clustergrammer automatically enables integration with the enrichment analysis tool Enrichr. Users can export a set of clustered genes to Enrichr using the interactive dendrogram (see screenshot) or import enriched terms into the visualization using Enrichrgram.
Users can also import biological information about their genes directly into the visualization (see screenshot below). Simply click the Enrichr-logo at the top-left of the heatmap to bring up a list of libraries from Enrichr, then click on a library to obtain enriched terms for your genes of interest. For instance, clicking on ‘ChEA 2016’ will enrich for up-stream transcription factors. The enriched terms are shown as row categories, which enables users to see which genes are associated with each term. The row-category titles give the enriched term name, and the red-bars represent the significance of the enrichment (see Enrichr combined score). Users can run enrichment analysis on specific clusters of genes by filtering the matrix to only show only their genes of interest: e.g. use the dendrogram Crop buttons or Brush-Crop buttons to select a subset of genes for analysis.
Enrichrgram.js provides this functionality and works with the Clustergrammer-JS API to depict enriched terms and their associated genes as row categories. The update-row-category functionality can be extended by developers for other domain-specific problems.