Clustergrammer-PY

version

Clustergrammer-PY is the back end Python library that is used to hierarchically cluster the data and generate the Visualization-JSON for the front end Clustergrammer-JS visualization library. Clustergrammer-PY is compatible with Python 2 and 3. The library is free and open-source and can be found on GitHub.

Clustergrammer-PY Dependencies

Installation

Clustergrammer-PY can be installed using pip (package index) with the following:

pip install --upgrade clustergrammer

or the source code can be obtained from the GitHub repo.

Python Workflow Examples

This workflow shows how to cluster a matrix of data from a file (see Matrix Formats and Input/Output) and generate a Visualization-JSON (for use by Clustergrammer-JS):

# make network object and load file
from clustergrammer import Network
net = Network()
net.load_file('your_matrix.txt')

# calculate clustering using default parameters
net.cluster()

# save visualization JSON to file for use by front end
net.write_json_to_file('viz', 'mult_view.json')

The file mult_view.json will be loaded by the front end and used to build the interactive visualization. See clusterergrammer.py for an additional example.

Clustergrammer can also load data from a Pandas DataFrame and perform normalization and filtering. In this example, we will load data from a DataFrame, normalize the rows, and filter the columns:

# make network object and load DataFrame, df
net = Network()
net.load_df(df)

# Z-score normalize the rows
net.normalize(axis='row', norm_type='zscore', keep_orig=True)

# filter for the top 100 columns based on their absolute value sum
net.filter_N_top('col', 100, 'sum')

# cluster using default parameters
net.cluster()

# save visualization JSON to file for use by front end
net.write_json_to_file('viz', 'mult_view.json')

Note that filtering done on the Network object before clustering is permanent, unlike the filtering done within cluster which can be toggled on and off in the front end visualization. The keep_orig parameter in the normalize function allows us to show un-normalized data a user mouses over a matrix-cell in the visualization. See the Clustergrammer-PY API documentation below for more information.

Clustergrammer-PY API

Clustergrammer-PY generates a Network object (see Network class definition), which is used to load a matrix (e.g. from a Pandas DataFrame), optionally normalize or filter the matrix, cluster the matrix, and finally generate the visualization JSON for the front end Clustergrammer.js.

When a matrix is loaded into an instance of Network (e.g. net.load_file('your_file.txt')) it is stored in the data, dat, attribute. Normalization and filtering will permanently modify the dat representation of the matrix. When the matrix is clustered (by calling cluster) this produces the Visualization-JSON, which is stored in the viz attribute. This JSON can then be exported as a string using net.export_net_json('viz') or saved to a file using net.write_json_to_file('viz', filename).

The function cluster calculates hierarchical clustering of your data and hierarchical clustering of successive-row-filtered versions of your data. These alternate filtered-views are stored as views within the Visualization-JSON.

class clustergrammer_py.Network(widget=None)

version 1.13.5

Clustergrammer.py takes a matrix as input (either from a file of a Pandas DataFrame), normalizes/filters, hierarchically clusters, and produces the Visualization-JSON for Clustergrammer-JS.

Networks have two states:

  1. the data state, where they are stored as a matrix and nodes
  2. the viz state where they are stored as viz.links, viz.row_nodes, and viz.col_nodes.

The goal is to start in a data-state and produce a viz-state of the network that will be used as input to clustergram.js.

add_cats(axis, cat_data)

Add categories to rows or columns using cat_data array of objects. Each object in cat_data is a dictionary with one key (category title) and value (rows/column names) that have this category. Categories will be added onto the existing categories and will be added in the order of the objects in the array.

Example cat_data:

[
  {
    "title": "First Category",
    "cats": {
      "true": [
        "ROS1",
        "AAK1"
      ]
    }
  },
  {
    "title": "Second Category",
    "cats": {
      "something": [
        "PDK4"
      ]
    }
  }
]
clip(lower=None, upper=None)

Trim values at input thresholds using pandas function

cluster(dist_type='cosine', run_clustering=True, dendro=True, views=['N_row_sum', 'N_row_var'], linkage_type='average', sim_mat=False, filter_sim=0.1, calc_cat_pval=False, run_enrichr=None, enrichrgram=None)

The main function performs hierarchical clustering, optionally generates filtered views (e.g. row-filtered views), and generates the :visualization_json.

dat_to_df()

Export Pandas DataFrams (will be deprecated).

dendro_cats(axis, dendro_level)

Generate categories from dendrogram groups/clusters. The dendrogram has 11 levels to choose from 0 -> 10. Dendro_level can be given as an integer or string.

df_to_dat(df, define_cat_colors=False)

Load Pandas DataFrame (will be deprecated).

downsample(df=None, ds_type='kmeans', axis='row', num_samples=100, random_state=1000)

Downsample the matrix rows or columns (currently supporting kmeans only). Users can optionally pass in a DataFrame to be downsampled (and this will be incorporated into the network object).

enrichrgram(lib, axis='row')

Add Enrichr gene enrichment results to your visualization (where your rows are genes). Run enrichrgram before clustering to include enrichment results as row categories. Enrichrgram can also be run on the front-end using the Enrichr logo at the top left.

Set lib to the Enrichr library that you want to use for enrichment analysis. Libraries included:

  • ChEA_2016
  • KEA_2015
  • ENCODE_TF_ChIP-seq_2015
  • ENCODE_Histone_Modifications_2015
  • Disease_Perturbations_from_GEO_up
  • Disease_Perturbations_from_GEO_down
  • GO_Molecular_Function_2015
  • GO_Biological_Process_2015
  • GO_Cellular_Component_2015
  • Reactome_2016
  • KEGG_2016
  • MGI_Mammalian_Phenotype_Level_4
  • LINCS_L1000_Chem_Pert_up
  • LINCS_L1000_Chem_Pert_down
export_df()

Export Pandas DataFrame/

export_net_json(net_type='viz', indent='no-indent')

Export dat or viz JSON.

export_viz_to_widget(which_viz='viz')

Export viz JSON, for use with clustergrammer_widget. Formerly method was named widget.

filter_N_top(inst_rc, N_top, rank_type='sum')

Filter the matrix rows or columns based on sum/variance, and only keep the top N.

filter_cat(axis, cat_index, cat_name)

Filter the matrix based on their category. cat_index is the index of the category, the first category has index=1.

filter_names(axis, names)

Filter the visualization using row/column names. The function takes, axis (‘row’/’col’) and names, a list of strings.

filter_sum(inst_rc, threshold, take_abs=True)

Filter a network’s rows or columns based on the sum across rows or columns.

filter_threshold(inst_rc, threshold, num_occur=1)

Filter the matrix rows or columns based on num_occur values being above a threshold (in absolute value).

load_data_file_to_net(filename)

Load Clustergrammer’s dat format (saved as JSON).

load_df(df)

Load Pandas DataFrame.

load_file(filename)

Load TSV file.

load_file_as_string(file_string, filename='')

Load file as a string.

load_stdin()

Load stdin TSV-formatted string.

load_tsv_to_net(file_buffer, filename=None)

This will load a TSV matrix file buffer; this is exposed so that it will be possible to load data without having to read from a file.

load_vect_post_to_net(vect_post)

Load data in the vector format JSON.

make_clust(dist_type='cosine', run_clustering=True, dendro=True, views=['N_row_sum', 'N_row_var'], linkage_type='average', sim_mat=False, filter_sim=0.1, calc_cat_pval=False, run_enrichr=None, enrichrgram=None)

… Will be deprecated, renaming method cluster … The main function performs hierarchical clustering, optionally generates filtered views (e.g. row-filtered views), and generates the :visualization_json.

normalize(df=None, norm_type='zscore', axis='row', keep_orig=False)

Normalize the matrix rows or columns using Z-score (zscore) or Quantile Normalization (qn). Users can optionally pass in a DataFrame to be normalized (and this will be incorporated into the Network object).

produce_view(requested_view=None)

This function is under development and will produce a single view on demand.

random_sample(num_samples, df=None, replace=False, weights=None, random_state=100, axis='row')

Return random sample of matrix.

reset()

This re-initializes the Network object.

set_cat_color(axis, cat_index, cat_name, inst_color)

Set row/column category colors using index, name and specified color.

swap_nan_for_zero()

Swaps all NaN (numpy NaN) instances for zero.

widget(which_viz='viz')

Generate a widget visualization using the widget. The export_viz_to_widget method passes the visualization JSON to the instantiated widget, which is returned and visualized on the front-end.

widget_df()

Export a DataFrame from the front-end visualization. For instance, a user can filter to show only a single cluster using the dendrogram and then get a DataFrame of this cluster using the widget_df method.

write_json_to_file(net_type, filename, indent='no-indent')

Save dat or viz as a JSON to file.

write_matrix_to_tsv(filename=None, df=None)

Export data-matrix to file.

Clustergrammer-PY Development

Clustergrammer-PY’s source code can be found in the clustergrammer-py GitHub repo. The Clustergrammer-PY library is utilized by the Clustergrammer-Web and the Clustergrammer-Widget.

Please Contact Nicolas Fernandez and Avi Ma’ayan with questions or use the GitHub issues feature to report an issue.