Clustergrammer-PY

version

Clustergrammer-PY is the back-end Python library that is used to hierarchically cluster the data and generate the Visualization-JSON for the front end Clustergrammer-JS visualization library. Clustergrammer-PY is compatible with Python 2 and 3.

Clustergrammer-PY Dependencies

Installation

Clustergrammer-PY can be installed using pip (package index) with the following:

pip install --upgrade clustergrammer

or the source code can be obtained from the GitHub repo.

Python Workflow Examples

This workflow shows how to cluster a matrix of data from a file (see Matrix Formats and Input/Output) and generate a Visualization-JSON (for use by Clustergrammer-JS):

# make network object and load file
from clustergrammer import Network
net = Network()
net.load_file('your_matrix.txt')

# calculate clustering using default parameters
net.make_clust()

# save visualization JSON to file for use by front-end
net.write_json_to_file('viz', 'mult_view.json')

The file mult_view.json will be loaded by the front-end and used to build the interactive visualization. See make_clustergrammer.py for an additional example.

Clustergrammer can also load data from a Pandas DataFrame and perform normalization and filtering. In this example, we will load data from a DataFrame, normalize the rows, and filter the columns:

# make network object and load DataFrame, df
net = Network()
net.load_df(df)

# Z-score normalize the rows
net.normalize(axis='row', norm_type='zscore', keep_orig=True)

# filter for the top 100 columns based on their absolute value sum
net.filter_N_top('col', 100, 'sum')

# cluster using default parameters
net.make_clust()

# save visualization JSON to file for use by front-end
net.write_json_to_file('viz', 'mult_view.json')

Note that filtering done on the Network object before clustering is permanent, unlike the filtering done within make_clust which can be toggled on and off in the front-end visualization. The keep_orig parameter in the normalize function allows us to show un-normalized data a user mouses over a matrix-cell in the visualization. See the Clustergrammer-PY API documentation below for more information.

Clustergrammer-PY API

Clustergrammer-PY generates a Network object (see Network class definition), which is used to load a matrix (e.g. from a Pandas DataFrame), optionally normalize or filter the matrix, cluster the matrix, and finally generate the visualization JSON for the front-end Clustergrammer.js.

When a matrix is loaded into an instance of Network (e.g. net.load_file('your_file.txt')) it is stored in the data, dat, attribute. Normalization and filtering will permanently modify the dat representation of the matrix. When the matrix is clustered (by calling make_clust) this produces the Visualization-JSON, which is stored in the viz attribute. This JSON can then be exported as a string using net.export_net_json('viz') or saved to a file using net.write_json_to_file('viz', filename).

The function make_clust calculates hierarchical clustering of your data and hierarchical clustering of successive-row-filtered versions of your data. These alternate filtered-views are stored as views within the Visualization-JSON.

class clustergrammer_py.Network

version 1.10.0

Clustergrammer.py takes a matrix as input (either from a file of a Pandas DataFrame), normalizes/filters, hierarchically clusters, and produces the Visualization-JSON for Clustergrammer-JS.

Networks have two states:

  1. the data state, where they are stored as a matrix and nodes
  2. the viz state where they are stored as viz.links, viz.row_nodes, and viz.col_nodes.

The goal is to start in a data-state and produce a viz-state of the network that will be used as input to clustergram.js.

add_cats(axis, cat_data)

Add categories to rows or columns using cat_data array of objects. Each object in cat_data is a dictionary with one key (category title) and value (rows/column names) that have this category. Categories will be added onto the existing categories and will be added in the order of the objects in the array.

Example cat_data:

[
  {
    "title": "First Category",
    "cats": {
      "true": [
        "ROS1",
        "AAK1"
      ]
    }
  },
  {
    "title": "Second Category",
    "cats": {
      "something": [
        "PDK4"
      ]
    }
  }
]
clip(lower=None, upper=None)

Trim values at input thresholds using pandas function

dat_to_df()

Export Pandas DataFrams (will be deprecated).

dendro_cats(axis, dendro_level)

Generate categories from dendrogram groups/clusters. The dendrogram has 11 levels to choose from 0 -> 10. Dendro_level can be given as an integer or string.

df_to_dat(df, define_cat_colors=False)

Load Pandas DataFrame (will be deprecated).

downsample(df=None, ds_type='kmeans', axis='row', num_samples=100)

Downsample the matrix rows or columns (currently supporting kmeans only). Users can optionally pass in a DataFrame to be downsampled (and this will be incorporated into the network object).

enrichr(req_type, gene_list=None, lib=None, list_id=None, max_terms=None)

Under development; get enrichment results from Enrichr and add them to clustergram.

export_df()

Export Pandas DataFrame/

export_net_json(net_type='viz', indent='no-indent')

Export dat or viz JSON.

filter_N_top(inst_rc, N_top, rank_type='sum')

Filter the matrix rows or columns based on sum/variance, and only keep the top N.

filter_cat(axis, cat_index, cat_name)

Filter the matrix based on their category. cat_index is the index of the category, the first category has index=1.

filter_names(axis, names)

Filter the visualization using row/column names. The function takes, axis (‘row’/’col’) and names, a list of strings.

filter_sum(inst_rc, threshold, take_abs=True)

Filter a network’s rows or columns based on the sum across rows or columns.

filter_threshold(inst_rc, threshold, num_occur=1)

Filter the matrix rows or columns based on num_occur values being above a threshold (in absolute value).

load_data_file_to_net(filename)

Load Clustergrammer’s dat format (saved as JSON).

load_df(df)

Load Pandas DataFrame.

load_file(filename)

Load TSV file.

load_stdin()

Load stdin TSV-formatted string.

load_tsv_to_net(file_buffer, filename=None)

This will load a TSV matrix file buffer; this is exposed so that it will be possible to load data without having to read from a file.

load_vect_post_to_net(vect_post)

Load data in the vector format JSON.

make_clust(dist_type='cosine', run_clustering=True, dendro=True, views=['N_row_sum', 'N_row_var'], linkage_type='average', sim_mat=False, filter_sim=0.1, calc_cat_pval=False, run_enrichr=None)

The main function performs hierarchical clustering, optionally generates filtered views (e.g. row-filtered views), and generates the :visualization_json.

normalize(df=None, norm_type='zscore', axis='row', keep_orig=False)

Normalize the matrix rows or columns using Z-score (zscore) or Quantile Normalization (qn). Users can optionally pass in a DataFrame to be normalized (and this will be incorporated into the Network object).

produce_view(requested_view=None)

This function is under development and will produce a single view on demand.

random_sample(num_samples, df=None, replace=False, weights=None, random_state=100, axis='row')

Return random sample of matrix.

reset()

This re-initializes the Network object.

swap_nan_for_zero()

Swaps all NaN (numpy NaN) instances for zero.

widget(which_viz='viz')

Export viz JSON, for use with clustergrammer_widget.

write_json_to_file(net_type, filename, indent='no-indent')

Save dat or viz as a JSON to file.

write_matrix_to_tsv(filename=None, df=None)

Export data-matrix to file.

Clustergrammer-PY Development

Clustergrammer-PY’s source code can be found in the clustergrammer-py GitHub repo. The Clustergrammer-PY library is utilized by the Clustergrammer Web App and the Clustergrammer Jupyter Widget.

Please Funding and Contact Nicolas Fernandez or Avi Ma’ayan with questions or use the GitHub issues feature to report an issue.