hclust

Version 1.00

download

contents

introduction

hclust demonstrates the usage of Hopfield networks for clustering, feature selection and network inference. The software reads expression data with sample annotation and creates plots showing the weight matrix of the network, the relaxation of the state matrix and the energy landscape.

installation

The following software components are required to run hclust:

For background information see the following links:

The installation of Python and the correct versions of compatible Python libraries can be difficult and time consuming. The simplest solution is the installation of the Enthought Python Distribution (EPD) that contains Python and all required libraries in a single package. The EPD can be downloaded from here.

hclust has been developed under Windows 7 using Python 2.7. In principle, hclust should run under all platforms that support Python 2.7 but no tests for other platforms or versions of Python have been performed.

When hclust and all required libraries are installed run the command test.bat (Windows) or ./test.sh (Linux) to ensure that everything is working properly. Compare the program output to the example output provided here.

usage

hclust is invoked from the command line using the following format:

python hclust.py filepath [-p -n -f]

filepath must be the path to a data file with expression data in tab-separated value format (TSV). The first row must contain the sample ids, the second row the sample labels (.e.g. cancer subtypes). The first column must contain gene or probe identifiers. See file yeoh_reduced.tsv for the required format. If the expression data are not normalized the option -n is required to normalize the data.

examples

python hclust.py dc_alizadeh-2000-v1.tsv -n
python hclust.py GSE7553.tsv -n -f
python hclust.py yeoh_reduced.tsv
python hclust.py yeoh_reduced.tsv -p

options

-p enables pruning of the network

-n normalizes the input data

-f performs feature selection

example

Here the plots generated by hclust when running the test example via test.bat or ./test.sh.

This graph shows the true rand index (TRI), estimated rand index (ERI) and the density of the network for different pruning thresholds. The best threshold is marked by the dashed line.

This heat map shows the weight matrix after pruning.

This sequence of plots shows the relaxation of the sample/state matrix over eight recall steps.

Here a 3D plot of the surface of the energy function.

A similar plot of the energy landscape but as a mesh with the sample data and the attractor states.

Finally a plot of the energy function in 2D and the trajectories of the samples converging toward their attractor states.

history


versiondatedescription
1.0006.09.13 First public version

data

The data suites used in the paper and created by Wang et al. and de Souto et al. can be downloaded here:

contact

nameemail
Mark Raganm.ragan@uq.edu.au