Installation

Here we describe how to install the DeepProg package. We assume that the installation will be done locally, using the --user flag from pip. Alternatively, the package can be installed using a virtual environment or globally with sudo. Both python2.7 or python3.6 (or higher) can be used. We tested the installation on a linux, OSX and Windows environment.

Requirements

  • Python 2 or 3 (Python3 is recommended)

  • Either theano, tensorflow or CNTK (tensorflow is recommended)

  • theano (the used version for the manuscript was 0.8.2)

  • tensorflow as a more robust alternative to theano

  • cntk CNTK is anoter DL library that can present some advantages compared to tensorflow or theano. See https://docs.microsoft.com/en-us/cognitive-toolkit/

  • scikit-learn (>=0.18)

  • numpy, scipy

  • lifelines

  • (if using python3) scikit-survival

  • (For distributed computing) ray (ray >= 0.8.4) framework

  • (For hyperparameter tuning) scikit-optimize

Tested python package versions

Python 3.8 (tested for Linux and OSX. For Windows Visual C++ is required and LongPathsEnabled shoud be set to 1 in windows registry)

  • tensorflow == 2.4.1 (2.4.1 currently doesn’t seem to work with python3.9)

  • keras == 2.4.3

  • ray == 0.8.4

  • scikit-learn == 0.23.2

  • scikit-survival == 0.14.0 (currently doesn’t seem to work with python3.9)

  • lifelines == 0.25.5

  • scikit-optimize == 0.8.1 (currently doesn’t seem to work with python3.9)

  • mpld3 == 0.5.1

Since ray and tensorflow are rapidly evolving libraries, newest versions might unfortunatly break DeepProg’s API. To avoid any dependencies issues, we recommand working inside a Python 3 virtual environement (virtualenv) and install the tested packages.

installation (local)

# The downloading can take few minutes due to the size of th git project
git clone https://github.com/lanagarmire/DeepProg.git
cd DeepProg

# (RECOMMENDED) install with conda
conda env create -n deepprog -f ./environment.yml python=3.8
conda activate deepprog
pip install -e .

# (RECOMMENDED) to install the tested python library versions
pip install -e . -r requirements_tested.txt

# Basic installation (under python3/pip3)
pip3 install -e . -r requirements.txt
# To intall the distributed frameworks
pip3 install -e . -r requirements_distributed.txt
# Installing scikit-survival (python3 only)
pip3 install -r requirements_pip3.txt
# Install ALL required dependencies with the most up to date packages
pip install -e . -r requirements_all.txt


# **Ignore this if you are working under python3**
# python 3 is highly preferred, but DeepProg working with python2/pip2, however there is no support for scikit-survival in python2
pip2 install -e . -r requirements.txt
pip2 install -e . -r requirements_distributed.txt

Installation with docker

We have created a docker image (opoirion/deepprog_docker:v1) with all the dependencies already installed. For the docker (and singularity) instruction, please refer to the docker tutorial.

Alternative deep-Learning packages installation

The required python packages can be installed using pip:

pip install theano --user # Original backend used OR
pip install tensorflow --user # Alternative backend for keras and default
pip install keras --user

Alternative support for CNTK / theano / tensorflow

We originally used Keras with theano as backend plateform. However, Tensorflow (currently the defaut background DL framework) or CNTK are more recent DL framework that can be faster or more stable than theano. Because keras supports these 3 backends, it is possible to use them as alternative. To install CNTK, please refer to the official guidelines . To change backend, please configure the $HOME/.keras/keras.json file. (See official instruction here).

The default configuration file: ~/.keras/keras.json looks like this:

{
    "image_data_format": "channels_last",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

R installation (Alternative to Python lifelines)

In his first implementation, DeepProg used the R survival toolkits to fit the survival functions (cox-PH models) and compute the concordance indexes. These functions have been replaced with the python toolkits lifelines and scikit-survival for more convenience and avoid any compatibility issue. However, differences exists regarding the computation of the c-indexes using either python or R libraries. To use the original R functions, it is necessary to install the following R libraries.

  • R

  • the R “survival” package installed.

  • rpy2 3.4.4 (for python2 rpy2 can be install with: pip install rpy2==2.8.6, for python3 pip3 install rpy2==2.8.6).

install.packages("survival")
install.packages("glmnet")
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("survcomp")

Then, when instantiating a SimDeep or a SimDeepBoosting object, the option use_r_packages needs to be set to True.

Visualisation module (Experimental)

To visualise test sets projected into the multi-omic survival space, it is required to install mpld3 module. Note that the pip version of mpld3 installed with pip on my computer presented a bug: TypeError: array([1.]) is not JSON serializable . However, the newest version of the mpld3 available from the github solved this issue. Rather than executing pip install mpld3 --user It is therefore recommended to install the newest version to avoid this issue directly from the github repository:

git clone https://github.com/mpld3/mpld3
cd mpld3
pip install -e . --user

Distributed computation

  • It is possible to use the python ray framework https://github.com/ray-project/ray to control the parallel computation of the multiple models. To use this framework, it is required to install it: pip install ray

  • Alternatively, it is also possible to create the model one by one without the need of the ray framework

Visualisation module (Experimental)

  • To visualise test sets projected into the multi-omic survival space, it is required to install mpld3 module: pip install mpld3

  • Note that the pip version of mpld3 installed on my computer presented a bug: TypeError: array([1.]) is not JSON serializable . However, the newest version of the mpld3 available from the github solved this issue. It is therefore recommended to install the newest version to avoid this issue.

Usage

  • test if simdeep is functional (all the software are correctly installed): go to main folder (./DeepProg/) and run the following

  python3 test/test_simdeep.py -v #
  • All the default parameters are defined in the config file: ./simdeep/config.py but can be passed dynamically. Three types of parameters must be defined:

    • The training dataset (omics + survival input files)

      • In addition, the parameters of the test set, i.e. the omic dataset and the survival file

    • The parameters of the autoencoder (the default parameters works but it might be fine-tuned.

    • The parameters of the classification procedures (default are still good)

Example scripts

Example scripts are availables in ./examples/ which will assist you to build a model from scratch with test and real data:

examples
├── example_hyperparameters_tuning.py
├── example_hyperparameters_tuning_with_test_dataset.py
├── example_with_dummy_data_distributed.py
├── example_with_dummy_data.py
└── load_3_omics_model.py