Tutorial: use DeepProg from the docker image
We created a docker image with deepprog python dependencies installed. The docker image (opoirion/deepprog_docker:v1
) can be downloaded using docker pull
and used to analyse a multi-omic dataset. Alternatively, DeepProg image can be installed using the Singularity container engine.
Installation with docker or harmony
Docker or Singularity needs to be installed first.
# Using docker
docker pull opoirion/deepprog_docker:v1
# Using Singularity
singularity pull docker://opoirion/deepprog_docker:v1
# After singularity finishing pulling he image, A SIF image (deeprog_docker_v1.sif) should have been created within the local folder
The version of the package installed correspond to the versions described in the requirements_tested.txt
. Thus, they are NOT the most up to date python packages, especially regarding the ray
installed package (installed version is 0.8.4). Since ray is used to configure the nodes, memories, CPUs when distributing DeepProg in a cluster, the API to use might differ with the most up-to-date ray API.
Alternative Image with R libraries
We also created a docker image containing R and the survival R dependencies (survival
, survcomp
, and glmnet
) installed. This image can be used with the option use_r_packages=True
. However, this version is signficantly larger (1.3GiG) to install.
# Using alternative docker image with R libraries installed
docker pull opoirion/deepprog_docker:RVersion1
# Using Singularity
singularity pull docker://opoirion/deepprog_docker:RVersion1
Usage (Docker)
the docker container needs to have access to three folders:
the input folder containing the matrices and the survival data
the output folder where will be generated the output file
the folder containing the DeepProg python code to launch
docker run \
-v <ABSOLUTE PATH FOR INPUT DATA>:/input \
-v <ABSOLUTE PATH FOR OUTPUT DATA>:/output \
-v <ABSOLUTE PATH FOR THE SCRIPT>:/code \
--rm \ # remove the container once the computation is finished
--name greedy_beaver \ # Name of the temporary docker process to create
deepprog_docker \ # name of the DeepProg docker image to invoke
python3.8 /code/<NAME OF THE PYTHON SCRIPT FILE>
Example
Create three folders for input, output, and scripts
cd $HOME
mkdir local_input
mkdir local_output
mkdir local_code
Go to
local_input
and download the matrices and survival data from the STAD cancer herehttp://ns102669.ip-147-135-37.us/DeepProg/matrices/STAD/
cd local_input
wget http://ns102669.ip-147-135-37.us/DeepProg/matrices/STAD/meth_mapped_STAD.tsv
wget http://ns102669.ip-147-135-37.us/DeepProg/matrices/STAD/mir_mapped_STAD.tsv
wget http://ns102669.ip-147-135-37.us/DeepProg/matrices/STAD/rna_mapped_STAD.tsv
wget http://ns102669.ip-147-135-37.us/DeepProg/matrices/STAD/surv_mapped_STAD.tsv
Go to
local_code
and open a text editor to create the following script namedprocessing_STAD.py
### script: processing_STAD.py
# Import DeepProg class
from simdeep.simdeep_boosting import SimDeepBoosting
# Defining global variables for input and output paths the mounted folder from the docker image
PATH_DATA = '/input/' # virtual folder. If using Singularity, This should be the existing path on the machine
PATH_RESULTS = '/output/' # virtual folder If using Singularity, This should be the existing path on the machine
# Defining a main function
def main():
"""
processing of STAD multiomic cancer
"""
#Downloaded matrix files
TRAINING_TSV = {
'RNA': 'rna_mapped_STAD.tsv',
'METH': 'meth_mapped_STAD.tsv',
'MIR': 'mir_mapped_STAD.tsv'
}
#survival file
SURVIVAL_TSV = 'surv_mapped_STAD.tsv'
# survival flag
survival_flag = {'patient_id': 'SampleID', 'survival': 'time','event': 'event'}
# output folder name
OUTPUT_NAME = 'STAD_docker'
PROJECT_NAME = 'STAD_docker'
# Import ray, the library that will distribute our model computation accros different nodes
import ray
ray.init(
webui_host='127.0.0.1', # This option is required when using ray from the docker image
num_cpus=10 #
)
# Random seed defining how the input dataset will be split
SEED = 3
# Number of DeepProg submodels to create
nb_it = 10
EPOCHS = 10
boosting = SimDeepBoosting(
nb_it=nb_it,
split_n_fold=3,
survival_flag=survival_flag,
survival_tsv=SURVIVAL_TSV,
training_tsv=TRAINING_TSV,
path_data=PATH_DATA,
project_name=PROJECT_NAME,
path_results=PATH_RESULTS,
epochs=EPOCHS,
distribute=True, # Option to use ray cluster scheduler
seed=SEED)
# Fit the model
boosting.fit()
# Save the labels of each submodels
boosting.save_models_classes()
boosting.save_cv_models_classes()
# Predict labels on the full (trainings + cv splits) datasets
boosting.predict_labels_on_full_dataset()
# Compute consistency
boosting.compute_clusters_consistency_for_full_labels()
# Performance indexes
boosting.evalutate_cluster_performance()
boosting.collect_cindex_for_test_fold()
boosting.collect_cindex_for_full_dataset()
# Feature scores
boosting.compute_feature_scores_per_cluster()
boosting.collect_number_of_features_per_omic()
boosting.write_feature_score_per_cluster()
# Close clusters and free memory
ray.shutdown()
# Excecute main function if this file is launched as a script
if __name__ == '__main__':
main()
After saving this script, we are now ready to launch DeepProg using the docker image:
docker run \
-v ~/local_input:/input \
-v ~/local_output:/output \
-v ~/local_code:/code \
--rm \
--name greedy_beaver \
deepprog_docker \
python3.8 /code/processing_STAD.py
After the execution, a new output folder inside
~/local_output
should have been created
ls ~/local_output/'STAD_docker
# Output
-rw-r--r-- 1 root root 22K Mar 30 07:37 STAD_docker_KM_plot_boosting_full.pdf
-rw-r--r-- 1 root root 830K Mar 30 07:37 STAD_docker_features_anticorrelated_scores_per_clusters.tsv
-rw-r--r-- 1 root root 812K Mar 30 07:37 STAD_docker_features_scores_per_clusters.tsv
-rw-r--r-- 1 root root 16K Mar 30 07:37 STAD_docker_full_labels.tsv
-rw-r--r-- 1 root root 22K Mar 30 07:37 STAD_docker_proba_KM_plot_boosting_full.pdf
drwxr-xr-x 2 root root 4.0K Mar 30 07:37 saved_models_classes
drwxr-xr-x 2 root root 4.0K Mar 30 07:37 saved_models_cv_classes
Usage (Singularity)
Contrary to Docker, Singularity does not require to mount a specific volume for data sharing and
Then, the DeepProg docker can be invoked using the following command
singularity run \
deepprog_docker_v1.sif \ # Path toward the downlaoded Singularity SIF image
python3.8 <PYTHON SCRIPT>
If we want to use the example script processing_STAD.py
described above with singularity, we just need to replace PATH_DATA
and PATH_RESULTS
with the paths on the machine.
the same methodology should be followed for adding more analyses, such as predicting a test dataset, embedding, or perform a hyperparameter tuning. Also, a better description of DeepProg different options is available in the other section of this tutorial