DM3Loc

DM3Loc is a novel Deep-learning framework with multi-head self-attention for multi-label mRNA subcellular localization prediction and analyses, which provide prediciton for six subcellular compartments, including nucleus, exosome, cytoplasm, ribosome, membrane, and endoplasmic reticulum.

Installation

Installation has been tested in Ubuntu 16.04.5 LST and Mac OS HighSierra with python 3.5.2 and python 3.7.13 Installation with conda and pip.

conda deactivate
conda create --name dm3loc python=3.7.13
conda activate dm3loc
pip install dm3loc
python -c "import DM3Loc; print(DM3Loc.__file__)"
#The DM3Loc will be installed in the above folder which contains the __init__.py.

You can also install the dependent packages by the following commands: sh sudo apt-get install -y python3.5 python3-pip python3 -m pip install numpy python3 -m pip install scipy python3 -m pip install scikit-learn python3 -m pip install pillow python3 -m pip install h5py python3 -m pip install keras==2.2.4 python3 -m pip install tensorflow==1.12.0 (or install the GPU supported tensorflow by pip3 install tensorflow-gpu==1.12.0 refer to https://www.tensorflow.org/install/ for instructions) Download the stand-alone tool by: sh git clone https://github.com/duolinwang/DM3Loc

Running on GPU or CPU

If you want to use GPU, you also need to install CUDA and cuDNN; refer to their websites for instructions. CPU is only suitable for prediction not training.

For general users who want to perform mRNA subcellular localization prediction by our provided model :

can directly call Multihead_predict.py

python3 Multihead_predict.py --dataset [custom mRNA data in FASTA format] --outputpath [custom specified output folder for the prediction results]

For details of the parameters, use the -h or --help parameter. For example, to predict for test.fasta in the testdsata folder and output the results to the folder ./Results/, run the following command:

python3 Multihead_predict.py --dataset "./testdata/test.fasta" --outputpath ./Results/

The result folder contains: 1) prediction_results.txt, the prediction result file. 2) attention_weights.txt, attention weight file, it contains attention weight vector per mRNA seuqnece.

For advanced users like to perform training by using their own data

Call Multihead_train.py. Multihead_train.py contains the process of spliting user's custom seuqneces into 5-fold cross-validation data and training 5 folds of models. The final model can be an ensemble of these 5 folds of models. To train customized models, users can run the following commands and replace with their own data and parameters.

python3 Multihead_train.py --normalizeatt --classweight --dataset [custom training data in FASTA format] --epochs 500 --message [output custom model keywords]

The training data should be in the FASTA format. The labels should be presented by onehot encoding and in the first place of the mRNA sequence description, seperated with other parts by a comma ',', as in the following example:

>010000,ACCNUM:NM_001507,Gene_ID:2862,Gene_Name:MLNR
ATGGGCAGCCCCTGGAACGGCAGCGACGGC.....................

Parameters --normalizeatt --classweight were required to train the model in the paper. For details of other parameters, use the -h or --help parameter.

Examples of commands used to train the models for 5-fold cross-validation:

python3 Multihead_train.py --normalizeatt --foldnum 5 --classweight --dataset ./testdata/modified_multilabel_seq_nonredundent.fasta --epochs 500 --message cnn64_smooth_l1

Note that the final model used in DM3Loc webserver was an ensemble model trained by 8-folds cross-validation data, to obtain that model, users should set the parameter --foldnum 8 and use the whole dataset "modified_multilabel_seq", which contains the redundant sequences, to train the model.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
__pycache__		__pycache__
model		model
testdata		testdata
Genedata.py		Genedata.py
LICENSE		LICENSE
Multihead_predict.py		Multihead_predict.py
Multihead_train.py		Multihead_train.py
README.MD		README.MD
hier_attention_mask.py		hier_attention_mask.py
multihead_attention_model.py		multihead_attention_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DM3Loc

Installation

Running on GPU or CPU

For general users who want to perform mRNA subcellular localization prediction by our provided model :

For advanced users like to perform training by using their own data

Examples of commands used to train the models for 5-fold cross-validation:

The DM3Loc webserver can be accessed at http://dm3loc.lin-group.cn/

About

Uh oh!

Releases

Packages

Languages

License

duolinwang/DM3Loc

Folders and files

Latest commit

History

Repository files navigation

DM3Loc

Installation

Running on GPU or CPU

For general users who want to perform mRNA subcellular localization prediction by our provided model :

For advanced users like to perform training by using their own data

Examples of commands used to train the models for 5-fold cross-validation:

The DM3Loc webserver can be accessed at http://dm3loc.lin-group.cn/

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages