EnsembleNPPred

A Robust Approach to Neuropeptide Prediction and Recognition Using Ensemble Learning with Machine learning and Deep learning methods

Abstract: Neuropeptides (NPs) are a diverse class of signaling molecules that play a critical role in regulating numerous physiological processes and behaviors, including pain perception, stress response, mood regulation, appetite, and circadian rhythms. They function as neurotransmitters, neuromodulators, or neurohormones, modulating neuronal activity and fine-tuning the brain's signaling networks. However, screening and identifying NPs through experimental techniques can be labor-intensive, time-consuming, and resource intensive. To address this, computational methods based on machine learning have been developed to screen potential NP candidates before experimental bioassay verification.

We introduce EnsembleNPPred, an ensemble learning method that combines traditional machine learning (ML) and deep learning (DL) models. This approach leverages the strengths of both ML and DL, reducing variance and potentially enhancing performance and robustness. We explored and evaluated several well-known single and ensemble machine learning approaches using benchmark datasets and independent testing datasets. The final model employs a voting mechanism to combine the results from the three different classifiers -- SVM, ET and DL--yielding a final probability prediction.

Testing results, when compared to existing methods, demonstrated that the proposed model with hybrid features achieved improved accuracy and sensitivity in differentiating between NPs and non-NPs. Additionally, the model was tested with various neuropeptide families from the NeuroPep database, achieving classification accuracy of 89.75% across all family testing dataset.

A diagram of a model

Description automatically generated

A screenshot of a graph

Description automatically generatedA screenshot of a graph

Description automatically generated

 

 

Download Link: Training and Testing data, EnsembleNPPred_models

 

Command:  perl EnsembleNPPred.pl input_fasta_file

 

***** Fasta seq must has length > 5 aa and contains only standard amino acids   *****

###Example:

perl EnsembleNPPred.pl   ./DATA/Testing_Data_by_Neuropeptide_family/7B2

perl EnsembleNPPred.pl ./DATA/Testing_Data_by_Neuropeptide_family/CCAP

A screenshot of a computer

Description automatically generated

perl EnsembleNPPred.pl ./DATA/Testing_Data_by_Neuropeptide_family/Adrenomedullin