The primary nutrient that is crucial for identifying biochemical processes and biological norms in living cells is protein. Proteins are usually centered around one or a few functions which are defined by their family type. Hence, identification and classification are needed to separate the proteins according to their structure and families. In this work, we built a model to classify families of protein sequences. We used the protein sequences dataset consists of various macromolecules of biological significance. The classifier is built up using deep learning of Bi-LSTM. We began the research by collecting the dataset from the Protein Data Bank of the Research Collaboratory for Structural Bioinformatics, pre-processing the data using tokenizing, and modeling the classifier based on deep learning network of Bi-LSTM. As we get the best accuracy rate of the trained model, we figure out the model performance using the evaluation metrics of learning curve, accuracy rate, and loss. The results show that Deep Bi-LSTM provides excellent performance with fit learning curve, 99% accuracy rate, and 0.042 loss.
Copyrights © 2024