Viruses and bacteria continue to evolve alongside humans. Viruses are spreading too fast and causing a huge loss of life in the world. Viruses play an important role as dangerous pathogens that continue to spread various infectious diseases. Metegenomics is the application of large sequencing technology to genetic material obtained directly from one or more environmental samples, resulting in at least 50Mb random samples and multiple long sequences. It is important to identify the origin of the virus to prevent the spread of outbreaks. Understanding the biology of these viruses and how they affect their ecosystems depends on knowing which host they infect. We can use metagenomic features derived from the viral genome to determine the type of virus host. The activity of predicting virus hosts has traditionally taken a lot of time and effort in the process. Technology can be one of the solutions that can be used to predict virus host types. One of the technologies that can be used is machine learning. We chose one of the machine learning algorithms, SVM, to predict viral hosts with metagenomics features, namely genome size, GC% and number of CDS from viral genomes derived from 7326 viral genomes. The SVM model was further optimised with GS and K-CV methods. This optimisation resulted in an increase in the accuracy value of the model when predicting virus hosts from 80% to 84%.
Copyrights © 2024