Protein secondary structure prediction using support vector machine and hierarchical clustering

Basit öğe kaydını göster

dc.contributor.author Atasever, Sema
dc.contributor.author Aydın, Zafer
dc.contributor.author Erbay, Hasan
dc.date.accessioned 2021-09-06T10:35:04Z
dc.date.available 2021-09-06T10:35:04Z
dc.date.issued 2018-04-30
dc.identifier.uri http://hdl.handle.net/20.500.11787/4527
dc.description.abstract Predicting the secondary structure from protein sequence plays a crucial role in predicting the 3D structure and understanding the function of proteins. As new genes and proteins are discovered the size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy. However, SVM is not effective for large datasets due to the quadratic optimization involved in model training. In this paper, we implemented two techniques on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately %50 of the data samples from the train set and reduce the model training time by %82.38 without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers. We employed single linkage clustering, average linkage clustering and the Ward’s method for clustering the feature vectors. We optimized the number of clusters and the maximum number of nearest neighbors by computing the prediction accuracy on validation sets. We observed that clustering can also reduce the size of the train set by %50 without sacrificing prediction accuracy. Among the clustering techniques the Ward’s method provided the best accuracy on test data. tr_TR
dc.language.iso eng tr_TR
dc.rights info:eu-repo/semantics/openAccess tr_TR
dc.subject Protein tr_TR
dc.subject Protein Secondary Structure Prediction tr_TR
dc.subject Support Vector Machine (SVM) tr_TR
dc.subject Multi-class Classification tr_TR
dc.subject Stratified Sampling tr_TR
dc.subject Hierarchical Clustering tr_TR
dc.subject Bayesian Network tr_TR
dc.title Protein secondary structure prediction using support vector machine and hierarchical clustering tr_TR
dc.type other tr_TR
dc.relation.journal 3rd World Conference on BIG DATA, (BIGDATA-2018) tr_TR
dc.contributor.department Nevşehir Hacı Bektaş Veli Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü tr_TR
dc.contributor.authorID 0000-0002-2295-7917 tr_TR
dc.contributor.authorID 40206 tr_TR


Bu öğenin dosyaları

Dosyalar Boyut Biçim Göster

Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster