Protein secondary structure prediction using support vector  machine and hierarchical clustering

DSpace@NEVÜ
→
Fakülteler / Faculties
→
Mühendislik ve Mimarlık Fakültesi / Faculty of Engineering and Architecture
→
Bilgisayar Mühendisliği Bölümü
→
Bilgisayar Mühendisliği Bölümü Koleksiyonu
→
Öğe Göster

dc.contributor.author	Atasever, Sema
dc.contributor.author	Aydın, Zafer
dc.contributor.author	Erbay, Hasan
dc.date.accessioned	2021-09-06T10:35:04Z
dc.date.available	2021-09-06T10:35:04Z
dc.date.issued	2018-04-30
dc.identifier.uri	http://hdl.handle.net/20.500.11787/4527
dc.description.abstract	Predicting the secondary structure from protein sequence plays a crucial role in predicting the 3D structure and understanding the function of proteins. As new genes and proteins are discovered the size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy. However, SVM is not effective for large datasets due to the quadratic optimization involved in model training. In this paper, we implemented two techniques on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately %50 of the data samples from the train set and reduce the model training time by %82.38 without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers. We employed single linkage clustering, average linkage clustering and the Ward’s method for clustering the feature vectors. We optimized the number of clusters and the maximum number of nearest neighbors by computing the prediction accuracy on validation sets. We observed that clustering can also reduce the size of the train set by %50 without sacrificing prediction accuracy. Among the clustering techniques the Ward’s method provided the best accuracy on test data.	tr_TR
dc.language.iso	eng	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	Protein	tr_TR
dc.subject	Protein Secondary Structure Prediction	tr_TR
dc.subject	Support Vector Machine (SVM)	tr_TR
dc.subject	Multi-class Classification	tr_TR
dc.subject	Stratified Sampling	tr_TR
dc.subject	Hierarchical Clustering	tr_TR
dc.subject	Bayesian Network	tr_TR
dc.title	Protein secondary structure prediction using support vector machine and hierarchical clustering	tr_TR
dc.type	other	tr_TR
dc.relation.journal	3rd World Conference on BIG DATA, (BIGDATA-2018)	tr_TR
dc.contributor.department	Nevşehir Hacı Bektaş Veli Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	tr_TR
dc.contributor.authorID	0000-0002-2295-7917	tr_TR
dc.contributor.authorID	40206	tr_TR

Bu öğenin dosyaları

Dosyalar	Boyut	Biçim	Göster
Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Bilgisayar Mühendisliği Bölümü Koleksiyonu [33]
Bilgisayar Mühendisliği Bölümü koleksiyonuna ait yayınları içerir.

Basit öğe kaydını göster

DSpace'de Ara

DSpace @ NEVU

External Links

Sherpa / Romeo

Göz at

Tüm DSpace
Bu Koleksiyon

Hesabım

Giriş

Protein secondary structure prediction using support vector machine and hierarchical clustering

Bu öğenin dosyaları

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

DSpace'de Ara

DSpace @ NEVU

External Links

Göz at

Tüm DSpace

Bu Koleksiyon

Hesabım