booktitle = "2015 IEEE International Conference on Bioinformatics
and Biomedicine (BIBM)",
title = "A multi-stage protein secondary structure prediction
system using machine learning and information theory",
year = "2015",
pages = "1304--1309",
abstract = "In this paper, we evaluated the performance of a
multi-stage protein secondary structure (PSS)
prediction model. The proposed classifier uses
statistical information and protein profiles. The
statistical information is derived from protein
sequences and structures by using a k-means clustering
technique and Information theory. In the first stage, a
feed-forward artificial neural network maps a sequence
fragment to a region in the Ramachandran plot
(2D-plot). A score vector is constructed with the
mapped region using clustering and statistical
information. The score vector represents the tendency
of pairing an identified region in the 2D-plot and
secondary structures for a residue. The score vectors
which are used in the second stage have fewer
dimensions compared to input vectors that are commonly
derived from protein sequences or profile information.
In the second stage, a two-tier classifier is employed
based on an artificial neural network and a genetic
programming (GP) method. The GP method uses IF rules
for a three-state classification. The two-tier
classifier's performance is compared to those of
two-tier artificial neural networks (ANNs) and support
vector machines (SVMs). The prediction method is
examined with a common protein dataset, RS126. The
performance of the proposed classification model is
measured based on Q3 and segment overlap (SOV) scores.
The proposed PSS prediction model improves over
3percent the Q3 score and 2percent the SOV score in
comparison to those of two-tier ANN and SVMs
architectures.",