Protein Secondary Structure Prediction Evaluation and a Novel Transition Site Model with New Encoding Schemes
Created by W.Langdon from
gp-bibliography.bib Revision:1.8081
- @PhdThesis{Zamani_Masood_201705_PhD,
-
author = "Masood Zamani",
-
title = "Protein Secondary Structure Prediction Evaluation and
a Novel Transition Site Model with New Encoding
Schemes",
-
school = "Computer Science, The University of Guelph",
-
year = "2017",
-
address = "Guelph, Ontario, Canada",
-
month = may,
-
keywords = "genetic algorithms, genetic programming, Protein
Structure, PSS, Machine Learning, ANN, SVM",
-
URL = "https://atrium.lib.uoguelph.ca/xmlui/handle/10214/10441",
-
URL = "http://hdl.handle.net/10214/10441",
-
URL = "https://atrium.lib.uoguelph.ca/xmlui/bitstream/handle/10214/10441/Zamani_Masood_201705_PhD.pdf",
-
size = "198 pages",
-
abstract = "Rapid progress in genomics has led to the discovery of
millions of protein sequences while less than
0.2percent of the sequenced proteins structures have
been resolved by X-ray crystallography or NMR
spectroscopy which are complex, time consuming, and
expensive. Employing advanced computational techniques
for protein structure prediction at secondary and
tertiary levels provides alternative ways to accelerate
the prediction process and overcome the extremely low
percentage of protein structures that have been
determined. State-of the art protein secondary
structure (PSS) prediction methods employ machine
learning (ML) techniques, compared to early approaches
based on statistical information and sequence homology.
In this research, we develop a two-stage PSS prediction
model based on Artificial Neural Networks (ANNs) and
Genetic Programming (GP) through a novel framework of
PSS transition sites, and new amino acid encoding
schemes derived from the genetic Codon mappings,
Clustering and Information theory. PSS transition sites
represent structural information of protein backbones,
and reduce the input space and learning parameters in
the PSS prediction model. PSS transition sites can be
used in Homology Modelling (HM) to define the boundary
of secondary structure elements. The prediction
performance of the proposed method is evaluated by
using Q3 and segment overlap (SOV)scores on two
standard datasets, RS126 and CB513, and the latest
protein dataset, PISCES, compiled with very strict
homology measures by which each sequence pair has a
similarity below the twilight zone or less than
25percent. The experimental results and statistical
analyses of the proposed PSS model indicate
statistically significant improvements in PSS
prediction accuracy compared to the state-of-the-art ML
techniques which commonly employ cascaded ANNs and
SVMs. The proposed encoding schemes show advantages in
extracting sequence and profile information, reducing
input parameters and training performances. A
successful PSS prediction model can be used in homology
detection tools for distant protein sequences and
protein tertiary structure prediction methods to reduce
the complexity of the protein structure prediction
which has important applications in medicine,
agriculture and the biological sciences.",
-
notes = "Supervisor: Stefan C. Kremer",
- }
Genetic Programming entries for
Masood Zamani
Citations