Jelenlegi hely


2019/20 I. félév
Árpád tér 2. II. em. 220. sz.
15:15 16:00
Peter Juma Ochieng
A Clustering Model for Identification of Time Course Gene Expression Patterns

Identification of gene expression patterns when studying complex and dynamic biological
processes such as gene regulatory functions is critical. Gene expression is a continuous biological
phenomenon and can be represented by a continuous function (curve). Each gene behaving in such a
continuous functions often shares similar functional forms. However, patterns such as numbers, shape, and
the identities of those genes sharing similar functional forms remain unknown. To identify such functional
forms we introduce a clustering model for identification of time course gene expression patterns. The
method utilizes an S-spline approach to model the functional curves and a penalized log-likelihood
approach to fit the model. In addition, a rejection-controlled EM algorithm is designed minimizes the error
and computational cost during mean curve estimation. Furthermore, the method utilizes general cross
validation to select smoothing parameters and further measure the clustering uncertainty using the Bayesian
information criterion. The interest of the method is illustrated by its application to D. melanogaster life
cycle datasets. Simulation results indicated our method accurately estimates mean expression curve to true
functional forms by assigning the gene to cluster, predicting mean curve and providing 95% associated
confidence bands for each cluster. Based on Gene Ontology term description, the estimated mean curve in
each cluster reflects true gene functional annotations with biologically meaningful gene expression patterns.
Finally, comparative clustering performance indicates our method to outperform Fuzzy-cMeans and KMeans
by misclassification rate of 0.1289 and overall success rate of 98.71%.