Jelenlegi hely


2019/20 I. félév
Árpád tér 2. II. em. 220. sz.
15:15 16:00
Gergely Pap
Neural Networks and DNA binding

Since the introduction of Next Generation Sequencing, genetics and bioinformatics entered the scene of big data science and machine learning. Huge amounts of unlabelled data can be extracted using the modern sequencers, however giving impactful meaning to the sequences and annotating them are two serious issues. Different unsupervised and semi-supervised machine learning methods are employed to solve these problems by trying to establish clusters, recognize patterns and classify biologically significant elements. A hotly discussed part of the latter is the Transcription Factor Binding Sites (TFBS). Traditionally, a sequence of 6-8-10 basis-pairs would mark such locations, but there are many examples and observations regarding TFBS that are an exception to the sequence-based rule. Until recently, only the proximity of such binding sites was examined. On the other hand, novel experiments suggest that farther parts of the genome might play significant roles in this selective binding of TF-s.

Furthermore, the protein-DNA binding sites also present interesting problems for AI and machine learning. Many new models using deep learning are capable to achieve unprecedented accuracy and prediction rates regarding the verification, discovery and exploration of such sites. The binding affinity of proteins to DNA is still explained by the unique base-pair sequences, but recent results imply that other physicochemical properties have much weight in these binding events. My presentation will include the explanation of a few machine learning approaches to problems such as the pattern recognition of TFBS and Protein-DNA binding sites and a novel method, which is still under active development by co-operating with the bioinformatics group of BRC Szeged.