Machine learning approach to the study of chromatin

Cuscó Pons, Pol

Machine learning approach to the study of chromatin

dc.contributor

Universitat Pompeu Fabra. Departament de Ciències Experimentals i de la Salut

dc.contributor.author

Cuscó Pons, Pol

dc.date.accessioned

2018-05-28T16:34:56Z

dc.date.available

2018-05-28T16:34:56Z

dc.date.issued

2017-12-13

dc.identifier.uri

http://hdl.handle.net/10803/565685

dc.description.abstract

Des de l’aparició de les tecnologies de seqüenciació d’alt rendiment, els conjunts de dades biològiques han esdevingut cada cop més grans i complexes, la qual cosa els fa pràcticament impossibles d’interpretar manualment. El paradigma de l’aprenentatge automàtic permet fer una anàlisi sistemàtica de les relacions i patrons existents en els conjuts de dades, tot aprofitant l’enorme volum de dades disponibles. No obstant això, una aplicació poc curosa dels principis bàsics de l’aprenentatge automàtic pot conduir a estimacions massa optimistes, un problema prevalent conegut com a sobreajust. En el camp del plegament de proteïnes, en vam trobar exemples en models publicats que afirmaven tenir un alt poder predictiu, però que es comportaven de forma mediocre devant de dades noves. En el camp de l’epigenètica, problemes com la falta de reproducibilitat, qualitat heterogènia i conflictes entre replicats esdevenen evidents quan es comparen diferents conjunts de dades de ChIP-seq. Per superar aquestes limitacions vam desenvolupar Zerone, un discretitzador de ChIP-seq basat en aprenentatge automàtic que és capaç de combinar informació de diferents replicats experimentals i d’identificar automàticament dades de baixa qualitat o irreproduïbles.

en_US

dc.description.abstract

Since the appearance of high throughput sequencing technologies, biological data sets have become increasingly large and complex, which renders them practically impossible to interpret directly by a human. The machine learning paradigm allows a systematic analysis of relationships and patterns within data sets, making possible to extract information by leveraging the sheer amount of data available. However, violations of basic machine learning principles may lead to overly optimistic estimates, a prevalent problem known as overfitting. In the field of protein folding, we found examples of this in published models that claimed high predictive power, but that performed poorly on new data. A different problem arises in epigenetics. Issues such as lack of reproducibility, heterogeneous quality and conflicts between replicates become evident when comparing ChIP-seq data sets. To overcome this limitations we developed Zerone, a machine learning-based ChIP-seq discretizer capable of merging information from several experimental replicates and automatically identifying low quality or irreproducible data.

en_US

dc.format.extent

149 p.

en_US

dc.format.mimetype

application/pdf

dc.language.iso

eng

en_US

dc.publisher

Universitat Pompeu Fabra

dc.rights.license

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/4.0/

dc.rights.uri

http://creativecommons.org/licenses/by/4.0/

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.subject

Machine learning

en_US

dc.subject

HMM

en_US

dc.subject

SVM

en_US

dc.subject

Epigenetics

en_US

dc.subject

ChIP-seq

en_US

dc.subject

Aprenentatge automàtic

en_US

dc.subject

Epigenètica

en_US

dc.title

Machine learning approach to the study of chromatin

en_US

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

575

en_US

dc.contributor.authoremail

polcusco@gmail.com

en_US

dc.contributor.director

Filion, Guillaume

dc.embargo.terms

cap

en_US

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

dc.description.degree

Programa de doctorat en Biomedicina

Documentos

tpcp_v2.pdf

1.133Mb PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

Programa de Doctorat en Biomedicina [1388]