Semantic-based approach for the discovery of Life Sciences web resources driven by rich user’s requirements

dc.contributor
Universitat Jaume I. Departament d'Enginyeria i Ciència dels Computadors
dc.contributor.author
Pérez Catalán, María
dc.date.accessioned
2020-06-04T08:06:20Z
dc.date.available
2020-06-04T08:06:20Z
dc.date.issued
2013-09-12
dc.identifier.uri
http://hdl.handle.net/10803/669104
dc.description.abstract
Web resources have been gaining popularity as providers of relevant data, whether those stored in datasets or those resulting from the execution of complex functions such as the alignment of protein sequences. Although the discovery of web resources has been largely studied, it is still a challenging research task due to the high dependency current search engines have on the characteristics of the available metadata. In some domains like Life Sciences, this dependency becomes even worse due to the heterogeneity of data. Current web resource registries allow users to search for resources that fulfill their information needs. The discovery in these registries is mainly based on the use of well-defined metadata, which is usually limited and very specific, and on the string matching of the user's query keywords, which is hampered by the heterogeneity of data. The main objective of this thesis is to assist the users in the discovery of the most appropriate resources for their information needs, specifically in the Life Sciences domain. The achievement of this objective implies addressing the main limitations of current web resource registries. Firstly, web resource discovery is driven by the user's requirements and, therefore, the precision of its results depends on how well the user's information needs are described in the requirements specification. Thus, rich requirements specifications are assumed to obtain more precise results. In the proposed approach, the requirements specification consists of a rich description of both the functionality and relevant features of the required resource. Additionally, discovery parameters are customizable by the users in order to improve the accuracy of the process. Secondly, the discovery depends heavily on the characteristics of the resources metadata. In many registries, resources are described with well-defined metadata, e.g., categories, and with textual descriptions, which provide richer information but harder automatic processing. In order to alleviate this dependency, this thesis proposes a normalization process which addresses the heterogeneity of data, and automatically identifies relevant information implicitly described in the resources metadata. Then, the discovery of web resources considers the normalized data, reducing words mismatchings, alleviating the problem of using different vocabularies, and improving the characterization of resources. Finally, whereas current registries provide the user with a list of resources without any information about their relevance to her requirements, in the proposed approach the user is prompted with a ranked list of resources according to the fulfillment of her information needs, and to the accomplishment of the user-defined features. In this way, the system assists the user until the end of the discovery process, providing her information relevant to the selection of the best suited resource. The experimental evaluation performed on each phase of the discovery method demonstrates that the proposed techniques obtain good results. Moreover, the discovery method has been implemented as part of BioUSeR, an online tool for the discovery of Life Sciences web resources. In BioUSeR, the results of each phase of the discovery process are visualized, and the parameters and the data involved in the process are easily customized by the user. We have used BioUSeR to demonstrate the usefulness of our approach using real usage examples.
dc.format.extent
168 p.
dc.format.mimetype
application/pdf
dc.language.iso
eng
dc.publisher
Universitat Jaume I
dc.rights.license
ADVERTIMENT. Tots els drets reservats. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.
dc.source
TDX (Tesis Doctorals en Xarxa)
dc.subject
Informática
dc.subject
Diseño y componentes de sistemas de información
dc.subject.other
Sistemas informáticos
dc.title
Semantic-based approach for the discovery of Life Sciences web resources driven by rich user’s requirements
dc.type
info:eu-repo/semantics/doctoralThesis
dc.type
info:eu-repo/semantics/publishedVersion
dc.subject.udc
004
dc.contributor.director
Berlanga Llavorí, Rafael
dc.contributor.codirector
Sanz Blasco, Ismael
dc.embargo.terms
cap
dc.rights.accessLevel
info:eu-repo/semantics/openAccess


Documents

2013_Tesis_Pérez Catalán_María.pdf

2.478Mb PDF

This item appears in the following Collection(s)