Semantic-based approach for the discovery of Life Sciences web resources driven by rich user’s requirements

Autor/a

Pérez Catalán, María

Director/a

Berlanga Llavorí, Rafael

Codirector/a

Sanz Blasco, Ismael

Fecha de defensa

2013-09-12

Páginas

168 p.



Departamento/Instituto

Universitat Jaume I. Departament d'Enginyeria i Ciència dels Computadors

Resumen

Web resources have been gaining popularity as providers of relevant data, whether those stored in datasets or those resulting from the execution of complex functions such as the alignment of protein sequences. Although the discovery of web resources has been largely studied, it is still a challenging research task due to the high dependency current search engines have on the characteristics of the available metadata. In some domains like Life Sciences, this dependency becomes even worse due to the heterogeneity of data. Current web resource registries allow users to search for resources that fulfill their information needs. The discovery in these registries is mainly based on the use of well-defined metadata, which is usually limited and very specific, and on the string matching of the user's query keywords, which is hampered by the heterogeneity of data. The main objective of this thesis is to assist the users in the discovery of the most appropriate resources for their information needs, specifically in the Life Sciences domain. The achievement of this objective implies addressing the main limitations of current web resource registries. Firstly, web resource discovery is driven by the user's requirements and, therefore, the precision of its results depends on how well the user's information needs are described in the requirements specification. Thus, rich requirements specifications are assumed to obtain more precise results. In the proposed approach, the requirements specification consists of a rich description of both the functionality and relevant features of the required resource. Additionally, discovery parameters are customizable by the users in order to improve the accuracy of the process. Secondly, the discovery depends heavily on the characteristics of the resources metadata. In many registries, resources are described with well-defined metadata, e.g., categories, and with textual descriptions, which provide richer information but harder automatic processing. In order to alleviate this dependency, this thesis proposes a normalization process which addresses the heterogeneity of data, and automatically identifies relevant information implicitly described in the resources metadata. Then, the discovery of web resources considers the normalized data, reducing words mismatchings, alleviating the problem of using different vocabularies, and improving the characterization of resources. Finally, whereas current registries provide the user with a list of resources without any information about their relevance to her requirements, in the proposed approach the user is prompted with a ranked list of resources according to the fulfillment of her information needs, and to the accomplishment of the user-defined features. In this way, the system assists the user until the end of the discovery process, providing her information relevant to the selection of the best suited resource. The experimental evaluation performed on each phase of the discovery method demonstrates that the proposed techniques obtain good results. Moreover, the discovery method has been implemented as part of BioUSeR, an online tool for the discovery of Life Sciences web resources. In BioUSeR, the results of each phase of the discovery process are visualized, and the parameters and the data involved in the process are easily customized by the user. We have used BioUSeR to demonstrate the usefulness of our approach using real usage examples.

Palabras clave

Informática; Diseño y componentes de sistemas de información

Materias

004 - Informática

Área de conocimiento

Sistemas informáticos

Documentos

2013_Tesis_Pérez Catalán_María.pdf

2.478Mb

 

Derechos

ADVERTIMENT. Tots els drets reservats. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

Este ítem aparece en la(s) siguiente(s) colección(ones)