On distributing the analysis process of a broad-coverage unification-based grammar of spanish

Marimon Felipe, Montserrat

On distributing the analysis process of a broad-coverage unification-based grammar of spanish

dc.contributor

Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació

dc.contributor.author

Marimon Felipe, Montserrat

dc.date.accessioned

2011-04-12T15:01:47Z

dc.date.available

2003-09-03

dc.date.issued

2003-03-28

dc.date.submitted

2003-07-31

dc.identifier.isbn

8468835226

dc.identifier.uri

http://www.tdx.cat/TDX-0731103-075511

dc.identifier.uri

http://hdl.handle.net/10803/5930

dc.description.abstract

This thesis describes research into the development and deployment of engineered large-scale unification-based grammar to provide more robust and efficient deep grammatical analysis of linguistic expressions in real-world applications, while maintaining the accuracy of the grammar (i.e. percentage of input sentences that receive the correct analysis) and keeping its precision up to a reasonable level (i.e. percentage of input sentences that received no superfluous analysis). In tacking the efficiency problem, our approach has been to prune the search space of the parser by integrating shallow and deep processing. We propose and implement a NLP system which integrates a Part-of-Speech (PoS) tagger and chunker as a pre-processing module of broad-coverage nification-based grammar of Spanish. This allows us to release the arser from certain tasks that may be efficiently and reliably dealt with by these computationally less expensive processing techniques. On the one hand, by integrating the morpho-syntactic information delivered by the PoS tagger, we reduce the number of morpho-syntactic ambiguities of the linguistic expression to be analyzed. On the other hand, by integrating chunk mark-ups delivered by the partial parser, we do notonly avoid generating irrelevant constituents which are not to contribute to the final parse tree, but we also provide part of the structure that the analysis component has to compute, thus, avoiding a duplication of efforts. In addition, we want our system to be able to maintain the accuracy of the high-level grammar. In the integrated architecture we propose, we keep the ambiguities which can not be reliably solved by the PoS tagger to be dealt with by the linguistic components of the grammar performing deep analysis. Besides improving the efficiency of the overall analysis process and maintaining the accuracy of the grammar, our system provides both structural and lexical robustness to the high-level processing. Structural robustness is obtained by integrating into the linguistic components of the high-level grammar the structures which have already been parsed by the chunker such that they do not need to be re-built by phrase structure rules. This allows us to extend the coverage of the grammar to deal with very low frequent constructions whose treatment would increase drastically the parsing search space and would create spurious ambiguity. To provide lexical robustness to the system, we have implemented default lexical entries. Default lexical entries are lexical entry templates that are activated when the system can not find a particular lexical entry to apply. Here, the integration of the tagger, which supplies the PoS information to the linguistic processing modules of our system, allows us to increase robustness while avoiding increase in morphological ambiguity. Better precision is achieved by extending the PoS tags of our external lexicon so that they include syntactic information, for instance subcategorization information.

eng

dc.format.mimetype

application/pdf

dc.language.iso

eng

dc.publisher

Universitat Politècnica de Catalunya

dc.rights.license

ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.subject

lingüística computacional

dc.subject

processament del llenguatge natural

dc.subject.other

5701. Lingüística Aplicada

dc.title

On distributing the analysis process of a broad-coverage unification-based grammar of spanish

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

004

cat

dc.subject.udc

cat

dc.contributor.director

Bel Rafecas, Núria

dc.contributor.codirector

Theofilidis, Axel

dc.contributor.tutor

Martín Rioja, Josep Andreu

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

cat

dc.identifier.doi

https://dx.doi.org/10.5821/dissertation-2117-93245

dc.identifier.dl

B.44817-2003

dc.description.degree

DOCTORAT EN FORMALITZACIÓ DEL LLENGUATGE

Documents

TESI.pdf

2.150Mb PDF

This item appears in the following Collection(s)

Programa de Doctorat en Formalització del Llenguatge [1]

Àrea de contingut