Data Reduction Techniques in Classification Processes

dc.contributor
Universitat Jaume I. Departament de Llenguatges i Sistemes Informàtics
dc.contributor.author
Lozano Albalate, Maria Teresa
dc.date.accessioned
2011-04-12T20:03:43Z
dc.date.available
2007-09-18
dc.date.issued
2007-07-25
dc.date.submitted
2007-09-18
dc.identifier.isbn
9788469084823
dc.identifier.uri
http://www.tdx.cat/TDX-0918107-132936
dc.identifier.uri
http://hdl.handle.net/10803/10479
dc.description.abstract
The learning process consists of different steps: building a Training Set (TS), training the system, testing its behaviour and finally classifying unknown objects. When using a distance based rule as a classifier, i.e. 1-Nearest Neighbour (1-NN), the first step (building a training set) includes editing and condensing data. The main reason for that is that the rules based on distance need many time to classify each unlabelled sample, x, as each distance from x to each point in the training set should be calculated. So, the more reduced the training set, the shorter the time needed for each new classification process. This thesis is mainly focused on building a training set from some already given data, and specially on condensing it; however different classification techniques are also compared.<br/>The aim of any condensing technique is to obtain a reduced training set in order to spend as few time as possible in classification. All that without a significant loss in classification accuracy. Some<br/>new approaches to training set size reduction based on prototypes are presented. These schemes basically consist of defining a small number of prototypes that represent all the original instances. That includes those approaches that select among the already existing examples (selective condensing algorithms), and those which generate new representatives (adaptive condensing algorithms).<br/>Those new reduction techniques are experimentally compared to some traditional ones, for data represented in feature spaces. In order to test them, the classical 1-NN rule is here applied. However, other classifiers (fast classifiers) have been considered here, as linear and quadratic ones constructed in dissimilarity spaces based on prototypes, in order to realize how editing and condensing concepts work for this different family of classifiers.<br/>Although the goal of the algorithms proposed in this thesis is to obtain a strongly reduced set of representatives, the performance is empirically evaluated over eleven real data sets by comparing not only the reduction rate but also the classification accuracy with those of other condensing techniques. Therefore, the ultimate aim is not only to find a strongly reduced set, but also a balanced one.<br/>Several ways to solve the same problem could be found. So, in the case of using a rule based on distance as a classifier, not only the option of reducing the training set can be afford. A different family of approaches consists of applying several searching methods. Therefore, results obtained by the use of the algorithms here presented are compared in terms of classification accuracy and time, to several efficient search techniques.<br/>Finally, the main contributions of this PhD report could be briefly summarised in four principal points. Firstly, two selective algorithms based on the idea of surrounding neighbourhood. They obtain better results than other algorithms presented here, as well as better than other traditional schemes. Secondly, a generative approach based on mixtures of Gaussians. It presents better results in classification accuracy and size reduction than traditional adaptive algorithms, and similar to those of the LVQ. Thirdly, it is shown that classification rules other than the 1-NN can be used, even leading to better results. And finally, it is deduced from the experiments carried on, that with some databases (as the ones used here) the approaches here presented execute the classification processes in less time that the efficient search techniques.
eng
dc.format.mimetype
application/pdf
dc.language.iso
eng
dc.publisher
Universitat Jaume I
dc.rights.license
ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.
dc.source
TDX (Tesis Doctorals en Xarxa)
dc.subject
NCN
dc.subject
dissimilarity
dc.subject
condensing
dc.subject
reduction
dc.subject
mixtures of Gaussians
dc.subject
sourrounding neighbourhood
dc.subject
Gaussians
dc.subject
NN
dc.subject.other
Llenguatges i Sistemes Informàtics
dc.title
Data Reduction Techniques in Classification Processes
dc.type
info:eu-repo/semantics/doctoralThesis
dc.type
info:eu-repo/semantics/publishedVersion
dc.subject.udc
004
cat
dc.contributor.authoremail
lozano.mayte@gmail.com
dc.contributor.director
Sánchez Garreta, José Salvador
dc.contributor.director
Pla Bañón, Filiberto
dc.rights.accessLevel
info:eu-repo/semantics/openAccess


Documents

lozano.pdf

2.168Mb PDF

This item appears in the following Collection(s)