Data Reduction Techniques in Classification Processes

Lozano Albalate, Maria Teresa

Data Reduction Techniques in Classification Processes

dc.contributor

Universitat Jaume I. Departament de Llenguatges i Sistemes Informàtics

dc.contributor.author

Lozano Albalate, Maria Teresa

dc.date.accessioned

2011-04-12T20:03:43Z

dc.date.available

2007-09-18

dc.date.issued

2007-07-25

dc.date.submitted

2007-09-18

dc.identifier.isbn

9788469084823

dc.identifier.uri

http://www.tdx.cat/TDX-0918107-132936

dc.identifier.uri

http://hdl.handle.net/10803/10479

dc.description.abstract

The learning process consists of different steps: building a Training Set (TS), training the system, testing its behaviour and finally classifying unknown objects. When using a distance based rule as a classifier, i.e. 1-Nearest Neighbour (1-NN), the first step (building a training set) includes editing and condensing data. The main reason for that is that the rules based on distance need many time to classify each unlabelled sample, x, as each distance from x to each point in the training set should be calculated. So, the more reduced the training set, the shorter the time needed for each new classification process. This thesis is mainly focused on building a training set from some already given data, and specially on condensing it; however different classification techniques are also compared. The aim of any condensing technique is to obtain a reduced training set in order to spend as few time as possible in classification. All that without a significant loss in classification accuracy. Some new approaches to training set size reduction based on prototypes are presented. These schemes basically consist of defining a small number of prototypes that represent all the original instances. That includes those approaches that select among the already existing examples (selective condensing algorithms), and those which generate new representatives (adaptive condensing algorithms). Those new reduction techniques are experimentally compared to some traditional ones, for data represented in feature spaces. In order to test them, the classical 1-NN rule is here applied. However, other classifiers (fast classifiers) have been considered here, as linear and quadratic ones constructed in dissimilarity spaces based on prototypes, in order to realize how editing and condensing concepts work for this different family of classifiers. Although the goal of the algorithms proposed in this thesis is to obtain a strongly reduced set of representatives, the performance is empirically evaluated over eleven real data sets by comparing not only the reduction rate but also the classification accuracy with those of other condensing techniques. Therefore, the ultimate aim is not only to find a strongly reduced set, but also a balanced one. Several ways to solve the same problem could be found. So, in the case of using a rule based on distance as a classifier, not only the option of reducing the training set can be afford. A different family of approaches consists of applying several searching methods. Therefore, results obtained by the use of the algorithms here presented are compared in terms of classification accuracy and time, to several efficient search techniques. Finally, the main contributions of this PhD report could be briefly summarised in four principal points. Firstly, two selective algorithms based on the idea of surrounding neighbourhood. They obtain better results than other algorithms presented here, as well as better than other traditional schemes. Secondly, a generative approach based on mixtures of Gaussians. It presents better results in classification accuracy and size reduction than traditional adaptive algorithms, and similar to those of the LVQ. Thirdly, it is shown that classification rules other than the 1-NN can be used, even leading to better results. And finally, it is deduced from the experiments carried on, that with some databases (as the ones used here) the approaches here presented execute the classification processes in less time that the efficient search techniques.

eng

dc.format.mimetype

application/pdf

dc.language.iso

eng

dc.publisher

Universitat Jaume I

dc.rights.license

ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.subject

NCN

dc.subject

dissimilarity

dc.subject

condensing

dc.subject

reduction

dc.subject

mixtures of Gaussians

dc.subject

sourrounding neighbourhood

dc.subject

Gaussians

dc.subject

dc.subject.other

Llenguatges i Sistemes Informàtics

dc.title

Data Reduction Techniques in Classification Processes

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

004

cat

dc.contributor.authoremail

lozano.mayte@gmail.com

dc.contributor.director

Sánchez Garreta, José Salvador

dc.contributor.director

Pla Bañón, Filiberto

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

Documents

lozano.pdf

2.168Mb PDF

This item appears in the following Collection(s)

Programa de Doctorat en Informàtica [94]