CLUSTERING ON DISSIMILARITY REPRESENTATIONS FOR DETECTING MISLABELLED SEISMIC SIGNALS AT NEVADO DEL RUIZ VOLCANO

erevistas - Mauricio Orozco-Alzate

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

8 pages

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

ABSTRACT
Classification of seismic signals at Colombian volcanoes has been carried out manually by visual inspection. In order to reduce the workload for the seismic analysts and to turn classification reliable and objective, the use of supervised learning algorithms has been explored
particularly classifiers built in dissimilarity spaces. Nonetheless, the performance of such learning methods is subject to the availability of a representative and a priori well classified training sets. To detect mislabeled events, the use of clustering techniques on the dissimilarity representations is proposed. Our experiments, performed on re-analyzed seismic signals, show a significant improvement respect to recognition accuracies for the original data sets.
RESUMEN
La clasificación de las señales sísmicas en los volcanes de Colombia ha sido llevada a cabo manualmente mediante inspección visual. Con el fin de reducir la carga de trabajo de los analistas y para tornar la clasificación confiable y objetiva, se ha explorado el uso de algoritmos de aprendizaje supervisado
particularmente, clasificadores construidos en espacios de disimilitud. No obstante, el desempeño de dichos métodos de aprendizaje está sujeto a la disponibilidad de un conjunto de entrenamiento representativo y, a priori, bien clasificado. Para detectar eventos mal clasificados, se propone el uso de técnicas de agrupamiento sobre las representaciones de disimilitud. Los experimentos, realizados sobre las señales sísmicas verificadas, muestran una mejora significativa respecto a las tasas de reconocimiento para los datos originales.

Sujets

Clustering

Similarity

Informations

Publié par	erevistas
Publié le	01 janvier 2007
Nombre de lectures	45

Extrait

EARTH SCIENCES
RESEARCH JOURNAL
Earth Sci. Res. J. Vol. 11, No. 2 (December 2007): 131-138
CLUSTERING ON DISSIMILARITY REPRESENTATIONS FOR DETECTING
MISLABELLED SEISMIC SIGNALS AT NEVADO DEL RUIZ VOLCANO
Mauricio Orozco-Alzate, and César Germán Castellanos-Domínguez
Universidad Nacional de Colombia Sede Manizales, Grupo de Control y Procesamiento Digital de
Señales, Campus La Nubia, km 7 vía al Magdalena, Manizales, Colombia.
Corresponding author: Mauricio Orozco-Alzate, email: morozcoa@unal.edu.co
ABSTRACT
Classifcation of seismic signals at Colombian volcanoes has been carried out manually by visual
inspection. In order to reduce the workload for the seismic analysts and to turn classifcation reliable
and objective, the use of supervised learning algorithms has been explored; particularly classifers
built in dissimilarity spaces. Nonetheless, the performance of such learning methods is subject to the
availability of a representative and a priori well classifed training sets. To detect mislabeled events,
the use of clustering techniques on the dissimilarity representations is proposed. Our experiments,
performed on re-analyzed seismic signals, show a signifcant improvement respect to recognition
accuracies for the original data sets.
Key words: Clustering, dissimilarity, mislabeling, seismic signals.
RESUMEN
La clasifcación de las señales sísmicas en los volcanes de Colombia ha sido llevada a cabo manualmente
mediante inspección visual. Con el fn de reducir la carga de trabajo de los analistas y para tornar la
clasifcación confable y objetiva, se ha explorado el uso de algoritmos de aprendizaje supervisado;
particularmente, clasifcadores construidos en espacios de disimilitud. No obstante, el desempeño
de dichos métodos de aprendizaje está sujeto a la disponibilidad de un conjunto de entrenamiento
representativo y, a priori, bien clasifcado. Para detectar eventos mal clasifcados, se propone el uso
de técnicas de agrupamiento sobre las representaciones de disimilitud. Los experimentos, realizados
sobre las señales sísmicas verifcadas, muestran una mejora signifcativa respecto a las tasas de
reconocimiento para los datos originales.
Palabras claves: Agrupamiento, disimilitud, etiquetado incorrecto, señales sísmicas.
Manuscript received September 9 2007.
Accepted for publication November 30 2007.

131CLUSTERING ON DISSIMILARITY REPRESENTATIONS FOR DETECTING MISLABELLED SEISMIC SIGNALS
AT NEVADO DEL RUIZ VOLCANO
INTRODUCTION opinion is requested just in case of serious
doubt. As a result, classifcations performed
In many applications of pattern recognition, by different experts are not available and an
it is extremely diffcult or expensive, or even analysis of concordance for such a-priori
impossible, to reliably label a training sample labels was not conducted. In this study, a
with its true category (Jain et al., 2000). revision of the original labelled Nevado del
Particularly, in automatic classifcation of Ruiz volcano (Ruiz) data set is conducted. In
seismic-volcanic signals, night and rotating contrast to the approach followed by Langer
shift work schedules, tedious evaluations, et al. (2006), the revision by using clustering
and changes of personnel turn the task of techniques was automated.
recognition by visual inspection susceptible
to human errors. Besides, analysts often Several clustering algorithms on a given
engage in differences of opinion about data set were used due to the lack of a
interpretations of dubitable signals. single appropriate clustering algorithm
(Jain et al., 2000). Therefore, experiments
In order to reduce the workload for the were conducted by using the most popular
seismic analyst and the risks associated clustering approaches, which belong to two
to subjective judgments, a number of basic strategies: hierarchical and partitioning
supervised classifcation methods have been methods. In addition, the Ruiz data set was
used (Scarpetta et al., 2005; Langer et al., arranged to consider two separated problems:
2006; Orozco-Alzate et al., 2006a). It is the Ruiz-VT,LP (two classes) and the Ruiz-
supposed for those supervised classifcation all (three classes) data sets. Revised data
techniques that a well-labeled data set is sets were used according to our previous
available. However, due to the same reasons dissimilarity-based classifcation approach
cited above, it is highly likely that training (Orozco et al., 2006a, Orozco et al., 2006b)
sets include mislabeled events. and compared against the performances
obtained with the original data sets.
In Langer et al. (2006), an automatic
classifcation of seismic events at Soufrière DISSIMILARITY REPRESENTATION
Hill volcano was carried out. In addition, a AND CLASSIFIER
careful manual revision of the original a-priori
classifcation was achieved by an expert not Differences in spectral content allowed a
involved in the previous labeling of the data visual discrimination of different types of
set. It was found that a considerable number volcanic earthquakes. Therefore, spectra
of the events were erroneously attributed of seismic records are commonly used for
to other classes. As a result, a remarkable classifcation and monitoring of seismic
improvement in classifcation accuracy activity (Zobin, 2003). In addition, recent
was obtained when the revised data set was studies have claimed that the dissimilarity-
used. based classifcation approach is a feasible
and sometimes advantageous alternative to
The Nevado del Ruiz Volcano is monitored the feature-based method (Duin et al., 1998,
by the Volcanological and Seismological Pękalska et al., 2001, Pękalska and Duin,
Observatory at Manizales (VSOM). Because 2002, Paclík and Duin, 2003b, Pękalska
of the considerable amount of data, the and Duin, 2005). According to those facts,
labelling task of the recorded seismic signals a dissimilarity representation for the Ruiz
is distributed among several analysts (e.g. one data set can be derived as follows: (i) the
trainee per volcanic station). A second or third power spectral density (PSD) for each

132Orozco and Castellanos. ESRJ Vol. 11, No. 2. December 2007
Figure 1. Dissimilarity measure as the difference between normalized spectra.
record is estimated via the Yule-Walker et al., 2006b), the linear normal density
autoregressive method: DC bias must be based classifer (BayesNL) outperformed
removed before computing the spectra, (ii) the nearest neighbor rule 1-NN and the
a dissimilarity measure between normalized quadratic normal density based classifer
spectra is calculated as the area difference of (BayesNQ). For a two-class problem, the
the non-overlapping parts (L -norm) between BayesNL classifer is given by
1
spectra, see Fig. 1.
Figure 1. Dissimilarity measure as the
difference between normalized spectra.
T P1   1 1f ( D ( x , R ))  D ( x , R ) m  m  Cm  m  l og 1  2  1  2 2 P   2A dissimilarity matrix D(T,T) was constructed
by having those pairwise measures. Each (1)
entry d of D corresponds to the dissimilarity
ij
between a pair of seismic records from the where C is the sample covariance matrix;
training set T. Then, a proper classifer can be m , m are the mean vectors and P ,
(1) (2) (1)
defned on such a dissimilarity representation, P are the class prior probabilities. If C is
(2)
either by using the entire training set T or a singular, a regularized version must be used.
representation set R⊆T. The following regularization is typically
used with λ equals 0.01 or less (Pękalska et
Linear Normal Density Based Classifer al., 2006):
A number of studies have showed that
normal density based classifers perform
C 1   C   diag Cre gwell in dissimilarity spaces (Pękalska et
. (2)al., 2001, Pękalska and Duin, 2002, Paclík
and Duin, 2003b, Paclík and Duin, 2003a,
CLUSTERING TECHNIQUESPękalska et al., 2004, Orozco et al., 2006a).
Particularly, in our previous study with the
Unsupervised classifcation refers to Nevado del Ruiz volcano data set (Orozco

133CLUSTERING ON DISSIMILARITY REPRESENTATIONS FOR DETECTING MISLABELLED SEISMIC SIGNALS
AT NEVADO DEL RUIZ VOLCANO
situations where the objective is to construct Partitioning clustering
decision boundaries based on unlabeled
Partitioning methods group the objects into k training data (Jain et al., 2000). Hierarchical
and partitioning methods are the two basic clusters, usually by using representatives or
strategies to fnd clusters. In this study, by assuming a specifc geometrical structure.
the following clustering techniques are Objects are assigned to the clusters, new
used: single linkage (SL), average linkage representatives are estimated and the
process is repeated until a stable solution is (AL), com