An effective biometric discretization approach to extract highly discriminative, informative, and privacy-protective binary representation

biomed - Lim Meng-Hui , Teoh

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

16 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Biometric discretization derives a binary string for each user based on an ordered set of biometric features. This representative string ought to be discriminative, informative, and privacy protective when it is employed as a cryptographic key in various security applications upon error correction. However, it is commonly believed that satisfying the first and the second criteria simultaneously is not feasible, and a tradeoff between them is always definite. In this article, we propose an effective fixed bit allocation-based discretization approach which involves discriminative feature extraction, discriminative feature selection, unsupervised quantization (quantization that does not utilize class information), and linearly separable subcode (LSSC)-based encoding to fulfill all the ideal properties of a binary representation extracted for cryptographic applications. In addition, we examine a number of discriminative feature-selection measures for discretization and identify the proper way of setting an important feature-selection parameter. Encouraging experimental results vindicate the feasibility of our approach.

Sujets

Quantization

Feature selection

Informations

Publié par	biomed
Publié le	01 janvier 2011
Nombre de lectures	431
Langue	English

Extrait

Lim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107
http://asp.eurasipjournals.com/content/2011/1/107
RESEARCH Open Access
An effective biometric discretization approach to
extract highly discriminative, informative, and
privacy-protective binary representation
*Meng-Hui Lim and Andrew Beng Jin Teoh
Abstract
Biometric discretization derives a binary string for each user based on an ordered set of biometric features. This
representative string ought to be discriminative, informative, and privacy protective when it is employed as a
cryptographic key in various security applications upon error correction. However, it is commonly believed that
satisfying the first and the second criteria simultaneously is not feasible, and a tradeoff between them is always
definite. In this article, we propose an effective fixed bit allocation-based discretization approach which involves
discriminative feature extraction, discriminative feature selection, unsupervised quantization (quantization that does
not utilize class information), and linearly separable subcode (LSSC)-based encoding to fulfill all the ideal properties
of a binary representation extracted for cryptographic applications. In addition, we examine a number of
discriminative feature-selection measures for discretization and identify the proper way of setting an important
feature-selection parameter. Encouraging experimental results vindicate the feasibility of our approach.
Keywords: biometric discretization, quantization, feature selection, linearly separable subcode encoding
1. Introduction equal-probable binary outputs creates a huge key space
Binary representation of biometrics has been receiving which could render an attacker clueless in guessing the
an increased amount of attention and demand in the correct output during a brute force attack. This is extre-
last decade, ever since biometric security schemes were mely essential in security provision as a malicious
widely proposed. Security applications such as bio- impersonation could take place in a straightforward
metric-based cryptographic key generation schemes manner if the correct key can be obtained by the adver-
[1-7] and biometric template protection schemes [8-13] sary with an overwhelming probability. Entropy is a
require biometric features to be present in binary form common measure of uncertainty, and it is usually a bio-
before they can be implemented in practice. However, metric system specification. By denoting the entropy of
as security is in concern, these applications require bin- abinaryrepresentationas L,itcanthenberelatedto
ary biometric representation to be the N number of outputs with probability p for i = {1,...,i
N? Discriminative: Binary representation of each user N}by . If the outputs are equal-L = − p log pi ii=1 2
ought to be highly representative and distinctive so that
probable, then the resultant entropy is maximal, that is,
it can be derived as reliably as possible upon every
L=log N. Note that the current encryption standard2
query request of a genuine user and will neither be mis-
based on the advanced encryption standard (AES) is
recognized as others nor extractable by any non-genuine
specified to be 256-bit entropy, signifying that at least
user. 256
2 possible outputs are required to withstand a brute
? Informative: Information or uncertainty contained in
force attack at the current state of art. With the consis-
the binary representation of each user should be made
tent technology advancement, adversaries will become
adequately high. In fact, the use of a huge number of
more and more powerful, resulting from the growing
capability of computers. Hence, it is utmost important
* Correspondence: bjteoh@yonsei.ac.kr to derive highly informative binary strings in coping
School of Electrical and Electronic Engineering, College of Engineering,
with the rising encryption standard in the future.
Yonsei University, Seoul, South Korea
© 2011 Lim and Teoh; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.Lim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 2 of 16
http://asp.eurasipjournals.com/content/2011/1/107
? Privacy-protective: To avoid devastated consequence ? Encoding: The second component can be regarded as
upon compromise of the irreplaceable biometric features a discrete-to-binary mapping process, where the resul-
of every user, the auxiliary information used for bit- tant index of each dimension is mapped to a unique n-
string regeneration must not be correlated to the raw or bit binary codeword of an encoding scheme. Next, the
projected features. In the case of system compromise, codeword output of every feature dimension is concate-
such non-correlation of the auxiliary information should nated to form the final bit string of a user. The discreti-
be guaranteed to impede any adversarial reverse engi- zation performance is finally evaluated in the Hamming
domain.neering attempt in obtaining the raw features. Other-
These two components are governed by a static or awise, it has no difference from storing the biometric
features in the clear in the system database. dynamic bit allocation algorithm, determining whether
To date, only a handful of biometric modalities such thequantityofbinarybitsallocated to every dimension
as iris [14] and palm print [15] have their features repre- is fixed or varied, respectively. Besides, if the (genuine
sented in the binary form upon an initial feature-extrac- or/and imposter) class information is used in determin-
tion process. Instead, many remain being represented in ing the cut points (intervals’ boundaries) of the non-
the continuous domain upon the feature extraction. overlapping quantization intervals, the discretization is
Therefore, an additional process in a biometric system is thus known as supervised discretization [1,3,16], and
needed to transform these inherently continuous fea- otherwise, it is referred to as unsupervised discretization
tures into a binary string (per user), known as the bio- [7,17-19].
metric discretization process. Figure 1 depicts the On the other hand, information about the constructed
general block diagram of a biometric discretization- intervals of each dimension is stored as the helper data
based binary string generator that employs a biometric during enrolment so as to assist reproducing the same
discretization scheme. binary string of each genuine user during the verifica-
In general, most biometric discretization can be tion phase. However, similar to the security and the
decomposed into two essential components, which can privacy requirements of the binary representation, it is
be alternatively described as a two-stage mapping important that such helper data, upon compromise,
process: should neither leak any helpful information about the
? Quantization: The first component can be seen as a output binary string (security concern), nor the bio-
continuous-to-discrete mapping process. Given a set of metric feature itself (privacy concern).
feature elements per user, every one-dimensional feature
space is initially constructed and segmented into a num- 1.1 Previous works
ber of non-overlapping intervals where each of which is Over the last decade, numerous biometric discretization
associated to a decimal index. techniques for producing a binary string from a given

Figure 1 A biometric discretization-based binary string generator.Lim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 3 of 16
http://asp.eurasipjournals.com/content/2011/1/107
set of features of each user have been reported. These direct binary representation (DBR) encoding elements (i.
schemes base upon either a fixed-bit allocation principle e. 3 ® 011,4 ® 100,5 ® 101 ). On the other10 2 10 2 10 2
(assigning a fixed number of bits to each feature dimen- hand, Chang et al. extend each feature space to account
n
sion) [4-7,10,13,16,20] or a dynamic-bit allocation prin- for the extra equal-width intervals to form 2 intervals
n
ciple (assigning a different number of bits to each in accordance to the entire 2 codeword labels from
feature dimension) [1,3,17-19,21]. each n-bit DBR encoding scheme.
Monrose et al. [4,5], Teoh et al. [6], and Verbitsky et Although both these schemes are able to generate bin-
al. [13] partition each feature space into two intervals ary strings of arbitrary length, they turn out to be
(labeled by ‘0’ and ‘1’) based on a prefix threshold. Tuyls greatly inefficient, since the ad-hoc interval handling
et al. [12] and Kevenaar et al. [9] have used a similar 1- strategies may probably result in considerable leakage of
bit discretization technique, but instead of fixing the entropy which will jeopardize the security of the users.
threshold, the mean of the background probability den- In particular, the non-feasible labels of all extra intervals
sity function (for modeling inter-class variation) is (including the boundary intervals) would allow an adver-
selected as the threshold in each dimension. Further, sary to eliminate the corresponding codeword labels
reliable components are identified based on either the fromherorhisoutput-guessingrangeafterobse