16
pages

Voir plus
Voir moins

Vous aimerez aussi

http://asp.eurasipjournals.com/content/2011/1/107

RESEARCH Open Access

An effective biometric discretization approach to

extract highly discriminative, informative, and

privacy-protective binary representation

*Meng-Hui Lim and Andrew Beng Jin Teoh

Abstract

Biometric discretization derives a binary string for each user based on an ordered set of biometric features. This

representative string ought to be discriminative, informative, and privacy protective when it is employed as a

cryptographic key in various security applications upon error correction. However, it is commonly believed that

satisfying the first and the second criteria simultaneously is not feasible, and a tradeoff between them is always

definite. In this article, we propose an effective fixed bit allocation-based discretization approach which involves

discriminative feature extraction, discriminative feature selection, unsupervised quantization (quantization that does

not utilize class information), and linearly separable subcode (LSSC)-based encoding to fulfill all the ideal properties

of a binary representation extracted for cryptographic applications. In addition, we examine a number of

discriminative feature-selection measures for discretization and identify the proper way of setting an important

feature-selection parameter. Encouraging experimental results vindicate the feasibility of our approach.

Keywords: biometric discretization, quantization, feature selection, linearly separable subcode encoding

1. Introduction equal-probable binary outputs creates a huge key space

Binary representation of biometrics has been receiving which could render an attacker clueless in guessing the

an increased amount of attention and demand in the correct output during a brute force attack. This is extre-

last decade, ever since biometric security schemes were mely essential in security provision as a malicious

widely proposed. Security applications such as bio- impersonation could take place in a straightforward

metric-based cryptographic key generation schemes manner if the correct key can be obtained by the adver-

[1-7] and biometric template protection schemes [8-13] sary with an overwhelming probability. Entropy is a

require biometric features to be present in binary form common measure of uncertainty, and it is usually a bio-

before they can be implemented in practice. However, metric system specification. By denoting the entropy of

as security is in concern, these applications require bin- abinaryrepresentationas L,itcanthenberelatedto

ary biometric representation to be the N number of outputs with probability p for i = {1,...,i

N? Discriminative: Binary representation of each user N}by . If the outputs are equal-L = − p log pi ii=1 2

ought to be highly representative and distinctive so that

probable, then the resultant entropy is maximal, that is,

it can be derived as reliably as possible upon every

L=log N. Note that the current encryption standard2

query request of a genuine user and will neither be mis-

based on the advanced encryption standard (AES) is

recognized as others nor extractable by any non-genuine

specified to be 256-bit entropy, signifying that at least

user. 256

2 possible outputs are required to withstand a brute

? Informative: Information or uncertainty contained in

force attack at the current state of art. With the consis-

the binary representation of each user should be made

tent technology advancement, adversaries will become

adequately high. In fact, the use of a huge number of

more and more powerful, resulting from the growing

capability of computers. Hence, it is utmost important

* Correspondence: bjteoh@yonsei.ac.kr to derive highly informative binary strings in coping

School of Electrical and Electronic Engineering, College of Engineering,

with the rising encryption standard in the future.

Yonsei University, Seoul, South Korea

© 2011 Lim and Teoh; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons

Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in

any medium, provided the original work is properly cited.Lim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 2 of 16

http://asp.eurasipjournals.com/content/2011/1/107

? Privacy-protective: To avoid devastated consequence ? Encoding: The second component can be regarded as

upon compromise of the irreplaceable biometric features a discrete-to-binary mapping process, where the resul-

of every user, the auxiliary information used for bit- tant index of each dimension is mapped to a unique n-

string regeneration must not be correlated to the raw or bit binary codeword of an encoding scheme. Next, the

projected features. In the case of system compromise, codeword output of every feature dimension is concate-

such non-correlation of the auxiliary information should nated to form the final bit string of a user. The discreti-

be guaranteed to impede any adversarial reverse engi- zation performance is finally evaluated in the Hamming

domain.neering attempt in obtaining the raw features. Other-

These two components are governed by a static or awise, it has no difference from storing the biometric

features in the clear in the system database. dynamic bit allocation algorithm, determining whether

To date, only a handful of biometric modalities such thequantityofbinarybitsallocated to every dimension

as iris [14] and palm print [15] have their features repre- is fixed or varied, respectively. Besides, if the (genuine

sented in the binary form upon an initial feature-extrac- or/and imposter) class information is used in determin-

tion process. Instead, many remain being represented in ing the cut points (intervals’ boundaries) of the non-

the continuous domain upon the feature extraction. overlapping quantization intervals, the discretization is

Therefore, an additional process in a biometric system is thus known as supervised discretization [1,3,16], and

needed to transform these inherently continuous fea- otherwise, it is referred to as unsupervised discretization

tures into a binary string (per user), known as the bio- [7,17-19].

metric discretization process. Figure 1 depicts the On the other hand, information about the constructed

general block diagram of a biometric discretization- intervals of each dimension is stored as the helper data

based binary string generator that employs a biometric during enrolment so as to assist reproducing the same

discretization scheme. binary string of each genuine user during the verifica-

In general, most biometric discretization can be tion phase. However, similar to the security and the

decomposed into two essential components, which can privacy requirements of the binary representation, it is

be alternatively described as a two-stage mapping important that such helper data, upon compromise,

process: should neither leak any helpful information about the

? Quantization: The first component can be seen as a output binary string (security concern), nor the bio-

continuous-to-discrete mapping process. Given a set of metric feature itself (privacy concern).

feature elements per user, every one-dimensional feature

space is initially constructed and segmented into a num- 1.1 Previous works

ber of non-overlapping intervals where each of which is Over the last decade, numerous biometric discretization

associated to a decimal index. techniques for producing a binary string from a given

Figure 1 A biometric discretization-based binary string generator.Lim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 3 of 16

http://asp.eurasipjournals.com/content/2011/1/107

set of features of each user have been reported. These direct binary representation (DBR) encoding elements (i.

schemes base upon either a fixed-bit allocation principle e. 3 ® 011,4 ® 100,5 ® 101 ). On the other10 2 10 2 10 2

(assigning a fixed number of bits to each feature dimen- hand, Chang et al. extend each feature space to account

n

sion) [4-7,10,13,16,20] or a dynamic-bit allocation prin- for the extra equal-width intervals to form 2 intervals

n

ciple (assigning a different number of bits to each in accordance to the entire 2 codeword labels from

feature dimension) [1,3,17-19,21]. each n-bit DBR encoding scheme.

Monrose et al. [4,5], Teoh et al. [6], and Verbitsky et Although both these schemes are able to generate bin-

al. [13] partition each feature space into two intervals ary strings of arbitrary length, they turn out to be

(labeled by ‘0’ and ‘1’) based on a prefix threshold. Tuyls greatly inefficient, since the ad-hoc interval handling

et al. [12] and Kevenaar et al. [9] have used a similar 1- strategies may probably result in considerable leakage of

bit discretization technique, but instead of fixing the entropy which will jeopardize the security of the users.

threshold, the mean of the background probability den- In particular, the non-feasible labels of all extra intervals

sity function (for modeling inter-class variation) is (including the boundary intervals) would allow an adver-

selected as the threshold in each dimension. Further, sary to eliminate the corresponding codeword labels

reliable components are identified based on either the fromherorhisoutput-guessingrangeafterobserving

training bit statistics [12] or a reliability (RL) function the helper data, or after reliably identifying the “fake”

[9] so that unreliable dimensions can be eliminated intervals. Apart from this security issue, another critical

from bits’ extraction. problem with these two schemes is the potential expo-

Kelkboom et al. have analytically expressed the genu- sure of the exact location of each genuine user pdf.

ine and imposter bit error probability [22] and subse- Based on the knowledge that the user pdf is located at

quently modeled a discretization framework [23] to the center of the genuine interval, the constructed inter-

analytically estimate the genuine and imposter Ham- vals thus serve as a clue at which the user pdf could be

ming distance probability mass functions (pmf) of a bio- located to the adversary. As a result, the possible loca-

metric system. This model is based upon a static 1-bit tions of user pdf could be reduced to the amount of

equal-probable discretization under the assumption that quantization intervals in that dimension, thus potentially

both intra-class and inter-class variations are Gaussian facilitating malicious privacy violation attempt.

distributed. Chen et al. [16] demonstrated a likelihood-ratio-based

Han et al. [20] proposed a discretization technique to multi-bit biometric discretization scheme which is like-

extract a 9-bit pin from each user’s fingerprint impres- wise to be supervised and user specific. The quantiza-

sions. The discretization derives the first 6 bits from six tion scheme first constructs the genuine interval to

pre-identified reliable/stable minutiae: If a minutia accommodate the likelihood ratio (LR) detected in that

belongs to bifurcation, a bit “0” is assigned; otherwise, if dimension and creates the remaining intervals in an

it is a ridge ending, a bit “1” is assigned. The derivation equal-probable (EP) manner so that the background

of the last 3 bits is constituted by a single-bit discretiza- probability mass is equally distributed within every

tion on each of three triangular features. If the biometric interval. The leftmost and rightmost boundary intervals

password/pin is used directly as a cryptographic key in with insufficient background probability mass are

security applications, it will be too short to survive brute wrapped into a single interval that is tagged with a com-

force attacks, as an adversary would only require at mon codeword label from the binary reflected gray code

most 512 attempts to crack the biometric password. (BRGC)-encoding scheme [24] (i.e., 3 ® 010,4 ®10 2 10

Hao and Chan [3] and Chang et al. [1] employed a 110,5 ® 111 ). This discretization scheme suffers2 10 2

multi-bit supervised user-specific biometric discretiza- from the same privacy problem as the previous super-

tion scheme, each with a different interval-handling vised schemes owing to that the genuine interval is con-

technique. Both schemes initially fix the position of the structed based on the user-specific information.

genuine interval of each dimension dimension around Yip et al. [7] presented an unsupervised, non-user spe-

the modeled pdf of the jth user: [μ -ks , μ +ks]and cific, multi-bit discretization scheme based on equal-j j j j

then construct the remaining intervals based on a con- width intervals’ quantization and BRGC encoding. This

stant width of 2ks within every feature space. Here, μ scheme adopts the entire BRGC code for labeling, andj j

and s denote mean and standard deviation (SD) of the therefore, it is free from the entropy loss problem.j

user pdf, respectively and k is a free parameter. As for Furthermore, since it does not make use of the user pdf

the boundary portions at both ends on each feature to determine the cut points of the quantization intervals,

space, Hao and Chan unfold every feature space arbitra- this scheme does not seem to suffer from the aforemen-

rily to include all the remaining possible feature values tioned privacy problem.

in forming the leftmost and rightmost boundary inter- Teoh et al. [18,19] developed a bit-allocation approach

vals. Then, all the constructed intervals are labeled with based on an unsupervised equal-width quantization withLim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 4 of 16

http://asp.eurasipjournals.com/content/2011/1/107

a BRGC-encoding scheme to compose a long binary

string per user by assigning different number of bits to

each feature dimension according to the SD of each esti-

mated user pdf. Particularly, the intention is to assign a

larger quantity of binary bits to discriminative dimen-

sions and smaller otherwise. In other words, the larger

the SD of a user pdf is detected to be, the lesser the

quantity of bits will be assigned to that dimension and

vice versa. Nevertheless, the length of the binary string

is not decided based on the actual position of the pdf

itself in the feature space. Although this scheme is

invulnerable to the privacy weakness, such a deciding

strategy gives a less accurate bit allocation: A user pdf

falling across an interval boundary may result in an

undesired intra-class variation in the Hamming domain

and thus should not be prioritized for bit extraction.

Another concern is that pure SD might not be a pro-

mising discriminative measure.

Chen et al. [17] introduced another dynamic bit-allo-

cation approach by considering detection rate (DR)

(user probability mass captured by the genuine interval)

Figure 2 An indefinite discrete-to-binary mapping from each

as their bit-allocation measure. The scheme, known as discrete-labelled quantization interval to a 3-bit BRGC

DR-optimized bit-allocation (DROBA), employs an codeword. The labelg(b) in each on the continuous feature

space can be understood by “index number (associated codeword)”.equal-probable quantization intervals construction with

BRGC encoding. Similar to Teoh et al.’s dynamic bit

allocation scheme, this scheme assigns more bits to

Linearly separable Subcode (LSSC) [25] has been putmore discriminative feature dimensions and vice versa.

forward to resolve such a performance-entropy tradeoffRecently, Chen et al. [21] developed a similar dynamic

by introducing bit redundancy to maintain the perfor-bit-allocation algorithm based on optimizing a different

manceaccuracywhenahighentropyrequirementisbit-allocation measure: area under the FRR curve. Given

imposed. Although the resultant LSSC-extracted binarythe bit-error probability, the scheme allocates bits dyna-

strings require a larger bit length in addressing an 8-mically to every feature component in a similar way as

interval discretization problem as exemplified in FigureDROBA except that the analytic area under the FRR

3, mapping discrete elements to the Hamming spacecurve for Hamming distance evaluation is minimized

becomes completely definite.instead of DR maximization.

This article focuses on discretization basing upon the

fixed bit-allocation principle. We extend the study of1.2 Motivation and contributions

[25] to tackle the open problem of generating desirableIt has been recently justified that DBR- and BRGC-

binary strings that are simultaneously highly discrimina-encoding-based discretization could not guarantee a dis-

tive, informative, and privacy-protective by means of dis-criminative performance when a large per-dimensional

cretization based on LSSC. Specifically, we adopt aentropy requirement is imposed [25]. The reason lies in

discriminative feature extraction with a further featurethe underlying indefinite feature mapping of DBR and

selection to extract discriminative feature components;BRGC codes from a discrete to a Hamming space, caus-

an unsupervised quantization approach to offer promis-ing the actual distance dissimilarity in the Hamming

ing privacy protection; and an LSSC encoding to achievedomain unable to be maintained. As a result, feature

large entropy without having to sacrifice the actual clas-points from multiple different intervals may be mapped

sification performance accuracy of the discriminativeto DBR or BRGC codewords which share a common

feature components. Note that the preliminary idea ofHammingdistanceawayfromareferencecodeword,as

this article has appeared in the context of global discre-illustrated by the 3-bit discretization instance in Figure

tization [26] for achieving strong security and privacy2. For this reason, regardless of how discriminative the

protection with high training efficiency.extracted (real-valued) features could be, deriving discri-

In general, the significance of our contribution isminative and informative binary strings with DBR or

three-fold:BRGC encoding will not be practically feasible.Lim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 5 of 16

http://asp.eurasipjournals.com/content/2011/1/107

BRGC and DBR for encoding is highlighted. In section

3, detailed descriptions about our approach in generat-

ing desirable binary representation will be given and ela-

borated. In section 4, experimental results justifying the

effectiveness of our approach are presented. Finally, con-

cluding remarks are provided in Section 5.

2. The emergence of LSSC

2.1 The security-performance tradeoff of DBR and BRGC

Two common encoding schemes adopted for discretiza-

tion, before LSSC is introduced, are DBR and BRGC.

DBR has each of its decimal indices directly converted

into its binary equivalent, while BRGC is a special code

that restricts the Hamming distance between every con-

secutive pair of codewords to unity. Depending on the

required size S of a code, the length of both DBR and

BRGC are commonly selected to be n = ⌈log S⌉.DBR 2

Instances of DBR and BRGC with different lengths

(n and n respectively) and sizes S are shown inDBR BRGC

Table 1. Here, the length of a code refers to the number

of bits in which the codewords are represented, while

the size of a code refers to the number of elements in a

Figure 3 A definite discrete-to-binary mapping from each

code. The codewords are indexed from 0 to S-1. Notediscrete-labelled quantization interval to a 7-bit LSSC

that each codeword index corresponds to the quantiza-codeword. The labelg(b) in each on the continuous feature

space can be understood by “index number (associated codeword)”. tion interval index as well.

Conventionally, a tradeoff between discretization per-

formance and entropy length is inevitable when DBR or

a) We propose a fixed bit-allocation-based discreti-

BRGC is adopted as the encoding scheme. The rationale

zation approach to extract a binary representation

behind was identified to be the indefinite discrete-to-

which is able to fulfill all the required criteria from

binary mapping behavior during the discretization pro-

each given set of user-specific features.

cess, since the employment of an encoding scheme in

b) Required by our approach, we study empirically

general affects only on how each index of the quantiza-

various discriminative measures that have been put

tion intervals is mapped to a unique binary codeword.

forward for feature selection and identify the reliable

More precisely, one may carefully notice that multiple

ones among them.

DBR as well as BRGC codewords share a common

c) We identify and analyze factors that influence

Hamming distance with respect to any reference code-

improvements resulting from the discriminative

word in the code for n and n ≥ 2, mapping pos-DBR BRGCselection based on the respective measures.

sibly most initially well-separated imposter feature

elements from a genuine feature element in the index

The structure of this article is organized as follows. In

space much nearer than it should be in the Hamming

the next section, the efficiency of using LSSC over

Table 1 A collection of n -bit DBRs and n -bit BRGCs for S = 8 and 16 with [τ] indicating the codeword index.DBR BRGC

Direct binary representation (DBR) Binary reflected gray code (BRGC)

n =3 n =4 n =3 n =4DBR DBR BRGC BRGC

S=8 S=16 S=8 S=16

[0] 000 [0] 0000 [8] 1000 [0] 000 [0] 0000 [8] 1100

[1] 001 [1] 0001 [9] 1001 [1] 001 [1] 0001 [9] 1101

[2] 010 [2] 0010 [10] 1010 [2] 011 [2] 0011 [10] 1111

[3] 011 [3] 0011 [11] 1011 [3] 010 [3] 0010 [11] 1110

[4] 100 [4] 0100 [12] 1100 [4] 110 [4] 0110 [12] 1010

[5] 101 [5] 0101 [13] 1101 [5] 111 [5] 0111 [13] 1011

[6] 110 [6] 0110 [14] 1110 [6] 101 [6] 0101 [14] 1001

[7] 111 [7] 0111 [15] 1111 [7] 100 [7] 0100 [15] 1000Lim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 6 of 16

http://asp.eurasipjournals.com/content/2011/1/107

space. Taking 4-bit DBR-based discretization as an which appears to be equal to the difference between the

example, the interval labelled with “1000”, located 8 inter- codeword index “3” and “1”. It is in general not difficult

vals away from the reference interval “0000”, is eventually to observe that neighbour codewords tend to have a

mapped to one Hamming distance away in the Hamming smaller Hamming distance compared to any distant

space. Worse for BRGC, interval “1000” is located even codewords. Thus, unlike DBR and BRGC, LSSC ensures

further (15 intervals away) from interval ‘0000’. As a result, every distance in the index space being thoroughly pre-

imposter feature components might be misclassified as served in the Hamming space, despite the large bit

genuine in the Hamming domain and eventually, the dis- redundancy a system might need to afford. As reported

in [25], increasing the entropy per dimension has a tri-cretization performance would be greatly impeded by such

an imprecise discrete-to-binary map. In fact, this defective vial effect on discretization performance through the

phenomenon gets more critical as the required entropy employment of LSSC, with the condition that the quan-

increases, or as S increases [25]. tity of quantization intervals constructed in each dimen-

sion is not too few. Instead, the entropy now becomes a

2.2 LSSC function of the bit redundancy incurred.

Linearly separable subcode (LSSC) [25] was put forward

to tackle the aforementioned inabilities of DBR and 3. Desirable bit string generation and the

BRGC effectively in fully preserving the separation of appropriate discriminative measures

feature points in the index domain when the eventual In the literature review, we have seen that user-specific

distance evaluation is performed in the Hamming information (i.e., user pdf) should not be utilized to

domain. This code particularly utilizes redundancy to define cut points of the quantization intervals to avoid

augment the separability in the Hamming space for reduction of possible locations of user pdf to the quan-

enabling one-to-one correspondence between every tity of intervals in each dimension. Therefore, strong

non-reference codeword and the Hamming distance privacy protection basically limits the choice of quanti-

incurred with respect to every possible reference zation to unsupervised techniques. Furthermore, the

codeword. entropy-performance independence aspect of LSSC

Let n denotes the code length of LSSC. An LSSC encoding allows promising performance to be preservedLSSC

contains S=(n + 1) codewords, that is a subset of regardless of how large the entropy is augmented perLSSC

nLSSC codewords (in total). The construction of LSSC dimension, and correspondingly how large the quantity2

can be given as follows: Beginning with an arbitrary of feature-space segmentation in each dimension would

n -bit codeword, say, an all zero codeword, the next be. Therefore, if we are able to extract discriminativeLSSC

n codewords can be sequentially derived by comple- feature components for discretization, deriving discrimi-LSSC

native, informative, and privacy-protective bit stringsmenting a bit at a time from the lowest-order (right-

can thus be absolutely possible. Our strategy can gener-most) to the highest-order (leftmost) bit position. The

ally be outlined in the four following fundamental steps:resultant n -bit LSSCs in fulfilling S=4,8and16LSSC

are shown in Table 2.

i. [Feature Extraction]-Employ a discriminative fea-The amount of bit disagreement, or equivalently the

ture extractor ℑ(·) (i.e., Fisher’s linear discriminantHamming distance between any pair of codewords hap-

pens to be the same as the corresponding positive index analysis (FDA) [27], Eigenfeature regularization and

difference. For a 3-bit LSSC, as an example, the Ham- extraction (ERE) [28]) to ensure D quality features

ming distance between codewords “111” and “001” is 2, being extracted from R raw features;

ii. [Feature Selection]-Select D (D <D <R)mostfs fs

discriminative feature components from a total of D

Table 2 A collection of n -bit LSSCs for S = 4, 8 and 16LSSC dimensions according to a discriminative measure c

where [τ] denotes the codeword index. (·);

n =3 n =7 n =15LSSC LSSC LSSC iii. [Quantization]-Adopt an unsupervised equal-

S=4 S=8 S=16

probable quantization scheme Q(·) to achieve strong

[0] 000 [0] 0000000 [0] 000000000000000 [8] 000000011111111

privacy protection; and

[1] 001 [1] 0000001 [1]0001 [9] 000000111111111

iv. [Encoding]-Employ LSSC for encoding ℰ (·) toLSSC

[2] 011 [2] 0000011 [2] 000000000000011 [10] 000001111111111

maintain such discriminative performance, while

[3] 111 [3] 0000111 [3]0111 [11] 000011111111111

satisfying arbitrary entropy requirement imposed on

[4] 0001111 [4] 000000000001111 [12] 000111111111111

the resultant binary string.

[5] 0011111 [5] 000000000011111 [13] 001111111111111

[6] 0111111 [6] 000000000111111 [14] 011111111111111

This approach initially obtains a set of discriminative

[7] 1111111 [7] 000000001111111 [15] 111111111111111

feature components in steps (i) and (ii); and producesLim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 7 of 16

http://asp.eurasipjournals.com/content/2011/1/107

an informative user-specific binary string (with large any significant effect in changing the background distri-

dentropy) while maintaining the prior discriminative per- bution). In their scheme, the cut points ofv ,v ∈1 2

formance in steps (iii) and (iv). The privacy protection is dthe j-th user’sgenuineinterval int in the dth-dimen-joffered by unsupervised quantization in step (iii), where

sional feature space are chosen based on a prefix thresh-

the correlation of helper data with the user-specific data

old t, such that

is insignificant. This makes our four-step approach to be

capable of producing discriminative, informative, and d d dint = {[v ,v ] ∈V | LR ≥ t} (2)1 2j j

privacy-protective binary biometric representation.

Among the steps, implementations of (i), (iii), and (iv) The remaining intervals are then constructed equal-

are pretty straightforward. The only uncertainty lies in probably, that is, with reference to the portion of back-

the appropriate discriminative measure and the corre- ground distribution captured by the genuine interval.

sponding parameter D in step (ii) for attaining absolutefs Since different users will have different intervals con-

superiority. Note that step (ii) is embedded particularly structed in each feature dimension, this discretization

to supplement the restrictive performance led by approach turns out to be user specific.

employment of unsupervised quantization. Here, we In fact, the LR could be used to assess discriminativity

introduce a couple of discriminative measures that can dof each feature component efficiently, since max(f (v))jbe adopted for discretization and perform a study on

d 2 dis reversely proportional to (σ ) because f (v)dv=1,the superiority of such measures in the next section. j j

or equivalently the dth dimensional intra-class variation;

d3.1 Discriminative measures X(·) for feature selection and f (v) is reversely proportional to the dth dimen-

The discriminativeness of each component is sional inter-class variation, which imply

closely related to the well-known Fisher’s linear discri- df (v) inter - classvariationjdLR =max ∝ max ,j∈{1,2,...,J},d∈{1,2,...,D} (3)minant criterion [27], where the discriminant criterion j df (v) intra - classvariation

is defined to be the ratio of between-class variance

Therefore, adopting D dimensions with maximum LR(inter-class variation) to within-class variance (intra- fs

would be equivalent to selecting D feature elementsclass variation). fs

with maximum inter- over intra-class variation.Suppose that we have J users enrolled to a biometric

3.1.2. Signal-to-noise ratio (c = SNR)system, where each of them is represented by a total of

1 2 D Signal-to-noise ratio (SNR) could possibly be anotherv ,v ,...,vD-ordered feature elements upon featureji ji ji

alternative to discriminative measurement, since it is a

extraction from each measurement. In view of potential

measure that captures both intra-class and inter-class

intra-class variation, the dth feature element of the jth

variations. This measure was first used in feature selec-

user can be modeled from a set of measurements by a

tion by a user-specific 1-bit RL-based discretization

duser pdf, denoted by f (v) where dÎ {1, 2,...,D}, jÎ {1,j scheme [12] to sort the feature elements which are iden-

d2,...,J}and vÎ feature space .Ontheotherhand, tified to be reliable. However, instead of using the

owing to inter-class variation, the dth feature element of default average intra-class variance to define SNR, we

the measurements of the entire population can be mod- adopt the user-specific intra-class variance to compute

deled by a background pdf, denoted by f (v). Both distribu- the user-specific SNR for each feature component to

tions are assumed to be Gaussian according to the obtain an improved precision:

central limit theorem. That is, the dth-dimensional back- 2 d(σ ) inter - classvarianced d dSNR = = , j∈{1,2,...,J},d ∈{1,2,...,D}ground pdf has mean μ and SD s while the jth user’s (4)j 2d intra - classvariance(σ )j

d ddth-dimensional user pdf has mean μ and variance σ .j j

3.1.3. Reliability (c = RL)3.1.1. Likelihood ratio (c = LR)

Reliability was employed by Kevenaar et al. [9] to sortThe idea of using LR to achieve optimal FAR/FRR per-

the discriminability of the feature components in theirformance in static discretization was first exploited by

user-specific 1-bit-discretization scheme. Thus, it can beChen et al. [16]. The LR of the jth user in the dth

implemented in a straightforward manner in our study.dimensional feature space is generally defined as

The definition of this measure is given by

df (v) ⎛ ⎛ ⎞⎞jd (1) d dLR = | μ − μ | inter - classvariationj ⎜ ⎜ j ⎟⎟d dRL =1/2 1+erf ∝ max ,f (v) ⎝ ⎝ ⎠⎠j 2 intra - classvariationd (5)2(σ )

j

with the assumption that the entire population is suffi- j∈{1,2,...,J},d ∈{1,2,...,D}

ciently large (excluding a single user should not haveLim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 8 of 16

http://asp.eurasipjournals.com/content/2011/1/107

where erf is the error function. This RL measure imposed on the binary output of the discretization

would produce a higher value when a feature element scheme. Based on a fixed-bit-allocation principle, L is

dd equally divided by D dimensions for typical equal-prob-has a larger difference between μ and μ relative toj

able discretization schemes and by D dimensions forfsdσ . As a result, a high RL measurement indicates aj our feature selection approach. Since the entropy per

high discriminating power of a feature component. dimension l is logarithmically proportional to the num-

3.1.4. Standard deviation (c = SD) ber of equal-probable intervals S (or l &S for ourfs fs

In dynamic discretization, the amount of bits allocated approach) constructed in each dimension, this can be

to a feature dimension indicates how discriminative the written as

user-specific feature component is detected to be.

l = L/D=log S fortypicalEPdiscretizationscheme(9)Usually, a more discriminative feature component is 2

assigned with a larger quantity of bits and vice versa.

or

dσThepureuser-specificSDmeasure signifying intra-j

l = L/D = lD/D =log S forourapproach(10)fs fs fs fs2class variance, was adopted by Teoh et al. as a bit-allo-

cation measure [18,19] and hence may serve as a poten-

By denoting n as the bit length of each one-dimen-

tial discriminative measure.

sional binary output, the actual bit length N of the final

3.1.5. Detection rate (c = DR)

bit string is simply N = Dn; while for LSSC-encoding-

Finally, unlike all the above measures that depend solely l

based schemes where n =(2 -1)bits,andforourLSSC

on the statistical distribution in determining the discri- lfsapproach where n =(2 −1) bits, the actual bitLSSC(fs)mination of the feature components, DR could be

length N and N can respectively be describedLSSC LSSC(fs)another efficient discriminative measure for discretiza-

bytion that takes into account an additional factor: the

position of the user pdf with reference to the con- l (11)N = Dn = D(2 −1)LSSC LSSC

structed genuine interval (the interval that captures the

largest portion of the user pdf) in each dimension. This and

measure, as adopted by Chen et al. in their dynamic bit-

lfsallocation scheme [17], is defined as the area under (12)N = D n = D (2 −1)LSSC(fs) fs LSSC(fs) fs

curve of the user pdf enclosed by the genuine interval

With the above equations, we illustrate the algorith-upon the respective intervals construction in that

mic description of our approach in Figure 4. Here, gdimension. It can be described mathematically by

and d* are dimensional variables, and || denotes binary

d d d concatenation operator.δ (S )= df (v)dv (6)j int jj

4. Experiments and analysis

dwhere δ denotes the jth user’sDRinthe dth dimen- 4.1. Experiment set-upj

dsion and S denotes the number of constructed intervals Two popular face datasets are selected to evaluate the

in the dth dimension. experimental discretization performance in this section:

To select D discriminative feature dimensions prop- FERETfs

erly, schemes employing LR, SNR, RL, and DR measures This employed dataset is a subset of the FERET face

should take dimensions with the D largest measure- dataset [29], in which the images were collected underfs

ments varying illumination conditions and face expressions. It

contains a total of 1800 images with 12 images for each

1 1 1 D D D{d | i = 1,...,D }=argmax [χ(v ,v ,...,v ),...,χ(v ,v ,...,v )],d ,...,d ∈ [1,D],D < D,i fs 1 D fsj1 j2 jI j1 j2 jI fs (7)Dfs maxvalues of 150 users.

FRGCwhile schemes employing SD measure should adopt

The adopted dataset is a subset of the FRGC datasetdimensions with the D smallest measurements:fs

(version 2) [30], containing a total of 2124 images with

1 1 1 D D D{d | i = 1,...,D }=argmin [χ(v ,v ,...,v ),...,χ(v ,v ,...,v )],d ,...,d ∈ [1,D],D < D.i fs 1 Dfs fsj1 j2 jI j1 j2 jI (8)

Dfs minvalues 12 images for each of the 177 identities. The images

were taken under controlled illumination condition.

We shall empirically identify discriminative measures

For both datasets, proper alignment is applied to the

that can be reliably employed in the next section.

images based on standard face landmarks. Owing to

possible strong variation in hair style, only the face

3.2 Discussions and a summary of our approach

region is extracted for recognition by cropping the

In a biometric-based cryptographic key generation appli-

images to the size of 30 × 36 for FERET dataset and 61

cation, there is usually an entropy requirement LLim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 9 of 16

http://asp.eurasipjournals.com/content/2011/1/107

Figure 4 Our fixed-bit-allocation-based discretization approach.

× 73 for FRGC dataset. Finally, histogram equalization is experiments, the equal error rate (EER) (error rate

applied to the cropped images. where FAR = FRR) is used for comparing the discretiza-

tion performance among different discretizationHalf of each identity’simagesareusedfortraining,

while the remaining half are used for testing. For mea- schemes, since it is a quick and convenient way to com-

suring the system’s false acceptance rate (FAR), each pare the performance accuracy of the discretizations.

image of the corresponding user is matched against that Basically, the performance is considered to be better

of every other user according to its corresponding image when the EER is lower.

index, while for the False Rejection Rate (FRR) evalua- The experiments can be divided into three parts: The

tion, each image is matched against every other images first part identifies the reliable discriminative feature

ofthesameuserforeveryuser.Inthesubsequent selection measures among those listed in the previousLim and Teoh EURASIP Journal on Advances in Signal Processing 2011, 2011:107 Page 10 of 16

http://asp.eurasipjournals.com/content/2011/1/107

section. The second part examines the performance of For LSSC encoding scheme which utilizes longer

our approach and illustrates that replacing LSSC with codewords than DBR and BRGC in each dimension to

DBR- or BRGC-encoding scheme in our approach fulfil a system-specified entropy requirement, the rela-

would achieve a much poorer performance when high tion between bit length n and single-dimensionalLSSC

entropy is imposed because of the conventional perfor- entropy l can be described by

mance-entropy tradeoff of DBR- and BRGC-encoding-

l (16)n = S −1=2 −1;LSSCbased discretization; The last part scrutinizes and reveals

how one could attain reliable parameter estimation, i.e.,

and for our approach, we have

D , in achieving the highest possible discretizationfs

l L/Dfs fsperformance. n =2 −1=2 −1 (17)LSSC(fs)

The experiments were carried out based on two differ-

ent dimensionality-reduction techniques: ERE [28] and from (10).

For the baseline discretization scheme of EP + LSSCFDA [27], and two different datasets: FRGC and FERET.

with D = 100, L = Dl = Dlog (n + 1) = 100logIn the first two parts of the experiments, 4453 raw 2 LSSC 2

(n + 1). Thus, L = {100, 200, 300, 400} correspondsdimensions of FRGC images and 1080 raw dimensions LSSC

to l = {1, 2, 3, 4}, n = {1, 3, 7, 15} and the actualof FERET images were both reduced to D = 100 dimen- LSSC

length of the extracted bit string is Dn = {100, 300,sions. While for the last part, the raw dimensions of LSSC

700, 1500}. While for the feature-selection schemesimages from both datasets were reduced to D=50and

with D =50where L = D l = D log (n +1) =100 dimensions for analytic purpose. Note that EP fs fs fs fs 2 LSSC(fs)

50log (n +1), L = {100, 200, 300, 400} corre-quantization was employed in all parts of experiment. 2 LSSC(fs)

sponds to l = {2, 4, 6, 8}, n = {3, 15, 63, 255}fs LSSC(fs)

and the actual length of the extracted bit string4.2. Performance assessment

becomes D n = {150, 750, 3150, 12750}. The4.2.1. Experiment Part I: Identification of reliable feature- fs LSSC(fs)

implication here is that when a particularly largeselection measures

entropy specification is imposed on a feature selectionBased on the fixed-bit-allocation principle, n bits are

scheme, a much longer LSSC-generated bit string willassigned equally to each of the D feature dimensions. A

always be required.Dn-bit binary string is then extracted for each user

Figure 5 illustrates the EER performance of (I) EP +through concatenating n-bit binary outputs of the indi-

DBR, (II) EP + BRGC, and (III) EP + LSSC discretiza-vidual dimensions. Since DBR as well as BRGC is a code

n tion schemes which adopt different discriminative mea-which comprise the entire 2 n-bit codewords for label-

n sures-based feature selections with respect to that of theling S=2 intervals in every dimension, the single-

dimensional l can be deduced from (9) as baseline (discretization without feature selection where

D = D) based on (a) FERET and (b) FRGC datasets.fs

nl=log S=log 2 = n. (13)2 2 “Max” and “Min” in each subfigure are referred to as

whether D largest or smallest measurements werefsThe total entropy L is then equal to the length of the

adopted corresponding to each feature selection method,

binary string:

as illustrated in (7) and (8).

D D A great discretization performance achieved by a fea-

L = l = n = Dn. (14)

ture-selection scheme basically implies a reliable mea-d=1 d=1

sure for estimating the discriminativity of the features.

Note that L = 100, 200, 300 and 400 correspond to n

In all the subfigures, it is noticed that the discretization

= 1, 2, 3 and 4, respectively, for each baseline scheme

schemes that select features based on the LR, RL, and

(D = 100). For the feature-selection-based discretization

DR measures give the best performance among the fea-

schemes to provide the same amount of entropy (with

ture selection schemes. RL seems to be the most reliable

n and l denoting the number of bits and the entropyfs fs discriminative measure, followed by LR and DR. In con-

of each selected dimension, respectively), we have

trast, SNR and SD turn out to be some poor discrimina-

D D tive measures that could not guarantee anyfs fs

(15)L = l = n = D nfs fs fs fs. improvement compared to the baseline scheme.d=1 d=1

When LSSC encoding in our 4-step approach (see

With this, L = 100, 200, 300 and 400 correspond to lfs Section 3) is replaced with DBR in Figure 5Ia, Ib; and

= n = 2, 4, 6 and 8 respectively, for D =50.Thisfs fs BRGCinFigure5IIa,IIb,RL-,LR-,andDR-basedfea-

implies that the number of segmentation in each ture selection schemes manage to outperform the

selected feature dimension is now larger than the usual respective baseline scheme at low L. However, in most

n−nfscase by a factor of .2 cases, these DBR- and BRGC-encoding-based