Regular coding partitions

Shaum

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

8 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Regular coding partitions
y yMarie-Pierre Beal Fabio Burderi Antonio Restivo
June 11, 2006
Abstract
The canonical coding partition of a set of words is the nest partition such that the words contained
in at least two factorizations of a same sequence belong to a same class. In the case the set is not
uniquely decipherable, it partitions the set into one unambiguous class and other parts that localize
the ambiguities in the factorizations of nite sequences.
We prove that the canonical coding partition of a regular set contains a nite number of regular
classes. We give an algorithm for computing this partition.
1 Introduction
In this paper, we call code a set of nite words. An important class of codes is the class of uniquely
decipherable codes. This property allows the decoding of a sequence of concatenated codewords. Never-
theless, some classes of codes are used in information theory although they are not uniquely decipherable
(see for instance [6], [7] and [8]). The condition of unique decipherability can also be weakened by consid-
ering that it applies only to codes with constraints (see [1]) or to codes with a constraint source (see [4],
[5]). In [5], the classi cation of ambiguities of codes is investigated in the study of natural languages.
From a combinatorial point of view, the study of ambiguities helps to understand the structure of a code.
To this purpose, the notions of coding partition and canonical coding partition of a code were intro-
duced in [3] to ...

Sujets

Statesboro

Edge of Sanity

EFA

Algorithme de Prim

Label rouge

BCE

Informations

Publié par	Shaum
Nombre de lectures	115
Langue	English

Extrait

Regular coding partitions

Marie-PierreBeal

† Fabio Burderi

June 11, 2006

† Antonio Restivo

Abstract Thecanonicalcodingpartitionofasetofwordsisthenestpartitionsuchthatthewordscontained in at least two factorizations of a same sequence belong to a same class. In the case the set is not uniquely decipherable, it partitions the set into one unambiguous class and other parts that localize theambiguitiesinthefactorizationsofnitesequences. Weprovethatthecanonicalcodingpartitionofaregularsetcontainsanitenumberofregular classes. We give an algorithm for computing this partition.

Introduction

Inthispaper,wecallcodeasetofnitewords.Animportantclassofcodesistheclassofuniquely decipherable codes. This property allows the decoding of a sequence of concatenated codewords. Never-theless, some classes of codes are used in information theory although they are not uniquely decipherable (see for instance [6], [7] and [8]). The condition of unique decipherability can also be weakened by consid-ering that it applies only to codes with constraints (see [1]) or to codes with a constraint source (see [4], [5]).In[5],theclassicationofambiguitiesofcodesisinvestigatedinthestudyofnaturallanguages. From a combinatorial point of view, the study of ambiguities helps to understand the structure of a code. To this purpose, the notions of coding partition and canonical coding partition of a code were intro-duced in [3] to study some decipherability conditions for codes weaker than the unique decipherability. In the case the code is not uniquely decipherable, the code is partitioned into one unambiguous class, andotherpartsthatlocalizetheambiguitiesinfactorizationsofnitesequences.Inthecasethecodeis uniquely decipherable, the canonical coding partition is trivial and contains a unique class equal to the code itself. In [3] is also given a Sardinas-Patterson like algorithm for computing the canonical coding partitionofanitecode. Inthispaper,weprovethatthecanonicalcodingpartitionofaregularcodehasanitenumberof classes, each one being regular. This result was conjectured in [3]. We give an exponential time algorithm for computing all classes of the partition which is based on automata constructions. In Section 2 we recall the notions of coding partition and canonical coding partition of a code intro-duced in[3]. The main result is proved in Section 3. We end the article with some concluding remarks.

Partitions of a code

+ LetAybetoainebabphalteened.WetAalphabetosevtrehinetowdrtfotesehA, and byAthe setofnonemptynitewords.AcodeXis here a subset ofAelements are called. Its codewords. InstitutGaspard-Monge,Laboratoired’informatiqueUMR8049,UniversitedeMarne-la-Vallee,77454Marne-la-Vallee Cedex 2, France. † Dipartimento di Mathematica ed Applicazioni, Universita degli studi di Palermo, Via Archira 34, 90123 Palermo, Italy.

Regular coding partitions

A codeXis said to beuniquely decipherable(UD) if every word has at most one factorization into codewords,i.e.the equality x1x2. . . xn=y1y2. . . ym, xi, yj∈X, impliesn=mandxi=yifor 1inrefer to [2] for the theory of uniquely. We decipherable codes. In [3] is introduced the notion of coding partition of a codeXgroups the codewords contained in. It at least two factorizations of a same word in the case whereXis not UD. LetIdeneithotesinehtreteset{1,2, . . . , r}, whereris a positive integer, orNpartition. A + + P= (Xi)i∈Iof a codeXisconcatenatively independentif, fori6=j,X∩X=∅. IfPis concatenatively i j + + independent, aP-factorizationof a wordz∈Xis a factorizationz=z1z2. . . znwhere eachzi∈X k + + for somek∈I, andzi∈Ximplieszi+1∈/ Xifi < n. k k Acoding partitionof a codeXis a concatenatively independent partition such that each wordzin + Xhas at exactly oneP-factorization,i.e.the equality

z=x1x2. . . xn=y1y2. . . ym,

wherexi, yjareP-factorizations ofz, impliesn=mandxi=yifor 1in. + We say that a wordz∈Xhas aminimal 2-factorizationin codewordsx1, x2, . . . xn, y1, y2, . . . , ym 0 ifz=x1x2. . . xn=y1y2. . . ymwith (xi)1in6= (yj)1jm, and there are not positive integersn < n, 0 0 andmm < , such thatx1x2. . . xn=y1y2. . . ym. 0 0 Thecanonical coding partitionofXtsehitparnesontitiP= (Xi)i∈Isuch that whenever a word + z∈Xhas a minimal 2-factorization in codewordsx1, x2, . . . xn, y1, y2, . . . , ym, these codewords belong to the same set of the partition, and such that all words not contained in any minimal 2-factorization belong to the same set of the partition. Thecanonicalcodingpartitioniswelldened,uptoabijectionofI. We can chooseIsuch thatX0 + is the set of all codewords that never appear in any factorization of a word ofXwhich has at least two factorizations. It is straightforward to see that the canonical coding partition is a coding partition (see [3]). In [3] is given a Sardinas-Patterson like algorithm for computing the canonical coding partition of a nite codeX. Example1.We consider the codeX={00,0010,1000,11}. Clearly the words 00,0010 and 1000 belong to a same set in the canonical coding partition since

001000 = 00

1000 = 0010

is a minimal 2-factorization. The canonical partition ofXis{00,0010,1000},{11}. Note thatXis not a UD code. Example2.The canonical coding partition of a UD code has a single set, the code itself.

Coding partition of a regular code

In this section, we consider a regular codeX. We say that a coding partition of a code isntienidoghtyacatatssees.Wmbnuoferinetsaaifhsi partition of a code isregularif all sets of the partitions are regular. Proposition1.nacecinohTnoiorefapaalitrtinetiescrdougalar.egulandr Inordertoprovethisproposition,wegiveanalgorithmforcomputingtheniteautomataaccepting thesetsofthepartitionfromaniteautomatonacceptingthecodeX. AniteautomatonA= (Q, I , E, Tatstestaesofteafoetini)damsQ, a set of edgesElabelled on an alphabetA, a set of initial statesIftosedaansatetanslT. We shall also consider automata

M-P.Beal,F.Burderi,A.Restivo

labelled inA. Asuccessful pathis a path going from a state ofIto a state ofT. The set of labels of successful paths is thelanguage acceptedby the automaton. An automaton isunambiguousif for any wordz, any statesp, q, there is at most one path going from ptoqand labelled byz. 0 0 0 0 LetA= (Q, I , E, Teb)inauaetotamehuaotnton.tomanetWedeA A= (I , E , Q , T) called a a a 0 0 0 0 0 0 thesquareofA, whereQ=QQ,E={(p, q)→(qp , )|p→pandq→q∈E}set of initial. The   0 0p stateIesehtdnastetalsnaftoTstate (later. A will be specied p, q.) will be also denoted by q Proof of Proposition 1.LetA= (Q, I , E, Tcode)ebainetgubiamunomutsaoueccanotaehtgnitpX such thatI={i},T={t}, and which has no edge coming iniand no edge going out oft. Such an automaton, called a normalized automaton, can be obtained by standard constructions (see for instance [2]). By mergingiandtinto a single state denoted by 0, we get an automatonB= (Q,0, E,0) accepting the setXthat. Note Bis no more unambiguous unlessXis UD.       0 0 0 We build the square automatonBB and such thatand replace the state by two states 0 0s0t           0 0 0 0 0 the edges going out of go out of and the edges coming in come in . Note that has 0 0s0 0t0s   0 no incoming edges and has no outgoing edges. We only keep inBB the states belonging to paths 0t       0 0p from to and going at least one time through a state withp= 0, q6= 0 orp6= 0, qBy= 0. 0s0t q   p using the state-elimination technique (see for instance [9]), we remove the states withpandqdistinct q       0 0p from 0 and get an automatonClabelled in regular subsets ofAand with, , whose states are 0s0t q p= 0, q6= 0 orp6= 0, qThere is at most one edge between two states and each label is a regular= 0. non-empty subset ofA.     p p States withp= 0 are called upper-zero states while states withq= 0 are called lower-zero q q     0 0 states. Hence and are both upper and lower-zero states. 0s0t  0   0 p p p p We denote byE0the regular set label of the edge→0a slight abuse of language,. With q q q q   0 p p we sometimes say that there is an edge labelled by a wordwto statefrom a state 0whenever q q  0 p p w∈E0. q q Letpi, qi, pj, qjbe states inQwithqiandqjdistinct from 0. Lete, fbe the edges         0qiqj0 e=→andf=→ pi0 0pj

(i.e.respectively an edge from an upper-zero state to a lower-zero state and an edge from lower-zero state to an upper-zero state). We denote by        qiqjqiqj Lto withthe regular set of labels of paths from all its states being lower-zero states. 0 0 0 0        qiqjqiqj Sto withthe union of the labels of all edges contained in a path from all its states 0 0 0 0 being lower-zero states.       qiqjqiqj Note that we may haveqi=qj. In this case,Lcontains the empty word andSmay be the 0 0 0 0 empty set. Wedenetheregularsets             0qiqiqjqj0qiqj Y=ELE+S , pi0 0 0 0pj0 0  Yifpi= 0, pj= 0,     0qi Y+Eifpi= 0, pj6= 0, 0s0 Sef=  qj0 Y+Eifpi6= 0, pj= 0, 0 0t         0qiqj0 Y+E+Eifpi=pj= 0, 0s0 0 0t

Regular coding partitions

where the symbol + is the union symbol and the dot symbol is the concatenation symbol. Letpi, qi, pj, qj, pk, qkbe states inQwithqi, qj, pj, pkLetdistinct from 0. e, f, gbe the edges             0qiqj0 0qk e=→, f=→andg=→. pi0 0pjpk0

Wedene

the regular set    0qi Sef g=E pi0

   qiqj L 0 0

      qj0qj0 E+E 0pj0pj

   0 0 L pjpk

   0qk E . pk0

WedenesimilarsetsSefandSef gwhene, gare edges from a lower-zero state to an upper-zero state andfis an edge from an upper-zero state to a lower-zero state, by exchanging the roles played by the upper and lower states. WegetanitenumberofregularsubsetsofX. Some of these states may have a nonempty intersection. Wereplacetwopartshavinganon-emptyintersectionbytheirunion.Afteranitenumberofstepswe getanitenumberofregularsubsetsofXwhose two by two intersections are empty. We denote these S r sets byX1, X2, . . . , Xr.tsehetenedeWX0=XXiclaim that (. We Xi)0iris the canonical i=1 coding partition ofX, which proves the proposition. To prove our claim, we show that any two words which belong to a same minimal 2-factorization belong to a same setXi. Letz=x1x2. . . xn=y1y2. . . ymbe a minimal 2-factorization wherexi, yjare codewords. The existence of such a factorization is equivalent to the existence of a path inC:  (e1)   (e2)   (e3)   (e4)  0q01q0j0 0q11q1j0 0 1 →. . . →. . . →. . . →. . . 0s0 0p11p1i0 0p21 1 (ek2)   (ek1)   (ek)  0 0qr1qrj0 r →. . . →. . . →. pr1pri0 0 0t r In this path, we denote byeithe edges going from an upper-zero state to a lower-zero one or the converse. Note that this path encodes two paths in the automatonAis read on the upper track, the other one. One on the lower track. The label of any path read on the upper (or lower track) going from 0 to 0 without going through 0 inbetween belongs toX. Hence

i1+ +ir+ 1 =n j0+j1+ +jr+ 1 =m.

Byrenumberingthelowercoecientspijof the upper-zero states of this pathp1topn, and the upper coecien tsqijof the lower-zero states of this pathq1toqm, the label of each part of this path going     0 0 from a state to a state is labelled byxi. The label of each part of this path going from a state pi1pi     qj1qj to a state is labelled byyj. 0 0 BythedenitionofthesetsSeiei+1and the setsSeiei+1ei+2, we get that allxiand allyjbelong to a same part of the canonical coding partition. Conversely, we prove that if two wordsxandybelong to a same set of the partition, then there isanitechainofwordsx=w0, w1, . . . , wn=ysuch thatwiandwi+1belong to a same minimal 2-factorization for 0i < n.    0q1q2 Letq1,q2be two non null states inQdrowowtfstsrse.Wtihawthoy, y∈S, then there is 0 0 0 anitechainofwordsy=w0, w1, . . . , wn=ysuch thatwiandwi+1such thatwiandwi+1belong to a same minimal 2-factorization for 0i < n.    0q1q20 0 0 0 0 Sincey, y∈S, there are inCtwo paths labelledxyzandx y z, with, z,x, x ∈A, containing 0 0 0 respectively an edge labelled byyand an edge labelled byy, with the following form:     q x y q2 z 1q11q12 → → →, 0 0 0 0       0 0 0 q x0y0z q2 1q q 11 12 → → →. 0 0 0 0