11 pages

English

Daniel Hart

pinfou

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

11 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

cours magistral
exposé

Daniel Hart Education Highest Earned Degree Ed.D., Harvard University, Human Development, 1982. Other Earned Degrees, Graduate and Undergraduate Ed.M., Harvard University, Human Development, 1979. B.A., Bates College, Psychology, 1978. Honors and Awards Professional Awards and Honors Daniel Gorenstein Memorial Award (2011). Camden Hero Award, Greater Camden Partnerships (2009). Alumni Community Service Award, Bates College (2009). Anna M. Sample Advocacy Award, Community Planning and Advocacy Council (2007). Hometown Hero Award, Philadelphia 76ers (2006).

refereed journals marmorstein
professor ii department of psychology faculty of arts and sciences - camden
awareness and self-understanding in cultural context
psychology 325 cooper street camden
development
journals

Sujets

Cours magistral

Exposé

Development

Journals

Psychology

HIStory

Address

University

Employment

Informations

Publié par	pinfou
Nombre de lectures	22
Langue	English

Extrait

CS521 Essay - Graph

Property Testing

Duc Le

June 6, 2010

Abstract Property testing, ﬁrst proposed by Rubinfeld and Sudan as program testing, is an interesting ﬁeld in computational theory, dealing with the task of determining whether an object has a property by reading a signiﬁ-cantly small portion of the whole object. This essay gives an introduction to property testing, focusing on testing properties of graphs. In addition, some currently open questions for this ﬁeld are also included.

Introduction

According to Dana Ron, property testing can be classiﬁed as the study of a speciﬁc class of problems:

“Given the ability to perform (local) queries concerning a particular object the problem is to determine whether the object has a pre-determined (global) property or diﬀers signiﬁcantly from any object that has the property. In the latter case we say it is far from (having) the property. The algorithm is allowed a small probability of failure, and typically it inspects only a small part of the whole object.”

An example of a problem in property testing is checking bipartiteness of graphs. Recall that a graph is bipartite if its set of vertices can be divided into two subsets, and there are no edges connecting any two vertices in each subset. In bipartiteness testing, the object is the graph and the property is bipartiteness. This problem will be described in detail in later sections of this essay. There are many reasons to motivate research on property testing. One of them is because the object may be too large or it is stored in a database with slow connection, making it hard to fully scan the entire content of the object. Thus, we need a mechanism that reads only a small part of the object and decides whether it has the tested property. Another reason arises when the size of the object is not too large, but the problem of exactly deciding the property is too hard (i.e. the problem is in NP-hard), so an approximation scheme is needed, though there is a risk of accepting some objects that do not have the property.

There has been much research done on property testing, and many inter-esting problems are solved. Some of the most notable problems are testing algebraic properties of functions, testing properties of strings, and testing graph properties. In the scope of this essay, we primarily focus on the testing of graph properties. This essay is organized as follows. Section 2 contains some preliminaries to property testing and graph property testing. Next, section 3 explains some graph testing algorithms. Following is section 4, which describes future direc-tions and open questions of this ﬁeld. Finally, section 5 summarizes the essay.

Preliminaries

This section contains two parts. The that can be applied to any property presents speciﬁc deﬁnitions for graph

ﬁrst part establishes common deﬁnitions testing algorithm, while the second part property testing.

2.1 Common Deﬁnitions for Property Testing 1 Because property testing algorithms do not read the whole input (or object) , they must be probabilistic, that is, the algorithm must still consider every single component with some probability. Still, random algorithms do not guarantee completely correct results. For example, one cannot know if a set contains only negative integers until he reads the entire set. Therefore, we need a new notation of closeness that is deﬁned in Deﬁnition 1. This notation distinguishes the case that the set contains only negative integers and the case that at leastfraction of the set are non-negative integers. Deﬁnition 1:An input is considered to be-close to satisfying a property 20 Pgiven a functionf:D→Fif there exists a functionf:D→Fsatisfying Pthat diﬀers fromfin less than or equal to|D|places. An input that is not -close to satisfyingPis called-far from satisfyingP. As reading this deﬁnition, we have some observations. First, this deﬁnition is general enough that it can be applied to any testing problems. Second, this deﬁnition deals with problems with ﬁxed input sizes, or one always needs to provide input sizes to the testing algorithm in advance. In addition, to illustrate the idea that a function satisﬁes a property, we can look at this small example: The functionabs, which calculates the absolute value of a number, is a function satisfying the property “always returns non-negative numbers.” The idea of closeness in Deﬁnition 1 leads to Deﬁnition 2 deﬁning an-test of a property. Deﬁnition 2:LetPbe a property, and letnbe the ﬁxed input size. An -test withq=q(, n)queries forPis a probabilistic algorithm that reads the input in up toqplaces and decides whether the input satisﬁesPwith probability 2 of at least or the input is-far from satisfyingP. 3 1 In this essay, the terms input and object are used interchangably 2 D is the set of all possible inputs, while F is the set of all possible outputs

In Deﬁnition 2, a query is indeed a read from the input. This implies that the algorithm can read from any location of the input, or random access to 2 the input is allowed. In addition, the term can be replaced by any ﬁxed 3 1 number greater than and smaller than 1. Furthermore, the deﬁnition does 2 not mention any information about the running time of the testing algorithm. However, as experience suggested, in most cases the testing algorithms also have small running time, and thus they are suitable for fast approximation of the input’s properties. Another point that should be noticed is Deﬁnition 2 allows testing algorithms to have 2-sided error probability, which means that the output of those algorithms can be wrong in all cases, whether the object satisﬁes the property or not. Some testing algorithms are 1-sided, which means that they always accept if the object satisﬁes the tested property.

2.2 Graph Property Testing Deﬁnitions 2.2.1 Graph Testing Models The deﬁnition of testing graph properties depends on the way graphs are rep-resented. There are two ways to represent graphs,adjacency matricesandin-cidence liststhere are two diﬀerent models for testing graph properties,. Thus, which are mentioned below. Note that here we only consider undirected graphs. -Adjacency-Matrix Modeladjacency matrix representation is appro-. As priate for dense graph, this model and its testing results are meaningful when being applied to such graphs. In this model, any algorithm used can query if an edge exists between two arbitrary vertices, and the distance between graphs is the fraction of entries that the matrices representing them diﬀer. Therefore, given a distance parameter, any graph that re-2 quires more than∙ |V|edge modiﬁcations to achieve the tested property 2 3 will be rejected. 4 -Incidence-Lists Modelmodel is suitable for sparse graphs or graphs. This that are not dense. The representation of those graphs contains lists of lengthd, wheredIn otheris at least the highest degree of all vertices. words,dis a bound of the the graph’s degree. Using this representation, any algorithm can query thei’th neighbor of any vertexv, fori∈1, ..., d. This query returns ‘0’ if no such neighbor exists. Similar to the adjacency-matrix model, the distance between graphs is the fraction of entries that their representations diﬀer. A graph is rejected when the number of edge  modiﬁcations to make it have the tested property is greater than∙d|V|. 2 1 Note that the factor appears in both models because both representations 2 are symmetric. Besides diﬀerent models, diﬀerent techniques are used for diﬀerent choices of representation. For adjacency-matrix representation, the technique used is 3 V is the set of vertices in the graph 4 In fact, there are two types of incidence-lists models, the bounded-degree one and the unbounded-degree one. In this essay, we only consider the bounded-degree model.

random samplingthe algorithm chooses a random small set. Speciﬁcally, U of vertices of graphG, ﬁnds all the edges interconnecting those vertices, and ﬁnally checks if the small subgraph induced byUsatisﬁes the tested property. If so, the algorithm returns accept; otherwise, it returns reject. It is usually very straightforward to prove that any graph possessing the property will likely be accepted. The hard and interesting task is to prove that graphs being far from having the property are rejected with high probability. For incidence-lists representation, a diﬀerent set of techniques that are more applicable for sparse graphs must be employed. The reason is, because sparse graphs contain few edges, a small random sample of vertices often has no internal edges, and thus, is an empty graph. Therefore, pure random sampling does not suﬃce for incidence-lists representation. Some techniques used are exhaustive local search, random walks, etc.

2.2.2 Deﬁnitions for Graph Property Testing Based on the models mentioned in the previous section, we derive the following deﬁnitions. First, we only consider undirected, simple graphsG= (V, E) with Vis the set of vertices andEis the set of edges. Let|V|=N, then we can assume thatV(G) ={1, ..., N}without loss of generality. For adjacency matrix model, because graphs are represented by adjacency matrices, they are associated with the boolean functionfG:fG(u, v) = 1 if (u, v)∈E, andfG(u, vIn addition, if) = 0 otherwise. X1andX2are two (not necessarily disjoined) sets of vertices, we deﬁne

E(X1, X2) ={(u, v)∈E:u∈X1, v∈X2}

. The distance between twoN-vertex graphsG1andG2is computed by dividing 2 the number of unordered pairs (u, v)∈[Nthat] such fG1(u, v)6=fG2(u, v) by 2 the total number of pairs,N. For incidence-list models, the distance between graphs is computed by divid-ing the number of pairs of vertices having an edge between them in one graph but not in the other byd∙Nthat. Recall dis a bound of the graph’s degree. As analogies to Deﬁnition 1 and 2 for common property testing problems, we provide Deﬁnition 3 and 4 for graph property testing problems. Deﬁnition 3:For any graph propertyP, and0≤≤1, graphGis said to be-far from having propertyPif it has distance greater thanfor every graph having that property. Otherwise,Gis said to be-close. Deﬁnition 4:A property testing algorithm for propertyPtakes a distance 2 parameterand returns accept with probability at least if the graph has the 3 2 property, and returns reject with probability at least if the tested graph is-far 3 from having the property. Note that this algorithm can perform queries regarding the existence of edges between any arbitrary pairs of vertices.

Testing Graph Properties

After illustrating the background for property testing, now we can talk specif-ically about a graph testing problem, testing bipartiteness. This bipartiteness testing algorithm is applied for adjacency-matrix model and will give readers an idea of how complex property testing problems are. In addition to the bi-partiteness testing algorithm, this section also mentions a summary of recent results in graph property testing.

3.1 Bipartiteness Testing for Adjacency-Matrix Model Recall that a graph is bipartite if its set of vertices can be divided into two subsets, and there are no edges connecting any two vertices in each subset. Traditionally, one can use Breadth First Search (BFS) algorithm to check if a graph is bipartite or not. This algorithm takes linear time in the number of edges and vertices. However, we want to do better by reading only a small portion of the graph and decide the bipartiteness by processing only that portion. To derive the testing algorithm, one needs to deﬁne what is-far from bipar-tite. Suppose the set of verticesVis partitioned into two setsV1andV2, that is, (V1, V2) is called a two-way partition ofV, edge (u, v)∈Eis called aviolating edgewith respect to (V1, V2) if eitheru, v∈V1oru, v∈V2. If the number of 2 violating edges with respect to (V1, V2) is greater thanN, (V1, V2) is called (-bad. Otherwise, V1, V2) ison these deﬁnitions, a graph-good. Based Gis -far from bipartite if and only if every partition (V1, V2) of the set of vertices Vis-bad. The algorithm for testing bipartiteness is quite simple, yet its analysis is much more complex.

Algorithm 1: Testing bipartiteness log(1/) 1. For graphG= (V, E), select the set ofm= Θ(2) vertices from  the set of verticesVand induce the subgraph by selecting all the edges in Ehaving two end points in the set of selected vertices.

2. Check the bipartiteness of the induced subgraph by BFS. If the induced subgraph is bipartite, output accept. Otherwise, output reject.

Theorem 1:If a graphGis bipartite, then Algorithm 1 will always accept it. Proof:If graphThe proof is quite straightforward: Gis bipartite, every subgraph of it is also bipartite. Thus, the output of step 1 is always a bipartite subgraph, and step 2 always returns accept.

Theorem 2:If a graphGis-far from bipartite, then Algorithm 1 will reject it with probability of at least 2/3. In addition, Algorithm 1 will output an evidence by in the form of a non-bipartite subgraph to prove the non-bipartiteness ofG.

Proof:To prove Theorem 2, the following deﬁnitions are needed:

 Deﬁnition 1:A vertexvis of high-degree if its degree inGis at leastN, 4 withN=|V|is not called high-degree.. Otherwise, it

Deﬁnition 2:For a subset of verticesUand any vertexv,Ucoversvif at least one ofv’s neighbors is inU.

Firstly, the set of vertices selected by Algorithm 1 can be divided into two 2 parts,UandS:|U|= Θ((log(1/))/) =m1,|S|= Θ((log(1/))/) =m2. For the ﬁrst partU, we derive the following lemma:

Lemma 1:With probability at least 5/6 over the choice ofU,Ucan cover all but at most(/4)Nhigh-degree vertices.

 Proof of lemma 1that a high-degree vertex: Recall vhas at leastN 4 neighbors, so the probability that a vertexu∈Uis not a neighbor ofvis at  N−N 4 most = 1−. Thus, the probability that the setUdoes not coverv, i.e. N4 Ucontains no neighbors ofvis at most   m1 (1−)<exp(− ∙m1) 4 4 withm1=|U|. 4 If we choosem1=∙ln(24/), then that probability is at most (/24). Be- cause at mostNhigh-degree verticesvexist in graphG, the expected number of high-degree vertices not being covered byUis at most (/24)Nto. According Markov’s inequality, the probability that there are at least (/4)Nhigh-degree vertices not being covered byUis at most 1/6. Therefore, with probability at least 5/6,Ucan cover all but at most (/4)Nhigh-degree vertices.

For now, let’s assume thatUcan cover all but at most (/4)Nhigh-degree vertices, we deﬁne the setCto be the set of vertices being covered byU, and the setR=V\C. BecauseUdoes not cover any vertices inR, the number of high-degree vertices inRis at most (/4)NFor a ﬁxedby our assumption. partition (U1, U2) ofU, induce the partition (C1, C2) ofCby putting all the neighbors ofU1inC2and the rest ofCinC1(note thatC1contains vertices that must have a neighbor inU2the set). For R, choose (R1, R2) as an arbitrary partition ofR. From the above settings, we can see that (C1∪R1, C2∪R2) is a partition ofVgraph. Because Gis-far from bipartite, all of its partitions, including 2 (C1∪R1, C2∪R2), are-bad, that is, they contain at leastNviolating edges. Recall thatRcan have at most (/4)Nhigh-degree vertices and at mostNnon-high-degree vertices, so the number of edges adjacent toRis at most (/4)N∗ 2 2 2 2 N+N∗(/4)N= (/2)Nthere must be. Therefore, N−(/2)N= (/2)N violating edges incident to only vertices inC1orC2. Figure 1 presents these partitions and violating edges graphically.

Figure 1: The partitions induced by (U1, U2). Every vertex inC2is adjacent toU1, and every vertex inC1is adjacent toU2are no edges between. There R andU1∪U2.

2 The next lemma states that if we select the setScontaining Θ((log(1/))/) vertices, then for every partition (S1, S2) ofS, with high probability, there is some violating edge with respect to (U1∪S1, U2∪S2):

Lemma 2:LetG= (V, E)be a graph being-far from bipartite, and let U=U1∪U2be a subset ofVthat covers all but at most(/4)Nhigh-degree vertices inG. LetS⊂Vbe an independently selected set,|S|=m2= Θ(|U|/). −|U| Then, with probability at least1−2/6, for every partition(S1, S2)ofS, there is at least one edge having both end points inU∪Sthat violates with respect to (U1∪S1, U2∪S2).

Proof of lemma 2:Because the setScontainsm2vertices, it can be m2 considered as pairs of vertices. Based on previous discussion, there are at 2 2 least (/2)Nviolating edges with respect to the partition (C1, C2); thus, for any pair (v, w) of vertices ofS, the probability that (v, w) is a violating edge 2 (/2)N with respect to (C1, C2) is at least2=/the probability that2. Thus, N m2 the pairs ofSdo not contain any violating edge is at most 2

ifm2= (16|U|/).

m2/2−|U| (1−(/2))<2/6

The second step of this proof is proving that ifSdoes contain a violating pair (v, w), then for every partition (S1, S2) ofS, there is always some edge violating (U1∪S1, U2∪S2). Recall that if (v, w) is a violating edge with respect to (C1, C2), either bothv, w∈C1or bothv, w∈C2loss of generality,. Without suppose thatv, w∈C2. There are two cases: - Bothvandware either inS1orS2: (v, w) is a violating edge with respect to (U1∪S1, U2∪S2). -vandware not in the same setS1orS2: In this case, sincev, w∈C2, by deﬁnition,vandwhave some neighbors inU1. Letu∈U1bev’s 0 neighbor andu∈U1bewIf’s neighbor. v∈S1andw∈S2then (u, v) will be a violating edge with respect to (U1∪S1, U2∪S2). Otherwise, if 0 v∈S2andw∈S1then (wu , ) will be a violating edge with respect to (U1∪S1, U2∪S2). In both cases, there exists a violating edge. Thus, the lemma is proven.

According to Lemma 2, if (U1, U2) is a ﬁxed partition ofU, then for ev-ery partition (S1, S2) ofS, the probability that there is no edge violating −|U| |U| (U1∪S1, U2∪S2) is at most 2/there are 2 6. Since partitions ofU, the probability that for every partition (U1, U2) ofU, for every partition (S1, S2) of −|U| |U| S, the partition (U1∪S1, U2∪S2) has no violating edge is at most 2/6∗2 = 1/the probability that the set6. Therefore, U∪Shas violating edges is at least 5/6 with respect to any partition of this set.

In summary, we have proved that: - With probability at least 5/6 over the choice ofU, it can be used to induce some partitions that are consistent withU, that is, whenUchanges, those partitions change accordingly.

- Using the induced partitions in the previous step, we can prove that with probability at least 5/6 over the choice ofS,U∪Swill contain violating edges with respect to any partition of this set. These two conclusions suggest that with probability at least 2/3, the induced graph is not bipartite. This concludes the proof of theorem 2.

Query and time complexities: When analyzing running time of property testing algorithms, two factors are considered, query complexity and time complexities. Query complexity is the time needed to query data from the object, and time complexity is the total running time of the algorithms. In the case of the described algorithm, the query and time complexity are identical because its running time is dominated by the time used to compute the subgraph based on the vertices selected. Since there 2 log(1/)log(1/) are Θ(2)) vertices, the running of Algorithm 1 is Θ(4)). However,   a more detailed analysis based on the above proof reveals that this bound can

be improved. Suppose the algorithm actually divides the set of selected vertices log(1/)log(1/) into two partsUandSof sizesm1= Θ( )) andm2= Θ(2)), it   only has to check whether there exists an edge between all pairs of vertices in U×Sand betweenm2/2 pairs of vertices inS. Thus the running time of the 2 log(1/) algorithm is Θ(3)). 

3.2 Summary of Recent Results in Graph Property Test-ing Some graph properties were studied by GoldReich et al. [5] have shown to have testing algorithms having query complexity of poly(1/) and time complexity at most exp(poly(1/)). Those algorithms are described below:

-k-Colorability,k≥proposed algorithm was ﬁrst thought to have3: The 4 6 2 3 5 e e query complexity ofO(k /) and running time exp(O(k /However,)) . as Alon and Krivelevich enhanced the analysis of the algorithm, they e e 2 4 2 found a new bound ofO(k /) on query complexity and exp(O(k/)) on running time.

-ρ-Clique: The proposed algorithm checks whether a graph has a clique of 2 6 e sizeρ∙N, where 0< ρ <Its query complexity is1 is a constant. O(ρ /) e 2 and running time isO(ρ/).

-ρproposed algorithm checks if a graph has a 2-way cut with-Cut: The e 2−7 ρNIts query complexity iscrossing edges. O() and running time is −3 e O(). This algorithm is interesting in the sense that it generalizes to 2 thek-way cuts problem, at the cost ofO(log (k)) in the query complexity and running time. In addition, it can also be used to testρ-Bisection. In e e −8−3 this case, the query complexity isO() and running time is expO().

Future Directions and Open Questions

Because graph property testing is a relatively new ﬁeld in Computer Science, there are many open problems that need to be solved. This section discusses some open questions, which aﬀect the future directions of graph property testing research.

4.1 Query Complexity of Testing Algorithms as a Func-tion of Distance Parameter When ﬁnding a testing algorithm, we expect that its query complexity only depends on the distance parameterrather than the size of the object being tested. In fact, many algorithms satisfy this condition. One example is the 5 e TheO(∙) notation is used for the sake of succinctness, which hides logarithmic factors that are not important in the analysis.

aforementioned bipartiteness testing algorithm. In previous years, researchers did not pay enough attention to the dependence of the testing algorithm’s query complexity on the distance parameter; therefore, there is a need for more study about this dependence. Oded Goldreich suggested four types of dependencies that can be further investigated:

1. The query complexity has a linear relation to the distance parameter, that is, the number of queries needed isO(1/): Many non-trivial graph p properties can be tested in Ω( 1/) for adjacency-matrix models, which is conjectured to be the lower-bound of this model. In addition, for the bounded-degree incidence-lists model, it is easy to prove that the lower-bound of testing algorithms is Ω(1/).

2. The query complexity is polynomial to the distance parameter, that is, the number of queries is poly(1/): As mentioned later in item 3, there are no testing algorithm having the query complexity of exponential related to the distance parameter, one might expect that polynomial is the upper-bound of query complexity. However, it is still quite far from the currently known upper-bound mentioned in item 4.

3. The query complexity is exponential to the distance parameter, that is, the number of queries is exp(1/): There are currently no natural testing problem possessing this complexity.

4. The query complexity is a function of the distance parameter that grows extremely fast: One example is the tower functiontfthat can be deﬁned inductively: tf(n) = exp(tf(n−1)) with tf(1) = 2.

4.2 General Graph as a Model for Property Testing Most of algorithmic research on graphs deal with general graphs. However, research in graph property testing started with dense graphs by using the ad-jacency matrix representation, then moved to bounded-degree graphs. Only recently the study of general graph testing has been considered. A new model for treating general graphs was developed by Parnas and Ron. This model can provide the same functionalities of the two older models. In addition, it is proved to be suﬃcient for the study of testing the properties of arbitrary graphs and to be the most relevant to computer science applications. Therefore, using this model allows researchers to reuse algorithmic techniques that are applied in other areas of computer science research. Research on this model will deﬁnitely open a new era for graph property testing algorithms.

Summary

Property testing is an interesting new ﬁeld in Computer Science, which is in-creasingly getting more attention from scientists globally. As the amount of

information on the Internet is increasing tremendously, we get more and more beneﬁts from the applications of this ﬁeld. This essay gave a short introduction to property testing. Though graph property testing is the main focus of this essay, it is still broad enough to give readers a good overview of this exciting ﬁeld.

References

[1] Dana Ron,Property DORDRECHT, 2001

testing,

COMBINATORIAL

OPTIMIZATION-

[2] Dana Ron,Property testing: A learning theory perspective, Learning Theory, 2007

[3] Eldar Fischer,The art of uninformed decisions, Bulletin of the EATCS, 2001

[4] Oded Goldreich,Contemplations on Testing Graph Properties, Manuscript, August, 2005

[5] O Goldreich, S Goldwasser, D Ron,Property testing and its connection to learning and approximation, Journal of the ACM (JACM), 1998

[6] Ronitt Rubinfeld,Sublinear time Algorithms, Proc. Intern. Congress of Mathematicians, 2006