Failure, Disconnection and Partition Detection in Mobile Environment

9 pages

English

Failure, Disconnection and Partition Detection in Mobile Environment

Nisig - Denis Conan , Pierre Sens , Luciana Arantes , Mathieu Bouillaguet

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

9 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Seventh IEEE International Symposium on Network Computing and ApplicationsFailure, Disconnection and Partition Detection in Mobile EnvironmentDenis ConanInstitut TELECOM, SudParis, UMR CNRS Samovar9 rue Charles Fourier, 91011 Évry, FranceDenis.Conan@it-sudparis.euPierre Sens, Luciana Arantes, and Mathieu BouillaguetLIP6 — Université de Paris 6 — INRIA4 Place Jussieu, 75252 Paris Cedex 05, FrancePierre.Sens,Luciana.Arantes,Mathieu.Bouillaguet@lip6.frAbstract ting than for receiving messages on wireless networks, thusleading to non-uniform radio range.In mobile environment, nodes can move around and vol- As the geographic extent of the system grows or its con-untarily leave or join the network. Furthermore, they can nectivity weakens, network partitions tend to be more fre-crash or be disconnected from the network due to the ab- quent. They may result in a reduction or degradation of ser-sence of network signals. Therefore, failure, disconnec- vices but not necessarily render the application completelytionandmobilitymaycreatepartitionsinwirelessnetworks unavailable. Partitions should keep working as autonomouswhich should be detected for fault and disconnection toler- distributed systems offering services to their clients as far asancereasons. possible. Algorithms that should beneﬁt from a partition de-Wepresentinthisarticleanarchitectureoflocalanddis- tector module are for instance distributed consensus in par-tributed detectors for mobile networks ...

Informations

Publié par	Nisig
Nombre de lectures	184
Langue	English

Extrait

Seventh IEEE International Symposium on Network Computing and Applications

Failure, Disconnection and Partition Detection in Mobile Environment

Denis Conan Institut TELECOM, SudParis, UMR CNRS Samovar 9 rue Charles Fourier, 91011 Évry, France Denis.Conan@itsudparis.eu Pierre Sens, Luciana Arantes, and Mathieu Bouillaguet LIP6 — Université de Paris 6 — INRIA 4 Place Jussieu, 75252 Paris Cedex 05, France Pierre.Sens,Luciana.Arantes,Mathieu.Bouillaguet@lip6.fr

Abstract

In mobile environment, nodes can move around and vol untarily leave or join the network. Furthermore, they can crash or be disconnected from the network due to the ab sence of network signals. Therefore, failure, disconnec tion and mobility may create partitions in wireless networks which should be detected for fault and disconnection toler ance reasons. We present in this article an architecture of local and dis tributed detectors for mobile networks that detect failures, disconnections, and partitions. It is basically composed of three unreliable detectors: a heartbeat failure detector, a vectorbased disconnection detector, and an eventually per fect partition detector.

1. Introduction

Recent advancements in wireless data networking and portable information appliances have given rise to the con cept of mobile computing. Users can access information and services irrespective of their movement and physical location. However, such an environment is extremely dy namic: Nodes can voluntarily disconnect themselves or move around; absence of wireless network signals can dis connect nodes from the network; nodes can fail and mes sages can be lost. Consequently, failure, disconnection, or mobility may cause a node or several of them to detach from the rest of the network, creating one or more network par titions. Another particularity of mobile environment is the fact that links are not bidirectional because, in practice, the two processes cannot rely on the same physical and logi cal resources in both directions. For argument’s sake, small devices like PDAs consume more power energy for emit

978-0-7695-3192-2/08 $25.00  2008 Crown Copyright DOI 10.1109/NCA.2008.18

ting than for receiving messages on wireless networks, thus leading to nonuniform radio range. As the geographic extent of the system grows or its con nectivity weakens, network partitions tend to be more fre quent. They may result in a reduction or degradation of ser vices but not necessarily render the application completely unavailable. Partitions should keep working as autonomous distributed systems offering services to their clients as far as possible. Algorithms that should beneﬁt from a partition de tector module are for instance distributed consensus in par titionable networks, and resources allocation and placement in dynamic networks. Therefore, a mechanism for provid ing information to the application about network partition is highly important in wireless environments, and is the focus of this paper. We propose an eventually perfect unreliable partition detector for wireless systems. Similarly to an un reliable failure detector [5], an unreliable partition detector can be considered as a per process oracle, which periodi cally provides, for each processp, a list of processes sus pected to be unreachable, that is those processes which are suspected of being in another partition thanp’s one. A parti tion detector is unreliable in the sense that it can make mis takes. Two properties characterise a failure detector:com pletenessandaccuracy. Roughly speaking,completeness sets requirements in respect to crashed processes, whileac curacyrestricts the number of false suspicions. By analogy, these two properties also characterise our partition detector, but with respect to reachable processes. Thus, our partition detector assures the followingcompletenessandaccuracy properties: A processp, which is correct, eventually detects every process that does not take part inp’s partition; andp eventually stops suspecting correct processes that belong to its partition. Our partition detector is able to detect partitions due to disconnections as well as failures. The ultimate goal

119

Authorized licensed use limited to: UR Rocquencourt. Downloaded on October 2, 2009 at 12:42 from IEEE Xplore. Restrictions apply.

of characterising the nature of the partition is to help the decisionmaking process of applying countermeasures for fault tolerance and disconnection tolerance:e.g, remove a faulty participant from a vote and wait for disconnected ones. Hence, in order to build our partition detector, a fail ure detector and a disconnection detector are required. Both detectors participate in our solution and the partition detec tor exploits information provided by them. In our approach, we consider that there is a local con nectivity module at each mobile node which is responsi ble for informing whether that node can send messages or not [9]. It monitors resources such as energy power, mem ory space and wireless link quality by controlling one of their attributes such that, when the raw value of the attribute is below some threshold, the mobile node is disconnected. The objective of a connectivity module is to establish a con nectivity mode (from strongly connected to disconnected) in a stabilised manner. However, such connectivity infor mation needs to be spread over the network. Hence, when a node is locally notiﬁed of its disconnection, the disconnec tion detector that we propose will “try” to spread the dis connection information over the network, through its neigh bours, by calling the broadcast primitive mentioned above. The contribution of our paper is then threefold:(1)a modiﬁed version of theHBfailure detector of [1, 2] which besides offering information about failure suspicions and the possibility of building a quiescent stubborn reliable broadcast primitive, provides information about the reacha bility of nodes;(2)an unreliable disconnection detector that broadcasts disconnection information through the network; and(3)an eventually perfect partition detector that, based on the information given by the two previous detectors, de tects network partitions. The remainder of this paper is organised as follows. In Section 2, we set out the distributed system model. Sec tion 3 presents our global architecture and the basic prim itives used throughout the paper. Section 4 describes the heartbeat failure detector for partitionable networks with terminal mobility and explains how the original algorithm was modiﬁed. The disconnection detector is presented in Section 5, and the partition detector in Section 6. We com pare our contribution with related work in section 7 while section 8 concludes our work.

2. Distributed System Model

We consider a partially synchronous distributed system in which there are bounds on process speeds and on mes sage transmission delays, these bounds are unknown, but they hold after some unknown time, which is called GST for Global Stabilisation Time [5]. The system consists of a set ofnprocessesΠ ={p1, p2..., pn}network of. The processes is a directed graphG= (Π,Λ)whereΛ⊂Π×Π.

The topology of the graph changes due to node movements and node failures, but the set of participantsΠto the dis tributed application is known. Without lack of generality, we assume that there is one process per mobile terminal. Processqis a neighbour of processpif and only if there is an unidirectional link fromptoq.

Failure modelProcesses and links can fail by crashing, that is by prematurely halting and then stopping performing any further action for ever. During the execution, by deﬁ nition, processes and links that have not crashed are said to be correct. In addition, correct links are fair lossy. Afair (lossy) linkmay lose messages, but if a processprepeatedly sends a messagemto processq, thenqeventually receives m.

1 Disconnection modeland reProcesses can disconnect connect. In connected mode, a process may send a mes sage to its neighbours, while in disconnected mode, the re sources of the process terminal are too low to send any ap plication message but control messages may be transmitted for a while. We assume that every process ends its execu tion while being connected and does not crash while being disconnected. In practice, the assumption means that the disconnected, and then terminating or faulty process does not succeed in leaving the set of participantsΠa. Then, mechanism of leases at the application level will make the incriminated process leaving the set of participants. In the sequel, this translates into the assumption that a terminal that disconnects eventually reconnects. A moving node ﬁrst disconnects from the network then it moves to a new loca tion and ﬁnally reconnect to the network. We assume that mobile terminals eventually stop moving.

Partition modelFollowing the terminology given in [1, 2], the network is said to bepartitionable, that is a network in which some links may be unidirectional and may crash. By deﬁnition, afair pathbetween processespandqis a path containing only fair links and correct processes, and asimple fair pathis a fair path in which no process ap pears more than once. In addition, processqis said to be reachablefrom processpif there exists a fair path between pandq, otherwise it isunreachable.pandqaremutually reachableif there exists a fair path betweenpandq, and a fair path betweenqandp. Then, thep’s partition, denoted partition(p), is the set of correct processes mutually reach able by processp.

1 Note that we consider that when a node disconnects, it disconnects from all the nodes, not only from a particular node. The topology changes are gathered in the concept of neighbourhood.

120

Authorized licensed use limited to: UR Rocquencourt. Downloaded on October 2, 2009 at 12:42 from IEEE Xplore. Restrictions apply.

3. Unreliable detection modules and basic com munication primitives

On each node, we provide a basic layer (BL). The func tion of this layer is twofold. Firstly, it establishes a con nectivitymode(from strongly connected to disconnected) in a stabilised manner. Secondly, it provides a list of current neighbours (nghset) by periodically calling the networking layer. By analogy with the participant detector of [4], the neighbourhood detector does not make any mistake: It is perfect. The very reason is that we can’t verify whether log ical connections are correctly managed, or whether network conﬁguration data are correct. For instance, if the underly ing operating system makes a mistake, the mobile termi nal will ﬁnd the links faulty (false positive) or will not use opened connections (false negative). Each change inmode andnghsetis notiﬁed to the upper layers. On each nodep, two detectors are plugged ontoBL: The heartbeat failure detector (HBF D) and the vectorbased disconnection detector (V BDD).HBF Doutputs a vec torH Bof heartbeat counters, one entry per process, and a set of mutually reachable processesmreachable.V BDD outputs a vectordvof disconnection counters, one entry per process. Disconnections and (re)connections are num bered: Disconnection events are oddnumbered and recon nection events are evennumbered. At the upper level of nodep, the eventually perfect par tition detectorE P P Duses information provided by both HBF DandV BDDto compute the setout, that is the set of processes not inp’s partition. Based onHBF D, we pro vide aquiescent stubborn broadcastprimitiveQSBused byVBDDto broadcast disconnection and reconnection in formation. HBF D,VBDDandE PP Dare characterised by both completeness and accuracy properties deﬁned as follows:

• HBCompletenesseach correct process: At p, the heartbeat counter of every process not inpartition(p) is bounded.

• HBAccuracyeach correct process: At p, the heart beat counter of every process is nondecreasing. The heartbeat counter of every process inpartition(p)is unbounded.

• VBDDCompletenessall disconnections: Eventually and reconnections of correct processpare seen by ev ery correct process inpartition(p).

• VBDDAccuracyprocess sees a disconnection: No (resp.reconnection) before the disconnection (resp. reconnection) effectively occurs.

• E PP DCompleteness (Strong partition complete ness): If some processqremains unreachable from a

correct processp, then eventuallypwill always sus pectqof not belonging topartition(p).

• E P PDAccuracy (Eventual strong partition accu racy): If some processqremains reachable from a cor rect processp, then eventuallypwill no longer suspect qof not belonging topartition(p).

Each process can use the following primitives to com municate:

•send(dest, m)/receive(f rom, m): Two basic point topoint communication functions to send (resp.re ceive) messagemto (resp.from) its neighbourdest (resp.f rom). When information is locally exchanged between local detectors,local_send(dest, m)and local_receive(f rom, m)functions are used where f romanddestare the name of the component (HBF D,V BDD, orBL).

•broadcast(m): This function calledQS Bbroadcasts messagemover fair links to all the correct processes in the partition of the correct sender. This primitive pro vides the abstraction of stubborn links hiding the re transmission mechanisms used to make somewhat re liable the transmission of messages. A formulation of the stubborn delivery property is as follows [7]: If the senderp, which does not crash, sends a messagem toqthat is correct, andpis able to indeﬁnitely delay the sending of any further message, thenqeventually receivesm. An important practical consideration is that stubborn links require only a bounded buffer space (minimum of one message). The quiescence property ensures that only a ﬁnite number of messages are sent when broadcast is invoked a ﬁnite number of times, even if processes involved in broadcasting move to other partitions (only a ﬁnite number of messages are sent in the latter partitions).QS BusesHBF D. Due to the lack of space, we do not present the broadcast al gorithm in this paper; the algorithm is a direct modiﬁ cation of the quiescent broadcast primitive given in [2] (Figure 3) in order to add the stubborn property as pre sented in [7].

4. Failure detection

Our failure detectorHBF Dis based on the class of heartbeat failure detectors proposed by [1, 2]. Such a choice is ﬁrstly explained by the need to build quiescent algo rithms, that is algorithms that eventually stop sending mes sages in partitionable networks. In [2], the authors prove that quiescent reliable communication are impossible with classical failure detectors whose implementation provides output of bounded size (e.g., the list of suspects has bounded

121

Authorized licensed use limited to: UR Rocquencourt. Downloaded on October 2, 2009 at 12:42 from IEEE Xplore. Restrictions apply.

size). Hence, they propose in the paper the class of heart beat failure detectors which can be used to circumvent this impossibility result. Another reason that justiﬁes our choice is that heartbeat failure detectors are not timeoutbased. Heartbeat failure detectors provide for each processpa vector of countersH B= [n1, n2, ..., nk]where eachnj is a positive integer corresponding to the number of heart beats received by processpfrom processpj. Thus,njis the “heartbeat value ofpjatp”. Intuitively,njincreases as long aspjis correct, not disconnected, and inpartition(p). No tice that heartbeat failure detectors provide the vectorH B without any treatment or interpretation. Then, other detec tors, as our partition detectorE P P D, can periodically ob tain the current value ofHBvector fromHBF Din order to deduce lists of suspected processes. Beside the heartbeat vectorH B, our failure detector HBF Dgives information about the topology of the net work since each process keeps information about which processes can be reachable through its neighbours. For each neighbourrof processp,HBF Dbuilds the set of processes mutually reachable frompthroughrset is called the. This reachablilityset ofpthroughrand the vectormreachable gathers the set ofreachabilitysets of all the neighbours of pproperty of mutual reachability can be expressed as. The follows: At each correct processp, for each neighbourr, the reachabilityset forr(mreachable[r]) eventually contains all the correct processes (e.g.,q), such that there is a simple fair path fromptoqthroughrand a simple fair path from qtop. Furthermore,HBF Dcan also accept requests for emptying some of the reachability sets in order to restart an accumulation phase of topology discovery. This function ality is used by our partition detectionE P P D, described in Section 6, when a failure or a disconnection is detected. HBF Dwhich runs on each nodepis presented in Al gorithm 1. It is based on the algorithm for partitionable networks described in [1]. The changes we have made are related to the addition of nodes mobility and the discovery of the network topology through neighbours. The variables HBandmreachablerespectively store the per process heartbeat counters and the per process mutualreachability sets, as previously described. The setnghbrscontrols the current neighbours ofp, while the setpathsgathers all the paths of whichpis aware since its last heartbeat sending. Algorithm 1 is executed by processp(p∈Π), and it is di vided into ﬁve parallel tasks. It provides to the upper client, e.g.the partition detectorE P PD, the heartbeat vectorH B and the reachability sets (sets ofmreacheable) (line 16). The principle of the algorithm is the piggybacking of fair paths in heartbeat messages. The ﬁrst task (lines 1–5) corresponds to the code block executed at the creation of the heartbeat failure detector. The second task (lines 6–9) is triggered when the neigh bourhood changes. Such an information,nghset, is pro

vided byBL(cf. Section 3). This task controls the mobility of nodes and therefore the current setnghbrsof neighbours ofp(line 9). Furthermore, the entries ofmreachablecorre sponding to those processes that are no longer neighbours of pare set to empty (line 7) since they cannot be reached any more frompthrough old neighbours. However, new neigh bours ofpare seen as reachable (line 8).

Algorithm 1:Heartbeat Failure DetectorHBF D 1upon initialisation do 2nghbrs← ∅{neighbourhood atp} 3HB[1..|Π|]← {0, ...,0}{heartbeat vector atp} 4mreachable[1..|Π|]← {∅, ...,∅}{preach. through neigh. fromp} 5paths← ∅{set of paths received in heartbeats during last period of time} 6upon local_receive(BL,nghset)do 7for allq∈nghbrs\nghsetdomreachable[q]← ∅ 8for allq∈nghset\nghbrsdomreachable[q]← {q} 9nghbrs←nghset 10periodically do 11HB[p]←HB[p] + 1 12paths←paths∪ {{p}} 13for allpath∈paths: (∃r∈path:rappears more than twice inpath) dopaths←paths\path 14for allq∈nghbrsdo send(q,HBFD, paths) 15paths← ∅ 16local_send(EP P D,HBFD, HB, mreachable) 17upon receive(q,HBFD, pathsq)do 18for allpath∈pathsqdo 19for allr∈Π :rappears afterpinpathdoHB[r]←HB[r] + 1 20if∃r∈Π :rappears right next topinpaththen 21for alls∈Π :sappears afterrinpathdomreachable[r]← mreachable[r]∪ {s} 22endif 23paths←paths∪ {(path∙p)} 24endfor 25upon local_receive(E PP D,EPPD, procset)do 26for alls∈procsetdomreachable[s]← ∅

In the third task (lines 10–16), processpperiodically in crements its own heartbeat and adds itself topaths, which already contains all paths received in heartbeat messages during the last period of time. However, before sending to all its neighbours a new heartbeat message which includes such a variable (line 14),pveriﬁes in line 13 if its pre vious heartbeat messages have not already completed two cycles. In this case, such a path will be removed from pathsAs shown in the example of the execution(line 13). of the algorithm described below, some topology requires that heartbeat messages complete two cycles in order to en sure that processes correctly update their mutually reach able sets. At the end of the third task, the heartbeat failure detector notiﬁes its clients,E P PDin our case, about new updated information concerning the heartbeat vectors and reachability sets (line 16). The fourth task (lines 17–24) handles the reception of messages bypof the formHBFD, paths. Upon receiving it from processq, for eachpath∈pathswithpath= (p1∙...∙pi∙p∙r∙pj∙...∙pk∙q),padds the processes (pj∙...∙pk∙ q), which appears after its neighbourr, tomreachable[r] (lines 20–21). Therefore,mreachable[r]contains a list of processes that can be mutually reached frompthroughr. In addition, processpincreases the heartbeat counters of

122

Authorized licensed use limited to: UR Rocquencourt. Downloaded on October 2, 2009 at 12:42 from IEEE Xplore. Restrictions apply.

all the processes that appear afterpinpath, that is all the processes of the sequence (r∙pj∙...∙pk∙q) (line 19), since they are not suspected byp. Processpappends then itself to pathand stores the new path inpaths(line 23). Notice that in this case,pis also reachable from (pj∙...∙pk∙q) through their respective neighbours. Finally, the ﬁfth task (lines 25–26) empties some entries ofmreachable. As previously explained, this functionality is used by the partition detectorE P PD, described in sec tion 6.

Example of execution ofHBF DIn order to explain how nodes dynamically discover which are the other nodes reachable through their respective neighbours, we show in Figure 1 the scenario of an execution of Algorithm 1, con sidering a topology with ﬁve nodes. Node1starts by sending to its neighbour node2a heart beat message that contains the variablepaths1which, in this case, includes just itself, as shown in Figure 1.(a) (line 12). Upon receiving it (cf. Figure 1.(b)), node2ap pends itself to all the received paths, adding the latter to its variablepaths2(line 23). Notice that it does not update its variablemreachable2since it is not included in any of the received paths. Next, the set{2}is being added topaths2 (line 12) and a new heartbeat message is sent to its neigh bours. Both nodes1and3, outgoing neighbours of2, re ceive it. In Figure 1.(c), both nodes1and3receive the above heartbeat message, while in Figures 1.(d) and 1.(e), node4 receives the heartbeat messages sent by node3, and node5 receives the heartbeat messages sent by node4, respec tively. Next, when node2receives the heartbeat message from node5(cf. Figure 1.(f)), it ﬁnds itself in some of the received paths. Therefore, line 21 of the algorithm is exe cuted and node2adds nodes4and5tomreachable2[3], that is these nodes are mutually reachable from node2 through its neighbour node3. Finally, Figure 1.(g) shows that the content of variable paths1of node1after having received the second heartbeat message from node2the scenario, we consider that. In node1has not sent any new heartbeat message after the reception of the ﬁrst heartbeat message from node2. Node1 then updates its variablemreachable(mreachable1[2] = {2,3,4,5}) since these nodes appear after its neighbour2 in path{1,2,3,4,5,2,1}.

Sketch of proofHBcompleteness: The proof is by con tradiction. Letqbe a process that is not in the parti tion ofp(p=q). Assumes thatH B[q]is not bounded. Then,preceives an inﬁnite number of times messages HBFD, paths, where there exists a pathPinpathswhich containsqafterp. This path is of the formP= (p1∙. . .∙ p∙. . .∙q∙. . .∙pk). Sincepreceives an inﬁnite number

of messages frompk, the linkpk→pBy repeatedis fair. application, for eachj=k−1, . . . ,1, the linkpj→pj+1 is fair. Thus, inP,(p∙. . .∙q)is a fair path fromptoqand (q∙. . .∙pk∙p)is a fair path fromqtop. Therefore,pandq are in the same partition —a contradiction. HBaccuracy: The ﬁrst part (the heartbeat counter of ev ery process is nondecreasing) is obvious sinceH B[q]can only be changed in lines 11 and 19. For the second part (the heartbeat counter of every process in the partition ofp is unbounded), two cases are possible. Letqbe a process in the partition ofp. Ifq=p, then line 11 is executed inﬁnitely often (sincepis correct) andH B[p]atpis un bounded. Now, assumeq=pand let(p1∙. . .∙pi)be a simple fair path fromptoq, and(pi∙. . .∙pk)be a simple fair path fromqtop, so thatp1=pk=pandpi=q. Forj= 1, . . . , k−1, letPj= (p1∙. . .∙pj)induc. By tion onj, we can show that, for eachj= 1, . . . , k−1, pjsends messagesHBFD, pathstopj+1an inﬁnite num ber of times, where there is a pathPinpathssuch that P=subpath∙Pj. Forj=k−1, this claim shows that a neighbour ofpsends messagesM=HBFD, pathsto  pan inﬁnite number of times, where there is a pathPin  pathssuch thatP=subpath∙Pk−1. Sincepis correct and by the fairness property of the links, p receives mes sages of the form ofMSincean inﬁnite number of times. qappears afterpinPk−1,H B[q]is incremented an inﬁnite number of times (line 19). Therefore,H B[q]is unbounded.

5. Disconnection detection

The connectivity information provided byBL(cf. Sec tion 3) remains local to the mobile node. Hence, as we want in our approach to make the difference between a discon nection and a failure, the disconnection/reconnection infor mation of nodes should be spread over the network. We consider that when a node is disconnected from the network, its does not send application messages anymore. However, this does not mean that control messages sent by fair links cannot be transmitted; in other words, physical transmission may be still possible for a while. Contrary to failures which are unexpected, there is a lapse of time between the connectivity detection of the mode “discon nected” and the effective physical disconnection. Such a lapse of time can be used for alerting remote processes of a node disconnection. Clearly, in the case of a sudden dis connection, no disconnection message can be sent and the disconnection will be detected as a failure by the failure de tector that runs on correct and connected processes. This false suspicion will last for the duration of the disconnec tion and will be corrected when the disconnected process reconnects. On the other hand, in the case in which the end user disconnects theirself voluntarily, we consider that the middleware service responsible for isolating the user’s node

123

Authorized licensed use limited to: UR Rocquencourt. Downloaded on October 2, 2009 at 12:42 from IEEE Xplore. Restrictions apply.

paths = {{1,2,1},{2,1}} 1 1

4 (b)

paths = {{1,2,1},{2,1}} 1 1

paths = {{1,2,1},{2,1},{1,2,3,4,5,2,1},{2,3,4,5,2,1}, 1 {3,4,5,2,1},{4,5,2,1},{5,2,1},{2,1}} mreachable [2] ={1,2,3,4,5}1 1

paths = {{1,2,3,4,5,2},{2,3,5,2}, 2 {2,3,4,5,2},{4,5,2},{5,2},{2}} 2 mreachable [3] ={2,4,5} 2 3

(f)

4 paths = {{1,2,3,4}{2,3,4},{3,4},{4}} 4 (d)

(g)

paths = {{1,2,1},{2,1}} 1 1

untary disconnection, connectivity mode change, or deliv ery of aVBDDThe lomessage with new information. cal vectordvkeeps information about process disconnec tion/reconnection, as previously described. The local vari ablevoluntaryDiscindicates whether the enduser has requested a voluntary disconnection, andmodeis a vari able which is updated with the information provided byBL about the connectivity of nodepitself, the latter information being inferred from raw data from the execution context. Thus, by considering the information about both voluntary disconnection and local connectivity,VBDDinfers the log ical connectivity ofp, which it stores indv[p].

(e)

4 (a)

The ﬁrst task (lines 1–4) corresponds to the code block executed at the creation of the disconnection detector. Ev ery process is considered to be connected at the beginning of the execution (lines 2–4). The next task (lines 5–10) allows the enduser to voluntarily disconnect or reconnect (by opposition to involuntary disconnections or reconnec tions detected byBL). The assignment of the variable voluntaryDisc(line 6) is followed by the propagation of this new disconnection event to every neighbour (line 9). Naturally, voluntary disconnections outdo involuntary dis connections/reconnections. Thus, whenpis not already dis connected, either voluntarily or involuntarily (mode=‘d’), a voluntary disconnection effectively disconnects the pro cess. Similarly, whenpis currently voluntarily discon nected and involuntarily connected (mode=‘d’), a volun tary reconnection effectively reconnects the process. The third task (lines 12–18) is executed when there is a change in the connectivity mode which is detected byBLthe. If

Figure 1. Example of reachability set dynamic construction

Authorized licensed use limited to: UR Rocquencourt. Downloaded on October 2, 2009 at 12:42 from IEEE Xplore. Restrictions apply.

(c)

paths = {{1,2},{2}} 2

waits for a short while before actually performing the dis connection, thus allowing the transmission of control mes sages before the interruption of communication. Then, we introduce the concept of unreliable vector based disconnection detector,V BDD, similar to the one of unreliable failure detection. When a process is notiﬁed of a disconnection either byBLor voluntarily by the enduser, VBDD“tries” to transmit the disconnection information to all the processes by callingQSB.V BDDbuilds thus a co herent distributed view of disconnection events. By analogy with heartbeat failure detectors, the discon nection detector does not output a list of disconnected pro cesses, but provides a per process vector, nameddv, of dis connection/reconnection event counters. Ifdv[q]of pro cesspcontains an even value,qis considered to be seen as connected byp, otherwise it is considered to be discon nected. Notice that such an interpretation of the discon nection vector’s entries is done afterwards by the partition detectorE PP D. It is worth mentioning thatV BDDconsid ers only disconnection/reconnection of correct processes. Indeed, by construction, the disconnection detector is not able to suspect processes of being faulty. So, as mentioned in Section 2, we assume that every process does not crash while being disconnected. The algorithm for processpof our disconnection detec torV BDDfor partitionable networks which supports node mobility is presented in Algorithm 2. It has four tasks. The principle of the algorithm is to broadcast viaQS Bthe dis connection vectordvintoVBDDmessages when one of the following events is triggered: New neighbourhood, vol

paths = {{1,2,3}{2,3},{3}} 3 3

paths = {{1,2,3,4,5}{2,3,4,5}, 5 5 {3,4,5},{4,5},{5}}

1 paths = {{1}} 1

124