Reliable server pooling [Elektronische Ressource] : evaluation, optimization and extension of a novel IETF architecture / by Thomas Dreibholz
267 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Reliable server pooling [Elektronische Ressource] : evaluation, optimization and extension of a novel IETF architecture / by Thomas Dreibholz

-

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
267 pages
English

Description

ReliableServerPoolingEvaluation,OptimizationandExtensionofaNovelIETFArchitectureDISSERTATIONtoobtaintheacademicgradedoctorrerumnaturalium(dr. rer. nat.)inComputerScienceSubmittedtotheFacultyofEconomicsInstituteforComputerScienceandBusinessInformationSystemsUniversityofDuisburg EssenbyDipl. Inform. ThomasDreibholzbornon29.09.1976inBergneustadt,GermanyPresidentoftheUniversityofDuisburg Essen:Prof.Dr.LotharZechlinDeanoftheFacultyofEconomics:Prof.Dr.HendrikSchroder¨Reviewers:1. Prof. Dr. Ing. ErwinP.Rathgeb2. Prof. Dr. KlausEchtleSubmittedon: November28,2006DateofDisputation: March07,2007iiSelbstandigk¨ eitserklarung¨Hiermiterklare¨ ich,dievorliegendeArbeitselbstandig¨ ohnefremdeHilfeverfaßtundnurdieangege beneLiteraturundHilfsmittelverwendetzuhaben.ThomasDreibholzNovember28,2006iiiivAbstractThe Reliable Server Pooling (RSerPool) architecture currently under standardization by the IETFRSerPool Working Group is an overlay network framework to provide server replication and ses sion failover capabilities to applications using it. These functionalities as such are not new, but theircombinationintoonegeneric,application independentframeworkis.InitialgoalofthisthesisistogaininsightintothecomplexRSerPoolmechanismsbyperformingexperimental and simulative proof of concept tests. The further goals are to systematically validatetheRSerPoolarchitectureanditsprotocols,provideimprovementsandoptimizationswherenecessaryand propose extensions if useful.

Sujets

Informations

Publié par
Publié le 01 janvier 2007
Nombre de lectures 872
Langue English
Poids de l'ouvrage 8 Mo

Exrait

PoolingerServReliable

Evaluation,OptimizationandExtensionofaNovelIETFArchitecture

DISSERTATION

gradeacademictheobtaintodoctor(dr.rerumrer.nat.)naturalium
ScienceComputerin

FacultySubmittedoftoEconomicsthe
InstituteforComputerUniversityScienceofandDuisburBusinessg-EssenInformationSystems

byDipl.-Inform.DreibholzThomasbornon29.09.1976inBergneustadt,Germany

PresidentoftheUniversityofDuisburg-Essen:
ZechlinLothar.DrProf.DeanoftheFacultyofEconomics:
Prof.Dr.HendrikSchr¨oder
wers:vieRe

1.2.Prof.Prof.DrDr..-Ing.KlausErwinEchtleP.Rathgeb

on:SubmittedDisputation:ofDate

NoMarchvember07,28,20072006

ii

Selbst¨andigkeitserkl¨arung

Hiermiterkl¨areich,dievorliegendeArbeitselbst¨andig

HilfsmittelundLiteraturbene

DreibholzThomas

embervNo

28,

2006

verwendet

zu

haben.

iii

ohne

fremde

Hilfe

erfvaßt

und

nur

die

ge-ange

vi

Abstract

TheReliableServerPooling(RSerPool)architecturecurrentlyunderstandardizationbytheIETF
RSerPoolWorkingGroupisanoverlaynetworkframeworktoprovideserverreplicationandses-
sionfailovercapabilitiestoapplicationsusingit.Thesefunctionalitiesassucharenotnew,buttheir
combinationintoonegeneric,application-independentframeworkis.
InitialgoalofthisthesisistogaininsightintothecomplexRSerPoolmechanismsbyperforming
experimentalandsimulativeproof-of-concepttests.Thefurthergoalsaretosystematicallyvalidate
theRSerPoolarchitectureanditsprotocols,provideimprovementsandoptimizationswherenecessary
andproposeextensionsifuseful.Basedontheseevaluations,recommendationstoimplementersand
usersofRSerPoolshouldbeprovided,givingguidelinesforthetuningofsystemparametersandthe
appropriateconfigurationofapplicationscenarios.Inparticular,itisalsoagoaltotransferinsights,
optimizationsandextensionsoftheRSerPoolprotocolsfromsimulationtorealityandalsotobring
theachievementsfromresearchintoapplicationbysupportingandcontributingrelevantresultstothe
IETFsongoingRSerPoolstandardizationprocess.
Toachievethedescribedgoals,aprototypeimplementationaswellasasimulationmodelare
designedandrealizedatfirst.Usingagenericapplicationmodelandappropriateperformancemet-
rics,theperformanceofRSerPoolsystemsinfailure-freeandserverfailurescenariosissystematically
evaluatedinordertoidentifycriticalparameterrangesandproblematicprotocolbehaviour.Improve-
mentsdevelopedasresultoftheseperformanceanalysesareevaluatedandfinallycontributedintothe
RSerPool.ofprocessstandardization

ords:eywKReliableServerPooling,Evaluation,Optimization,Extension

v

vi

wledgementsAckno

ThisthesisistheresultofmyworkasresearchassociateintheComputerNetworkingTechnology
point,GroupIofwtheouldlikInstituteetoeforxpressmyExperimentalacknowledgementMathematicstoatevtheerybodyUnivhaersityvingofDuisbsupportedurg-Essen.meduringAtthismy
research.RSerPoolInparticular,Iwouldliketothankmyprimaryadvisor,ErwinP.Rathgeb,forhissupportofmy
RSerPoolresearchandthisthesis,aswellasmysecondaryadvisorKlausEchtleforhisthesisreview.
Furthermore,IwouldliketoexpressmyspecialthankstomyformercolleagueAndreasJungmaierfor
ashiswellgreatascooperationMichaelTin¨uxtheenSCfromTPLItheB/SMO¨CKunsterETAPUniIvprojectersityof(whichAppliedisthebasisSciencesofmyforRShisPLIBsupportprototype),ofthe
RSPLIBprojectandourIETFstandardizationcontributions.Furthermore,Iwouldliketothankmy
studentassistantsSebastianRohde(forwritingtheinitialversionoftheDemonstrationSystemGUI
myfortheRShand-writtenPLIBprototype)RSerPoolandsketchesJobinusingPulinthanaththeDIA(fordrawingapplicationprogram).testing,aswellasdrawingalotof
Indepartmentaddition,forIwsupportiouldngalsomyliketoRSerPoolthanktheprojectSiemensduringICNitsfirstInformationthreeandyears,aswellCommunicationastheNetwDeutscheorks
Forschungsgemeinschaft(DFG)forsponsoringtheprojectforanotherfouryears.
holzforFinallythe,Iyearswouldofliketoencouragementthankmyfandathersupport.ErnstG¨unterDreibholzandmymotherAnneloreDreib-

vii

viii

Contents

ContentsoductionIntr11.1Motivation........................................
1.2ScopeandRelatedWork................................
1.2.1Availability...................................
1.2.2LoadBalancing.................................
1.2.3FaultTolerance.................................
1.2.4ReliableServerPooling............................
1.3GoalsofthisThesis...................................
1.4OrganizationofthisThesis...............................
BasicsorkingNetw22.1TheOSIandTCP/IPNetworkingModels.......................
2.2TheStandardizationofNetworkProtocols.......................
2.3TheNetworkLayer...................................
2.3.1IPVersion4...................................
2.3.2IPVersion6...................................
2.4TheTransportLayer..................................
2.4.1UDP......................................
2.4.2TCP.......................................
2.4.3SCTP......................................
2.4.3.1Introduction.............................
2.4.3.2PacketFormat............................
2.4.3.3AssociationEstablishment......................
2.4.3.4Multi-Homing............................
2.4.3.5CongestionControl.........................
2.4.3.6UserDataTransport.........................
2.4.3.7Extensions..............................
2.5Summary........................................
3ReliableServerPooling
3.1Introduction.......................................
3.2TheRequirementsforRSerPool............................
3.3TheRSerPoolArchitecture...............................
3.4AMigrationPathforLegacyApplications.......................
ix

ix111223345779910101111121212121315151516171919202123

3.5TheProtocolStack...................................23
3.6TheApplicationScenarios...............................24
3.6.1TelephoneSignalling..............................25
3.6.2SessionInitiationProtocol(SIP)........................26
3.6.3IPFlowInformationExport(IPFIX)......................27
3.6.4LoadBalancing.................................27
3.6.5Real-TimeDistributedComputing.......................28
3.6.5.1Requirements............................29
3.6.5.2ApplicabilityofRSerPool......................29
3.6.6MobilitySupportforSCTP...........................29
3.6.7OtherApplicationScenarios..........................30
3.7TheRSerPoolComponents...............................31
3.7.1Registrar....................................31
3.7.1.1Announces..............................32
3.7.1.2PoolManagement..........................32
3.7.1.3PoolMonitoring...........................33
3.7.1.4ServerSelectionandFailureReporting...............35
3.7.1.5HandlespaceAuditandHandlingofPRFailures..........35
3.7.2PoolElement..................................35
3.7.3PoolUser....................................36
3.8TheProtocolDesign..................................37
3.9TheAggregateServerAccessProtocol.........................38
3.9.1Overview....................................38
3.9.2PoolElementFunctionality..........................39
3.9.2.1RegistrationandReregistration...................39
3.9.2.2Monitoring..............................42
3.9.2.3Deregistration............................42
3.9.3PoolUserFunctionality............................43
3.9.3.1HandleResolution..........................43
3.9.3.2FailureReport............................44
3.9.4AutomaticConfigurationFunctionality....................44
3.9.5SessionLayerFunctionality..........................45
3.9.5.1DataChannelandControlChannel.................45
3.9.5.2ASAPCookies............................46
3.9.5.3ASAPBusinessCards........................48
3.10TheEndpointHandlespaceRedundancyProtocol...................49
3.10.1Overview....................................49
3.10.2AutomaticConfiguration............................49
3.10.2.1TheENRPPresenceMessage....................49
3.10.2.2DynamicandStaticPeerTableConfiguration...........51
3.10.2.3MaintainingConnectionstoPeerRegistrars............52
3.10.2.4ObtainingthePeerTablefromPeerRegistrars...........52
3.10.3RegistrarInitialization.............................54
3.10.4HandleUpdate.................................55
3.10.5HandlespaceAuditandSynchronization....................56
3.10.6TakeoverProcedure...............................57
3.11ThePoolMemberSelectionPolicies..........................59
x

3.11.1Basics......................................60
3.11.2Non-AdaptivePolicies.............................60
3.11.2.1RoundRobinandWeightedRoundRobin.............60
3.11.2.2RandomandWeightedRandom...................61
3.11.3AdaptivePolicies................................61
3.11.3.1LeastUsed..............................61
3.11.3.2PriorityLeastUsed.........................61
3.12TheMechanismsforServiceReliabilityandAvailability...............62
3.12.1FailureModel..................................62
3.12.2MechanismsoftheTransportLayer......................62
3.12.3MechanismsoftheSessionLayer.......................63
3.12.4SupportforRedundancyModels........................65
3.13SecurityConsiderations.................................65
3.14Summary........................................65
67ManagementHandlespaceThe44.1Introduction.......................................67
4.2ImplementationHistoryandLessonsLearned.....................67
4.3AnAbstractHandlespaceManagementDatatype...................68
4.3.1HandlespaceStructure.............................68
4.3.2OperationsforthePoolElementFunctionalities................69
4.3.2.1RegistrationHandling........................69
4.3.2.2ReachabilityTimers.........................69
4.3.3OperationsforthePoolUserFunctionalities..................69
4.3.4OperationsfortheRegistrarFunctionalities..................70
4.4TheHandlespaceManagementDesign.........................70
4.4.1HandlespaceDataStructure..........................70
4.4.2PolicyRealizations...............................71
4.4.2.1HelperConstructs..........................71
4.4.2.2RoundRobin.............................72
4.4.2.3WeightedRoundRobin.......................73
4.4.2.4LeastUsedandPriorityLeastUsed.................74
4.4.2.5RandomandWeightedRandom...................75
4.4.3TimerManagement...............................76
4.4.4ChecksumHandling..............................77
4.4.5Synchronization................................78
4.4.6PoolHandleManagement...........................79
4.4.7StorageStructuresandAlgorithms.......................79
4.4.8Node-LinkageImplementation.........................81
4.5TheHandlespaceManagementValidation.......................83
4.5.1Assertions....................................83
4.5.2ConsistencyCheckingFunctions........................83
4.5.3RegressionTests................................84
4.5.4ValidationSoftware...............................84
4.6Summary........................................84
xi

5TheRSPLIBPrototypeImplementation85
5.1Introduction.......................................85
5.2TheRequirementsforthePrototype..........................86
5.3TheDesignDecisions..................................86
5.4TheFoundationComponents..............................87
5.4.1NetworkUtilities................................87
5.4.2TimerManagementandEventCallbackHandling...............87
5.5TheRegistrar......................................87
5.6ThePU/PELibrary...................................88
5.6.1BuildingBlocks................................88
5.6.2TheRSerPoolAPI...............................89
5.6.2.1TheBasicModeAPI........................90
5.6.2.2TheEnhancedModeAPI......................91
5.7TheDemonstrationSystem...............................92
5.8ThePrototypeImplementationValidation.......................95
5.9ASurveyofOtherRSerPoolImplementations.....................95
5.9.1Motorola....................................95
5.9.2Cisco......................................95
5.9.3M¨unsterUniversityofAppliedSciences....................96
5.9.4InteroperabilityTests..............................96
5.10Summary........................................97
6TheRSPSIMSimulationModel99
6.1Introduction.......................................99
6.2TheRequirementsfortheSimulationModel......................99
6.3TheSimulationFramework...............................100
6.3.1ADiscussionofSimulationPackages.....................100
6.3.1.1LISP-basedSimulationPackage...................100
6.3.1.2OPNETMODELER.........................100
6.3.1.3NS-2.................................101
6.3.1.4OMNET++.............................101
6.3.1.5Conclusion..............................102
6.3.2StatisticalPost-Processing...........................102
6.3.3AnOverviewoftheOMNET++DiscreteEventSimulator..........102
6.3.4TheSimulationToolChain...........................103
6.4TheSimulationModel.................................104
6.4.1TheNetwork..................................104
6.4.2TheFoundationModules............................105
6.4.2.1TheControllerModule.......................105
6.4.2.2TheTransportNodeModule....................105
6.4.2.3TheRegistrarTableModule.....................106
6.4.3TheRSerPoolModules.............................106
6.4.3.1TheRegistrarModule........................106
6.4.3.2ThePoolElementModule......................107
6.4.3.3ThePoolUserModule........................109
6.5TheSimulationModelValidation............................111
6.6Summary........................................111
xii

7HandlespaceManagementPerformance113
7.1Introduction.......................................113
7.2ThePerformanceMetric................................113
7.3TheMeasurementSetup................................114
7.4TheOperationsThroughputofDifferentStorageAlgorithms.............116
7.4.1RegistrationandDeregistrationOperations..................117
7.4.2Re-RegistrationOperations...........................119
7.4.3TimerHandling.................................120
7.4.4HandleResolutionOperations.........................122
7.4.4.1Introduction.............................122
7.4.4.2AnUnlimitedSettingofMaxIncrement...............123
7.4.4.3AReducedSettingofMaxIncrement................123
7.4.4.4Summary...............................124
7.4.5SynchronizationOperations..........................124
7.4.6SummaryandConclusion...........................126
7.5TheScalabilityoftheNumberofPoolElements....................126
7.5.1RegistrationandDeregistrationOperations..................127
7.5.2Re-RegistrationOperations...........................127
7.5.3TimerHandling.................................128
7.5.4HandleResolutionOperations.........................129
7.5.5SynchronizationOperations..........................131
7.5.6Summary....................................131
7.6TheScalabilityoftheNumberofPools........................132
7.6.1AnalysisandEvaluation............................132
7.6.2OutlookonFutureScalabilityEnhancements.................133
7.7UsingLeaf-LinkedTrees................................134
7.8Summary........................................135
8RSerPoolPerformanceResults137
8.1Introduction.......................................137
8.2TheRequirementsfortheApplicationModel.....................138
8.3TheDesignoftheApplicationModel.........................138
8.4TheImplementationoftheApplicationModel.....................141
8.5ThePerformanceMetrics................................141
8.5.1PerformancefromtheServiceProvidersPerspective.............142
8.5.2PerformancefromtheServiceUsersPerspective...............142
8.6TheSimulationScenarioSetupandResultsPresentation...............144
8.7UnderstandingtheImpactoftheLoadParameters...................145
8.7.1QuantifyingWorkload.............................145
8.7.2VariationofPU:PERatioandRequestSize..................146
8.7.2.1TheImpactontheSystemUtilization................147
8.7.2.2TheImpactontheHandlingSpeed.................148
8.7.3VariationoftheRequestSizeandRequestInterval..............149
8.7.4VariationoftheRequestIntervalandPU:PERatio..............149
8.7.5Summary....................................150
8.8TheFallaciesandPitfallsofRoundRobinSelection..................151
8.8.1PitfallsoftheRoundRobinListPointer....................151
xiii

8.8.2PitfallsoftheWeightedRoundRobinPolicy.................153
8.8.3Summary....................................154
8.9TheImpactoftheNumberofRegistrars........................154
8.10TheChallengeofNetworkDelay............................155
8.10.1GeneralEffectsofNetworkDelay.......................155
8.10.2CreatingDistance-AwarePolicies.......................156
8.10.2.1HowtoQuantifyDistance?.....................157
8.10.2.2AnEnvironmentforDistance-AwarePolicies............158
8.10.2.3TheDefinitionofDistance-AwarePolicies.............158
8.10.3AProofofConcept...............................160
8.10.4TheAppropriateChoiceofParameters.....................161
8.10.5ExperimentalValidationusingthePLANETLAB................163
8.10.5.1ThePLANETLABEnvironment...................163
8.10.5.2TheMeasurementSetup.......................165
8.10.5.3TheResultsofaFirstTrial.....................165
8.10.5.4TheResultsofaLong-TermMeasurement.............166
8.10.6Summary....................................168
8.11ConfiguringthePU-SideCacheParameters......................168
8.11.1GeneralEffectsofthePU-SideCache.....................168
8.11.2WhentoUsethePU-SideCache?.......................169
8.11.3Summary....................................170
8.12TheEffectsofHeterogeneousServerCapacities....................171
8.12.1ServerCapacityDistributionScenarios....................171
8.12.1.1ASinglePowerfulServer......................171
8.12.1.2MultiplePowerfulServers......................174
8.12.1.3LinearCapacityDistribution....................175
8.12.1.4UniformRandomCapacityDistribution..............177
8.12.1.5TruncatedNormalRandomCapacityDistribution.........178
8.12.1.6Summary...............................179
8.12.2ALoad-Increment-AwarePolicy........................179
8.12.2.1ASinglePowerfulServer......................179
8.12.2.2MultiplePowerfulServers......................181
8.12.2.3LinearCapacityDistribution....................183
8.12.2.4Summary...............................184
8.13Summary........................................184
9RSerPoolFailureScenarioResults185
9.1Introduction.......................................185
9.2UsingDynamicPools..................................186
9.3TheHandlingofPoolElementFailures........................188
9.3.1IntroducingServerFailures...........................188
9.3.2SessionMonitoringbythePoolUser.....................189
9.3.3PoolElementMonitoringbytheHomeRegistrar...............191
9.3.3.1GeneralEffectsoftheEndpointKeep-AliveMonitoring......191
9.3.3.2HowCostlyistheEndpointKeep-AliveMonitoring?.......193
9.3.4ReducingthePoolElementMonitoringOverhead...............193
9.4TheEvaluationofSessionFailoverMechanisms....................195
vxi

9.4.1AbortandRestart................................195
9.4.2Client-BasedStateSharing...........................196
9.4.2.1GeneralEffectsofaCookie-BasedFailover............197
9.4.2.2HowCostlyisaStateCookie?...................198
9.4.2.3LimitingtheAmountofCookieTraffic...............199
9.4.2.4Summary...............................199
9.5TheHandlingofRegistrarFailures...........................200
9.6Summary........................................201
203OutlookandConclusion1010.1AchievedGoalsandObtainedResults.........................203
10.1.1ThePrototypeImplementationandtheSimulationModel...........203
10.1.2TheHandlespaceManagement.........................203
10.1.3ThePerformanceofRSerPoolSystems....................204
10.1.3.1WorkloadParameters........................204
10.1.3.2FallaciesandPitfallsoftheRoundRobinPolicies.........204
10.1.3.3CopingwithNetworkDelay.....................204
10.1.3.4MakingUseofthePU-SideCache.................205
10.1.3.5ScenariosofHeterogeneousServerCapacities...........205
10.1.3.6PoolElementFailureDetectionMechanisms............205
10.1.3.7SessionFailoverMechanisms....................206
10.1.4StandardizationandDeploymentofRSerPool.................206
10.2OutlookandFutureWork................................207
ARSerPoolMessageTypesandParameters209
215AlgorithmsofList217esFigurofList221ablesTofList223yBibliograph239Index247itaeVCurriculum

xv

xvi

Glossary

IAPOSARPASAIASCITMABMBFBSDBARCOCCR32−CCRGDFPDHCDiffServDNSDoDDoSDPFENRPCFSIFDDSFEMFSFTPGKGNUGPLGPSGUIHTTPPICMPv4ICMPv6ICMIDEIEEIETFIGMPtServIn

aceInterfProgrammingApplicationSystemOperatingResearchaAmigAggregateServerAccessProtocol
AmericanStandardCodeforInformationInterchange
ModeransferTAsynchronousBundesministeriumf¨urBildungundForschung
BerkeleySoftwareDistribution
CommonObjectRequestBrokerArchitecture
CheckyRedundancCyclicCyclicRedundancyCheck,32bits
orschungsgemeinschaftFDeutscheProtocolConfigurationHostDynamicServicesferentiatedDifSystemNameDomainDefenseofDepartmentU.S.ServiceofDenialactorFPenaltyDistanceProtocolyRedundancHandlespaceEndpointSystemCombatFutureFibreDistributedDataInterface
SetentEvFutureMachineStateFiniteProtocolransferTFileeeperGatekUnixnotisGNULicensePublicGeneralGNUSystemPositioningGlobalaceInterfUserGraphicalHyper-TextTransportProtocol
ProtocolMessageControlInternetInternetControlMessageProtocol,Version4
InternetControlMessageProtocol,Version6
IdentifierInstituteforElectricalandElectronicsEngineers
InternetEngineeringTaskForce
ProtocolManagementGroupInternetServicesgratedInte

xvii

IOSIPIPFIXIPsecIPv4IPv6ISDNISOITULANLGPLCLLLUDPF−LUCMAMGMGCMIBUMMMTBFMTUNAMNEDNGWNS2−NS++OMNeTOSIPCAPDPDFPEPHPLUPPEPPIDPPPPPUPRPSTNPUQoSNDRACRFRRoloRSerPPRSV

(Cisco)SystemOperatingInternetProtocolInternetExportInformationwFloIPSecurityIP4ersionVProtocol,Internet6ersionVProtocol,InternetIntegratedServicesDigitalNetwork
anizationgOrStandardsInternationalUnionelecommunicationsTInternationalGNULocalAreaLesserNetwGeneralorkPublicLicense
ControlLinkLogicaly)Polic(PoolUsedLeastLeastUsedwithDistancePenaltyFactor(PoolPolicy)
ControlAccessMediaaywGateMediaMediaManagementGatewayInformationControllerBase
UnitManagementMemoryMeanTimeBetweenFailure
UnitransmissionTMaximumNetwNetworkorkAnimatorDefinition(NS-2)Language(OMNeT++)
aywGateorkNetwerServName2SimulatororkNetwObjectiveModularNetworkTestbedinC++
InterconnectionSystemsOpenComputerPersonalPersonalPortableDigitalDocumentFAssistantormat
ElementPoolHandlePoolPriorityLeastUsed(PoolPolicy)
ElementPoolProxyIdentifierProtocolayloadPProtocolPoint-to-PointUserPoolProxygistrarRePoolPublicSwitchedTelephoneNetwork
UserPoolServiceofQualityy)Polic(PoolRandomCommentsforRequestRoundReliableRobinServer(PoolPoolingPolicy)
ProtocolationReservResource

xviii

TTRSASPTPSCSGSHASHASHA−−1256
ANSIGTRSIPSLASLPSNMPSOEoFSPSS7SSHTCPVTLTOEUDPoIPVNVPNAWGWNDWRADPF−NDWRAWRR

ServRound-Triper/ApplicationTimeStateProtocol
StreamSignallingControlGatewTayransmissionProtocol
AlgorithmHashSecureSecureSecureHashHashAlgorithm,Algorithm,192256bitsbits
SignallingTransportWorkingGroup(IETF)
ServiceSessionLevInitiationelAgreementProtocol
SimpleServiceNetwLocationorkProtocolManagementProtocol
EngineoadOfSCTPSingleSignallingPointofSystemFailureNo.7
TSecureransmissionShellControlProtocol
Type-Length-Value
UserTCPOfoadDatagramEngineProtocol
VoiceoverIP
VirtualPrivateNetwork
orkNetwAreaideW(IETF)GrouporkingWWeightedRandom(PoolPolicy)
WeightedRandomwithDistancePenaltyFactor(PoolPolicy)
WeightedRoundRobin(PoolPolicy)

xix

xx

1Chapter

oductionIntr

ERVICESrequiringacertainminimumavailabilityarebecomingincreasinglyimportant.Inthis
chapter,thenecessitytoensureserviceavailabilityismotivatedfirst.Afterthat,thescope
SofReliableServerPooling–theIETFsnovelframeworktoachievesuchavailability–is
presented.ThisisfollowedbyashortintroductiontoReliableServerPoolingitselfandanoverview
ofthegoalsofthisthesis.

vMoti1.1ation

alsoTheagroInternetwingisnumberbecomingofavincreasinglyailability-criticalimportantservicesforallbeingkindsproofvidedapplications.intheInInternet.particular,Considerthereanis
customerse-commercecanbescenariosatisfied.offeringForaanvirtuale-shopshop:proasviderlong,theastheInternetserviceproiswvidesorking,aninethexpensiproveviderandplatformits
toofferproductswithoutgeographicallimitations.Furthermore,theInternetallowsacustomerto
convenientlyvisitshopsonline.Butincaseofaservicefailure,theprovidercannotgainrevenue
and–evenworse–theannoyedcustomerssimplyuseanotherserviceandnevercomebackagain.
takClearlyento,ktheeepavsuchailabilityaserviceofarunning,serviceelikveeninthee-commercecaseofisservcritiercalfailuresandandappropriatenetworkactionsproblems.havetobe
Formally,theAvailabilityofservicescanbeclassifiedasinGrayandSiewiorek(1991):socalled
fault-tolerantsystemshaveanavailabilityof99.99%,i.e.anaveragedowntimeof50minutesper
year.Systemsinthehighavailabilitycategoryimprovetheavailabilitybyanorderofmagnitudeto
99.999%,i.e.theaveragedowntimeisreducedto5minutesperyear.Toactuallyachieveacertain
minimumavailability,mechanismsofmultipleresearchareasarebeingcombined.Thesebasicswill
beintroducedinthefollowingsection.

1.2ScopeandRelatedWork

Althoughavailabilityhasbecomeincreasinglyimportantandtherearevariousapplication-specific
solutionstoensuretheavailabilityofcertainservices,therehadnotbeenactivitiestostandardizea
generic,application-independentframework–untiltheIETFhadstartedtodefinesuchanapproach
ainv2001:ailabilityofReliableSS7-basedServerPoolingtelephone(RSerPool).signallingWhilecomponents,theprimarythegoalmotiofvationRSerPoolofhasRSerPoolbeenhassettobeendefinethe
anapplication-independentframeworkforavailability-criticalservices,beingusableforpractically
application.ofkindyan

1

2

CHAPTERODUCTIONINTR1.

Figure1.1:ReliableServerPoolingandRelatedConcepts

Whileavailabilitysolutionsarenotentirelynew,theircombinationintoasingle,unifiedandcom-
entmonresearchframeworkareas,–theasRSerPoolillustratedinarchitecturefigure1.1–:ais.vailabilityRSerPool,loadcombinesbalancingtheandfmechanismsaultoftolerance.threedifThesefer-
threeareaswillbedescribedinthefollowing,beforeRSerPoolitselfisintroducedinmoredetail.

ailabilityvA1.2.1Abasicmethodtoimprovetheavailabilityofaserviceisserverreplication.Insteadofhavingone
serverrepresentingasinglepointoffailure,serversaresimplyduplicated.MostapproacheslikeLinux
Virtualapplication-layerServer(seeanLVSycast(seeProject(2003Bhattacharjee)),Cisco™etal.(Distrib1997))utedsimplyDirectormap(seeaclientCiscosSystemssession(to2000one))orof
theserversandusetheAbort-and-RestartPrincipleincaseofaserverfailure(i.e.thesessionsof
afailedserverarelostandhavetoberestartedfromscratch).Whilethisapproachissufficientfor
itsmainapplication–webserverfarms–itcausesunacceptabledelaysforlong-lastingsessionsand
isMoreuselessforsophisticatedapplicationsapproacheslikelikvideoeFT-TCPconferences(Fault-Torolerantreal-timeTCP,seetransactionsAlvisi(seeetal.Uyar(et2001al.)),(2004M-TCP)).
(MigratoryTCP,seeSultanetal.(2002))orRSerPool(seeT¨uxenetal.(2006))provideaSession
Layertoallowaresumptionoftheinterruptedsessiononanewserver.Asurveyofmethodsforthe
necessaryserverstatereplicationcanbefoundinWiesmannetal.(2000),averyhandytechniqueis
theclient-basedstatesharingdescribedinDreibholz(2002).

BalancingLoad1.2.2TheexistenceofmultipleserversforredundancyreasonsautomaticallyleadstotheissuesofLoad
toDistribtheutionassignmentandLoadofworkBalancingtoa.processingWhileloadelement,distributloadion(seebalancingBergerandrefinesBrothiswne(definition1999))byonlyrequirrefers-

1.2.SCOPEANDRELATEDWORK

3

ingtheassignmenttomaintainabalanceacrosstheprocessingelements.Thisbalancereferstoan
application-specificparameterlikeCPUloadormemoryusage.
classesAcbeinglassificationimportantofloadforthisdistribthesisutionarealgorithmsadaptivecanandbefoundnon-adaptiinveGuptaones.andAdaptiBeparive(strate1999);giesthebasetwo
theirrequireassignmentup-to-datedecisionsinformation.ontheOntheprocessingotherhand,elementsnon-adapticurrentvestatusalgorithms(e.g.doCPUnotload)requireandsuchthereforestatus
data.AnanalysisofadaptiveloaddistributionalgorithmscanbefoundinKremienandKramer
(in1992);ColajanniandperformanceYu(e2002v),aluationsDykesforetal.web(2000serv),erCardellinisystemsetusingal.(dif2000).ferentalgorithmsarepresented

oleranceTaultF1.2.3FaultTolerancedenotesthepropertyofasystemenablingittocontinueoperatingproperlyinthe
eventofthefailureofsomeofitscomponents.Whileageneralintroductionintotheconceptsofthis
topiccanbefoundinEchtle(1990),twosub-topicshavetobeemphasized:redundancyconceptsand
checkpointing.ImportantredundancyconceptsareexplainedinEngelmannandScott(2005):usingthesimple
Active/Standbyconcept,onecomponentisactivewhileoneormorebackupcomponentsareactivated
iftheprimaryonebecomesunavailable.Furthermore,thestandbycomponentcanbeclassifiedinto
cold-standby(thebackupsystemhastobeinitializedfirst),warm-standby(thereisaregularstate
replicationbetweentheactiveandthestandbysystem)andhot-standby(thesystemstateisreplicated
oneverychange,i.e.thebackupsystemcanlosslesslytakeovertheserviceoftheprimarysystem).
Ontheotherhand,theActive/Activeapproachprovidesmultipleactivesystems,allowingforabetter
utilizationoftheserverresources.
Anotherimportantsub-topicoffaulttoleranceisthecheckpointingmechanism.Checkpointingis
atechniqueforrollbackrecovery:anapplicationcansetacheckpointforitscurrentstate.Incaseof
problemsduringservicecontinuation,itcanreturntothesavedstateandtryagain.Checkpointingnot
onlyworksforlocalapplications(seePlanketal.(1995),wherethestateissimplysavedtodisk),but
alsofordistributednetworkingapplications(seeLe´onetal.(1993),SeligmanandBeguelin(1994)).
Inthiscase,thestateinformationistransmittedtootherserversor–asproposedbyDreibholz(2002)
–transmittedtotheclientintheformofastatecookie.

1.2.4ReliableServerPooling
ThescopeofRSerPool(seeT¨uxenetal.(2006))istoprovideanopen,application-independentand
highlyavailableframeworkforthemanagementofserverpoolsandthehandlingofalogicalcommu-
nication(session)betweenaclientandapool.Essentially,RSerPoolconstitutesacommunications-
orientedoverlaynetwork,whereitsSessionLayerallowsforsessionmigrationcomparabletoSultan
etal.(2002),Alvisietal.(2001).Whileserverstatereplicationishighlyapplication-dependentand
outofthescopeofRSerPool,itprovidesmechanismstosupportarbitraryschemes1.RSerPoolis
notrestrictedtoacertainredundancyconcept.Itspoolmanagementprovidessophisticatedserver
selectionstrategies2forloadbalancing,bothadaptiveandnon-adaptive.Customalgorithmsfornew
applicationscanbeaddedeasily,asdescribedinDreibholzandRathgeb(2005b).
12SeeSeeT¨uxDreibholzenandandDreibholzRathgeb((2006b2005d),),DreibholzDreibholzand(2002),RathgebConrad(2005eand,c),Lei(Dreibholz,2005a).RathgebandT¨uxen(2005).

4

ODUCTIONINTR1.CHAPTER

Figure1.2:TheRSerPoolProjectConcept

ThesisthisofGoals1.3

Asalreadymentionedinsection1.2,theresearchareasonwhichRSerPoolisbasedonarenoten-
tirelynew,buttheircombinationintoasingle,unifiedframeworkis.WhenourUniversitysRSerPool
project–andthereforetheworkonthecontentsofthisthesis–hasbeenstartedin2002,theIETFhad
alreadydefinedtherequirementsandgoalsofRSerPoolasRFCinT¨uxenetal.(2002).Severalexist-
ingserverpoolingsolutionshadalreadybeencomparedbytheIETF(seeLoughneyetal.(2005)),but
sincenoneoftheexistingtechnologieshadbeenfoundcapabletosatisfyinglyfulfiltherequirements,
ithadbeennecessarytocreateanewframework.Thedescriptionofthisnewframework–RSerPool
–aswellasbasicdefinitionsoftheprotocolshadbeenpublishedasInternetDrafts(anarchiveofall
historicversionsofthesedraftscanbefoundatDreibholz(2006c)).Butsincenobodyhaddeveloped
animplementation,thesedocumentsstillincludedvariousopenissues,awsanderrors.Forexam-
ple,theprotocolsdraftshadonlyprovidedthenamesofafewserverselectionprocedures,butdid
notactuallydefinethem.Somenecessaryorusefulfieldsintheprotocolmessageshavebeenmiss-
ingandtheauto-configurationhadrequiredusers(many,possiblyuntrustworthy)tosendmulticast
messages(awasteofbandwidthandasecurityaw).Furthermore,withoutpossibilitiestoperform
proof-of-concepttestsoratleasttoevaluatethenewprotocolssimulatively,ithadnotbeenclearthat
theRSerPoolframeworkwouldworkasintended.
InitialgoalofthisthesisthereforehasbeentogaininsightintothecomplexRSerPoolmechanisms
byperformingproof-of-concepttestsusinglabexperimentsandbyperformingsimulationscovering
amorecomprehensiverangeofparametervalues.Furthergoalshavebeentosystematicallyvalidate
theRSerPoolarchitectureanditsprotocols,provideimprovementsandoptimizationswhereneces-
saryandproposeextensionsifusefulfortheapplicabilityofRSerPool.Basedontheseevaluations,
recommendationstoimplementersandusersofRSerPoolshouldbeprovided,givingguidelinesfor
thetuningofsystemparametersandtheappropriateconfigurationofapplicationscenarios.Inpartic-
ular,itisalsoagoaltotransferinsights,optimizationsandextensionsoftheRSerPoolprotocolsfrom
simulationtorealityandalsobringtheachievementsfromresearchintoapplicationbysupportingand
contributingrelevantresultstotheIETFsongoingRSerPoolstandardizationprocess.

1.4.ORGANIZATIONOFTHISTHESIS

5

Inordertofulfilthegoalsofthisthesis,thefollowingworkitemshavebeenidentified:
DesignandimplementationofaRSerPoolprototypefortestsinrealnetworksandtosupport
standardization;theDesignandimplementationofaRSerPoolsimulationmodel;
Developmentofanefficientandexiblehandlespacemanagementcomponent,requiredfor
both,theprototypeimplementationandthesimulationmodel;
Definitionofagenericapplicationmodelandperformancemetric,aswellasitsrealizationfor
both,prototypeimplementationandsimulationmodel;
SimulativeevaluationoftheRSerPoolandrelatedapplicationparametersinordertoidentify
criticalparameterrangesandproblematicprotocolbehaviour;
Findingimprovementsandvalidatingtheireffectiveness,aswellas
Finallycontributingtheimprovementsintothestandardizationprocess.
Anillustrationoftheworkareacoherenceisprovidedinfigure1.2.

1.4OrganizationofthisThesis

Thisthesisisstructuredasfollows:atfirst,ashortoverviewofthenecessarynetworkingbasics
isprovidedinchapter2.Afterthat,adetaileddescriptionoftheRSerPoolframeworkisgivenin
chapter3,includingdesigndecisions,applicationscenarios,thetwoRSerPoolprotocolsASAPand
ENRPandadetailedoverviewoftheavailabilitymechanismsusedinRSerPoolsystems.
Sinceanefficientmanagementofthehandlespace(i.e.thesetofpoolsandtheirservers)iscru-
cialforboth,theprototypeimplementationandthesimulationmodel,chapter4presentsthedevel-
opedhandlespacemanagementapproachfirst.Afterwards,theactualimplementationoftheprototype
RSPLIBisintroducedinchapter5andthedescriptionofthesimulationmodelRSPSIMisdepictedin
.6chapterChapter7dealswiththeperformanceevaluationofthehandlespacemanagementapproach.The
actualevaluationoftheRSerPoolsystemperformanceispresentedinchapter8(forfailure-freesce-
narios)andinchapter9(forserverfailurescenarios).Thefinalchapterofthisthesis(chapter10)
providesaconclusionandoutlookonfuturework.

6

CHAPTER

1.

ODUCTIONINTR

2Chapter

BasicsorkingNetw

Nthischapter,thenetworkingbasicsimportantforthisthesisarepresented.Theintentionofthe
followingsectionsistooutlinethecontextofReliableServerPooling,aswellasdetermining
Iimportantdefinitionsandterminology.Sincetheparticularnetworkingbasicsarewelldocu-
mented,onlyashortintroductionisgivenhere.Foradetaileddescription,seetheprovidedreferences
andcomputernetworkstextbookslikeTanenbaum(1996).

2.1TheOSIandTCP/IPNetworkingModels

ThetwomostimportantnetworkingmodelsaretheOpenSystemsInterconnectionModel(OSIModel)
bytheInternationalStandardsOrganization(ISO)andtheTCP/IPReferenceModel(TCP/IPModel)
bytheU.S.DepartmentofDefense(DoD);theyarepresentedinfigure2.1.Theideaofbothmodels
istodivideupthefunctionalitiesofacomplexnetworkingsystemintoLayers,eachonebuiltuponthe
onebelowit.ThepurposeofeachlayeristoprovideadefinedServicetotheupperlayers,shielding
anydetailsofhowthisserviceisactuallyrealized.
TheentitiesondifferentmachinesthatcomprisethecorrespondinglayersaredenotedasPeers.
TwopeerscommunicateusingrulesandconventionscalledProtocol.AProtocolStackisasetof
protocolsusedbyacertainsystem,definingoneprotocolperlayer.Inordertotransmitamessagefrom
layernononemachinetolayernonanotherone,eachlayerpassesitsdataandcontrolinformation

Figure2.1:TheOSIandTCP/IPNetworkingModels

7

8

CHAPTER2.NETWORKINGBASICS

toitslowerlayer,untilthelowestoneisreached.Ontheothermachine,thisprocedureisrepeated
inreverseorder.TheoperationsandservicesalayerofferstoitsupperonedefinealayersInterface.
Duetothedefinedinterface,animplementationforalayercanbesimplyreplacedbyanotherone–
withoutanychangetotheotherlayers.Thismakesiteasytoadaptacomplexnetworkingstackto
requirements.changingThesevenlayersoftheOSImodelhavethefollowingfunctionalities:
PhysicalLayer:Thislayerhandlesthephysicaltransmissionofdataoveracertainmedium.In
particular,itdefineshowtosendbitse.g.viacopperorfibrewiresortransmitthemviaaradio
link.DataLinkLayer:Thesecondlayerprovidesthefunctionalitiestotransferdatabetweennetwork
entities.Thisincorporatesphysicaladdressingaswellasdataframinganderrorcorrection.A
dataunithandledbythislayerisdenotedasFrame.
TheDataLinkLayerisoftenseparatedintwosub-layers:theMediumAccessControl(MAC)
sub-layerhandlestheorderedaccesstothephysicalmedium;theLogicalLinkControl(LLC)
sub-layercopeswithupper-layerprotocolidentificationanddataframingaswellaserrorde-
tectionandrecovery.
NetworkLayer:Functionalitiesfortransferringvariablelengthdatasequencesfromasourcetoa
destinationendpointviaoneormorenetworksareprovidedbythislayer.Inparticular,it
providesalogical,hierarchicaladdressingschemeandnetworkrouting.Thedataentityhandled
bythislayerisdenotedasPacket.
TransportLayer:Thefourthlayerprovidesthetransferofuserdataandincludesthesegmentation
andreassemblyoflargedatablocks,reliabletransport,owcontrolandcongestionhandling.
AdataentityofthislayeriscalledSegment.
SessionLayer:Inthefifthlayer,dialoguecontrolbetweenend-userapplicationsisrealized.This
includesmechanismsforduplexorhalf-duplexoperationandthedefinitionofcheckpoints,
adjournmentandrestartofaSession.
PresentationLayer:ThePresentationlayerisresponsiblefortranslatingdataencodingsbetween
differentend-usersystems.Forexample,thiscouldmeantoconvertdatabetweenASCIIand
Unicoderepresentations.Inparticular,thislayeralsoprovidesencryptionanddecryptionas
wellascompressionanddecompressionofdata.
ApplicationLayer:Thislayerprovidestheactualserviceoftheend-userapplication.
Thesevenlayermodelissometimesextendedbytwomorelayers:theFinancialLayer(layer8)
andthePoliticalLayer(layer9).Whilethiscanmainlybeseenashumour,itisnottoofarawayfrom
.realityTheTCP/IPmodelsimplifiestheOSImodelbyareductionfromseventofourlayers:
Host-to-NetworkLayer:ThislayercombinesthefunctionalitiesoftheOSImodelsPhysicaland
DataLinklayers.Thatis,itincludesthephysicaltransmissionaswellasthecontrolledaccess
medium.transportato

2.2.THESTANDARDIZATIONOFNETWORKPROTOCOLS

9

InternetworkLayer:TheInternetworkLayeriscorrelatedtotheNetworkLayeroftheOSImodel.
Therefore,eachInternetworkLayerprotocolcanalsobeseenasaNetworkLayerprotocol.
ImportantprotocolsofthislayerareIPv4(seesubsection2.3.1)andIPv6(seesubsection2.3.2).

TransportLayer:TheTransportLayeroftheTCP/IPmodeldirectlymapstotheOSImodelscorre-
spondinglayer.ThemostimportantTransportLayerprotocolsareTCP(seesubsection2.4.2),
UDP(seesubsection2.4.1)andSCTP(seesubsection2.4.3).

ApplicationLayer:TheApplicationLayeroftheTCP/IPmodelcombinesthefunctionalitiesofthe
OSIlayeraremodelthesHypeSession,r-TextTPresentationransferandProtocolApplication(HTTP,seeLayers.FieldingSomeetal.(important1999)),theprotocolsFileTofransferthis
Protocol(FTP,seePostelandReynolds(1985))andtheSimpleNetworkManagementProto-
colNote,(SNMPthatit,isseenotHarringtonpossibletoetal.(uniquely2002)).mapApplicationLayerprotocolsoftheTCP/IPmodel
tocanthebeOSIseenasmodel.aForPresentationexample,LayerFTPprofunctionalityvides.theconTherefore,versionofmappingscharacterinliteratureencodingsmay–varywhich.

SinceReliableServerPoolinginparticularprovidesSessionLayerfunctionalities,theOSImodel
providesamorefine-graineddescription.Therefore,itisusedthroughoutthisthesis.Toovercome
theambiguityoftheApplicationLayermappingsdescribedabove,thefollowingconventionisused:
allprotocolsdirectlyinteractingwiththeserviceprovidedfortheuseraremappedtotheOSIModels
.LayerApplication

2.2TheStandardizationofNetworkProtocols

NexttotheirOSImodel,theISOhasalsodefineditsownprotocolsforeachoftheirlayers.Dueto
variousreasons,noneoftheirprotocolsachievedahigherdistribution.Foradetaileddiscussionof
thereasons,seesection1.4.4ofTanenbaum(1996).Probablyoneofthemostimportantlacksofthe
OSIstandardshasbeenthatahugefeehastobepaidforeach(!)oftheirstandardsdocuments.This
clearlydidnotmotivatemanypeopletodealwiththeISOstandards.
Thecreationofopen,freetothepublicstandardshasbeenthegoaloftheInternetEngineering
TaskForce(IETF).UnlikeISOstandardization,theactivitiesoftheIETFarebasedonpublicmailing
listsandmeetings,opentoeverypersoninterestedinit.Standardsarecreatedbydiscussionsand
roughconsensus,aresultingstandardsdocumentispublishedasRequestforComments(RFC).All
RFCscanbeaccessedontheIETFwebsite(seeIETF(2006))byanybodyinterestedin,freeofcharge.
DuetotheIETFsconceptionincontrasttotheISOstandardization,theIETFprotocolshavebecome
highlysuccessful.Today,nearlyallsignificantprotocolsoflayer3andabovearestandardizedbythe
IETFasRFCs,e.g.IPv4andIPv6,TCP,SCTPandHTTP.

2.3TheNetworkLayer

Inmainthisintentionsection,ofthethisimportantsectionistoNetwshoorkwtheLayeraspectsprotocolsoftheIPv4protocolsandthatIPv6arearerelevshortlyantforthisintroduced.thesis.TheIn
particular,thisincludesaddressscopingandmulticasthandling.Foradetailedintroductionintoboth
protocols,seetheprovidedreferences.

10

CHAPTER2.NETWORKINGBASICS

4ersionVIP2.3.1Version4oftheInternetProtocol(IP),thereforedenotedasIPv4,hasbeendefinedinSeptember1981
asRFCinPostel(1981b).Untiltoday(2006),itisstillthebasisofthecurrentInternet.IPv4provides
32-bitaddresses,thereforethenumberofhostsistheoreticallylimitedto232=4,294,967,296.
Inpractice,however,thenumberofpossiblehostsisofcoursesignificantlylower,duetoreserved
addressspacesandinefficientaddressassignment.
Unlikeglobaladdresses,thescopeofthefollowingaddressspacesislimited:alladdressesfrom
127.0.0.0to127.255.255.255belongtoahostitself.Thatis,theseaddressescanonlybeusedbe-
tweenendpointsonthesamehost.10.0.0.0to10.255.255.255,172.16.0.0to172.31.255.255and
192.168.0.0to192.168.255.255areprivateaddressspacesthatcanbefreelyusedwithinlocalnet-
works.AddresseswithintheseprivatespacesarenotroutedwithintheInternet.
TogetherwithIPv4,theInternetControlMessageProtocol(ICMP),version4–thereforedenoted
asICMPv4–hasbeendefinedinPostel(1981a).Itprovidesfunctionalitiesfortestingaswellasfor
reportingerrorsliketheunreachabilityofadestinationaddress.
IPv4alsoprovidesthepossibilitytosendmulticastmessages.Forthispurpose,theaddressrange
from224.0.0.0to239.255.255.255isreserved.Whilethefirstfourbitsoftheaddressarefixed,the
remaining28bitsspecifythedestinationmulticastgroup.Allendpointshavingjoinedthecorre-
spondingmulticastgroupshouldreceiveamessageaddressedtothegroupsmulticastaddress.The
functionalitiesforjoiningandleavingamulticastgroupareprovidedbytheInternetGroupManage-
mentProtocol(IGMP).ThelatestversionoftheIGMPprotocolisversion3;itisdefinedinCainetal.
).2002(

6ersionVIP2.3.2MainlyduetotheaddressspacelimitationsofIPv4,version6oftheInternetProtocol(IPv6)hasbeen
developedandstandardized1asRFCinDeeringandHinden(1998b).IncontrasttoIPv4,IPv6uses
asimplifiedheaderformatand128-bitaddresses.Thisprovidesthetheoreticalpossibilityforupto
2128≈3.402824∗1038addresses.Eveninpessimisticmappingscenarios,thisprovidessufficient
spaceformillionsofaddressespersquaremetreofearthsurface.
Thetextualaddressrepresentationisdefinedinsection2.2ofHindenandDeering(2006):anIPv6
addressiswrittenaseightcolon-separatedgroupsoffourhexadecimaldigits(i.e.16bits).Leading
zerosinagroupmaybeomitted.Example:2001:638:501:4ef1:204:23ff:fe9f:1087.Inordertosim-
plifytherepresentation,adouble-colon::indicatesoneormoregroupsofzeros.Thiscompression
mayoccuratmostonceinanaddress.Itmayalsobeusedtocompressleadingortrailingzeros.
Examples:2001:5c0:0:2::24,2001:638:501:4ef1::,::1.Subnetmasksarewritteninprefixnotation
(i.e.thelengthofthesubnetmaskinbits).Examples:2001:638:501:4ef1::/64,2001::/16.
LikeIPv4,IPv6alsoprovidesdifferentaddressscopes(definedinHindenandDeering(2006)):
::1/128istheloopbackaddressandcanonlybeusedbyanendpointitself.
fe80::/10denotestheLink-Localaddressspace.Itisonlyvalidforthelocalphysicallink,e.g.
gment.seEthernetanfec0::/10denotestheSite-Localaddressspace.Itisequivalenttoaprivateaddressspacein
IPv4.Theuseofsite-localaddressesisdeprecatedasdefinedintheRFCHuitemaandCarpenter
(2004).Thatis,itshouldnotbeusedanymore.
1numberAnehadxperimentalalreadyvbeenersion5assigned.oftheIPTherefore,protocoltheusingsuccessor64-bitofIPv4addressesisneIPv6.verreachedRFCstatus.Nevertheless,theversion

2.4.THETRANSPORTLAYER

11

fc00::/7denotesLocalIPv6UnicastAddresses.Thisisthereplacementforsite-localaddresses,
definedinHindenandHaberman(2005).Althoughonlyroutedlocally(e.g.inthenetwork
ofacompany),theyareataveryhighprobabilitygloballyunique(duetoapseudo-random
globalIDpart).ThepropertyofuniquenesssolvesmultipleproblemsdescribedinHuitemaand
).2004(Carpenter

ICMPv6Together–haswithbeenIPv6,definedtheinInternetDeeringandControlHindenMessage(1998aP).rotocol,LikevICMPv4,ersionit6–providesthereforefunctionalitiesdenotedas
fortestingaswellasforreportingerrorsliketheunreachabilityofadestinationaddress.
AnalogouslytoIPv4,IPv6alsoprovidesmulticastcapabilities.Theaddressspaceff00::/8has
beenreservedforthispurpose.Tojoinorleaveamulticastgroup,theIGMPprotocolisusedagain.
Fordetails,seeCainetal.(2002).
supportsAnimportantauthenticationdifferenceandtoIPv4confidentialityisthatinIPv6theNetwmandatorilyorkLayerrequiresbydeftheault.FsupportorofdetailsIPsec.onThatIPsec,is,seeit
theRFCsKentandAtkinson(1998c,a,b).IPsecisoptionalforIPv4.
AnotherimportantpropertyofIPv6istheavailabilityofasocalledFlowLabelintheIPv6header.
Aowlabelisa20-bitnumberthatis–togetherwithitsIPv6sourceaddress–network-uniquefor
aowunreadabletoaforcertainnetwdestination.ork-nodesIfotherthethanupperthe-layerreceiverdata,thisiscanencryptedbeusedtousingidentifyIPsecandcertainothereforews,e.g.beingto
AnensureideatoQualityaddaofowServicelabel(QoS,optionseetoIPv4Dreibholzis(proposed2001)infordetails)Dreibholz(properties2005a).likeanassuredbandwidth.

2.4TheTransportLayer

Inthissection,theimportantTransportLayerprotocolsUDP,TCPandSCTPareintroduced.Since
UDPandTCParewell-knownstandardsandampledocumentationisavailable,theintroductionhere
isquiteshortandonlyoutlinestheimportantfactsnecessarytounderstandtheirdifferencestoSCTP.
TheSCTPprotocol,sinceveryrelevantforReliableServerPooling,isexplainedinamoredetailed
form.

UDP2.4.1TheUserDatagramProtocol(UDP),definedasRFCinPostel(1980),providesaconnection-less,
message-orientedtransmissionofdata.AmessageisalsodenotedasDatagram,hencetheprotocols
name.ToprovidethepossibilityofmultipleUDPinstancesperendpoint,UDPusesa16-bitPort
numbertoidentifysourceanddestinationinstanceintheTransportLayer.
sizenThe,themessagedestinationframingendpointispreservwilledprobyvideUDPthe.fullThatis,messageifthetoitsupperupperlayerlayer.transmitsThea16-bitmessageInternetof
Checksum(definedinBradenetal.(1988))isusedtodetecttransmissionerrors.Incaseofanerror,
thewholemessageisdropped.UDPdoesnotensurethesequenceofthedatagrams.Thatis,theymay
bedeliveredinanyorderatthereceiverside.Thisisdenotedasunordereddelivery.Furthermore,
thetransportofUDPisunreliable.ThereisnomechanisminUDPtodetectthelossofmessages.If
losslesstransmissionisrequired,thishastobeensuredbytheupperlayer.Furthermore,UDPdoes
notprovideanyowandcongestioncontrol.

12

CHAPTER2.NETWORKINGBASICS

TCP2.4.2UnlikeUDP,theTransmissionControlProtocol(TCP)definedasRFCinPostel(1981c)provides
aconnection-oriented,stream-orientedtransmissionofbytestreams.Beforeanyuserdatacanbe
messagetransferred,aframing,connectione.g.asenderhastomaybesendestablishedmessagesusingofan3-wbytesaywhilehandshakthee.receivTCPersidedoesreceinotvespreservtheme
byte-wise.Forthereceiver,itisnotpossibletodeterminetheoriginalblocksgivenbythesenders
upperlayerstotheTransportLayer.Ifthepreservationofmessageframingisrequired,thishastobe
layers.upperthebyrealizedexactlyTCPtheprosamevidesaorderr.eliableLostseservice:gmentsallwilldatabesentdetectedbytheandsenderwillretransmitted.bereceivedFurthermore,onthereceiTCPverprosidevidesin
window-basedowandcongestioncontrol.Thatis,thetransmissionspeediscontrolledbasedonthe
receiversandnetworkscurrentcapabilities.LikeUDP,theTCPprotocoluses16-bitportnumbersto
identifyinstancesonthesameendpoint.Furthermore,thedatacorrectnessisensuredusingthe16-bit
InternetChecksum(seeBradenetal.(1988)).

SCTP2.4.3TheStreamControlTransmissionProtocol(SCTP)wasoriginallyintendedtotransportSS7(Sig-
nallingSystemNo.27)telephonesignallingoverIPnetworks.SinceSS7networkshavestrictavail-
abilityrequirements,thesedemandsalsohavetobefulfilledbythetransportoverIP.Nevertheless,
variousotherapplicationscenarioshavebeenfound.ThissectiongivesanintroductiontotheSCTP
protocol.

oductionIntr2.4.3.1TheSCTPprotocolhasbeendefinedin2000bytheIETFSignallingTransportWorkingGroup(SIG-
TRAN)asRFCinStewartetal.(2000),withsomeupdatesinStoneetal.(2002),Stewart,Arias-
Rodriguez,Poon,CaroandT¨uxen(2006).AnoverviewofSCTPisprovidedinJungmaier(2005),
Jungmaieretal.(2001,2000a),OngandYoakum(2002),applicationscenariosaredescribedinCoene
).2002(SCTPprovidesaconnection-oriented,message-orientedtransportofdata.Toprovidetheidenti-
ficationofTransportLayerinstances,a16-bitportnumberasforTCPandUDPisused.3AllSCTP
messagesareprotectedagainsttransmissionerrorsusingthe32-bitCRC-32checksumasdefined
inStoneetal.(2002).TheoriginallyusedAdler-32checksumhasbeenreplacedduetoerrordetec-
tionweaknesses.Inthefollowing,theimportantfeaturesofSCTPareshortlyexplained.

2.4.3.2PacketFormat
TheformatofaSCTPpacketispresentedinfigure2.2.Italwaysincludessourceanddestination
portnumbers,thechecksumandaVerificationTag.Theverificationtagisa32-bitrandomnumber,
negotiatedatconnectionsetupforeachtransmissiondirection.Ithastobeequalforallfollowing
packetsinthesamedirection.Thisisusedasasecuritymechanismagainstblindattacks:anattacker
beingunabletoreadthepacketstreamnotonlyhastoguesstheportnumbers(whichareinmany
casesfixed),butalsohastoguesstheverificationtag.
23ASeeCRC-32GradischnigchecksumandT¨uxdoesen(not2001),denoteaGradischnigsuminetal.(mathematical2000),sense.JungmaierNevetal.ertheless,(2000athis),istheJungmaier(terminology2005).usedbythe
documents.standards

2.4.THETRANSPORTLAYER

Figure2.2:TheSCTPPacketFormat

13

EachSCTPpacketincludesoneormoreChunks,whichmayincludeuserdata(DataChunk)or
controldata(ControlChunk).ExtensionstotheSCTPprotocol(seesubsubsection2.4.3.7)canbe
easilyrealizedbydefiningnewtypesofcontrolchunks.

EstablishmentAssociation2.4.3.3

Beforeanyuserdatacanbesent,itisfirstnecessarytoestablishaconnection,calledAssociation
inSCTPterminology.Thisisrealizedbya4-wayhandshakeasshowninfigure2.3.Tosetup
anassociation,anINITcontrolchunkissenttothepeerinstanceBfirst.IfBacceptsthesetup
request,itstoresallnecessaryinformationforassociationsetupintoasocalledCookie,whichis
thencryptographicallysignedbyasecretkey.ThecookieissentbacktoAusinganINITACK
chunk.Afterthat,Bdeallocatesallresourcesrelatedtothenewassociation.Ahandlesthecookie
asarbitrarybytevectorandmustreturnitwithoutanychangestoBusingaCOOKIEECHOchunk.
Bcanverifyitsauthenticityusingitssecretkeyandcansubsequentlyestablishtheassociationusing
theinformationfromthecookie.Afterthat,itconfirmsthesuccessfulassociationsetupusinga
chunk.CKACOOKIEThe4-wayhandshakesolvesanimportantproblemofTCPs3-wayhandshake:forTCP,anat-
tackercoulduseaspoofedsourceaddresstosendSYNrequests(seePostel(1981c))toaTCPserver.
UponreceptionoftheSYN,theTCPinstancewillreserveresourcesforthenewconnection.Evenif
thesenderisnotexisting,theresourcesremainreservedforseveralminutes(dependingontheTCP
implementation).IfanattackersendslargenumbersofSYNs(knownasSYNFloodingattack),this
willleadtoaDenialofService(DoS),preventinglegitimateusersoftheservertoconnecttoitdueto
resources.oflackaOntheotherhand,usingthe4-wayhandshakeofSCTP,thesenderoftheINITmustreceivethe
INITACKtobeabletoreplywithaCOOKIEECHO.Thatis,itsendpointaddressmustbevalid.
Otherwise,theSCTPserverwillnotreserveanyresourcesfortheassociation.

14

Figure

Figure

2.3:

2.4:

The

The

SCTP

CHAPTER

Association

Multi-Homing

Setup

Feature

of

2.

SCTP

ORKINGNETW

ASICSB

2.4.THETRANSPORTLAYER

15

Multi-Homing2.4.3.4PresumablythemostimportantfeatureofSCTPistheMulti-Homing.Thatis,endpointsofanassoci-
ationcanutilizemorethanoneNetworkLayeraddress,e.g.duetomultiplenetworkcards.Figure2.4
ciationillustratesspeertheendpointprinciple:asSCTPpossiblesupportsPathtothisfthisactbyendpoint.viewingAeachNetworkNetworkLayerLayeraddressaddresscaninofanprincipleasso-
wbelongorktoLayeranprotocols,arbitrarye.g.NetwIPv4orkandLayerIPv6.protocol.Thatis,Inapeerparticular,endpointancanendpointbemayreachedalsounderusethemultipleaddressNet-of
eachpath.Iftherearendisjointpathswithinthenetwork,anyn-1pathsmayfailwithoutinterrupting
association.theSCTPmonitorseachofanassociationspathsusingHEARTBEATchunks.Anendpointreceiv-
ingBEATsuchAaCKchunkuponhasitstoHEARreplyTBEAtoitTusingchunk,aanHEARendpointTBEAThasAvCKerifiedchunk.thatByapathreceiisvingactuallytheHEARusable.T-
Forthetransportofuserdata,oneofthepossiblepathsischosenassocalledPrimaryPath.Ifthe
primarypathfails,anotheroneisselectedandaPathFailoverisperformed–transparentlyforthe
layers.uppernewUsingNetworktheLayerAdd-IPeaddressesxtension(anddescribedthereforeinpaths)subsubsectionorremove2.4.3.7them,itduringisfurthermoreassociationpossibleruntime.toadd

olContrCongestion2.4.3.5Usingexactlyoneprimarypath–insteadofbalancingthetrafficamongmultiplepathsandutilizing
multiplenetworks–hasbeenrealizedduetocongestioncontrolreasons:theconnection-lessparadigm
ofIPnetworksprovidesnowaytoensurethatmultiplepathsaredisjoint.Inparticular,itisnot
possibletodecidewhethercongestionontwopathsiscausedbyasinglecommonlink.SCTPrealizes
aper-pathTCP-friendlycongestioncontrol.Thatis,utilizingnpathsoverthesamebottlenecklink
wouldappearlikenTCP-controlledows.Clearly,sincealltrafficiscausedbythesameassociation,
thiswouldbequiteunfairtootherTCP-basedows.However,workisinprogresstocopewiththis
problem.SeeJungmaier(2005)fordetails.

ransportTDataUser2.4.3.6AnotherinterestingfeatureofSCTPisitssupportformultiplestreamsoveroneassociation,theso
calledMulti-Streaming.AStreamisanuni-directionalchannelbetweenthetwoendpoints;thenumber
ofstreamsmulti-streaming:ineachEndpointdirectionAisnesendsgotidataatedpackatetsonassociationdifferentsetup.streams.FigureThe2.5SCTPillustrateslayerthesegmentsprincipletheof
thedatadatablockschunksintobackdataintochunksthefullandmessagestransmitsandthemproovervidesthethemtoassociation;theupperthepeerlayer(i.e.endpointSCTPBpreservcombineses
framing).messagetheThereisnoHead-of-LineBlockingfordatamessagesondifferentstreams.Thatis,ifamessage
inofoneotherstreamstreams.isfullyThismeansrecombinedthatatmessagesendpointofB,difitisferentgiventostreamsthemayupperovlayerertak–eregeachardlessother.ofHowemessagesver,
byordereddefault,delivtheeryorderisnotofnecessarymessages,itwithincanbetheturnedsameoffstreamonaisperpreserv-messageed(sobasis.calledInOrthisderedcase,Deliveryunordered).If
deliverysimilartoUDPisprovided.
messageToPsupportayloadtheProtocolcommunicationIdentifierofdif(PPID)ferentisprouppervided.-layerThisprotocolsidentifieroverthedenotessamea32-bitstream,anumberper-
protocol.-layerupperthespecifying

16

CHAPTER2.NETWORKINGBASICS

Figure2.5:TheMulti-StreamingFeatureofSCTP

Extensions2.4.3.7Asexplainedinsubsubsection2.4.3.2,theprotocoldesignofSCTPprovidesthepossibilitytoeasily
addprotocoleextensionsxtensions.bydefiningnewchunktypes.Thissubsubsectiondescribesthecurrentlydefined

PartialReliability(Pr-SCTP)ThePartialReliabilityextensionforSCTP,definedinStewartetal.
(2004),allowstospecifyatimeoutforthetransmissionofamessageonaper-messagebasis.Re-
transmissionsofamessageareonlytriggereduntilitsgiventimeoutexpires.Afterexpirationofthis
timeout,themessageissimplydropped.
Byusingatimeoutof0ms,SCTPbehaveslikeUDP,inthewaythatitonlytriestosendthe
messageonce.Ontheotherhand,SCTPstillkeepsprovidingcongestionandowcontrolforevery
message.Thatis,unlikeUDP,asenderwillnotoverloadthenetworkorthereceiverwithitsdata.
Pr-SCTPissupportedbyallmajorSCTPimplementations;bothendpointstelltheirpeerinstance
onassociationsetupwhetherornotPr-SCTPisavailable.Ifoneoftheendpointsisnotcapableof
Pr-SCTP,amessagetimeoutwillsimplybeignored,i.e.thetransportwillbecomereliable.

AuthenticationChunkTheauthenticationofchunksisprovidedbytheAuthenticationChunkex-
tensiondefinedinT¨uxenetal.(2005).Usingapre-sharedkeyplusarandomkeynegotiatedatasso-
hasciationestablishedsetup,itisthepossibleassociationtoto.ensureThatthatis,ananattackendpointerkwilleepsbeunablecommunicatingtohijackwithantheassociation.peerendpointit

DynamicAddressReconfiguration(Add-IP)TheDynamicAddressReconfigurationextension
(calledAdd-IPinshort)isdefinedinRamalhoetal.(2006)andprovidesthepossibilitytoaddNet-
workLayeraddressestoandremovethemfromanassociationduringitsruntimewithoutinterrupting
theassociationorbotheringtheupperlayer.Theprimaryreasonforthisextensionhasbeentosupport
seamlessIPv6renumberingintelecommunicationssignallingscenarios.Thatis,itallowsforreplac-
ingtheIPv6prefixonaproviderchange–withoutinterruptinganyestablishedsignallingrelation.In
particular,itwouldalsobepossibletoseamlesslyupgradeanIPv4networktoIPv6–aslongasthe

YSUMMAR2.5.

17

endpointsalreadyhavesupportforbothprotocols.Otherinterestingapplicationscenariosincludethe
supportToofensuremobilitythatan(MobileattackerSCTP,mayseenotRieeasilygelandchangeT¨uxenan(2006association)).sendpointtohimself,Add-IP
mandatorilyrequirestheusageoftheAuthenticationChunkextension.

PacketDropThePacketDropextensiondefinedinStewart,LeiandT¨uxen(2006a)canbeused
toinformasenderofpacketdropsnotrelatedtocongestion.Inparticular,itcanbeusediferror-
pronetransmissionchannelslikesatellitelinksareusedforthetransmissionandthereceiverdetects
achecksumdifferenceinareceivedSCTPpacket.Then,apacketdropreportcantellthesender
aboutthereceptionerrorandherebyforceanimmediateretransmission.Otherwise,thesenderwould
assumeapacketlossduetocongestionandreduceitsdatarate.

StreamResetTheStreamResetextensiondefinedinStewart,LeiandT¨uxen(2006b)isusedto
resetresetsathestreammessageofansequenceassociation,numberinorderforatoallocertainwitsstreamusageinfororderadiftoalloferentwtheneapplication.wInapplicationparticularof,thisit
streamtouseitforkeepingtrackofthemessagesequence.

Secure-SCTP(S-SCTP)WhileSCTPcanbeusedwithIPsecorTLS4(seeJungmaier,Rescorla
andbothT¨uxensolutions(2002is))tooptimalensure.WhileauthenticityIPsec,lacksinteofgrityefandficiency,confidentialityTLSdoesofnotthesupporttransmissions,thesecuritynoneofof
seecontrolUnurkhaanchunksas(2005well)).asTheunorderedSecure-SCTPdeliveryeandPrxtension-SCTP(for(S-SCTP)adeprotailedvidesadiscussionsecurityofethextensionproblems,for
SCTP,supportingboth,authenticationandencryption,onaper-messagebasisandprovidingfull
supportproposedforethextensionorderedisanddescribedunorderedasInternetdeliveryDraftinfeaturesHohendorfaswelletasal.for(2006the).Pr-SCTPextension.This

Summary2.5InTCP/IPthischaptermodels,,atheshortNetworksummaryLayeroftheprotocolsimportantIPv4andnetwIPv6orking–includingbasicshaveICMPv4,beengivICMPv6en:theandOSIIGMPand,
asimportantwellasthefeaturesTransportmakingitLayersuperiorprotocolstoTCPTCP,andUDPUDP:anditsSCTPsecurity.TheagainstSCTPattacksprotocolduetoprovvideserificationsome
tagand4-wayhandshake,itsresilienceduetomulti-homing,thepreservationofmessageframingand
streams.multipleforsupportits

4TransportLayerSecurity,seeDierksandAllen(1999),Blake-Wilsonetal.(2003).

18

CHAPTER

2.

ORKINGNETW

ASICSB

3Chapter

ReliableServerPooling

HISchapterexplainstheReliableServerPoolingconcept,itsfunctionalitiesanditstwoproto-
cols:ASAP(AggregateServerAccessProtocol)andENRP(EndpointHandlespaceRedun-
TdancyProtocol).ThefirstpartofthischapterdescribesthemotivationforRSerPoolandits
architecture(section3.1tosection3.5),includingapplicationscenariosforRSerPoolinsection3.6.
Afterthat,anoverviewoftheRSerPoolcomponenttypesisgiveninsection3.7.Inparticular,this
sectionillustratestheinteractionamongitsdifferentclassesofcomponents.Thisisfollowedbythe
descriptionofthetwoRSerPoolprotocolsthemselvesinsection3.9andsection3.10,aswellasthe
currentlydefinedserverselectionproceduresinsection3.11.Basedonthefunctionalitiesandproto-
colsdescriptionsofRSerPool,thefailuremodelaswellasasummaryofthemechanismsforservice
reliabilityandavailabilityarepresentedinsection3.12.Finally,ashortintroductiontothesecurity
conceptofRSerPoolisgiveninsection3.13.

oductionIntr3.1

AccordingtoEchtle(1990),Reliabilitydenotestheabilityofasystemtooperatecorrectlyforagiven
timeundervalidoperatingconditions.Anapproachtoensurethereliabilityofasystemistoensure
theavailabilityofitscomponents.AsdefinedinGrams(1999),thetermAvailabilitydenotesthe
probabilityofacorrectfunctionwithinthemeaningofreliability.Inclassicaltelecommunications-
orientednetworks,componentavailabilityisensuredbyredundantlinksanddevices.Anexampleis
presentedinRathgeb(1999),whereseveralmechanismstoincreasetheavailabilityofacommercial
described.areswitchTMAThefocusoftheReliableServerPoolingarchitecture,whichwillbepresentedandevaluatedin
thisthesis,is–despiteitsname–theavailabilityofservices.Nevertheless,theoverallgoalofitsusers
istoachievetherequiredreliabilityofaservicebeingbuiltontopofthisarchitecture.Thedescription
ofthefailuremodelaswellasdetailsontheprovidedmechanismstoensureservicereliabilityand
availabilitywillbepresentedinsection3.12;butfirst,itisnecessarytointroducetheconceptof
ReliableServerPoolinganditsprotocols.
ThetermServerPoolingdenotesthemanagementofredundantserverresourcesforthesame
serviceinordertoensureitsavailabilityand/ortorealizeloaddistributionamongdifferentservers.
Varioussolutionsfortherealizationofserverpoolingareavailable,bothproprietary(likeCisco™Dis-
tributedDirector,seeCiscoSystems(2000))andOpenSource(likeLinuxVirtualServer,seeLVS
Project(2003)).Mostoftheavailablesolutionsfocusonveryspecificservices;theservicemost
frequentlyaddressedbyserverpoolingsystemsisHTTP(seeFieldingetal.(1999)).Butwiththe

19

20

CHAPTER3.RELIABLESERVERPOOLING

growingdemandandnumberofprovidedservices,itbecamemoreandmoredifficulttodevelopand
maintainaspecificserverpoolingsolutionforeachservice–insteadofre-inventingthewheelagain
andagain,acommon,unifiedandstandardizedframeworkisrequired.
thegoalTheofdevtheelopmentIETFandReliableServstandardizationerPoolingofWsuchorkinganGrouparchitecture(IETFforservRSerPoolerpoolingWG,seehasIETFbeensetRSeras-
PoolabbreWGviated(2005as)).RSerPool,Asaresult,whichtheatwtheorkingmomentgroupconsistshasofcreatedonetheirRFC,seconceptveralReliableInternetServerDraftsPandoolingour,
prototypeimplementationRSPLIB(seechapter5)asreferenceimplementation.

3.2TheRequirementsforRSerPool

AskeyrequirementsfortheReliableServerPoolingarchitecturetobedefined,theIETFRSerPool
WGhadidentifiedthefollowingpoints:

Lightweight:TheRSerPoolsolutionmaynotrequireasignificantamountofresources(e.g.CPU
powerormemory).Inparticular,itshouldbepossibletorealizeRSerPool-basedsystemsalso
onlow-powerdeviceslikemobilephones,PDAsandembeddeddevices.

Real-Time:Servicesliketelephonesignalling(seesubsection3.6.1)haveverystrictlimitationson
thesystemdurationstateofisfreailovers.-establishedInthewithincaseofatimecomponentframefofailures,onlyaitfemaywbehundredsnecessaryofmilthataliseconds.normalIn
telephonesignalling,suchafeatureisinparticularcrucialifdealingwithemergencycalls.

Scalability:ForprovidingserviceslikeDistributedComputing(seesubsection3.6.5),itisneces-
sarytomanagepoolsofmanyhundredsoreventhousandsofservers(e.g.animationrendering
pools).TheRSerPoolarchitecturemustbeabletoefficientlyhandlesuchpools.Butnumber
andsizeofpoolsarelimitedtoacompanyororganization.Inparticular,itisnotagoalof
RSerPooltomanagethecompleteInternetinonesetofpools.

Extensibility:ItmustbepossibletoeasilyadapttheRSerPoolarchitecturetofutureapplications.In
newparticular,applicationsthismeanscantodefinehavespecialtherulespossibilityontowhichaddsernevwerofservtheerpoolselectionisthemostprocedures.appropriateThattois,
usefortheprocessingofarequest(e.g.theleast-usedserver).

Simplicity:TheeffortnecessarytoconfigureRSerPoolcomponents(i.e.toaddorremoveservers)
shouldbeassmallaspossible.Intheidealcase,theconfigurationshouldhappenautomatically,
i.e.itshouldonlybenecessarytoturnonanewserver.

Amoredetaileddescriptionoftherequirementsfortheserverpoolingarchitecturecanbefound
inT¨uxenetal.(2002).ThedraftLoughneyetal.(2005)discussestheapplicabilityofexisting
technologiesliketheDomainNameSystem(DNS),theCommonObjectRequestBrokerArchitec-
ture(CORBA),theServiceLocationProtocol(SLP)andLayer4/Layer7switchingtoachievethe
requirementsdescribedabove.Insummary,noexistingtechnologyhasbeenfoundcapabletosuffi-
cientlyfulfilalloftherequirements.Therefore,thedefinitionofanewframework–ReliableServer
justified.is–(RSerPool)Pooling

3.3.ARCHITECTURERSERPOOLTHE

ConceptRSerPoolThe3.1:Figure

3.3TheRSerPoolArchitecture

21

Figure3.1showsthebuildingblocksoftheRSerPoolarchitecture,whichhasbeendefinedbythe
IETFRSerPoolWGinT¨uxenetal.(2006).IntheterminologyofRSerPool,aserverisdenotedasa
PoolElement(PE).InitsPool,whichgroupsserversprovidingthesameservice1,itisidentifiedbyits
PoolElementIdentifier(PEID).ThePEIDisa32-bitrandomnumberwhichischosenuponaPEs
registrationintoitspool.ThesetofallpoolsisdenotedastheHandlespace.Inolderliterature,itmay
bedenotedasNamespace.Thisdenominationhasbeendroppedinordertoavoidconfusionwiththe
DNS.EachpoolinahandlespaceisidentifiedbyauniquePoolHandle(PH),whichisrepresentedby
anarbitrarybytevector.Usually,thisisanASCIIorUnicodenameofthepool,e.g.DownloadPool
erPool.ebServWorEachhandlespacehasacertainscope(e.g.anorganizationorcompany),denotedasOperation
Scope.Asexplainedinsection3.2,itisexplicitlynotagoalofRSerPooltomanagetheglobal
Internetspoolswithinasinglehandlespace.Duetothelimitationofoperationscopes,itispossible
tokeepthehandlespaceat.Thatis,PHsdonothaveanyhierarchyincontrasttotheDNSwithits
top-levelandsub-domains.Thisconstraintresultsinasignificantsimplificationofthehandlespace
management(seealsochapter4fordetailsonhandlespacemanagement).Figure3.2presentsan
examplefortheinformationstoredinahandlespace.Inthisexample,apooldenotedbyitsPHe-
ShopDatabaseconsistsof3PEswiththeirIPv4andIPv6transportaddressesaswellastheirpolicy
information(tobeexplainedindetaillater;here:informationonserverload).
Withinanoperationscope,thehandlespaceismanagedbyredundantRegistrars(PoolRegistrar,
PR).Inliterature,thiscomponentissometimesalsodenotedasENRPServerorNameServer(NS).
Sinceregistraristhemostexpressiveterm,thisdenotationisusedinthewholedocument.PRs
havetoberedundantinordertoavoidaPRtobecomeasinglepointoffailure(SPoF).EachPRofan
operationscopeisidentifiedbyitsRegistrarID(PRID),whichisa32-bitrandomnumber.Itisnot

1Therefore,aserverprovidingmultipleservicesisregisteredinmultiplepools.

22

CHAPTER3.RELIABLESERVERPOOLING

HandlespaceThe3.2:Figure

2necessaryhandlespace.toPRsensureofanuniquenessoperationofPRscopeIDs.APRsynchronizemaintainstheiraviewcompleteofthecopyofhandlespacetheoperationusingthescopeEnd-s
andpointT¨uxHaenN(dlespace2006b)).ROlderedundancvyersionsProtocolofthis(ENRP,protocoldefinedusetheinXietermetal.Endpoint(2006),SteNamespacewart,Xie,RedundancStillmany
Protocol.ThisnaminghasbeenreplacedtoavoidconfusionwithDNS,buttheabbreviationhas
remained.DuetohandlespacesynchronizationbyENRP,PRsofanoperationscopearefunctionally
equal.Thatis,ifanyofthePRsfails,eachotherPRisabletoseamlesslyreplaceit.
(2006aUsing,b)),theaPEAggrecangaddateSitselfervertoAorccessremoPverotocolitselffrom(ASAP,apooldefinedbyinSterequestingwart,aXie,registrationStillmanoranddereT¨uxgis-en
retrationgistrationatanarbitbecomesrarythePRofPEstheHome-PRoperation(PR-H).scope.AIncasePR-Hofnotonlysuccessfulinformresthegistration,otherthePRsPRofthechosenopera-for
tionscopeabouttheregistrationorderegistrationofitsPEs,italsomonitorstheavailabilityofitsPEs
byASAPEndpointKeep-Alivemessages.Suchamonitoringmessagehastobeacknowledgedbythe
PEwithinacertaintimeinterval.IfthePEfailstoanswerwithinacertaintimeout,itisassumedto
beoutofserviceandimmediatelyremovedfromthehandlespace.Furthermore,aPEisexpectedto
re-registerregularly.Atare-registration,itisalsopossibleforthePEtochangeitslistoftransport
addressesoritspolicyinformation(tobeexplainedlater).
toTorequestusettheheserviceresolutionofofathepool,apoolsclientPH–toacalledlistPofoolPEUseridentities(PU)inatanRSerPoolarbitraryPRterminologyofthe–firstoperationhas
scope.ThisselectionprocedureisdenotedasHandleResolution.Iftherequestedpoolisexisting,the
PRwillselectalistofPEidentitiesaccordingtothepoolsPoolMemberSelectionPolicy,alsosimply
denotedasPoolPolicy.
Possiblepoolpoliciesaree.g.arandomselection(Random)ortheleast-loadedPE(LeastUsed).
Whileinthefirstcaseitisnotnecessarytohaveanyselectioninformation(PEsareselectedrandomly),
itisrequiredtomaintainup-to-dateloadinformationinthesecondcaseofselectingtheleast-loaded
toPE.Thisequallycasedistribisutealsotheillustratedrequestinloadfigureonto3.2the.UspoolingsanPEs.Detailsappropriatewillbeselectionpresentedpolicy,initissectione.g.3.11possible.
2areAsused.discussedFurthermore,insectionnosev3.2.1ereofXieconsequenceetal.(of2006such),theaconictchanceofhasbeenidenticalPRidentified.IDsisveryTherefore,lowthewhencomplegoodxityrandomofaPRnumbersID
negotiationinENRPhasbeenomitted.

3.4.AMIGRATIONPATHFORLEGACYAPPLICATIONS

23

AfterreceptionofalistofPEidentitiesfromaPR,aPUwillwritethePEinformationintoits
localcache.ThiscacheisdenotedasPU-sideCache.Outofitscache,thePUwillselectexactly
onePE–againusingthepoolsselectionpolicy–andestablishaconnectiontoitusingtheprotocol
oftheapplication,e.g.HTTPoverSCTPorTCPincaseofawebserver.Usingthisconnection,the
serviceprovidedbytheserverisused.Iftheestablishmentoftheconnectionfailsortheconnection
isabortedduringserviceusage,anewPEcanbeselectedbyrepeatingthedescribedselectionproce-
dure.IftheinformationinthePU-sidecacheisnotoutdated,aPEidentitymaybedirectlyselected
fromthecache,skippingtheeffortofaskingaPRforhandleresolution.Afterre-establishingacon-
nectionwithanewPE,thestateoftheapplicationsessionhastobere-instantiatedonthenewPE.
TheprocedurenecessaryforsessionresumptionisdenotedasFailoverProcedureandisofcourse
application-specific.ForanFTPdownloadforexample,thefailoverprocedurecouldmeantotellthe
newFTPserverthefilenameandthelastreceiveddataposition.Bythat,theFTPserverwillbeable
toresumethedownloadsession.Sincethefailoverprocedureishighlyapplication-dependent,itis
notpartofRSerPoolitself,thoughRSerPoolprovidesfarreachingsupportfortheimplementationof
arbitraryfailoverschemesbyitsSessionLayermechanismsexplainedinsubsection3.9.5.
TomakeitpossibleforRSerPoolcomponentstoconfigureautomaticallyasrequestedinsec-
tion3.2,PRscanannouncethemselvesviaUDPoverIPmulticast(detailscanbefoundinsubsec-
tion3.9.4andsubsection3.10.2).TheseannouncescanbereceivedbyPEs,PUsandotherPRs,
allowingthemtolearnthelistofPRscurrentlyavailableintheoperationscope.Theadvantageof
usingIPmulticastinsteadofbroadcastisthatthismechanismwillalsoworkacrossrouters(e.g.LANs
connectedviaaVPN)andtheannounceswill–incaseofe.g.aswitchedEthernet–onlybeheard
andprocessedbystationsactuallyinterestedinthisinformation.IfIPmulticastisnotavailable,itis
ofcoursepossibletostaticallyconfigurePRaddresses.

3.4AMigrationPathforLegacyApplications
RSerPoolisacompletelynewprotocolframework.Tomakeitpossibleforexistingspecializedorpro-
prietaryserverpoolingsolutionstoiterativelymigratetoaRSerPool-basedsolution,itismandatory
toprovideamigrationpath.ForclientswithoutsupportforRSerPool,theRSerPoolconcept(seefig-
ure3.1)providesthepossibilityofaProxyPoolUser(PPU).APPUhandlesrequestsofnon-RSerPool
clientsandprovidesanintermediationinstancebetweenthemandtheRSerPool-basedserverpool.In
subsection3.6.4,adetailedexamplewillbeprovided.FromaPEsperspective,PPUsbehavelike
PUs.gularreSimilartoaPPUallowingtheusageofanon-RSerPoolclient,itispossibletoinstallaProxyPool
Element(PPE)tocontinueusinganon-RSerPoolserverinaRSerPoolenvironment.

StackotocolPrThe3.5Figure3.3showstheprotocolstackofPR,PEandPU.TheENRPprotocolisonlyusedforthe
handlespacesynchronizationamongPRs,allcommunicationsbetweenPEandPR(registration,re-
registration,deregistration,monitoring)aswellasbetweenPUandPR(handleresolution,failure
reporting)isbasedontheASAPprotocol.Thefailoversupport,basedonanoptionalSessionLayer
betweenPUandPE(tobeexplainedindetailinsubsection3.9.5),isalsousingASAP.Inthiscase,
theASAPprotocoldata(ControlChannel)ismultiplexedwiththeapplicationprotocolsdata(Data
Channel)overthesameconnection.UsingtheSessionLayerfunctionalityofASAP,apoolcanbe
viewedasasingle,highlyavailableserverfromthePUsApplicationLayerperspective.Failure

24

CHAPTER3.RELIABLESERVERPOOLING

Figure3.3:TheRSerPoolProtocolStack

detectionandhandlingaremainlyperformedautomaticallybytheSessionLayer,transparentlyfor
.LayerApplicationtheThetransportprotocolusedforRSerPoolisinmostcasesSCTP(seesubsection2.4.3).The
importantpropertiesofSCTPrequiringitsusageinsteadofTCParethefollowing:
Multi-homingandpathmonitoringbyheartbeatmessagesforimprovedavailability(seesub-
subsection2.4.3.4)andverificationoftransportaddresses(seesubsection3.9.2fordetails);
Addressreconfiguration(Add-IP,seesubsubsection2.4.3.7andRamalhoetal.(2006))toen-
ablemobilityandinterruption-freeaddresschanges(e.g.addinganewnetworkinterfacefor
y);redundancenhancedMessageframing(seesubsubsection2.4.3.6)forsimplifiedmessagehandling(especiallyfor
theSessionLayer,tobeexplainedinsubsection3.9.5);
Securityagainstblindoodingattacks,duetothe4-wayhandshakeandtheverificationtag(see
and)2.4.3.3subsubsectionforProtocoltheASAPidentificationSessionbyLayerPPID(seefunctionality,tosubsubsectionbeexplained2.4.3.6)inforsubsectionprotocol3.9.5multiple).xing(required
ForthetransportofPRannouncesbyASAP(explainedinsubsection3.9.4)andENRP(explained
insubsection3.10.2)viaIPmulticast,UDPisusedastransportprotocol.TheusageofSCTPis
mandatoryforallENRPcommunicationamongPRsandfortheASAPcommunicationbetweenPEs
andPRs.FortheASAPcommunicationbetweenPUandPRaswellasfortheSessionLayercom-
withmunicationanadaptationbetweenlayerPEanddefinedPU,initisConradandrecommendedLei(to2005buse)isSCTPpossible.–buttheThisusageadaptationofTCPlayertogetheradds
functionalitieslikeheartbeats,messageframingandprotocolidentificationontopofaTCPconnec-
tion.Nevertheless,someimportantadvantagesofSCTParemissing–especiallythehighimmunity
againstoodingattacks(seesubsubsection2.4.3.3)andthemulti-homingproperty(seesubsubsec-
tionwitha2.4.3.4SCTP).Thestack,onlye.g.ifusingmeaningfulareasonproprietarytouseembeddedTCPisifthesystemPUprovidingimplementationaTCPstackcannotonlybe.equipped

AThe3.6Scenariospplication

Inthissection,theapplicationscenariosofRSerPoolareexplained.

3.6.THEAPPLICATIONSCENARIOS

Figure3.4:TelephoneSignallingwithSeparatedGKandMGC

25

SignallingelephoneT3.6.1TelephonesignallingoverIPnetworksusingtheSCTPprotocolistheapplicationscenariothatorigi-
nallymotivatedthedevelopmentofRSerPool.APublicSwitchedTelephoneNetwork(PSTN)defines
strictrequirementsfortheavailabilityoftheusedcomponents.Ifthereisafailureofsignallingcom-
ponents,astablesystemstatemustbereachedagainwithinonlyafewhundredsofmilliseconds(see
alsoGradischnigandT¨uxen(2001)).Tocopewiththeserequirementsforthetransportoftelephone
vsignallingeloped.Its(SS7featuresprotocol)omulti-homingverIPandnetwpathorks,themonitoringSCTPprotectprotocola(seesignallingsubsectionnetwork2.4.3ag)ainsthasbeennetworkde-
againstproblems,butcomponentitcannotfailuresimprobyvethearedundancvyailabilityhasofbeenthethesignallingmotivationendpointsforthemRSerPool.selves.IntheThefolloprotectionwing,
afoundshortinT¨uxintroductionenetal.to(the2006).applicationofRSerPoolfortelephonesignallingisgiven.Detailscanbe
Intelephonenetworks,signallingandmediatransport(e.g.atelephonecall)arehandledsep-
arately.ANetworkGateway(NGW)inatelephonenetworkusuallyconsistsofSignallingGate-
ways(SG),MediaGatewayController(MGC),MediaGateways(MG)andGatekeeper(GK).The
SGreceivessignallinginformation(e.g.acallsetup)andtransfersittotheMGC,whichisresponsi-
blefortheconnectionsignalling.ItcanasktheGKforaccesscontrol(e.g.Maythisuserestablish
acalltothatdestination?)andaddressresolution(e.g.a0800numbertotheactualendpoint).After
athat,MGtheisMGCconfiguredcantosendthehandlesithegnallingdatastreammessages(e.g.tototheneconxtvertNGWISDN.voiceFurthermore,datatotheaVMGCoIPformatensuresforthata
NGWbetweenISDNandVoIPnetworks).
Obviously,MGCandGKhaveveryhighavailabilityrequirementsandaprotectionagainstfailures
ismandatory.UsingRSerPool,thesecomponentsarerealizedbypoolsasillustratedinfigure3.4.For
aSG,itisthereforefirstnecessarytoperformahandleresolutionandserverselectionasexplainedin
section3.3.Thatis,aSGbecomesaPUandtheMGCsoftheMGCpoolarethePEs.
ButGKandMGCarebothrealizedaspools.Thatis,fromtheperspectiveofaMGCPE,itis
itselfsymmetricaPUofcase,theaGKGKcanpool.replaceOntheafailedotherMGChand,ajustGKlikePEaisMGCitselfcanaPUreplaceofathefailedMGCGKpool.byInanotherthis
PEtheofthecommunicationoppositepool.betweenForPEtheandRSerPoolPU.Thiswillarchitecture,beethexplainedinsymmetricdetailincaseonlysubsubsectionhasimportance3.9.5.3.for

26

CHAPTER3.RELIABLESERVERPOOLING

Figure3.5:TelephoneSignallingwithCombinedGKandMGC

ProxiesSIPRSerPool-based3.6:Figure

Aspecialcaseofthedescribedscenarioisshowninfigure3.5:here,MGCandGKfunctional-
itiesarecombinedintoasinglecomponent–againmaderedundantbyRSerPool.Thisresultsina
simplifiedmanagementoftheredundancy.

3.6.2SessionInitiationProtocol(SIP)

JustliketheusageofRSerPoolforclassicaltelephonesignallingviaSS7overIP,itisalsopossibleto
applyRSerPoolmechanismstoSIP-basedcommunications(seeRosenbergetal.(2002))likeVoice
overIP(VoIP).Serviceslikeconnectionsignalling,userlocalization,accountingandforwardingare
realizedbySIPproxies.Clearly,theseproxiesarecrucialfortheoperationofthesystem.Inor-
dertoachieveanavailabilitycomparabletoclassicaltelephonesystems(e.g.toforwardemergency
calls),itismandatorytokeepthesecomponentsredundant.TheapplicationscenarioforRSerPool-
basedSIPcommunicationsispresentedinfigure3.6:SIPproxiesconsistofapoolofredundantSIP
proxyPEs.Formoredetails,seeConradetal.(2002),Renieretal.(2005),Bozinovski,Gavrilovska,
PrasadandSchwefel(2004),Bozinovski(2004),Bozinovski,SchwefelandPrasad(2004),Bozi-
novski,GavrilovskaandPrasad(2003),Bozinovski,Renier,SchwefelandPrasad(2003).

3.6.THEAPPLICATIONSCENARIOS

RSerPoolwithIPFIX3.7:Figure

27

3.6.3IPFlowInformationExport(IPFIX)
IntheIPFlowInformationExport(IPFIX)frameworkdefinedinSadasivanetal.(2006),socalledOb-
servationPointsinanetworkgatherinformationaboutcurrentlyrunningowsandforwardthemtoso
e.g.calledfortheCollectoranalysisPointsof.theThenetwcollectororkutilization.pointsprovidethestoredowinformationdatatoapplications,
Observationpointsmaye.g.berouters,whichusuallypossessonlyalimitedamountofmemory.
Otherwise,Therefore,iftheacollectorcollectorwillpointnotbebecomesabletostoreunreachable,anyitmoreisowmandatoryinformationtofindanddatanotheralossonewillquicklyoccur..
Furthermore,itshouldbeensuredthatobservationpointsdistributeequallyoverthesetofcollector
points,inordertoavoidoverloadingasinglecollectorwhileothersremainidle.
pointBothload–canrequirementsbefulfi–lledabyquickthefailovRSerPoolertoanewarchitecture.collectorInpointDreibholz,andbalancingCoeneandoftheConradobserv(2006ation),
thecollectorconceptpointsofareusingPEsofRSerPoolacollectorforIPFIXpool,isobservdescribed.ationThispointsareconceptintheisalsoroleofillustratedPUs.Aninobservfigureation3.7:
pointPUcanchooseacollectorbyhandleresolutionandserverselectionasdescribedinsection3.3.
Anloadisappropriatedistributedpoolequallypolicoy,vere.g.theRoundsetofRobincollectororPEs.LeastUsed,canensurethattheobservationpoint

BalancingLoad3.6.4LoadbalancingisaRSerPoolapplicationscenariocurrentlyveryactivelydiscussedbytheIETF
RSerPoolWG.ItisdescribedinT¨uxenetal.(2006),Coeneetal.(2004).Figure3.8illustratesthe
conceptofloadbalancingusingRSerPoolfortheexampleofawebserverpool:theloadcaused
bytheclients(webbrowsers)hastobebalancedequallyamongthewebserversofthepool.A
LoadBalancercomponentrealizesthisdistributionusingRSerPoolmethods.Thisdistributionis
completelytransparentforthewebbrowser(client).Theuserscancontinueusingtheirexistingweb
browsersoftware,whiletheloadbalancercomponentadoptstheroleofaPPU(ProxyPU,seealso
).3.4sectionCurrently,itisunderdiscussionbytheIETFRSerPoolWGtocombinetheloadbalancerprotocol
SASP(Server/ApplicationStateProtocol)withtheRSerPoolarchitecture.TheSASPprotocolhas

28

CHAPTER3.RELIABLESERVERPOOLING

Figure3.8:LoadBalancingwithRSerPool

Figure3.9:DistributedComputingwithRSerPool

beendevelopedbyIBManddescribedasInternetDraftinBivens(2006).

3.6.5Real-TimeDistributedComputing

AnotherapplicationscenariohavingsomeoverlapwithloadbalancingisDistributedComputing.
IdeasforthisapplicationscenariohavebeendescribedinDreibholzandT¨uxen(2003),Zhang(2004),
Dreibholz,RathgebandT¨uxen(2005)andrefinedinDreibholzandRathgeb(2005e,c,d).Thecon-
ceptofusingRSerPoolforreal-timedistributedcomputingalsohasbeendescribedasInternetDraft
inDreibholz(2006a).Figure3.9illustratesthisconcept:attheuserside,alargecomputationrequest
isgenerated.Forexample,thiscouldbeasimulation(beingcomposedofmultipleindependentruns)
ortherenderingofananimationsequence.Thegeneratedcomputationrequestissplitupintosmaller
parts.ThesepartsaredistributedbyparallelPUinstancesamongthecomputationPEsofthepool
usingRSerPoolmechanisms.Theprocessedpartialresultsarefinallycombinedtotheresultofthe
computation.complete

3.6.THEAPPLICATIONSCENARIOS

29

UnlikealreadyexistingapplicationslikeSETI@HOME(seeSETIProject(2003))–wherecompu-
tationjobsaredownloadedbyclientsfromacentralserverandtheircompletedresultmaybeuploaded
totheserverpossiblysomedayslater–theideaofreal-timedistributedcomputingisproposed.That
is,ifacomputationrequestcannotbehandledinaproppermanner(e.g.timely),animmediatefailover
performed.isPEanotherto

ementsRequir3.6.5.1Thebasicideaofdistributedcomputing(seealsoDavies(2004))istoutilizecurrentlyunusedcom-
putationcapacityforusefulthings.Forexample,thiscouldmeantousethecapacityofofficePCs
duringnight-timeandiftheirusersdonotutilizethem.ButwhenaPCscapacityisrequiredback
byitsuser,ithastobeimmediatelyrevokedfromthecomputationpool.Thatis,poolsmaybevery
dynamic.Furthermore,computationrequestsfordistributedcomputingusuallyhavealongruntimeofmany
minutesorhours.Ifacomputationserverbecomesunavailable,anappropriatefailovermechanism
hastoensurethatalreadycomputedpartialresultsdonothavetobere-computedagain.Instead,a
computationsessionshouldberesumedbyanotherPE.Thisisparticularlyrelevantifthedistributed
computingcapacityisprovidedbyuserPCs:unlikeserversinacomputingcentre,theyhaveasignif-
icantlyloweravailability.Forexample,ausercouldsimplyturnthePCofforthenetworkconnection
break.couldFinally,distributedcomputingpoolscanbeverylarge,consistingofmanyhundredsoreventhou-
sandsofcomputationservers.Forexample,alargecompanycoulddecidetoaddallofitsoffice
PCstoasimulationpool.Insuchpools,theprobabilityofhighlydifferingcapacitiesisverylikely.
Thatis,thereareveryheterogeneouscomputationserversandtheserverselectionprocedurehasto
incorporatethisaspecttoeffectivelyutilizethepool.

3.6.5.2ApplicabilityofRSerPool
RSerPoolisequippedwithallfunctionalitiesbeingnecessarytofulfilthedescribedrequirements:By
providingthepossibilitiesforregistrationandderegistration,highlydynamicpoolscanberealized.
Thefeaturesfortheautomaticconfigurationalsomakesettingupcomputationserverprocesseseasy
(seesubsubsection3.7.1.1).RSerPoolallowstheimplementationofarbitraryfailoverprocedures
(seesubsection3.9.5)tocopewiththenecessityofsessionresumption.Furthermore,itprovides
monitoringofregisteredPEs(seesubsubsection3.7.1.3),sothatfailedelementsarequicklyremoved
fromthehandlespace(e.g.whenaPCisturnedoffwithoutderegisteringfromthepool).Dueto
itssimplehandlespacemanagement(at,i.e.withouthierarchy)andthepossibilityforthedefinition
ofspecialserverselectionpolicies,RSerPoolisalsoabletohandlelargepoolsofheterogeneous
resources.Forfurtherdetailsonthissubject,seeDreibholzandRathgeb(2005b),Dreibholz(2004d).

3.6.6MobilitySupportforSCTP
Asalreadyexplainedinsubsection2.4.3,theSCTPprotocolincombinationwithitsDynamicAd-
dressReconfigurationextension(Add-IP,seeRamalhoetal.(2006))providesthepossibilitytoadd
transportaddressestoandremovethemfromarunningassociation.Usingthisfeature,itispossible
foramobileendpointtoperformahandoverintoanewnetwork(MobileSCTP,seeRiegelandT¨uxen
(2006)).Duringhandover,thefollowingtwocasescanoccur:

30

CHAPTER3.RELIABLESERVERPOOLING

Figure3.10:SCTPMobilitywithRSerPool

BreakbeforeMake:Theconnectiontotheoldnetworkhasbeenbrokenbeforeanewnetworkis
reached.Usingaradioconnection,thiswouldmeanthatthecurrentnetworkgetsoutofreach
beforeanewnetworkcanbecontacted.

MakebeforeBreak:Anewnetworkcanalreadybecontactedwhiletheconnectiontothecurrent
networkisstillworking.Foraradioconnection,thiswouldmeanthatanewnetworkisalready
usablewhiletheoldoneisstillinreach.

Aslongasthereisalwaysatmostoneendpointinahandoverprocedureusingbreakbefore
make,Add-IPwillworkasexpected:theendpointbeingdisconnectedfromallnetworkscantell
theothersideitsnewaddressjustafterbeingconnectedtothenewnetwork.Butifbothendpoints
performabreakbeforemakehandoversimultaneously,neitheronecaninformitspeeraboutits
addresschange–noendpointknowsunderwhichnewaddressitspeerisreachable.
InDreibholz,JungmaierandT¨uxen(2003),wehaveproposedtoutilizeRSerPoolforthestorage
oftransportaddressesandvalidateditsapplicabilityusingourprototypeimplementation.Theresult
ofthisworkhasbeentheInternetDraftDreibholzandPulinthanath(2006).Anillustrationofthe
proposedmobilityapproachisshowninfigure3.10:atleastoneendpointhastoregisterasPEunder
aPHknowntoitspeer.AssoonasthePEstransportaddresseschange(i.e.incaseofahandover),
ithastoperformare-registrationthatpropagatestheaddresschangeintothehandlespace.Then,the
peerendpointcan–intheroleofaPU–askaPRtoresolvethePHintothenewtransportaddressof
thePEendpointandusethemechanismsofAdd-IPtocontinuetheassociation.

ScenariospplicationAOther3.6.7Finallyshortly:,thereDreibholzare(some2002)furtherdescribesapplicationane-commercescenariosforscenario,RSerPool,wherewhichane-shopwillonlyservicebeisprosummarizedvided
tousers.IfoneofthePEsprovidingtheshopservicefails,RSerPoolensuresafailovertoanother
one–furthermoretransparentlymakestheforfailothevercustomerhandling.TheftransparentailoverforthemechanismApplicationdescribedLayerinonthesubsubsecusertionside.3.9.5.2
SamtaniIn(Uyar,2003a,Zheng,b),theFeckusageo,ofSamtaniRSerPoolandforConradFuture(2003),CombatUyaretSystemsal.(2004(FCS)),inUyar,battlefieldZheng,netwFeckoorksandis

COMPONENTSRSERPOOLTHE3.7.

Figure3.11:TheBuildingBlocksofaRegistrar

31

ensuresuggested.aThatcontinuousis,mobileservice,anddespitenon-mobileofserversPEsbeingprovidetargetsmilitaryofdestruction.services.TheRSerPoolmechanisms

ComponentsoolRSerPThe3.7Inthissection,thefunctionalitiesandbehavioursoftheRSerPoolcomponentsPR,PEandPUare
described.Inparticular,theinteractionamongASAP,ENRPandthecomponentfunctionalitiesisin
thefocusofthissection.ASAPandENRPmessagesandthedetailedfunctionalitiesofbothprotocols
areexplainedinthesubsequentsections3.9(ASAPprotocol)and3.10(ENRPprotocol).

Registrar3.7.1ThefunctionalitiesofaPRare
1.Thetransmissionofannouncesforautomaticconfiguration,
2.Registration,re-registrationandderegistrationofPEs,
3.MonitoringtheavailabilityofPEsbykeep-alivemessages,
4.Handleresolution(i.e.PEselectionbypoolpolicy)forPUs,
5.HandlespaceconsistencyauditandsynchronizationamongthePRsoftheoperationscopeand
6.TakeoverofthePR-HfunctionalityforPEsofafailedPR.
ThebuildingblocksofaPRareillustratedinfigure3.11.Clearly,thelocalhandlespacecopyisthe
centralelementofaPR.UsingtheENRPprotocol(describedinsection3.10),thishandlespacecopy
iskeptinconsistencewiththeotherPRsoftheoperationscope.TheASAPprotocol(describedin
section3.9)isusedtocommunicatewithPEsandPUs.APRmanagementlayerabovebothprotocols
coordinatestheaccesstothehandlespacecopy.ThenecessaryPRmanagementfunctionalityofthis
layerisdescribedinthefollowing.

32

CHAPTER3.RELIABLESERVERPOOLING

Figure3.12:AutomaticConfigurationofASAP

Announces3.7.1.1ForASAPaPRtoannouncebemessagesautomaticallyviafoundUDPbyoverotherIPPRsmulticast.aswellasFigurePEs3.12andPUs,illustratesitcanthesendconceptoutbyENRPusingand
theeconfigurationxampleofbehaASAPviourforforthePRs,seeconfigurationsubsectionofPEs3.10.2and.IPPUs;formulticastdetailedisusedtoinformaallowtionaboutannouncestheacrossENRP
routers.Furthermore,arouterorswitchonlyforwardstheannouncesintosegmentswherestationsare
reallyinterestedintheirreception(byhavingjoinedintoacertainmulticastgroup).Thissavesboth,
bandwidthandprocessingpower.Intheexampleshown,thestationswithinthemulticastdomain
(PE#1andPE#2aswellasPU#1andPU#2)areautomaticallyinformedabouttheexistenceofthe
PR.donotIfallomulticastwismulticast),notpossiblethereisinalthewaysnetwtheorkorpossibilityincertaintopartsstaticallyofit(e.g.configureduetoPRsecurityaddressespoliciesintootherthat
PRscomponentaswellaswithoutPEsanandyPUs.furtherInthisconfigurationcase,ofeffort.course,Inittheisenotxampleanylongerscenario,possiblethisistothesimplycaseforturnPEon#3a
#3.PUand

ManagementoolP3.7.1.2AmaintaskofthePRistoregisterPEsintothehandlespace,re-registerthem(i.e.updatetheir
reordergistrationtoregisterainformation)PEintoandthefinallyhandlespace,deregisterthefollothemwingfromtheinformationhandlespaceisrequired:onrequestviaASAP.In

PoolHandle:ThePHofthepoolintowhichthePEdesirestoregister.

PEID:A32-bitrandomnumberchosenbythePEasitsidentificationnumberwithinthepool.

TransportAddresses:ThelistoftransportaddressesunderwhichthePEisreachableforitsappli-
ascation.theusedThistransportincludestheprotocolNetw(e.g.orkSCTPLayer,TCPprotocolorUDP)addressesandtheT(usuallyransportIPv4Layerand/orIPv6)protocolassportwell
port).SCTP(e.g.number

PolicyInformation:Adescriptionofthepoolspolicy(e.g.LeastUsedorRoundRobin)alongwith

COMPONENTSRSERPOOLTHE3.7.

33

thePEstateinformationnecessaryforthespecifiedpolicy.IfthepolicyisLeastUsedforex-
ample,thePEscurrentloadstatehastobegiven.Fordetailsonpoolpolicies,seesection3.11.
BeforethePEcanberegisteredintothepool,multipleverificationstepsarenecessary:
1.ItisnecessarytocheckwhetherthePEispermittedtoregister(authenticationandauthoriza-
tion).2.IthastobeverifiedwhichNetworkLayeraddressesofthePEsgiventransportaddressesare
actuallypartoftheSCTPassociationbetweenPRandPE.Duetothemulti-homingproperty
ofSCTP(seesubsubsection2.4.3.4),thereshouldbeaSCTPpathtoeachofthePEsprovided
NetworkLayeraddressesifitisreallyreachableunderthecorrespondingaddress.Dueto
wrongconfigurationsettings,aPEcoulde.g.specifyitslocalhostaddress(e.g.127.0.0.1for
IPv4or::1forIPv6)oraprivateaddress(e.g.192.168.x.y),whichisunreachableforthePR.
AllNetworkLayeraddressesnotbeingpartoftheSCTPassociationareuselessandtherefore
droppedfromthePEinformation.Iftherearenoremainingaddressesfromtheintersectionof
thesetsofgiventransportaddressesandaddressesbeingpartoftheSCTPassociation,thePEs
rejected.isrequestgistrationre3.Afteraddressverification,aPRhastocheckifthePEsrequestedpoolisalreadyexisting.Ifthis
isthecase,ithastobecheckedwhetherthePEsrequestedpolicysettings(policytypeandthe
informationrequiredbythepolicy)areconsistentwiththepoolssettings.Ifanincompatibility
isdetected,theregistrationrequestisdenied.Forexample,itisnotpossibletoregisteraPE
requestingRoundRobinselectionintoapoolusingtheLeastUsedpolicy.
4.AllPEsofapoolmustusethesameTransportLayerprotocolforthetransportofapplication
data(e.g.SCTPorTCP).IfthePEssettingsareincompatibletotheexistingpool,thePRrejects
theregistration.Forexample,aTCP-basedPEcannotberegisteredintoapoolofSCTP-based
PEs.5.Iftherequestedpoolisnotexisting,itiscreatedusingthepolicyrequestedbythePE.
Ifallstepshavebeenpassed,theregistrationisgrantedandthePEisfinallyaddedtothepool.
Uponsuccessfulregistrationintothehandlespace,theregisteringPRbecomesthePR-Hofthe
newlyregisteredPE.ByusinganupdatemessageviaENRP(tobeexplainedindetailinsubsec-
tion3.10.4),ittellsallotherPRsoftheoperationscopeaboutthenewregistration.ThesePRswill
alsoaddthenewPEintotheirlocalhandlespacecopies.
Are-registrationisequivalenttoaregistration.ThePEinformationinthehandlespaceisupdated
analogouslytoaregistration.Thisespeciallymeansthatagainaverificationoftheaddresses,pool
policyconsistencyandprotocolcompatibilityhastobemade.Theupdateis–analogouslytoa
registration–propagatedbythePR-HtotheotherPRsoftheoperationscopebyusinganupdate
.ENRPviamessageToperformaderegistration,itissufficientforthePEtospecifyitsPHandPEID.ThePR-H
distributestheinformationaboutthederegistrationtotheotherPRs,againbyusinganENRPupdate
message.

MonitoringoolP3.7.1.3AnotherimportanttaskofaPRisthemonitoringofallPEsforwhichithastheroleofaPR-H.
Thisfunctionalityisillustratedinformofamessagesequencediagraminfigure3.13:APEhasbeen

34

CHAPTER3.RELIABLESERVERPOOLING

Figure3.13:TheRegistrationandMonitoringofaPoolElement

registeredataPR(usinganASAPRegistrationmessage,thePRconfirmedthegrantedrequestby
anASAPRegistrationResponsemessage;tobeexplainedindetailinsubsubsection3.9.2.1).Dueto
successfulregistration,thePRbecamethePEsPR-H.ItpropagatestheregistrationtotheotherPRs
oftheoperationscopeusinganENRPUpdatemessage.
Furthermore,thePR-HstartstosendASAPEndpointKeep-AlivemessagestoitsownedPEs
inregularintervals(seesubsubsection3.9.2.2).APEisrequiredtoanswersuchamessagebyan
immediateASAPEndpointKeep-AliveAck,whichhastobereceivedbythePRwithinagiventime
interval.Uponreception,thePRassumesthatthePEisstillreachableandworkingproperly.Ifthe
timeoutforthereceptionofthekeep-aliveacknowledgementexpires,thePRremovesthePEfrom
itshandlespacecopy.Furthermore,ittellstheotherPRsoftheoperationscopethatthePEhasbeen
removed(usinganENRPUpdatemessage).
SincetheconnectionbetweenPRandPEisusingSCTP,andSCTPitselfalsomonitorsthereach-
abilityofitspeerendpointsbySCTPheartbeatmessages(seesubsubsection2.4.3.4),aPEfailure
maybedetectedalreadybytheTransportLayeritself.Inthiscaseofcourse,thePEcanbeim-
mediatelyremovedfromthehandlespacewithoutusingtheASAPEndpointKeep-Alivemechanism.
Thisdoublesafeguarding–onTransportLayerbySCTPandonApplicationLayerbyASAP–has
thefollowingreason:TheSCTPprotocolisusuallyimplementedinthekernelofthePEsoperating
system,butASAPisassumedtoberealizedaspartoftheapplication(e.g.asasharedlibrary)in
userspace.WhiletheapplicationcouldbecomeunusableduetoaDenialofServiceattackorsimply
aprogrammingbugorconfigurationmistakes,thekernelsSCTPimplementationmaystillcorrectly
answerallSCTPheartbeatmessages–althoughtheapplicationitselfhasstoppedworking.Dueto
theApplicationLayerkeep-alivesofASAP,theapplicationitselfisforcedtorespond.Thisleadstoa
maximumoperationalsafetyforthePEslistedinthehandlespace.

COMPONENTSRSERPOOLTHE3.7.

35

3.7.1.4ServerSelectionandFailureReporting
ForPUs,aPRprovidestheresolutionofaPHintoalistofPEidentitiesselectedfromthehandlespace
accordingtotheselectionpolicyofthepool.Adetaileddescriptionofcertainpoolpoliciesisgiven
.3.11sectioninThemonitoringofPEsbytheirPR-HandtheverificationofthePEaddressesbytheirSCTP
associationtothePR-Hensureahighdegreeofreliabilityfortheprovidedhandlespaceinformation.
Nevertheless,itmaybepossiblethataPUcannotreachaPEanymore.Inthiscase,thePUshould
informitsPRabouttheunreachabilityofthisPE(usinganASAPEndpointUnreachablemessage,
tobeexplainedindetailinsubsection3.9.3).ThePRkeepsthenumberofunreachabilityreportsfor
eachPEinitshandlespacemanagement.IfthethresholdMAX-BAD-PE-REPORTisreached,thePE
isremovedfromthehandlespace.Thedefaultsettingforthisthresholdis3(seesection4.2ofXie
etal.(2006)).IfthePUsPRisalsoPR-HoftheconcerningPE,itcouldimmediatelysendanASAP
EndpointKeep-Alivemessagetoverifyitsreachability.

3.7.1.5HandlespaceAuditandHandlingofPRFailures
AfurthertaskofaPRistheauditofthehandlespaceconsistencyamongthePRsoftheoperation
scope,aswellashandlespacesynchronizationtoresolveinconsistencies.Furthermore,incaseof
aPRfailure,theremainingPRshavetonegotiatewhichPRtakesoverthePR-Hfunctionalityfor
thePEsofthefailedPR.BothfunctionalitiesarestronglyrelatedtoENRP;therefore,theyareex-
plainedinthedescriptionofENRPinsubsection3.10.5(handlespaceauditandsynchronization)and
subsection3.10.6(takeoverprocedure).

ElementoolP3.7.2Nexttoitsmaintask,whichisclearlytheprovisionofitsapplication,aPEhastoperformregistration,
re-registrationandfinallyderegistration.Furthermore,ithastoanswerASAPEndpointKeep-Alives
fromitsPR-H(seesubsubsection3.7.1.3)toconfirmitsavailability.
BeforeaPEcanregisterataPR,ithastochooseaPEID.Thatis,agood,non-zero,pseudo-
randomnumberof32bitshastobecomputedaccordingtotheguidelinesdefinedinEastlakeetal.
(1994)andsection4.2.1ofXieetal.(2006).Furthermore,aconnectiontoaPRhastobeestablished.
TheaddressesofpossiblePRsmaybelearneddynamicallybythePRsannouncemessagesviaIP
multicast(seesubsubsection3.7.1.1)orcanalsobestaticallyconfigured.Afterthat,thePEcan
registerintoitsdesiredpool(givenbyitsPH)underitsPEID,withalistoftransportaddresses
(includingNetworkLayeraddresses,transportprotocolandport)andpolicyinformation.
Incaseofsuccessfulregistration,thePEhastorenewitsregistrationregularlybyperformingare-
registration.Usingare-registration,itisalsopossibletorenewthePEslistoftransportaddressesor
policyinformation(e.g.toupdatethePEsloadstateincaseoftheLeastUsedpolicy).Furthermore,
thePEisrequiredtoanswerASAPKeep-AlivemessagesbyASAPKeep-AliveAcks.
ToprotectaRSerPoolsystemagainstPRfailures,adoublesafeguardingisapplied:
1.IfthePR-Hfails,thePEcansimplyuseanotherPRforre-registration.Uponre-registrationat
anotherPR,thisnewPRautomaticallybecomesthePEsnewPR-H.SincethePEIDdoesnot
changeonre-registration,thenewPRknowsthattheregistrationrequesthastobehandledas
anupdate(i.e.thePR-Hhaschanged).
2.IfaPRfailureisnotimmediatelydetectedbythePE,theENRPprotocolprovidesfeaturesto
negotiatethetakeoverofthePR-HfunctionalitybyotherPRsoftheoperationscope(detailsof

36

CHAPTER3.RELIABLESERVERPOOLING

Figure3.14:TheServerSelectionbyRegistrarandPoolUser

thetakeoverprocedurewillbeexplainedinsubsection3.10.6).Inthiscase,thenewPR-Htells
thePEthatithasbecomeitsPR-HusingaspecialASAPEndpointKeep-Alivemessage(see
subsubsection3.9.2.2fordetails).ThePEthenhastoadoptthePRasitsPR-H.

AnotherimportanttaskofaPEistosupporttheresumptionofsessionsstartedbyanotherPE,
aspartofthefailoverprocedureofthePU.Thatis,thenewPEhastofindoutthesessionstateof
theoldPEandseamlesslycontinuetheapplicationsession.Therealizationofsuchmechanismsand
theirsupportbyRSerPoolarestronglyrelatedtotheASAPprotocol.Therefore,theyareintroduced
indetailaspartoftheASAPdescriptioninsubsection3.9.5.

UseroolP3.7.3Inlistoforderpolictousey-selectedtheservicePEofidentitiesapool,fromaPUaPR.firsthasAnalogousltorequestytoathePE,itresolutionisofnecessarythetopoolsconnectPHintotoana
arbitraryPRoftheoperationscope.ThePRaddressesareeitherdynamicallylearnedbymulticast
announces(seesubsubsection3.7.1.1)orstaticallyconfigured.AfterhandleresolutionatthePR,
thePUwritesthereceivedlistofselectedPEidentitiesintoitslocalcache(thePU-sidecache).An
illustrationofthisprocessisgiveninfigure3.14.Thecacheisalocal,temporary,partialcopyofthe
handlespace.Itsentriesexpire,ifnotupdatedbytheresultsofasubsequenthandleresolutionata
PR,afteraconfiguredtimeout.ThistimeoutiscalledStaleCacheValue.Dependingontheaccuracy
loadrequirementsinformationofforapplicationtheLeastandUsedpoolpolicpolicyy),forthethestalecachedcachevaluehandlespacecanbedata(e.g.configuredhavingsmallerup-to-date(more
PR).accurate,AstalebutcachemorevhandlealueofzeroresolutionsspecifiesataPR)thattorhehighercaches(lesscontentaccurate,mayfeonlywerbehandleusedforresolutionsthecurrentata
aservPR.erToselectionfinallyandconnectistodeletedaaPE,afterwsingleards.PEThatis,identityeachisservselectederselectoutionofthemandatorilycache,againrequiresaccordingtoqueryto

3.8.THEPROTOCOLDESIGN

StructureMessageThe3.15:Figure

Figure3.16:TheParameterStructure

37

thepoolspolicy.AtransportconnectiontothisselectedPEcanthenbeestablished,overwhichthe
PEsservicecanbeusedwiththeapplicationprotocol.
IftheselectedPEisunreachableoritfailsduringtheusageofitsservice,thePUshouldnotify
itsPRaboutthefailureusinganASAPEndpointUnreachablemessage(detailswillbeexplainedin
subsection3.9.3).Afterthat,anewPEhastobeselected.Dependingonthesettingofthestale
cachevalue,anewPEmaybeimmediatelyselectedfromthecache.Inthiscase,theoverhead
(bandwidth,time)ofqueryingaPRisskipped.Otherwise,thecompleteprocedureofatfirstquerying
aPRisrepeatedasexplainedabove.AftersuccessfullyestablishingaconnectiontothenewPE,
thisPEhastoresumetheinterruptedsession.SincethefailoversupportmechanismsofRSerPoolare
stronglyrelatedtotheASAPprotocol,adetaileddepictionisprovidedaspartoftheASAPdescription
insubsection3.9.5.Uponsuccessfulexecutionoftheapplication-specificfailoverprocedure,the
resumed.issessionapplication

The3.8DesignotocolPr

EachASAPandENRPMessageisstructuredasshowninfigure3.15.Thisstructureisdefined
inStewart,Xie,StillmanandT¨uxen(2006b)andincludesthefollowingfields:
MessageType:Thetypenumberofthemessage.
MessageFlags:Type-specificmessageags.

38

CHAPTER3.RELIABLESERVERPOOLING

MessageLength:Thelengthofthecompletemessage,includingitsheader.
MessagezerosVtoalue:theneThextlargertype-specificsizebeingcontentaofmultipletheof4message.bytes.Ifnecessary,themessageispaddedwith
SinceASAPtheandcommonENRPheaderprotocols,thestructuredescription(type,agsofandthelength)protocolsisinidenticalthefolloforewingverysectionsmessage3.9typeandof3.10the
willnotexplicitlymentionthesefields.
ParTheametersofmessagevariablecontentlengthconsistsmayoffolloaw.Ptype-specificarametersarefieldsetstructuredhavingasashownconstantinfigurelength.3.16:Afterthat,
ParameterType:Thetypeoftheparameter.
ParameterLength:Thetotallengthoftheparameter,includingitsheader.
ParameterValue:Thetype-specificcontentoftheparameter.Ifnecessary,theparameterispadded
withzerostothenextlargersizebeingamultipleof4bytes.
Duetothethreefieldswhicharecommonforallparameters,theyarealsodenotedasTLV(type-
length-value)blocks.Thiscommonstructureallowsaneasyextensionoftheprotocol:itispossible
toaddnewparameterswithoutchangingeveryexistingimplementation.Aprotocolimplementation
confrontedwithunknownparametertypescansimplyskipthem.Theexactbehaviourontheprocess-
ingofunknownparametersisdefinedbythefirsttwobitsoftheparametertype:
00:Themessageincludingtheunknownparameterisignored,thesenderwillnotbenotified.
01:Themessageisignored,butthesenderwillgetanerrormessage.
10:Theunknownparameterissilentlyskipped,i.e.thesenderwillnotbeinformed.
11:Theunknownparameterisskipped,butthesenderwillbenotified.
forForENRP)thearenotificationdefined.onTheseskippedmessagesmessages,alsohaerrorvetomessagesbeusedif(ASAPnoErroranswerforisASAPgeneratedandinENRPresponseError
totheincludingreceithevedskippedmessage.parameter(s).Otherwise,SeetheStewresponseart,Xie,messageStillmancanandbeTe¨uxenxtended(2006bby)anforErrordetailsPonarameterthis
type.parameter

3.9TheAggregateServerAccessProtocol
Inthissection,theAggregateServerAccessProtocol(ASAP),definedinStewart,Xie,Stillmanand
T¨uxen(2006a,b),isexplained.First,anoverviewofASAPsfunctionalitiesisgiven.Thecommuni-
cationtofulfilthesetasksisexplainedindetailafterwards.

viewerOv3.9.1TheASAPprotocolisusedforthecommunicationbetweenPRandPEorPU,aswellasforthe
PR,PEcommunicationandPUseebetweensectionPU3.5.andPEASAPhas(SessionthefolloLayer).wingFortasks:aclassificationintotheprotocolstackof
Registration,re-registrationandderegistrationaswellasmonitoringofPEsbytheirPR-Hfor
PEs,

3.9.THEAGGREGATESERVERACCESSPROTOCOL

Figure3.17:TheASAPRegistrationMessage

Handleresolution(i.e.serverselectionbypoolpolicy)forPUs,
TransmissionofPRannouncesfortheautomaticconfigurationofPEsaswellasofPUsand
Thesupportoffailoverprocedures(SessionLayer)betweenPUsandPEs.
TheASAPcommunicationforthesetasksisdescribedinthefollowing.

39

FunctionalityElementoolP3.9.2Inthissection,thePEfunctionalitiesoftheASAPprotocol–registration,re-registration,monitoring
andderegistration–areexplained.

egistrationRerandRegistration3.9.2.1Inordertoregisterintoapoolortorenewitsregistrationbyperformingare-registration,aPEsends
anASAPRegistrationmessagetoanarbitraryPRoftheoperationscopeasdescribedinsubsubsec-
tion3.7.1.2.Figure3.17showsthestructureofthismessagetype.ItincludesthePHofthepoolto
registerintoinformofaPoolHandleParameterandallinformationaboutthePEinformofaPool
arameter.PElementThestructureofthePoolHandleParameterisdepictedinfigure3.18,itsimplyincludesthePHas
bytevector.Figure3.19showsthestructureofthePoolElementParameter.Itincludesthefollowing
entries:PoolElementIdentifier:ThisfieldrepresentsthePEIDofthePEtoberegistered.
HomeRegistrarIdentifier:Here,thePRIDofthePEsPR-Hisgiven.Ifitisunknown,0isspeci-
fied.RegistrationLife:TheRegistrationLifeparameterisaspecificationoftheregistrationlifetimein
milliseconds.Forexample,avalueof300,000msmeansthattheregistrationwillbevalidfor
atleast5minutes.ThisvalueisalsoameasureofthePEsreliabilityandinparticularusedto
calculatethefrequencyofre-registrations(tobeexplainedbelow).
UserTransportParameter:Thisisthedefinitionoftheapplicationstransportendpoint.Depending
ontheapplicationstransportprotocol,thisisaSCTPTransportParameterforSCTP,TCP
TransportParameterforTCPorUDPTransportParameterforUDP.Dependingontheprotocol,

40

Figure

Figure

3.18:

3.19:

Figure

The

The

3.20:

CHAPTER

Pool

Pool

The

Handle

Element

yPolic

3.

RELIABLE

arameterP

arameterP

arameterP

VERSER

POOLING

3.9.THEAGGREGATESERVERACCESSPROTOCOL

Figure3.21:TheASAPRegistrationResponseMessage

41

thisparameterincludesone(TCP,UDP)oratleastone(SCTP)NetworkLayeraddress(e.g.a
setofIPv4andIPv6AddressParameters)aswellastheTransportLayerprotocolsportnumber.
SeeStewart,Xie,StillmanandT¨uxen(2006b)forthedefinitionsoftheseparameters.
Furthermore,theSCTPandTCPTransportParametersallowtospecifyaTransportUse.The
connectionbetweenPUandPEmayeitherconsistofaDataChannelfortheapplicationprotocol
Dataonlyand(i.e.ControlwithoutChannelRSerPool(i.e.supportwithanforfailovASAP-baseder;TransportSessionUseLayeristo0x0000)supportorofafaailovmultipleer,toxbeed
explainedinsubsection3.9.5;TransportUseis0x0001).
ASAPTransportParameter:ThePEsASAPendpointtransportaddressisspecifiedhereinform
ofafteraaSCTPtakeoTvraernsportprocedureParameter(see.Thesubsection3.7.2specification)totellofthethisPEaddressthatitishasnecessarybecometoitsneallowwaPR-H.PR
Thetakeoverprocedurewillbeexplainedindetailinsubsection3.10.6.
MemberSelectionPolicyParameter:Inthisentry,therequestedpoolpolicyaswellasitspolicy-
specificparametersareprovidedinformofaPolicyParameter.ThestructureofthePolicy
Parameterisillustratedinfigure3.20;itincludesatleastthetypeofthepolicy(e.g.Round
RobinorLeastUsed).Therestoftheparameterispolicy-specificande.g.includesthePEs
loadstatefortheLeastUsedpolicy.Policiesandtheirparameterswillbeexplainedindetailin
.3.11sectionAftersendingaRegistrationmessagetoaPR,aPEwaitsatmosttheperiodoftimespecified
byStillmanthetimerandT¨T2-Reuxen(gistr2006aation))..IfThenodefaultresponsevaluefromforthethisPRtimerisisrecei30sved(seewithinsectionthe5.1timeoutinStewintervart,al,Xie,a
newPRhastobeselectedandtheregistrationtoberetried.
presentsThethePRsstructureansweroftothisaRemessagegistrationtype:itmessageincludesisatheRePEsgistrationPHinformResponseofaPoolmessage.HandlePFigurearameter3.21
of(seethefigurePEID3.18)only).andtheBit0PEofsPEtheIDinmessageformagsofaisPooldenotedElementastheIdentifierRejectPag.arameterIfitis(whichsetto0,consiststhe
registrationrequesthasbeengrantedandsuccessful;otherwise,ithasbeenrejected.Incaseofa
rejectedauthorization,registration,inconsistentanoptionalpolicysetErrortingsPorarameterinvalidmaytransportincludetheaddresses).reasonsSeeforStethewart,rejectionXie,(e.g.Stillmanno
andT¨uxen(2006b)fordetailsonthisparametertype.

42

CHAPTER3.RELIABLESERVERPOOLING

Figure3.22:ThePoolElementMonitoringbyitsHomeRegistrar

Aftersuccessfulregistration,thetimerT4-Reregistrationhastobestarted.Itsintervalisdefined
inLifesectionminus5.120ofSteseconds,wart,Xie,whicheverStillmanvalueandisT¨uxsmalleren(.2006aEach)timeandsetthistotimer10eminutesxpires,orthethereRegistrationgistrationis
repeatedasre-registrationandthetimerrestarted.

Monitoring3.9.2.2AfteraPEhasbeenregisteredintoapool,itsavailabilityismonitoredbyitsPR-HusingASAPKeep-
Alivemessages,asillustratedinfigure3.22:inregularintervals,anEndpointKeep-Aliveissentto
thePE.ThePEhastorespondusinganASAPEndpointKeep-AliveAckmessage.IfthePEfailsto
answerwithinacertaintimeout,thePR-Hremovesitfromthehandlespace.
TheEndpointKeep-Alivemessage(seefigureA.1foranillustration)includesthePRIDofthe
sendingPRandthePHofthePEspoolinformofaPoolHandleParameter(seefigure3.18).Bit0of
theagsfieldisdenotedasHomeag.IfitissetandthesendingPRdiffersfromthePEscurrentPR-
H,thePEshouldadoptthesendingPRasitsnewPR-H.Thisagisusedforthetakeoverprocedure
offailedPRs(seesubsection3.7.2;thetakeoverprocedureitselfisdescribedinsubsection3.10.6).
TheEndpointKeep-AliveAckmessage(seefigureA.2foranillustration)includesthePEsPEID
andthePHofthePEspoolinformofaPoolHandleParameter(seefigure3.18).

egistrationDer3.9.2.3Toderegisterfromitspool,aPEsendsanASAPDeregistrationmessagetoitsPR-H(seefigureA.4
foranillustrationofthismessagetype).ItincludesthePHofthePEspoolinformofaPoolHandle
Parameter(seefigure3.18)andthePEsPEIDinformofaPoolElementIdentifierParameter.
AftertransmissionofaDeregistrationmessage,aPEwaitsforatimeintervaldefinedbythetimer
forT3-Derthisegistrtimerisation30sfor(seethesectionreception5.1ofinanStewASAPart,Xie,DereStillmangistrationandTResponse¨uxen(2006amessage.)).IfThenodefaultresponsevalueof

3.9.THEAGGREGATESERVERACCESSPROTOCOL

Figure3.23:TheASAPHandleResolutionResponseMessage

43

thePRisreceivedwithinthetimeoutinterval,anewPRhastobeselectedandthederegistrationtobe
retried.includesThetheDerePHofgistrationthePEsResponsepoolinmessageformofa(seePoolfigureHandleA.5Pforanarameterillustration(seefigureofthis3.18)andmessagethePEtype)s
PEIDinformofaPoolElementIdentifierParameter.Furthermore,anoptionalErrorParameter
giisvetheincludedreasonsiftheforderethegistrationrejection.hasSeebeenStewart,rejected.Xie,InthisStillmancase,andtheT¨uxencontents(2006bof)theforErrordetailsPonarameterthis
type.parameter

FunctionalityUseroolP3.9.3Inthissection,thePUfunctionalitiesoftheASAPprotocol–handleresolutionandfailurereports–
xplained.eare

ResolutionHandle3.9.3.1ThemainfunctionalityofthePRprovidedtoPUsishandleresolutionincludingthePEselection
bypolicyasexplainedinsubsection3.7.3.Inordertorequestahandleresolution,aPUsendsan
figureASAPA.9HandleforanResolutionillustrationofmessagethetoanstructure)arbitrarysimplyPRofincludestheincludesoperationthescope.PHofThisthemessagerequestedtypepool(seein
formofaPoolHandleParameter(seefigure3.18).
AftersendingaHandleResolutionmessagetoaPR,aPUwaitsatmosttheperiodoftimespecified
bythetimerT1-ENRPRequest.Thedefaultvalueforthistimeris15s(seesection5.1inStewart,Xie,
StillmanandT¨uxen(2006a)).IfnoresponseofthePRisreceivedwithinthetimeoutinterval,anew
PRhastobeselectedandthehandleresolutiontoberetried.
ThePRsresponsetoahandleresolutionrequestisanASAPHandleResolutionResponsemes-
sage(seefigure3.23foranillustrationofthismessagetype).ItincludesthePHoftherequestedpool
informofaPoolHandleParameter(seefigure3.18).Furthermore,ifthehandleresolutionhasbeen
successfullyprocessed,theresponseincludesalistofselectedPEidentitiesinformofPoolElement

44

CHAPTER3.RELIABLESERVERPOOLING

Figure3.24:TheASAPServerAnnounceMessage

Parameters(seefigure3.19).Incaseofanerror,themessageincludesanErrorParameterdescribing
theproblem(e.g.missingauthorizationortherequestedpoolisnotexisting).FordetailsontheError
Parameter,seeStewart,Xie,StillmanandT¨uxen(2006b).

ReporteailurF3.9.3.2

IfaselectedPEisnotreachableorifitfailsduringtheusageofitsservice(ofcourse,itdependson
theapplicationhowafailureisdefined,seealsosubsection3.12.1fordetails),thePUshouldreport
thisfailuretothePR.ThemessagetypeforreportingaPEfailureistheASAPEndpointUnreachable
(seefigureA.10foranillustrationofthismessagetype).ItincludesthePHofthePEspoolinform
ofaPoolHandleParameter(seefigure3.18)andthePEsPEIDinformofaPoolElementIdentifier
Parameter.ThePRdoesnotsendanyresponsetoafailurereport.

utomaticA3.9.4FunctionalityConfiguration

AsemessagesxplainedsentinoverUDPsubsubsectionviaIP3.7.1.1multicast.,aPRThecantransmissionannounceitsinteervalxistenceoftheusingServerASAPAnnounceServerAnnouncemessages
isdefinedbythetimerT6-Serverannounce.Itsdefaultvalueisonesecond(seesection5.1inStewart,
Xie,StillmanandT¨uxen(2006a)).Figure3.24showsthestructureoftheServerAnnouncemessage
oftype:TitransportincludesPthearametersPRIDdefiningoftheundersendingwhichPR.transportOptionally,addressestheServtheerPRisAnnouncereachable.canAlloincludewedaareset
SCTPandTCPTransportParameters.IfnoTransportParameterisspecified,thePRisreachable
andunderthetheSCTPsourceportNetwnumberorkLayerequaltoaddresstheUDPofthesourceannounceportnumbermessage,oftheusingServSCTPerasAnnouncetransportmessage.protocol
BylisteningtoServerAnnouncemessagesandnotingtheobtainedPRinformationintoitsServer
Table,aPEorPUcandynamicallylearntheidentitiesofthecurrentlyavailablePRs.Ifnonew
ServerAnnounceisreceivedforanentryoftheServerTablewithinanintervalgivenbythetimer
T7-ENRPoutdate,thePRentryisremoved.Thedefaultvalueforthistimeris5s(seesection5.1
inStewart,Xie,StillmanandT¨uxen(2006a)).

3.9.THEAGGREGATESERVERACCESSPROTOCOL

45

FunctionalityerLaySession3.9.5InthissectiontheASAPSessionLayerisexplained.First,theconceptoftheDataandControl
Channeldescribed:iscookiesintroduced.andbAfterusinessthat,cards.thetwoSessionLayermechanismstosupportsessionfailoversare

3.9.5.1DataChannelandControlChannel
IfaPUhasdetectedafailureofitscurrentlyusedPEandthereforehasselectedanewPEandsuc-
cessfullyestablishedanewconnection,itisnecessarythatthesessionstatebetweenthePUandthe
failedPEisre-instantiatedonthenewserver.Thefailovermechanismtoactuallyresumethesession
isapplication-specificandthereforecannotbeadirectpartofRSerPool.Nevertheless,RSerPoolpro-
videsapowerfulsupportforrealizingarbitraryfailoverschemes.Thissupportfunctionalityisdefined
inConradandLei(2005a):theASAPSessionLayerbasedontheDataChannel/ControlChannel
concept.AsalreadyexplainedintheintroductiontotheRSerPoolprotocolstackinsection3.5,theprotocol
usedfortheconnectionbetweenPUandPEistheapplicationsprotocol(e.g.HTTPforaccesstoa
webserverPE).ThiscommunicationchannelisdenotedastheDataChannel;allmessagestrans-
mittedoverthischannelcompletelybelongtotheapplicationitselfandareoutofRSerPoolsscope.
Clearly,toprovideanyusefulservice,aDataChannelismandatory.
Optionally,ASAPoffersthepossibilityforanASAPcommunicationchannelbetweenPUand
PE.ThiscommunicationchanneliscompletelycontrolledbytheASAPSessionLayeranddenoted
astheControlChannel.Itprovidesthefollowingfunctionalities:
1.Cookiesforthefailoversupportbyclient-basedstatesharing(tobedescribedinthesubsequent
subsubsection3.9.5.2),aswellas
2.BusinessCardsforthesignallingofspecificPEsforfailoverandthesignallingofsymmetric
PE/PEcommunications(tobedescribedinthesubsequentsubsubsection3.9.5.3).
IfaControlChannelisused,bothDataandControlChannelaremultiplexedoverasinglecon-
nectionbetweenPUandPE.ThemanagementofthisconnectionbehoovestheASAPSessionLayer,
whichisresponsiblefortheestablishmentandteardownoftheconnectionandthedetectionoffailures
andbreaks.TheApplicationLayerofthePUthenestablishesaSessiontoacertainpool(givenby
itsPH)insteadofperformingahandleresolutionandconnectingtoaselectedPE.ASAPsSession
Layertakescareofallrelatedtasks:
HandleresolutionandselectionofaPE,
EstablishmentofaconnectiontotheselectedPE,
Monitoringtheconnectiontodetectfailuresandbreaks,
SelectinganewPEandestablishinganewconnectionifthecurrentPEfailsand
Invokingtheapplication-specificfailoverprocedurebynotifyingtheApplicationLayerifnec-
.essaryTheconnectionbetweenPUandPEisusuallyrealizedbyanSCTPassociation(seesection3.5).
ThemultiplexingofDataandControlchannelisrealizedusingdifferentPPIDs(i.e.11forASAPand
adifferentonefortheapplicationprotocol).ThereasonformultiplexingbydifferentPPIDsinstead
ofusingmultipleSCTPstreamnumbersisexplainedinthesubsequentsubsubsection3.9.5.2.

46

CHAPTER3.RELIABLESERVERPOOLING

CookiesASAP3.9.5.2Inmanyapplicationscenarios,afailovermechanismdescribedinDreibholz(2002)canbeused:the
Client-basedStateSharing.Theprincipleofthismechanismisdepictedinfigure3.25.Duringusage
ofthePEsservice,thePEregularlysendsitscurrentsessionstateinformofaStateCookietothePU.
Thismayberealizedonaregularbasis,onimportantchangesoroneverystateupdate.ForthePU,
astatecookiecanbeviewedasanarbitrarybytevector.However,itisalsosuggestedinDreibholz
(2002)thatforefficiencyreasonsthePUcouldbeallowedtoreadcertainpartsofthecookieoreven
beallowedtomodifythem.CryptographicmechanismsatthePEsidecanensurethatconfidential
informationinpartsofthestatecookiecannotbereadbythePU(byusingencryption)orbealtered
(byusingadigitalsignature).Inthesimplestcase,allPEsofapoolcanuseasharedsecretkeyfor
encryption.andauthenticationThePUisonlyrequiredtostorethelateststatecookie(i.e.themostup-to-dateserverstate).In
caseofasessionfailover,thisstoredcookiewillbesenttothenewPE.Then,thenewPEcanverify
thestatecookiebycheckingitssignature,decryptanyencryptedpartsandfinallyre-instantiatethe
sessionstate.Afterthat,thesessioncanbecontinued.
SincethePUsonlytasksaretostorethelateststatecookieandtosendthiscookietoanewPE
uponfailover,therealizationofthedescribedfailovermechanismisquitesimple.Itisfurthermore
easilyscalablewiththenumberofPUs.ThemainlimitationofitsapplicabilityisthatthePUmay
havethepossibilitytorestoreolderstatesbyusinganout-of-datecookie.Thismaybeasecurity
threatforcertainapplications(seeDreibholz(2002)fordetails).
Duetothesimplicityanduniversalapplicabilityofclient-basedstatesharing,ithasbeenpossible
tocontributeitintotheASAPprotocolstandarddocumentsStewart,Xie,StillmanandT¨uxen(2006a),
ConradandLei(2005a),inthewaythatASAPsupportsthetransmissionofstatecookiesoverthe
ControlChannel.Themessagesequenceisillustratedinfigure3.26:thePEsendsitscurrentstate
viatheControlChanneltothePUusinganASAPCookiemessage(seefigureA.12foranillustration
ofthismessagetype).ItincludesthestatecookieitselfinformofaCookieParameter.TheCookie
Parametersimplyconsistsofthecookieinformofabytevector(seeStewart,Xie,StillmanandT¨uxen
(2006b)forthecompletedefinitionandfigureA.14foranillustrationofthisparametertype).
TheASAPSessionLayeronthePUsidememorizesthelatestcookie.IfPE#1fails,thecur-
rentlystoredcookieissenttothenewlyselectedPE#2usinganASAPCookieEchomessage(see
figureA.13foranillustrationofthismessagetype)asfirstmessageofthecommunication.The
formatofthismessagetypeisequaltotheCookiemessage.AfterthePE#2hasre-instantiatedthe
storedsessionstate,theapplicationiscontinuedandthenewPEagainstartssendingCookiemessages
states.sessionup-to-datetheincludingThePE-sideapplicationdecideswhenthereareappropriatepointstosendaCookiemessage.
Thatis,theCookiemessagesaretightlylinkedtothecommunicationovertheDataChannel.Fora
distributedcomputingapplicationasdescribedinsubsection3.6.5,apartialresultcouldbereturned
whichisfollowedbyacookietodefineacheckpointforapossiblesessionresumption.Clearly,in
thiscaseitisexpectedthatthePUreceivesthepartialresultbeforethecookie,sincetheobtained
partialresultwillnotbere-generatedbyanewPEifusingthesentcookieforfailover.Thisimplies
thenecessitytoensurethemessagesequenceoftheDataandControlChannelcommunication.It
isthereforenotpossibletouseanownSCTPstreamforeachchannel.Inthiscase,overtakingof
messageswouldbepossible(asthisisactuallythefeatureofnohead-of-lineblocking).Butsince
twodifferentprotocolsaremultiplexedoverthesameassociation,itissimplypossibletosetdifferent
PPIDsforthemessagesofeachprotocol(thestandardPPIDforASAPis11).Inthiscase,themessage
ensured.isorder

3.9.

THE

TEGGREGAA

VERSER

Figure

Figure

3.25:

3.26:

A

CCESSA

The

TOPROCOL

Concept

Session

of

ervailoF

Client-Based

using

State

Client-Based

Sharing

State

Sharing

47

48

CHAPTER3.RELIABLESERVERPOOLING

Figure3.27:TheASAPBusinessCardMessage

Usingclient-basedstatesharing,acompletelyautomaticfailoverbytheASAPSessionLayer
ispool.possible.TheFormappingthetoPUasPEandApplicationfailoversLayerusing,itisstateonlycookiesnecessarymaybetoestablishcompletelyasessiontransparenttoa3toservtheer
.LayerApplication

CardsBusinessASAP3.9.5.3Dependingontheapplication,theremaybeconstraintsrestrictingthesetofPEsusableforfailover.
TheASAPBusinessCardmessagesentviatheControlChannelisusedtoinformpeercomponents
constraints.suchabout

PEsRestrictedinthepoolPESetsmayfbeorFusedailoforverfailovTheer.Ffirstorecasexample,touseinaalargeBusinesspool,CardeachisPEifcanonlyashareitsrestrictedcompletesetof
setofsessionstateswithafewotherPEsonly.Thiskeepsthesystemscalable,i.e.aPEinapoolof
nhasservtotellersitsdoesPUnotthehasetveoftoPEsynchronizeidentitiesallbeingsessioncandidatesstatesforwithaftheailoverotherusingn-1anPEs.ASAPInthisBusinesscase,aCardPE
messageovertheControlChannel.
Figure3.27presentsthestructureofthismessagetype:itincludesthePHofthePEspoolinform
ofaPoolHandleParameter(seefigure3.18)andthelistoffailovercandidatesasasequenceofPE
identitiesintheformofPoolElementParameters(seefigure3.19).
APEmayupdatethelistofpossiblefailovercandidatesatanytimebysendinganotherBusiness
Card.ThePUhastostorethelatestlistoffailovercandidates.Ofcourse,ifafailoverbecomes
necessary,thePUhastoselectfromthislistusingtheappropriatepoolpolicy–insteadofperforming
theregularPEselectionbyhandleresolutionataPR.Therefore,someliteraturealsodenotesthe
BusinessCardbythemoreexpressivetermLastWill.

SymmetricCommunicationInparticularforthetelecommunicationssignallingexamplepresented
insubsection3.6.1usingseparateMGCandGK,aPUusingtheserviceofapoolBmayitselfbea
PEofpoolA.Inthiscase,aPEofpoolBmayperform–intheroleofaPUusingtheserviceofpool
3Ofcourse,theApplicationLayercanrequesttobenotifiedonfailovers.

3.10.THEENDPOINTHANDLESPACEREDUNDANCYPROTOCOL

49

A–afailovertoanotherPEofpoolA.Inthedescribedtelecommunicationsexample,thisschemeis
usedbetweentheMGCpoolandtheGKpool.
Insuchsymmetricscenarios,aPUhastotellitsPEthatitisitselfaPEofanotherpool.Thisis
therealizedPHofbyitssendingpool.anOptionallyASAP,alsoBusinessspecificCardPEmessageidentitiestotheforPEfailoovvererthemaybeControlprovided.Channel,Theproformatviding
mayremains–nothewinsametheasroleeofxplainedaPU–inusethethepreproviousvidedparagraph.PHforaIfthehandlePEdetectsresolutionaftoailurefindaofneitswPU,PEortheusePE
theprovidedPEidentitiestoselectone.Afterthat,itcanperformafailovertothatPE.

3.10TheEndpointHandlespaceRedundancyProtocol
Inthissection,theEndpointHandlespaceRedundancyProtocol(ENRP)definedinXieetal.(2006),
Stewart,Xie,StillmanandT¨uxen(2006b)isdescribed.First,anoverviewofitsfunctionalitiesis
given.Afterthat,thesefunctionalitiesareexplainedindetail.

viewerOv3.10.1TheclassificationENRPintoprotocoltheisusedprotocoltostacksynchronizeofaPRtheseesectionhandlespace3.5.amongThefolloPRsofwinganitemsoperationdescribescope.ENRPForsa
tasks:AutomaticconfigurationofaPR,
PR,aofInitializationPropagatingPEregistrationchanges(registration,re-registration,deregistration)tootherPRs
bythePR-HofthePE,
AuditingthehandlespaceconsistencyamongPRsandre-synchronizingtheirviewsifnecessary,
andOrderlytakingoverthePEsoffailedPRs.
TheENRPcommunicationforthesetasksisdescribedinthefollowing.

ConfigurationutomaticA3.10.2PRsneedtoknowthelistofallPRscurrentlyactiveintheoperationscope.Toobtainthislist,
whichisdenotedasPeerTable,ENRPprovidesthreekindsofconfigurationmechanisms:static
configurationbytheadministrator,announcesviamulticastandlearningfrompeerPRs.Sinceall
mechanismsarebasedontheENRPPresencemessage,thismessagetypeisexplainedfirst.After
that,theconfigurationmechanismsthemselvesareintroduced.

3.10.2.1TheENRPPresenceMessage
TheautomaticconfigurationofPRsisstronglybasedonthetransmissionofENRPPresencemessages.
Figure3.28showsthestructureofthismessagetype.Itincludesthefollowinginformation:
SenderRegistrarID:ThisisthePRIDofthesendingPR.

50

Figure

Figure

3.28:

3.29:

The

The

ENRP

erServ

CHAPTER

Peer

3.

Presence

Information

RELIABLE

Message

arameterP

VERSER

POOLING

3.10.THEENDPOINTHANDLESPACEREDUNDANCYPROTOCOL

Figure3.30:ThePeerTablesofRegistrars

51

ReceiverRegistrarID:ThePRIDofthemessagesdestinationPRisgiveninthisfield.Ifthe
messageisusedformulticastannounces,thisIDissetto0(i.e.thereceiverisnotspecified).
ChecksumParameter:TheChecksumParameter(seeStewart,Xie,StillmanandT¨uxen(2006b)for
itsdefinitionandfigureA.11foranillustrationofthisparametertype)includesachecksumof
thePEentriesforwhichthePRisthePR-H.Itisusedforthehandlespaceconsistencyaudit
beingexplainedinsubsection3.10.5.
ServerInformationParameter:ThisoptionalparameterincludesaServerInformationParameter
describingtheENRPtransportendpointoftheparticularPR.ThestructureoftheServerInfor-
mationParameterisillustratedinfigure3.29.ItincludesthePRsID,aswellasadescription
ofthetransportendpointusinganSCTPTransportParameter(seeStewart,Xie,Stillmanand
T¨uxen(2006b)foritsdefinition).Thatis,itspecifiesunderwhichNetwork-Layeraddresses
andSCTPportnumberthePRisreachableforENRPcommunication.TheM-bit(Multicast
bit)denotesthatthePRusesENRPoverUDPviaIPmulticastforitsENRPcommunication.
SincethispossibilityisunderdiscussionforremovalbytheIETFRSerPoolWG,itisnotfurther
explainedhere.SeeXieetal.(2006)fordetails.
TheR-bitintheagsfield(ResponseRequiredag)signalizesthatthesenderofthePresence
messagerequiresthereceiversimmediateresponseinformofanownPresencemessage.Usinga
messagewiththisagset,aPRcanverifythatanotherPRisstillalive.

3.10.2.2DynamicandStaticPeerTableConfiguration
AnalogouslytotheASAPServerAnnouncemessagessentviaIPmulticast(seesubsection3.9.4),a
PRcanannounceitsavailabilitybysendingPresencemessagesoverUDPviaIPmulticast.Inthis
case,theReplyRequiredagisneverset,andthereceivedPRIDisunspecified(i.e.setto0).
AsforPEsandPUs,itisofcoursealsopossibletostaticallyconfigurePRaddressesintothe
PeerTable.Figure3.30illustratesthecontentofaPRsPeerTable:PR#1toPR#4arelocated
withindomainthe(thesemulticastentriesdominaintheandPeerareTableabletoaredenotedautomaticallyasdynamicdetect).theeSincexistencePR#5ofiseachlocatedotherPRoutsideintheof

52

CHAPTER3.RELIABLESERVERPOOLING

themulticastdomain,ithasastaticallyconfiguredentryforPR#4(theentryisdenotedasstatic).On
theentryotherforahand,deadPRpeer#5PRalso#?.hasaThisstaticPRhasentryneforverPRbeen#4.reached,Furthermore,thereforebothonlyPeerTitsablestransportincludeaaddressesstatic
areknown,butnotitsPRID.
uringPRall#5PRonlyidentitieshasastaticintoeventryeryforPRPRoutside#4ofoftheamulticastmulticastdomain,domain.aInthirdordertomethodavoidforstaticallyconfigurationconfig-is
providedbyENRP.ThismethodwillbeexplainedafterintroducingtheENRPconnectionsetupand
maintenance.

3.10.2.3MaintainingConnectionstoPeerRegistrars
4InencearegularmessageintervtoevalerygivenotherbyPRtheinitsconstantPeerTable,PEER-HEARoveranTBEASCTPT-CYCLEassociation,atoPRthetriestocorrespondingsendaPres-PR.
Ifitisnotalreadyexisting,itistriedtoestablishanassociation.TheReplyRequiredagofthe
Presencemessageisnotset.Ontheotherhand,thepeerPRwillalsosendPresencemessagesover
theSCTPconnection.Bythat,itisensuredontheApplicationLayerthateachpeerPRisactive.The
elapsedtimesincethelastPresencemessagereceivedfroma5peeriskeptaspartofthecorresponding
peerPRsentryinthePeerTableanddenotedasLastHeard.
IfnoPresencemessageisreceivedfromapeerwithinthetimeoutMAX-TIME-LAST-HEARD
(seesection3.9.3ofXieetal.(2006);thedefaultvalueis61sasdefinedinsection4.1ofXieetal.
(reply2006)),aimmediatelyPresence.IfagmessageainnohavingresponsetheisReplyreceivedRequiredwithintheagsettimeoutissent.ThisforcesMAX-TIME-NO-RESPONSEthepeerto
((see2006)),sectionthepeer3.9.3ofmustXiebeetal.considered(2006);asthedeaddefandaultvaluethereforeis5sbeasremovdefineded.inFurtsectionhermore,4.1oftheXietakeoetveral.
procedurefunctionality(seeforthesubsectionPEsowned3.10.6)byisthefstarted,ailedinPR.ordertonegotiatewhichotherPRtakesoverthePR-H

3.10.2.4ObtainingthePeerTablefromPeerRegistrars
BesidethepossibilitiestodynamicallyconfigurethePeerTablebymulticastannouncesandstatically
bymanuallyinsertingentries,ENRPprovidesathirdwayforitsconfiguration:theacquisitionofPeer
TableentriesfrompeerPRs.AfteraSCTPassociationtoapeerPRhasbeensuccessfullyestablished
asdescribedintheprevioussubsubsection3.10.2.3,thepeerPRsPeerTablecanberequestedusing
anENRPPeerListRequestmessage.Figure3.31showsthestructureofthismessagetype.Itonly
includesthePRIDsofthesendingandthereceivingPRs.
UponreceptionofthePeerListRequest,thepeerPRrespondsusinganENRPPeerListResponse
message.Theformatofthismessagetypeisshowninfigure3.32;itincludesthePRIDsofsending
andreceivingPRs,aswellastherequestedPeerTableinformofaServerInformationParameter(see
figure3.29;thecontentsofthisparameterhavebeenexplainedinsubsubsection3.10.2.1).Entries
oftheresponsethatdonotalreadyexistinthereceiversPeerTablehavetobeadded.Afterthat,
connectionstothenewentriescanbeestablishedasdescribedinsubsubsection3.10.2.3.
Intheexampleprovidedinfigure3.30,PR#5haslearnedtheentriesforPR#1toPR#3by
askingPR#4(itsidentityisknownduetostaticconfiguration).TheentriesreceivedbythePeerList
ResponseofPR#4aremarkedwithfromPeer.
45OfSeecourse,sectionfrom3.9.2anofXieimplementetal.(ers2006);perspectithedefve,aultonlyvaluetheistime30sasstampofdefinedtheinlastsectionPresence4.1ofmessageXieetisal.(stored2006).intothetable.
TheLastHeardvaluecanbecalculatedasthedifferencebetweenthecurrenttimestampandthestoredone.

3.10.

THE

ENDPOINT

CEAHANDLESP

Figure

Figure

3.31:

3.32:

The

The

ANCYREDUND

ENRP

ENRP

Peer

Peer

List

List

OPROCOLT

Request

Message

Response

Message

53

54

3.10.3

CHAPTER3.RELIABLESERVERPOOLING

Figure3.33:TheENRPHandleTableRequestMessage

Figure3.34:TheENRPHandleTableResponseMessage

InitializationRegistrar

WhenaPRisstartedup,itfirsthastoselectitsPRID.Thisisanon-zero,32-bitrandomnumber.
Uniquenessintheoperationscopehasnottobeenforced(seesection3.2.1ofXieetal.(2006)).
Afterthat,thePRhastoobtainthelistofotherPRsintheoperationscopeusingthethreemethods
explainedinsubsection3.10.2(i.e.staticallyconfigured,learnedviamulticastannouncesandobtained
peers).fromIfatleastoneotherPRhasbeenfoundandaSCTPassociationtoithasbeensuccessfullyestab-
lished,asocalledMentorPRischosenfromthelistofavailablePRs(e.g.byrandomlyselectingan
entry).TheMentorPRisusedforobtainingthecompletehandlespace.Forthispurpose,thehand-
lespaceisrequestedusinganENRPHandleTableRequestmessage.Thestructureofthismessage
typeispresentedinfigure3.33,itincludesthePRIDsofsendingandreceivingPRs.TheW-bitinthe
agsfielddenotestheOwnChildrenOnlyag.Ifthisagisset,onlythePEentriesforwhichthe

3.10.THEENDPOINTHANDLESPACEREDUNDANCYPROTOCOL

Figure3.35:TheENRPHandleUpdateMessage

55

receivingPRisthePR-Harerequested.Thisagisusedforthehandlespacesynchronizationincase
ofaninconsistency(tobeexplainedinsubsection3.10.5);sincethecompletehandlespaceisrequired
fromtheMentorPR,itisthereforenotset.
UponreceptionoftheHandleTableRequest,theMentorPRrespondsusinganENRPHandle
TableResponsemessage.Theformatofthismessagetypeispresentedinfigure3.32,itincludesthe
PRIDsofsendingandreceivingPRs.Optionally,themessageincludeshandlespacedataassequence
ofPHs(informofPoolHandleParameters,seefigure3.18)andPEidentities–belongingtothe
correspondingpooldenotedbythePH–informofPoolElementParameters(seefigure3.19).See
subsubsection3.9.2.1foradetaileddescriptionofthesetwoparametertypes.IftheR-bitintheags
field–denotedasRejectag–isset,theHandleTableRequesthasbeenrejected.Thiscouldbethe
caseifthechosenMentorPRisalsoinaninitializationphase.Thatis,anotherPRhastobeselected
asMentorPRandthehandlespaceacquisitiontoberepeated.TheM-bitintheagsfieldisdenoted
asMoreag.Ifitisset,theresponsedoesnotincludethecompletehandlespacedatabutonlya
partialresult(e.g.duetothelimitedsizeofamessage).AsubsequentHandleTableRequestwill
returnthenextblock(i.e.theMentorPRhastoholdastate,rememberingwheretocontinue).Onthe
otherhand,iftheMoreagisnotset,obtainingthehandlespacehasbeenaccomplished.
AssoonasthecompletehandlespacehasbeentransmittedfromtheMentorPR,thePRisready
andcanstartannouncingitsavailabilitytoPUsandPEsbysendingASAPServerAnnouncemessages
(seesubsection3.9.4).Ifithasnotbeenpossibletoobtainthehandlespacewithinaconfigured
timeout,thePRassumestocurrentlybealonewithintheoperationscopeandalsogoesfromthe
operation.normalintophaseinitialization

UpdateHandle3.10.4WhenaPR-Hperformstheregistration,re-registrationorderegistrationofaPE,itisresponsiblefor
alsoinformingtheotherPRsoftheoperationscopeaboutthechangedhandlespacecontent.Themes-
sagetypeforthispurposeistheENRPHandleUpdatemessage.Asshowninfigure3.35,itincludes
thePRIDsofsendingandreceivingPRs,aswellasthePEsPHinformofaPoolHandleParameter
(seedetailedfigure3.18description)andofthethePEcontentsinformationoftheseintwformoofaparametersPoolcanElementbePfoundarameterin(seesubsubsectionfigure3.193.9.2.1).A.
Finally,theHandleUpdateincludesthefieldUpdateAction,definingwhetherthegivenPEshouldbe

56

CHAPTER3.RELIABLESERVERPOOLING

Algorithm1The16-BitInternetChecksumAlgorithm
1unsignedshortcalculateInternetChecksum16(void*data,sizetcount)
{23unsignedshort*addr=(unsignedshort*)data;
4unsignedintsum=0;
56while(size>=sizeof(*addr)){/*Maincalculationloop*/
7sum+=*addr++;
8size−=sizeof(*addr);
}910if(size>0){/*Addleft−overbyte,ifany*/
11sum+=*(unsignedchar*)addr;
}1213while(sum>>16){/*Fold32−bitsumto16bits*/
14sum=(sum&0xffff)+(sum>>16);
}1516return(˜sum);
}17

(re-)registered(PEADD,0x0000)orderegistered(PEDEL,0x0001).

3.10.5HandlespaceAuditandSynchronization
IfaPRhastemporarilylosttheconnectivitytootherPRsoftheoperationscope,itsviewofthe
handlespacemaybeinconsistenttotheviewsoftheotherPRs.DuringinterruptionoftheENRP
connections,updatesapplyingregistrations,re-registrationsorderegistrationsmayhaveoccurred,
whichhavebeenmissedbythedisconnectedPR.Therefore,itisnecessarytoregularlycheckthe
consistencyofthehandlespace–thischeckisdenotedasHandlespaceAudit–andtriggeringare-
.necessaryifsynchronizationTheapproachusedforthehandlespaceauditistocalculateachecksumoverthePEidentities
ownedbyaPR,thatisthesetofPEsforwhichaPRisthePR-H.Forthispurpose,a16-bitInternet
checksum(alsousede.g.fortheIPv4andTCPheaderchecksums)iscalculatedoverthesequenceof
eachPEsPHandPEIDasdefinedinsection3.1.1ofXieetal.(2006).Thealgorithmforthecheck-
sumcalculationisdefinedinBradenetal.(1988),Rijsinghani(1994),apseudo-coderepresentation
canbefoundinalgorithm1.Importantpropertiesofthisalgorithmarethatitissimpleandefficientto
implement.Furthermore,itispossibletocalculateitincrementally.Thatis,itallowstosubtractthe
checksumofaPEonderegistrationandtheadditionofachecksumonPEregistrationwithouthaving
tore-computethechecksumsofallotherPEs.Seesubsection4.4.4aswellasRijsinghani(1994)for
detailsonthecomputationoftheincrementalchecksum.
Thechoiceofthe16-bitInternetchecksumistheresultofsomelengthydiscussionsontheIETF
RSerPoolWGsmailinglistandatthe63rdIETFmeeting:Motorolamighthaveasoftwarepatenton
checksumslongerthan16bits.Takingthe16-bitInternetchecksumthatisalreadyusedfortheIPv4
header,TCPheaderandmanyotherapplicationshasbeenconsideredasthesafestsolutiontoavoid
runningintounforeseeabletroublewiththeU.S.softwarepatentsystem.
HavingcomputedthechecksumforitsPEentries,aPR-HcanpublishitaspartofitsPresence
message(seesubsection3.10.2).ThisallowsthereceiverofaPresencemessagetocomparethe
includedchecksumtothechecksumitexpectsforthesendingPR(byreferringtoitsownviewof

3.10.THEENDPOINTHANDLESPACEREDUNDANCYPROTOCOL

Figure3.36:AnExampleforHandlespaceAuditandResynchronization

57

thehandlespace).Note,thataPRcanefficientlymaintainachecksumforeachotherknownPRof
theoperationscopebyusingincrementalchecksumupdates.Moredetailsonefficienthandlespace
managementcanbefoundinchapter4andDreibholzandRathgeb(2005b),Dreibholz(2004d).If
thechecksumfromaPresencemessageisequaltotheexpectedone,thehandlespacereferringtothe
peerPRsPEentriesisassumedtobeconsistent.Theprobabilityofanundetectedinconsistencyis
1:65,536(i.e.about0.0015%),duetothe16-bitchecksum.
Ifthetwochecksumsdiffer,are-synchronizationofthehandlespaceisnecessary.First,allPE
entriesownedbytheremotePR(i.e.thePEsforwhichthePRisthePR-H)aremarkedinthelocal
handlespacecopy.Afterthat,thePEinformationforallPEsownedbytheremotePRisrequested
fromitbyusingtheENRPHandleTableRequestmessagealreadyintroducedinsubsection3.10.3.In
thecurrentcase,theW-ag(oWnchildrenonly)isset,inordertorequestthePRsownPEentries
only.TheentriesreceivedintheresponseoftheremotePRareusedtoupdatethelocalhandlespace
copy.Finally,allmarkedentriesnotbeingupdatedareremoved(theseentriesareobsolete).After
that,thehandlespaceviewofthePEsownedbytheremotePRisconsistentagain.
Tomakethehandlespaceauditandre-synchronizationprocedureclearer,figure3.36presentsan
exampleforthemessagesequence.PR#2receivesaPresencemessagefromPR#1anddetectsa
handlespaceinconsistencyduetodifferingchecksums.Therefore,PR#2requeststhehandlespace
dataforthePEsownedbyPR#1usingaHandleTableRequest(theOwnChildrenOnlyagisset).
PR#1answerswiththefirstpartofitshandlespacedata.TheHandleTableResponsedoesnotinclude
thefulldata,sincethemessagecapacitywouldbetosmall.Therefore,theM-ag(More)isset.
PR#1requeststhesubsequentdatapacketbyanotherHandleTableRequest.Finally,thelastHandle
TableResponseisreceived(M-agisnotset).Afterremovingtheobsoleteentries,thehandlespace
informationofPR#1sPEsisagainconsistentinPR#2sviewofthehandlespace.

3.10.6TakeoverProcedure
IfafailureofaPRisdetected,theremainingPRshavetonegotiatewhichPRtakesovertheownership
ofthePEscurrentlyownedbytheout-of-orderPR.Thisnegotiationprocedure,togetherwiththe

58

CHAPTER3.RELIABLESERVERPOOLING

Figure3.37:TheENRPInitTakeoverMessage

actualownershipchange,isdenotedasTakeoverProcedureorsimplyTakeover.
Toinitiateatakeover,thePRwhichhasdetectedafailureofanotherPRfirstsendsanENRPInit
Takeovermessagetoallofitspeers,includingtheputativelyfailedPRitself.Figure3.37showsthe
structureofthismessagetype.ItincludesthePRIDsofsendingandreceivingPRs,aswellasthe
PRIDoftheputativelyfailedPR–thetargetofthetakeoverprocedure,thereforedenotedasTarget.
AftertransmissionoftheInitTakeovermessages,aconsenttothetakeoverisexpectedfromallother
PRs,exceptthetargetitself(whichispresumablydead,ofcourse).
UponreceptionofanInitTakeovermessage,therearethreepossiblecasesforaPRtoreact:
1.ThePRitselfisthetargetofthetakeoverprocedure.Inthiscase,itimmediatelysendsaPres-
encemessage(seesubsection3.10.2)toallotherPRsinordertoavoidbeingtakenover.
2.ThePRhasalreadystarteditsowntakeoverprocedureforthetarget.Inthiscase,itchecks
whetheritsownPRIDissmallerthanthePRIDofthePRsendingtheInitTakeovermessage.
Inthiscase,itabortsitsowntakeoverprocedureandconfirmsthetakeoverbytheotherPR
usinganENRPInitTakeoverAckmessage(seebelow).
3.Iftheothertwoothercasesdonotapply,thePRgivesitsconsenttothetakeover,usingan
ENRPInitTakeoverAckmessage(seebelow).
TheconsenttoatakeoverisgivenbyanENRPInitTakeoverAckmessage.Itsstructureisequal
totheInitTakeover,exceptforadifferentmessagetype.Ifthetakeoverisconfirmedbyallother
PRsexceptthetakeovertarget,aPRhaswonthetakeovernegotiationandcanfinallycompletethe
takeover.Forthispurpose,itsendsaTakeoverServermessagetoallremainingPRsoftheoperation
scope.TheformatofthismessagetypeisequaltoInitTakeoverandInitTakeoverAck,exceptfor
themessagetype.UponreceptionoftheTakeoverServermessage,aPRchangestheownershipofall
PEsownedbythetakeovertargetPRtothetakeoversenderPR.
Finally,thenewPR-HofthePEstakenoverhastotellthePEsthemselvesaboutbeingtakenover.
Forthispurpose,anASAPEndpointKeep-Alivemessage(seesubsubsection3.9.2.2)issenttoeach
ofthePEs.TheHomeagofthesemessagesisset,thereforethePEsadoptthenewPRastheir
PR-H.Afterthat,thetakeoverprocedureisaccomplished.
Tomaketheorderofeventsinatakeoverprocedureclear,anexampleisillustratedinfigure3.38.
Here,PR#2detectsthefailureofafourthPR(notshowninthefigure).Itthereforestartsatakeover
bysendingInitTakeovermessagestoPR#1andPR#3.PR#1onhisparthasalsodetectedthefailure
andstartedatakeover.SincePR#1hasasmallerPRIDthanPR#2,itabortsitsowntakeoverand
givesPR#2itsconsentfortherequestedtakeoverinformofanInitTakeoverAckmessage.PR#3
alsoconfirmsPR#2stakeoverrequest.AfterreceivingtheconsentsofPR#1andPR#3,PR#2has

3.11.THEPOOLMEMBERSELECTIONPOLICIES

Figure3.38:ATakeoverExample

59

wonthetakeoverandcanactuallycompleteitbysendingTakeoverServermessagestoPR#1and
PR#3,aswellassendingASAPEndpointKeep-AliveswiththeHomeagsettothePEstaken
over(notshowninthefigure).UponreceptionoftheTakeoverServermessages,PR#1andPR#3
changetheownershipofallPEscurrentlybelongingtothefailedPRtotheirnewPR-H(i.e.toPR#2).
Ithastobenotedthatifatakeoverfailsforwhateverreason,thePEsofthefailedPRthemselves
detecttheunreachabilityoftheirPR-Hupontheirnextre-registration.Thatis,afteratmostthe
re-registrationinterval(T4-Reregistration)plustheregistrationtimeout(T2-Registration),aPEis
inducedtolookforanewPR-H.Seealsosubsection3.7.2fordetailsonthisdoublesafeguarding
mechanism.

3.11ThePoolMemberSelectionPolicies

Thepoolmemberselectionpolicies–alsosimplydenotedaspoolpolicies–definetheloaddistri-
butionandbalancingmechanismsofRSerPoolsystems.Inthissection,theirbasicsareexplainedat
first.Afterthat,theactualpoliciesdefinedintheWorkingGroupDraftT¨uxenandDreibholz(2006b)
xplained.eareTheWorkingGroupdocumentistheresultofourcontributionstothestandardizationofRSerPool:
theinitialRSerPooldocumentshadspecifiedafewpolicynamesonly,butanexactspecification
hadbeenmissing.Therefore,aspartofthepolicyperformanceevaluationsdescribedinchapter8,
acompletespecificationofasetofpolicieshasbeencreated.Thesepolicieshavebeenevaluated
inDreibholz,RathgebandT¨uxen(2005),theresultingguidelinesfortheirimplementationaswellas
theirspecificationshavebeensubmittedtotheIETFasIndividualSubmissionInternetDraftT¨uxen
andDreibholz(2005).ThisdocumenthasthenbecometheWorkingGroupDraftT¨uxenandDreibholz
(2006b)oftheIETFRSerPoolWG(seealsoDreibholz(2004b,c)).

60

CHAPTER3.RELIABLESERVERPOOLING

Basics3.11.1Theancinge.xistenceWhileofloadmultipledistribPEsutionforaccordingredundanctoyBergerautomaticallyandBrowneleads(to1999Load)onlyDistribrefersutiontotheandLoadassignmentBal-
ofworktoaprocessingelement,loadbalancingrefinesthisdefinitionbyrequiringtheassignmentto
loadmaintainoramemorybalanceusage.acrossAthepoolPEs.policyThedeterminesbalancethereferstodistribanutionofapplication-specificloadcreatedbyparameterPUsamonglikeCPUPEs
ofapool,alsoprovidingthepossibilitytobalanceit.
LoadinBeforetheitiscontextpossibleoftoRSerPooldefinehasrulestobeforloaddefinededistribxactly:utionhere,andloadbalancing,denotestheavaluemeaninginofthetheintervtermal
bitfrom0%unsignedinte(unloaded)gertoand100%therefore(fully0%loaded).mapstoFor0x00000000storagereasons,andt100%heloadmapsvtoalue0xfisffffffencodedf.Foraseach32-
viewapplication,ofRSerPool.itisFthereforeormally,thisnecessarymappingtomapisthedefinedasfolloapplication-specificws:meaningofUtilizationtothe
m:UApplication→[0,1]⊂R
um(u)=max(UApplication)−min(UApplication)(3.1)
Inthisformula,UApplicationdenotestheapplicationssetofutilizationvalues.ForanFTPserver,
ofthis7couldusers,beitstheloadcannumberbeofuserscalculatedfromas0mto(7)10.=That7is,=U70%FTP.A=g{0ate,k1e,.eper..,for10}andtelephoneatasignallingutilization
0−10(seesubsection3.6.1)coulddefineitsutilizationasthenumberofMGCrequestsinitsqueue.If
therearecurrently10requestsforamaximumnumberof50,UGK={0,1,...,50}andtheloadis
.20%=(10)mcateIngories:theWadaptiorkingveandGroupDraftnon-adaptiT¨uxveenones.andForDreibholzan(Adaptive2006b),Policypool,arepoliciesgularareupdateclassifiedoftheintopolictwyo
load.informationSinceaisPEsnecessaryloadis.vAneariable,xampleithasfortosuchbeapropagpolicyatedisintoselectingthethehandlespacePEcurrentlyregularlyhavingorontheevleastery
change.Ontheotherhand,aNon-AdaptivePolicydoesnotrequiresuchinformation.Anexamplefor
suchapolicyisrandomselection.
Inthefollowingtwosubsections,therelevantpoliciesdefinedintheWorkingGroupDraftT¨uxen
andDreibholz(2006b)6areexplained.

3.11.2Non-AdaptivePolicies
Therelevantnon-adaptivepoliciesfromT¨uxenandDreibholz(2006b)areintroducedinthissubsec-
tion.

3.11.2.1RoundRobinandWeightedRoundRobin
UsingtheRoundRobin(RR)policy,thePEsareselectedinturn.RoundRobinisthesocalledDefault
Policyoptional.ofTheRSerPool.choiceofThatRoundis,everyRobinasRSerPooldefaultpolicyimplementationisbasedonmustthesupportit.recommendationsAllotherofpoliciesDreibholz,are
RathgebAandgeneralizationT¨uxen(of2005).RoundRobinisWeightedRoundRobin(WRR).Forthispolicy,eachPEpro-
videsapositiveintegerweightconstant,describingitsprocessingpower(dependingontheapplication
6ThisdocumentalsodefinesthecorrespondingPolicyParameters.

3.11.THEPOOLMEMBERSELECTIONPOLICIES

61

ofcourse,e.g.CPUpower,memory,diskspaceetc.)proportionaltotheotherPEs.Analogouslyto
asRounditsweightRobin,theconstantPEsspeareagcifies.ainThatselectedis,ainPEturn.ofButweightfor5iseachselectedround,5aPEtimes,iswhileselectedaPEasofmanyweighttimes1
once.onlychosenistheItisselectionimportofantatocertainnotePEthatdependsRoundonRobintheasprewellviousasWselections.eightedRoundSinceforRobinareRSerPoolstateful.multipleThatcom-is,
ponents(PRsandPU-sidecaches)mayperformselectionsindependentlyofeachother,theresulting
oFverallallaciesandselectionpitfallsbehaofviourthesecouldpoliciesshowareadescribedsignificantindifsectionference8.8.fromthedesiredroundrobinscheme.

3.11.2.2RandomandWeightedRandom
TheRandom(RAND)policyselectsPEs–asexpected–byrandom.Theprobabilityforchoosinga
PEisequalforallelementsofthepool.AgeneralizationofRandomisWeightedRandom(WRAND).
Inthiscase,eachPEprovidesapositiveweightconstantwhichspecifiesitsdesiredselectionproba-
bilityproportionaltotheotherPEs.Thatis,apowerfulPEofweight9.75isonaverageselected9.75
timesmorefrequentlythanaPEofweight1.
IncontrasttotheRoundRobinandWeightedRoundRobinselection,theRandompoliciesare
stateless.Thatis,thechoiceofthenextPEdoesnotdependonthecurrentselection.Depending
ontheapplicationscenario,thiscouldleadtoasignificantlyimprovedloaddistribution.Formore
.8.8sectionseedetails,

3.11.3AdaptivePolicies
ThissubsectiondescribestherelevantadaptivepoliciesdefinedinT¨uxenandDreibholz(2006b).

UsedLeast3.11.3.1TheLeastUsed(LU)poolpolicybasesitsselectiondecisiononthecurrentloadstatesofthePEs,
i.e.itselectstheleast-loadedserver.Loadinthiscaseisdefinedasexplainedinsubsection3.11.1.If
multiplePEsareonthesamelowestloadlevel,roundrobinorrandomselectionshouldbeperformed
elements.least-loadedtheseamong

UsedLeastPriority3.11.3.2EvenifapoolconsistsofPEsprovidingveryheterogeneouscapacities,theselectionperformedby
LeastUsedisalwaysbasedonthePEsactualload.Butforsuchscenarios,selectingalow-loadedbut
low-capacityPEmayresultinaworseperformancethanusingahigher-loadedbutalsomorepowerful
thePE.Forotherehand,xample,PEPE#2#1isis8%10%loaded,loadedbutandananewadditionalrequestwrequestouldtoincreasehandleitswouldloadbyincreaseanotherits2%.loadOnby
additional8%.Clearly,itwouldbebeneficialtochoosePE#1,sinceitsloadwouldincreaseto12%
insteadof18%forPE#2.
(2005The),letsPriorityaPELeastspecifyUsedaLoad(PLU)Incrpolicementy,introducedconstant.andThisevvalue,aluatedgivineninDreibholz,theunitsofRathgebload,andTspecifies¨uxen
howmuchanadditionalrequestwillincreasethePEsload.Afterthat,thePEhavingthecurrently
lowestsumofloadplusloadincrementischosen.Thatis,whileLUselectsPEsbasedonthecurrent
load,assignedPrioritytoit.InLeastcaseUsedofbasesmultipleitsPEschoiceontheonthesameloadlowestlevelsumthatleavel,PEwillroundhavrobineafterorarandomnewrequestselectionis

62

CHAPTER3.RELIABLESERVERPOOLING

canbeappliedamongthem.Intheexampleabove,PU#1wouldbechosen–itssumofloadandload
incrementis12%,insteadofthe18%forPE#2.

3.12TheMechanismsforServiceReliabilityandAvailability
ARSerPoolsystem–basedontheENRPandASAPprotocolsdescribedintheprevioussections–
includesvariousmechanismstodetectandhandlecomponentfailures,inordertosupportapplica-
tionsinprovidingareliableservice.Inthissection,thesemechanismsaresummarizedtomaketheir
.clearfunctionalities

ModeleailurF3.12.1FortheanalysisofRSerPoolscomponentfailurehandlingperformance,aswellasforthecorrect
ability,understandingitisfirstofthenecessaryfollowingtodefinedescriptionaFailurofeModelRSerPool.Insmeparticularchanisms,theforfailureservicemodelreliabilityspecifiesandavwhatail-
thetermfailureactuallymeans.
Forthecontextofthisthesis,aFailureofaPR,PEorPUcomponentoccursinformofaso
calledSilentFailure:afaultycomponentsimplydisappears,i.e.itdoesnotanylongerrespondto
requestsaccordingtoitsprotocolspecifications(e.g.ASAP,ENRPoranapplicationprotocol).For
theservice,conteitxtdoesofthisnotmakthesis,eathedifreasonferenceforawhetherfailurethe–denotedinterruptionasofFaultits–isserviceirreleisve.g.ant:forcausedthebyauserofcrasha
ofthecomponent,abrokennetworkconnectionorthefailureofanintermediatenetworkdevicelike
.routeraItisimportanttonotethattheusedfailuremodeldoesnotcoverthefollowingaspects:
ofTheRSerPool.outputofFromwrongthecalculationperspectiveresultsof(i.e.RSerPool,dueatofaultymemoryorcomponentCPUfcouldaults)issimplyoutofhalttheitselfscopeif
itdetectsitsownmalfunctionbyappropriatemechanisms(seeEchtle(1990)fordetailsonthis
subject).Inthiscasesuchashutdownbecomesequaltoasilentfailure.
Intentionaldistributionofmisinformationbyanattackerisoutofthisthesissscope.However,
somesecurityconsiderationsarepresentedinsection3.13.
Thenumberoftolerablecomponentfailuresdependsontheapplicationprovidingitsserviceby
anumberpool:ofwhilenecessarythePEsRSerPoolisfunctionalityapplication-dependentitselfcan(bbeutproalsoatvidedleastwithone,atofleastcourse).onePR,theminimum
Afterhavingdefinedthefailuremodel,itispossibletoexplainRSerPoolsmechanismsforservice
reliabilityandavailabilityindetail.SincebothRSerPoolprotocolsstronglyrelyonthefeaturesof
theSCTPprotocol,theroleoftheSCTPmechanismsisdescribedatfirst.Thisisfollowedbythe
explanationoftheactualRSerPoolmechanisms.

3.12.2MechanismsoftheTransportLayer
RSerPoolsystemsarebasedontheSCTPTransportLayerprotocol(seesubsection2.4.3).Therefore,
problemsconcerningtheunderlyinglayersmaybesolvedintheTransportLayeralready.Table3.1
summarizestheproblemswhichcanoccurontheselayersandthemechanismsprovidedtodetect
them:

3.12.THEMECHANISMSFORSERVICERELIABILITYANDAVAILABILITY63

Congestion:Probably,themostfrequentprobleminanetworkistheoverloadofbottlenecklinks.
Clearly,anoverloadedlinkleadstopacketloss.IncaseofSCTP(aswellasforTCP),lost
packetsaredetectedbysequencenumberdifferences.Thesenderretransmitspacketsnotbe-
ingacknowledgedwithinacertaintimeout(incaseofreliabletransport).Furthermore,the
congestioncontrolalgorithmensuresthattheavailablebandwidthisfairlysharedamongthe
link.bottleneckautilizingassociationsTransmissionErrors:BiterrorsduetounreliabletransmissionaredetectedbytheCRC-32check-
sum7withineachSCTPpacket.Incaseofawrongchecksum,thepacketissimplydropped
andretransmittedbythesender(duetomissingacknowledgement;incaseofreliabletrans-
port).Furthermore,usingthePacketDropextension,areceivercanexplicitlynotifyitssender
aboutpacketrejectionduetotransmissionerrors(seesubsubsection2.4.3.6andStewart,Lei
andT¨uxen(2006a)).Therefore,atransmissionerrorcanbedifferentiatedfromapacketloss
congestion.todueLinkandRouterProblems:Links–aswellasrouters–mayfail,e.g.duetodamagesorsimply
powerloss.Themulti-homingfeatureofSCTP(seesubsubsection2.4.3.4)allowstoconnectan
endpointtodifferent–andforlogicalreasonsindependent–networks.Inthiscase,aproblem
withinonenetwork–detectedbypathmonitoringusingheartbeatchunks–cansimplybe
handledbyusinganotherpathforthetransportofdata.Thatis,themulti-homingfeatureof
SCTPprovideslinkredundancy.Fortheupperlayers,theswitchingoftheprimarypathis
transparent(seealsoJungmaier,RathgebandT¨uxen(2002),Jungmaier(2005)fordetails).

3.12.3MechanismsoftheSessionLayer
ThehandlingofserverfailuresisthemainpurposeRSerPoolhasbeencreatedfor.ButaPEisnot
theonlycomponenttypewhichcanfail–afailureispossibleforaPUandaPRaswell.Table3.2
summarizesthemechanismsbywhicheachcomponenttypedetectsthefailuresofapeercomponent
type:PoolUserFailure:IfaPUfails,theassociationtoitsPRisbroken.Thisisdetectedbythetransport
protocol(i.e.usuallybySCTP).Exceptforcleaninguptheresourcesallocatedfortheassocia-
tionmaintenance,nothingelsehastobedonebythePR.
ThesamehappensforanassociationwithaPE.Onlyincaseofasymmetricscenario,i.e.the
PEisalsoaPU(seesubsubsection3.9.5.3),thePEhastoperformafailover–nowintherole
PU.aofPoolElementFailure:APUhastodetectthefailureofitsPEbyapplication-specificmechanisms.
Intheusualcase,thisisrealizedbyapplicationtimeouts:ifthePEdoesnotanswerwithina
giventimeout,thePUassumesthatthePEisdeadandperformsafailover.Fortheapplication
modelbeingintroducedlaterinthisthesis,thefailuredetectionfunctionalityisrealizedbythe
CalcAppKeepAliveandCalcAppKeepAliveAckmessages(fordetailsseesection8.3).
ForthePR,therearetwomechanismstodetectaPEfailure:
1.First,ifthePRisthePR-HofthePE,afailureisdetectedbyanASAPEndpointKeep-
Alivemessage(seesubsubsection3.7.1.3):ifthePEfailstoanswerusinganASAPEnd-
7NoteagainthataCRC-32checksumdoesnotdenoteasuminmathematicalsense.Nevertheless,thisistheterminology
documents.standardsthebyused

64

CHAPTER3.RELIABLESERVERPOOLING

MechanismDetectionProblemCongestionSequenceNumbersandAcknowledgements
TransmissionErrorChecksum,PacketDropExtension
LinkandRouterProblemsPathMonitoring,Multi-Homing
Table3.1:TheNetworkandComponentFailureDetectionMechanismsofSCTP

FailedComponentDetectionbyDetectionMechanism
PoolUserPoolElementApplication-Specific(TimeoutorBrokenConnection)
PoolUserRegistrarBrokenConnection
PoolElementPoolUserApplication-Specific(TimeoutorBrokenConnection)
PoolElementRegistrarASAPEndpointKeep-AliveTimeout
PoolElementRegistrarASAPEndpointUnreachable(s)byPoolUser(s)
RegistrarPoolUserASAPRequestTimeout
RegistrarPoolElementASAPRequestTimeout
RegistrarRegistrarENRPPresenceTimeout
RegistrarRegistrarENRPRequestTimeout
Table3.2:TheComponentFailureDetectionMechanismsofRSerPool

pointKeep-AliveAckmessage,itisassumedtobedead.Thespeedofthisfailurede-
tectionmechanismdependsonthesettingsoftheKeep-AliveTransmissionandTimeout
intervals(seesubsubsection4.3.2.2).
2.TheseconddetectionmechanismforPEfailuresisthereportingofaPEunreachability
byPUs,usingASAPEndpointUnreachablemessages(seesubsubsection3.9.3.2).A
PRcountsthenumberofunreachabilityreportsforeachPE;ifthenumberreachesthe
configuredlimitMAX-BAD-PE-REPORT(seesubsubsection3.7.1.4),thePEisassumed
dead.betoClearly,upondetectionofaPEfailure,thePRsactionistoremovethecorrespondingPE
handlespace.thefromidentityRegistrarFailure:PEsandPUsdetectafailureoftheirPRbyrequesttimeouts;i.e.arequestfor
registration,deregistrationorhandleresolutionisnotansweredwithinthespecifictimeoutof
thecorrespondingrequesttype.Inthiscase,anotherPRhastobecontactedandused.Forthe
PE,thisalsomeanstore-registeratthenewPR.
TheremainingPRsalsoperformatakeoverprocedureforthePEsofthefailedPR(seesubsec-
tion3.10.6).ThePRwinningthetakeoversendsanASAPEndpointKeep-Alivemessagewith
theHomeagsettoeachPEwhoseownershiphasbeentaken,inordertonotifythemabout
theirnewowner.
FortheRSerPoolfailurehandlingevaluationsofthisthesis,onlyfailuresofPEsandPRsare
relevant;PUrestartsareapplication-specificandoutofscope.TheusualprocedureofaPUfailure
handlingistorestartthewholesession.Forexample,iftheuserhasrebootedthePUhostaftera

TIONSCONSIDERASECURITY3.13.

65

systemcrash,hesimplystartsthePUapplicationagain.Anotherapproachistoregularlysavethe
applicationstateandallowaPUrestartfromthelateststoredstate;seePlanketal.(1995)formore
subject.thisondetails

3.12.4SupportforRedundancyModels
Clearly,theprovisionofcomponentredundancyconceptsleadstotheusageofdifferentredundancy
models(seealsoEngelmannandScott(2005)).FortheRSerPoolapplicationscenariosdescribed
insection3.6,itisassumedthatthemajorityoftheservicesusestheactive/activemodel(seealso
section1.2.3).Thatis,allserversofthepoolshouldbeutilizedtomakebestuseoftheavailablere-
sources.ConfiguringanappropriatepoolpolicyforthePEselection(seesection3.11),loadbalancing
(seesection1.2.2)canbeusedtoreasonablydistributetheworkloadamongtheserversofthepool.
ButRSerPoolisnotrestrictedtotheactive/activemodelonly.Usinganappropriatepolicy,e.g.
theredundancymodelpolicydefinedinXieandYarrol(2004),itiseasilypossibletoalsoapplythe
active/standbymodelaswell.Thatis,somePEsareinstandbymodeandonlyusedincaseofafailure
oftheactiveones.Theactualbehaviourofthestandbycomponent,i.e.whetheritprovidescold-
standby,warm-standbyorhot-standby(seesection1.2.3forthedefinitions)isintheresponsibilityof
theapplicationonlyandoutofthescopeofRSerPool.

ConsiderationsSecurity3.13ConsiderationsforthesecurityofRSerPoolarediscussedindetailinStillmanetal.(2005).Therefore,
onlyashortsummaryoftheimportantpointsisgivenhere.Ingeneral,RSerPoolendpointsare
protectedquitewellagainstblindoodingattacks,duetotheusageofSCTPastransportprotocol
(seesubsection2.4.3fordetailsonthesecuritymechanismsofSCTP).ThemainthreatforRSerPool
systemsisthatanattackermaybringmisinformationintothehandlespace,eitherbyASAPorENRP.
IfanattackerisabletomasqueradeasanauthorizedPE,itcancreatefakedregistrationsandby
artfulchoiceofpolicyparameters(e.g.thelowestloadorthehighestweight)implythechoiceof
unsuitableornon-existingPEs.Thismayleadtoadenialofservice,atleastforthepoolforwhichthe
compromisedPEauthorizationisvalidfor.IfanattackerisevenabletomasqueradeasPR,itcanfill
arbitrarymisinformationintothehandlespaceandcauseadenialofserviceforthewholeRSerPool
network.Furthermore,PEsmaychoosetheattackersPRasPR-H.
Tocopewiththedescribedthreats,RSerPoolmandatorilyrequiresbilateralauthenticityandin-
tegrityprotection–amongPRsaswellasbetweenPRsandPEsorPUs.Bothrequirementscanbe
achievedbyapplyingexistingstandardsecuritytechnologies:eitherIPsec8ontheNetworkLayeror
TLS9ontheTransportLayer.Alternatively,theSecure-SCTPextensionforSCTP(seesubsubsec-
tion2.4.3.7)alsoprovidesasuitablesolution.
Optionally,allthreetechniquesalsoofferthepossibilityforencryptiontoensureconfidentiality.
Thatis,dependingontheusersrequirements,ASAPandENRPtrafficcanalsobeencrypted.

Summary3.14Inthischapter,theReliableServerPooling(RSerPool)frameworkforthemanagementofandaccess
toserverpoolshasbeenpresented.TheprovisionofcomponentredundancyforSS7telephonesig-
89SeeSeeKDierksentandandAtkinsonAllen((19991998c),,aBlak,b).e-Wilsonetal.(2003).

66

CHAPTER3.RELIABLESERVERPOOLING

nallingoverIPnetworkshadbeentheinitialmotivationofRSerPool.Duetoitsgenericapplicability,
vandariousloadnewbalancing.applicationInthefirstscenariosparthaofvetheshownintroductionup,e.g.totheusageRSerPool,foritsreal-timecomponentsdistrib–reutedgistrarscomputing(PR),
servAfterersthat,(poolthetwoelements,RSerPoolPE)andprotocolsclientsand(pooltheiruser,behaPU)viour–andhavetheirbeeninteractionintroducedhaveinbeendetail:thepresented.Ag-
gregateServerAccessProtocol(ASAP)forthecommunicationofpoolelementsandpooluserswith
registrarsandtheEndpointHandlespaceRedundancyProtocol(ENRP)forthesynchronizationof
theconsiderationshandlespacehaveamongbeenreegistrars.xplained.Finally,serverselectionprocedures(poolpolicies)andsecurity

4Chapter

ManagementHandlespaceThe

HISchapterdescribesthedesignandimplementationforanefficienthandlespacemanagement.
Atfirst,amotivationfortheeffortonoptimizingthehandlespacemanagementisprovided.
TThisisfollowedbytherequirementsforthehandlespacemanagement.Afterthat,thedesign
andimplementationofthehandlespacemanagementapproachusedforboth,theRSerPoolprototype
implementationRSPLIBandthesimulationmodelRSPSIM,aredescribed.Thischapterisconcluded
byremarksonthevalidationoftheimplementation.

oductionIntr4.1Ascationemayxplainedhavineitssectionown3.6,requirementsRSerPooliswithapplicablerespecttoforservaerwidevselection,arietyofthatisitapplications.mayspecifyEachitsoappli-wn
poolsectionpolic3.6.5y.)andFurthermore,loadbalancingpoolsof(seecertainsubsectionapplications3.6.4)likemayreal-timebecomevdistriberylarutedge.Forcomputingexample,(seeasub-big
internationalcompanycoulddecidetoaddallofitsofficePCstoacomputationpoolforsimulation
asofprocessing.2006,aAsinglepoolofusuale.g.office10,000PCtopro100,000videsaPCs2.5+canGHzproCPUvideanincludingenormousFPUandvcomputationectorunit,capacity:512+
MBytesofmemoryand80+GBytesofharddiskspace.Scaledby4to5ordersofmagnitude,the
ovmoreerallcomesecomputationxtraordinarilycapacityineofxpensisuchvae:poolusuallycan,notanofonlyficePCcompeteisonlywithautilizedduringsupercomputerw,orkingitfurtherhours-
andeventhenitwaitsmostofthetimeforuserinput.
InsupportedsummarybyPRs,poolsandmayPUs.Itbecomeislarthereforegeandnecessarytherearetomanthinkyaboutpossiblehowpooltoefpoliciesficientlywhichmanagehavetosuchbe
handlespaces.

4.2ImplementationHistoryandLessonsLearned
wForaytheoffirst,realizingfast-tracktheversionhandlespaceofthehasRSPbeenLIBtakenprototypebyusinglinearimplementationlists.(seeThatalsois,thechapter5handlespace),thena¨ıvhase
beenfunctionsrealizedproasvidedalistbyoftheGpools,LIBwherelibraryeach(seepoolGNOMEincludedProjectalist(of2001PEs.))haAsvelistbeenused.implementation,Thisna¨ıvthee
approachhasbeenrathersimple,butitworkedreasonablywellforsmallpoolsoflessthan10PEs
usingdefinedthetheRoundLeastUsedRobin,andLeastRoundUsedRobinorRandompolicies).Hopolicweyv(theer,testsstandardshaveshowndocumentsthatatmorethispooltimepoliciesonly

67

68

CHAPTER4.THEHANDLESPACEMANAGEMENT

HandlespacePoolPHh1Policyπ1PoolElement...ID-#e11Policy...Infoπˆ11Addresses...a11
PoolElementID-#e1m1PolicyInfoπˆ1m1Addressesa1m1
.........
PoolPHhPolicyπPoolElement...ID-#en1Policy...Infoπˆn1Addresses...an1
nnPoolElementID-#enmnPolicyInfoπˆnmnAddressesanmn

Table4.1:TheHandlespaceDatatypeStructure

maybeusefulandtherehavebeenideasforapplicationscenariosrequiringmuchlargerpools(e.g.
real-timedistributedcomputing,seesubsection3.6.5).
ThenecessitytoevaluatefunctionalitiesofRSerPoolforresearchpurposeshasledtothecreation
oftheRSPSIMsimulationmodel(seealsochapter6).Duetothelimitationsoftheprototypeshand-
lespacemanagement,ithasbeendecidedtoredesignit:poolshavebeenrealizedasobjectsofan
abstractclass;theactualimplementationofapoolincludingitsstoragefunctionalityhasbeenpro-
videdbyaderivedclasswhichhasalsoimplementedthepoolsselectionpolicy.Afterthat,ithas
beenforexamplepossibletorealizetheRoundRobinpolicypoolsusingcircularlists,Randompools
usinganarrayofPEsandLeastUsedpoolsusingtreessortedbythePEsloadvalues.Fortheactual
storageoflistsorarrays,functionsprovidedbytheOMNET++library(seeVarga(2005b,a))–on
whichthesimulationisbased–havebeenused.Whilethenewapproachaddressedscalabilityaswell
asextensibility,itbecameincreasinglydifficulttomaintainandverifythevarietyofdifferentpolicy
andstorageimplementations.Furthermore,allnewpoliciesalsohadtobesupportedbytheRSPLIB
prototype,whichhasstillusedtheoldhandlespacemanagementimplementation.Sincethenewhand-
lespacemanagementapproachhasbeenbasedontheOMNET++frameworkwritteninC++,italso
hasnotbeenpossibletosimplyportitbacktotheC-basedprototype.
Asaresultofthehandlespacemanagementexperiencesobtainedfromprototypeandsimulation
model,ithasbeendecidedtoredesignitagain–andfinallydoittherightway!Thekeyrequirements
forthenewhandlespacemanagementhavebeenthenecessityforonlyonestoragemechanism,which
iscommonforallpolicies,andtheusabilityofthenewsystemforthesimulationmodelaswellas
fortheprototypeimplementation.Inthefollowingsection4.3,thehandlespacemanagementwillbe
definedinformofanabstractdatatype.Theimplementationdesignwillbedescribedinsection4.4.

4.3AnAbstractHandlespaceManagementDatatype
Inthissection,theabstractdatatypeofthenewhandlespacemanagementapproachisdescribed.

eStructurHandlespace4.3.1Table4.1presentsthestructureoftheabstracthandlespacemanagementdatatype:thehandlespace
consistsofasetofnpools,identifiedbytheiruniquePHsh1tohn(hi∈Pfori∈{1,...,n}
andPthesetofallpossiblePHs).Eachpooliusesapoolpolicyπi∈Π,whereΠdenotestheset
ofallpoolpolicies(e.g.Π={LU,RR,RAND}).Furthermore,eachpooliincludesanon-empty
setofmiPEentries,denotedbytheirPEIDsei1toeimi(eij∈{1,...,232−1}⊂Nforall

4.3.ANABSTRACTHANDLESPACEMANAGEMENTDATATYPE

69

i∈{1,...,n}andj∈{1,...,mi}).EveryPEalsoincludesitspolicyinformationπˆij∈Πˆπi,
whereΠˆπidenotesthesetofallvalidpolicyinformationsettingsforpolicyπi.Finally,eachPEentry
includesitstransportaddressaij∈TPi×TA×P(NA),whereNAdenotesthesetofnetwork-layer
addresses(i.e.usuallytheunionofIPv4andIPv6addresses)andTAdenotesthesetoftransport-
layeraddresses(i.e.portnumbers)ofthetransportprotocolTchosenforpooli.T∈Tfor
i∈{1,...,n},whereTPdenotesthesetofallsupportedtransportPiprotocolsandtransportPiusesP(e.g.
TP={SCTPwithControlChannel,SCTPwithDataChannelonly}).

4.3.2OperationsforthePoolElementFunctionalities
ForthePEfunctionalitiesofaPR,thehandlespacemanagementhastoprovideregistrationhandling
monitoring.reachability-basedtimerand

HandlingRegistration4.3.2.1Uponregistration,re-registrationandderegistration,itisnecessarytofindanexistingPEstructurein
thehandlespacebyitsPHandPEID:itisfirstnecessarytofindthepoolstructurethePEbelongs
to;afterthat,alookupforthePEinthegivenpoolcanbeperformed.Furthermore,itisnecessaryto
insertanewPEintothepooluponregistrationandtoremoveitfromthepooluponderegistration.
Therefore,thehandlespacemanagementtobedefinedhastoprovidethetwomethodshsMgt-
RegisterPE(PH,PE)andhsMgtDeregisterPE(PH,PE)toprovideregistrationandderegistrationfunc-
tionalities.Sinceare-registrationissimplyaregistrationofanexistingPE–andthereforetheplace
whereitispossibletodecidewhetheraregistrationisactuallyare-registrationistheregistration
methoditself–noseparatemethodisnecessaryhere.

imersTReachability4.3.2.2APR-HhastomonitortheavailabilityofitsPEsbysendingkeep-alivemessages(seesubsubsec-
tion3.7.1.3).Thatis,itmustbepossibletoscheduleaKeep-AliveTransmissionTimerforeachPE
entry.Clearly,itmustalsobepossibletocancelsuchascheduledtimerifthePEisbeingremoved.
Furthermore,afteranASAPKeep-Alivemessageissent,itisnecessarytoscheduleaKeep-Alive
TimeoutTimer.Ifsuchatimerexpires,noASAPKeep-AliveAckmessagehasbeenreceivedandthe
PEisassumedtobedead.Clearly,aKeep-AliveTimeouttimerhastobecancelledifiteitheranswers
theKeep-AlivemessageorthePEisbeingremoved.
APRnotbeingaPR-HforacertainPEhastopayattentiontothePEsRegistrationLifetime
ENRPparameterUpdate(seemessagesubsubsectionwithinthe3.9.2.1gi):venthetimePEentryspan.hastoTherefore,expireitifisitisnecessarynottoupdatedmaintainbyaasubsequentLifetime
ExpiryTimer.ThistimerisscheduledwhenaPEentryiscreated(orupdated)uponanENRPUpdate
messageandcancelled(andusuallyre-scheduled)uponthefollowingUpdate.

4.3.3OperationsforthePoolUserFunctionalities
ToprovidethePUfunctionalities,thehandlespacemanagementfirsthastoperformthePEselection:
oneormorePEidentitieshavetobeselectedfromapoolgivenbyitsPHusingthepoolsselection
policy.Thatis,afterasuccessfullookupofthepool,alistofPEidentitieshastobeobtainedby
applyingtheselectionprocedurespecifiedbythepoolspolicy.

70

CHAPTER4.THEHANDLESPACEMANAGEMENT

Furthermore,thePUfunctionalityofthehandlespacemanagementincludesthemaintenanceof
thePU-sidecache.Thatis,aCacheExpiryTimerhastobescheduledforeachPEentryinthecache.
Whenthetimerexpires,thecorrespondingPEidentityhastobeushedfromthecache.
ForthepooluserfunctionalityofthePEselection,thehandlespacemanagementhastoprovide
afunctionhsMgtHandleResolution(PH)whichselectsacertainnumberofPEidentitiesfromthe
handlespace.

4.3.4OperationsfortheRegistrarFunctionalities
Asexplainedinsubsection3.10.5,PRssynchronizetheirviewsofthehandlespacebysequencesof
ENRPHandleTableRequestandHandleTableResponsemessages.AHandleTableResponsemes-
sageonlyincludesalimitedfractionofhandlespacedata,duetomessagesizelimitation(65,535bytes)
andtoavoidoverloadingendpointsandnetwork.Thatis,ifaPRhasrequestedahandlespacecopy
fromoneofitspeers,theservingPRrememberstheplaceinthehandlespacewheretocontinuefrom
whentherequestingPRasksforthesubsequentpartusinganotherHandleTableRequestmessage.
Thatis,thehandlespacemanagementtobedefinedhastoprovideafunctionhsMgtTraverse-
(PRID,lastPH&,lastPEID&)whichreturnsacertainnumberofPEidentitiesownedbythePR
givenbyPRID(orall,ifnoPRisspecified)andstoresPHandPEIDofthelastreturnedPEidentity
intotheprovidedreferencevariableslastPHandlastPEID.Asubsequentcalltothisfunctionwill
resumeatthenearestPEaftertheprovidedone.
Finally,forthePRsynchronizationfunctionalityasdescribedinsubsection3.10.5,itmustbe
possibletoobtainthehandlespacechecksumfor
1.AllPEidentitieshandledbythePRitself(i.e.thePEsforwhichitisthePR-H)and
2.ForallPEidentitiesbelongingtoacertainpeerPRoftheoperationscope.

DesignManagementHandlespaceThe4.4Inthissection,theimplementationdesignfortheabstracthandlespacemanagementdatatypein-
troducedinsection4.3isdescribed.Whilefirstideasforthisimplementationdesignhavealready
beensuggestedinDreibholz(2004d),amoreformalpresentationhasbeenprovidedinDreibholzand
).2005b(RathgebThepresentationofthehandlespacemanagementdesigninthissectionisstructuredasfollows:
First,thedatastructureforthehandlespaceandtherealizationofpoolpoliciesareexplained.Af-
terthat,themanagementoftimereventsandchecksumsisdescribed,followedbythehandlespace
synchronizationhandling.Finally,possiblealgorithmstomanagethepresentedstructuresanddesign
decisionsfortheimplementationareintroduced.

eStructurDataHandlespace4.4.1ThemaintaskofthehandlespacemanagementisclearlythestorageofpoolandPEinformation.
Therefore,itisfirstnecessarytodefinehowthesedataentitiesarestored.Thehandlespacedata
structuredesignisshowninfigure4.1andquitestraightforward:thehandlespaceconsistsofaSetof
poolssorted–bythedenotedPEasIDPool(thisSetset–issorteddenotedbyastheIndepoolsxSet)PHs;andeachasecondpoolSetconsistsofPEofaSetreferencesofPEsortedreferencesbya
policy-specificSortingOrder(thissetisdenotedasSelectionSet).Itisintentionallynotdefinedhow

4.4.THEHANDLESPACEMANAGEMENTDESIGN

Figure4.1:TheHandlespaceDataStructureDesign

PolicyπiPoolElementID-#ei1si1
PoolPHhiSi=1+max{sij|j∈{1,...,mi}}......
PoolElementID-#eimisimi

Table4.2:SequenceNumbersforPoolsandPoolElements

71

aSetisactuallyimplemented(e.g.byusingalinearlist);possiblechoicesforactuallyrealizingaSet
datatypearepresentedlaterinsubsection4.4.7.
spondingClearly,theoperationseffortoffortheSetimplementingdatatype.Ainsertion,policremoy-basedvalandselectionlookupofofPEsPEsisisreducedreducedtototakethespecificcorre-
lectionelementsProfocedurthee.SelectionThenextSetstep(sortedthereforebytheistopoleicxplainy-specifichowsortingsortingorderorder)andusingaselectionpolicprocedurey-specificareSe-
definedforcertainpoolpolicies.Thesedefinitionsarethesubjectofthefollowingsubsection.
TheThereasonDefaultandaSelectiondetailedeProcxampleedurforeisthissimplychoicetowilltakebePEspresentedfromtheinbeginningsubsubsectionofthe4.4.2.2Selection.Set.

RealizationsolicyP4.4.2Inthissubsection,sortingordersandselectionproceduresforpoolpoliciesarepresented.Tosimplify
thissometask,importantsomepoliciesimportant–helperaccompaniedconstructsbyaredetailededefinedxamples.first.Afterthat,itisshownhowtorealize

ConstructsHelper4.4.2.1SequenceNumbersThefirsthelperconstructtobeintroducedisthedefinitionofsequencenumbers
sforijPEswhichandispoolsuniqueasinshopoolwni.inStable⊂N4.20:denoteseachpoolthesetelementofalleij∈possibleSofpoolsequenceigetsanumbers.sequenceThenumberglobal

72

CHAPTER4.THEHANDLESPACEMANAGEMENT

PoolElementID-#ei1wi1
PoolPHhiPolicyπi......
Wi=1≤j≤miwijPoolElementID-#eimiwimi
Table4.3:PoolElementWeightsandWeightSum

sequencenumberofpooliisdefinedas
Si=1+max{sij|j∈{1,...,mi}},(4.1)
i.e.thelargestPEsequencenumberofthepoolplusone.TheuniquenesspropertyofthePEsse-
quencenumberswithintheirpoolwillbecomeveryhandyforthepolicydefinitionslaterinthissec-
tion.EachtimeanewPEjisinsertedintothepooliuponaregistration,whenitsregistrationisupdated
byare-registrationoritisselectedbyahandleresolution,PEjssequencenumbersijissettoSi
andthereforeSiisincreasedbyoneaccordingtoitsdefinitioninequation4.1.Thisguaranteesthat
eachsequencenumbersikfor1≤k≤miisuniquewithinpooli.Itisobviousthattheoperations
necessarytomaintainthesequencenumberarerealizableinO(1)1time.
WhileintheoryS=N0,Smustbeoffinitesizeforanypracticalimplementation.Thatis,S
isusuallya32-bitor64-bitunsignedintegerdatatype.Therefore,carehastobetakentoavoida
sequencenumberoverowwhenSiisincremented.Inthiscase,are-sequencingisnecessary:The
PEshavetobetraversedintheircurrentorderandtheirsequencenumbershavetobesetstarting
from0.Then,Si=mi.Obviously,suchare-sequencingofpoolicanbeperformedinO(mi)time.
Inpracticehowever,usinga64-bitunsignedintegerdatatypeforthesequencenumber,are-sequencing
willneverhappeninrealisticruntimescenarios2.

WeightsandWeightSumAsanotherhelperconstructforthedefinitionofpoolpolicies,aweight
constantwik∈NisintroducedforeachPEofpooliand1≤k≤mi,asillustratedintable4.3.
Furthermore,theweightsumWiofpooliisdefinedasfollows:
Wi=1≤j≤miwij.(4.2)
ItisobviousthattheoperationsnecessarytomaintaintheweightsumarerealizableinO(1)time.
Now,foranynumberr∈[1,...,Wi]⊂N,exactlyonePEkofpoolifulfilsthecondition
1≤j≤k−1wij<r≤wik+1≤j≤k−1wij.(4.3)
Thispropertywillbeutilizedfortheeasyselectionofrandomelementsasbeingpresentedlaterin
.4.4.2.5subsubsection

RobinRound4.4.2.2ThefirstpolicytobedefinedisthedefaultpolicyofT¨uxenandDreibholz(2006b):RoundRobin.
Usingthesequencenumbershelperconstructdefinedinsubsubsection4.4.2.1,itsdefinitionbecomes
12O(f):={g:N→N|∃c>0,n0∀n≥n0:g(n)≤cf(n)}
Nevertheless,theimplementationintroducedinthischaptercorrectlyhandlesthere-sequencing.

4.4.THEHANDLESPACEMANAGEMENTDESIGN

PolicyRRPoolElementID-#1s11=1
PoolExampleS1=4PoolElementID-#2s12=2
PoolElementID-#3s13=3
PolicyRRPoolElementID-#2s12=2
PoolExampleS1=5PoolElementID-#3s13=3
PoolElementID-#1s11=4

Table4.4:ARoundRobinPolicyExample

73

rathersimple:thesortingorderissimplytosortthePEsbysequencenumberinascendingorder.The
selectionproceduretoselectaPEistosimplytakethefirstPEoftheSelectionSet.
AnexampleforusingtheRoundRobinselectionisprovidedintable4.4.Theupperpartshows
thepoolbeforetheselection:PEs#1,#2and#3havethesequencenumbers1,2and3.Therefore,the
poolsglobalsequencenumberS1is3+1=4,accordingtoequation4.1.SelectingaPEmeanstaking
thefirstPEoftheSelectionSet,i.e.PE#1isselected.Afterthat,asdefinedinsubsubsection4.4.2.1,
thesequencenumberofPE#1issettothepoolsglobalsequencenumber(4)andtheglobalsequence
numberisincrementedbyone(to5).SincePE#1snewsequencenumbernowisthehighestonein
thepool,thisPEgoestotheendoftheSelectionSet.AsubsequentselectionwillchosePE#2,then
PE#3,thenPE#1again,etc..Thatis,thedesiredroundrobinbehaviourisprovided.

RobinRoundeightedW4.4.2.3ImplementingtheWeightedRoundRobinpolicyisslightlymorecomplicatedthanasimpleRound
Robinselection:sincePEsmaybeselectedmultipletimesperround,itisfirstnecessarytointroducea
countervij∈NforeachPEj(1≤j≤mi)ofpooli.TheintroducedvariablehasbeencalledVirtual
Counter(sincePEsmayhavemultiplevirtualoccurrencesintheroundrobinlist).Furthermore,it
mustbepossibletodecidewhetheraPEhasalreadyleftthecurrentround.Therefore,aRound
Counterrij∈N0isintroducedforeachPEj(1≤j≤mi)ofpooli.Thisroundcounterrijdenotes
thenumberoftheroundrobinroundthePEawaitsitsnextselectionin.
EachtimeaPEentryjofpooliisselected,itsvirtualcountervijisupdatedasfollows:
vij=vij−1(vij>1).
)else(wijInthesecondcase(vij=1)thePEhasleftthecurrentroundrobinround.Therefore,itsroundcounter
rijmustbeincrementedbyoneanditsvirtualcountervijberesettothePEsweightconstantwij.
Carehastobetakenofanoverowoftheroundcounterdatatypewhenactuallyimplementingit:In
thiscase,thePEsroundcountersinpoolicanberenumberedinO(mi)time.However,usinga
sufficientlylargedatatype,thispracticallyneverhappens3.
ThesortingorderoftheSelectionSetisdefinedassortingbythecompoundkeyof
Theroundcounterinascendingorder,
Thevirtualcounterinascendingorderand
3Nevertheless,theimplementationintroducedinthischaptercorrectlyhandlestheroundcounterrenumbering.

74

CHAPTER4.THEHANDLESPACEMANAGEMENT

PolicyWRRPoolElementID-#5w11=2r11=20v11=2s11=5
Poolh1S1=8PoolElementID-#1w12=1r12=20v12=1s12=6
PoolElementID-#9w13=1r13=20w13=1s13=7
PoolElementID-#1w12=1r12=20v12=1s12=6
Poolh1PolicSy=9WRRPoolElementID-#9w13=1r13=20v13=1s13=7
1PoolElementID-#5w11=2r11=20v11=1s11=8
PolicyWRRPoolElementID-#9w13=1r13=20v13=1s13=7
Poolh1S1=10PoolElementID-#5w11=2r11=20v11=1s11=8
PoolElementID-#1w12=1r12=21v12=1s12=9

Table4.5:AWeightedRoundRobinPolicyExample

PolicyLUPoolElementID-#7l11=10s11=6
PoolExampleS1=8PoolElementID-#2l12=10s12=7
PoolElementID-#11l13=40s13=3
PolicyLUPoolElementID-#2l12=10s12=7
PoolExampleS1=9PoolElementID-#7l11=10s11=8
PoolElementID-#11l13=40s13=4

Table4.6:ALeastUsedPolicyExample

Thesequencenumberinascendingorder.
AsfortheRoundRobinpolicy,thedefaultselectionprocedure(i.e.takingthePEidentitiesfromthe
topofthelist)isapplied.
AnexamplefortheWeightedRoundRobinpolicyisprovidedintable4.5.Theupperpartofthe
tableshowstheinitialstateofthepool:allthreePEsareinround20(r11=20;r12=20;r13=20),
PE#5isweightedby2(w11=2)andstillhastobeselected2times(v11=2)initscurrentround
(20);theotherPEsareweightedby1(w12=1;w13=1)andstillhavetobeselectedonce(v12=1;
).1=v13ThefirstselectionresultsintakingPE#5;thepoolstateafterthisselectionisshowninthemiddle
partoftable4.5.Thevirtualcounterv11ofPE#5hasbeendecreasedby1,i.e.thisPEstillhastobe
selectedonceinthecurrentroundrobinround(20).Asubsequentselection–takingPE#1–results
inthepoolstatusshowninthelowerpartoftable4.5:PE#1hasleftthecurrentround20(hence
r12=21);thenexttimethisPEwillbeselectedagainiswithinthefollowinground21.

4.4.2.4LeastUsedandPriorityLeastUsed
UsingtheLeastUsedpolicy,aloadstatelij∈[0,1]⊂RisprovidedforeachPEj(1≤j≤mi)of
pooli.Clearly,thesortingorderoftheSelectionSetisdefinedbyasortingkeycomposedof

4.4.THEHANDLESPACEMANAGEMENTDESIGN

PolicyWRANDPoolElementID-#7s11=1,w11=1
PoolExampleS1=5PoolPoolElementElementID-#8ID-#2ss1312==32,,ww1312==23
W1=7PoolElementID-#6s14=4,w14=1

Table4.7:AWeightedRandomPolicyExample

75

Theloadstateisascendingorderand
Thesequencenumberinascendingorder.
WhiletheloadstateobviouslyensuresthataleastutilizedPEisselected,thesequencenumbernot
onlyensuresuniquenessofthecomposedsortingkey,butalsoprovidesaroundrobinselectionamong
multipleleast-loadedPEs.Thispropertywillalsobedemonstratedbytheexamplebelow.Theselec-
tionprocedureisthedefaultone,i.e.thePEsaretakenfromthetopoftheSelectionSet.
Table4.6providesanexamplefortheLeastUsedpolicy;theupperpartshowsthepoolstate
beforetheselection.Inthisinitialstate,PE#7andPE#2areloadedby10%,whilePE#11isloaded
by40%.WhenPE#7isselected(sinceitisthefirstPEoftheSelectionSet),itssequencenumber
isupdatedaccordingtosubsubsection4.4.2.1.Thelowerpartoftable4.6showsthattheselected
PE#7isre-insertedintotheSelectionSetaslastelementofthe10%loadsection,duetoitsupdated
sequencenumber.Thatis,usingthesequencenumberassecondpartofthecomposedsortingkey
ensuresroundrobinselectionamongmultipleleast-usedPEs.
ThedefinitionofPriorityLeastUsedisquitesimilartoLeastUsed.Nexttothedefinitionofthe
loadstate,aloadincrementvaluelˆij∈[0,1]⊂RisdefinedforeachPEj(1≤j≤mi)ofpooli.
TheloadincrementlˆijdescribestheadditionalloadofthePEcausedbythehandlingofafurther
session.Then,thecomposedloadcanbedefinedaslij∗=min(1.0,lij+lˆij),i.e.thesumofboth,the
loadstateandtheloadincrement,butnotexceeding100%.Finally,thesortingorderoftheSelection
Setisdefinedusingasortingkeycomposedof
Thecomposedloadinascendingorderand
Thesequencenumberinascendingorder.

4.4.2.5RandomandWeightedRandom
RandomselectionscannotsimplytakePEsfromthetopoftheSelectionSet.Therefore,theweight
sumhelperconstructdefinedinsubsubsection4.4.2.1isutilizedfortheselectionprocedure:the
weightconstantwijofeachPEj(1≤j≤mi)ofpoolicorrespondstoPEjsproportionalselection
probability.Then,theselectionprocedureissimplytochoosearandomnumberr∈R[1,...,Wi]
(seealsoequation4.2)andtaketheelementkthatuniquelyfulfilsequation4.3.Usingauniformdis-
tributionforthechoiceofr,theWeightedRandompolicyprovidesthedesiredpropertyofchoosing
elementswithprobabilitiesbeingproportionaltothePEsprovidedweightconstants.Randomselec-
tionisonlyaspecialcaseofWeightedRandomselection,wherewij=1foreachPEj(1≤j≤mi)
ofpooli.SinceitisnecessarytodefineauniquesortingorderontheSelectionSetforstoragepur-
poses,simplythePEIDistakenassortingkey.
AnexamplefortheselectionprocedureoftheWeightedRandompolicyisprovidedintable4.7:
thePEs#7,#2,#8and#6areweightedby1,3,2and1;therefore,W1=1+3+2+1=7.

76

CHAPTER4.THEHANDLESPACEMANAGEMENT

where?UsedimerT1Keep-AliveTransmissionPR-H
2Keep-AliveTimeoutPR-H
3LifetimeExpiryPR(butnotPR-H)
CachePU-sideExpiryCache4

Table4.8:TheTimersoftheHandlespaceManagement

Figure4.2:TheTimerScheduleStructure

ToselectaPE,arandomnumberr∈R[1,...,7]ischosen.Letr=5.Inthiscase,onlyk=3
satisfiestheconditionofequation4.3:1≤j≤k−1w1j<5≤w1k+(1≤j≤k−1w1j);thatis,
1+3<5≤2+(1+3)andthereforetheselectedPEisthethird(k-th)ofthelist:PE#8.

ManagementimerT4.4.3Table4.8liststhesetofhandlespacemanagementtimersforPEstructuresandplaceswherethe
specifictimersareused.Asshown,thetwoKeep-AlivetimersareonlyusedbyPR-Hs,whilethe
cacheLifetimeonly.ExpirytimerFurthermore,isusedabyPR-Hnevnon-PR-HserusesonlybothandKtheeep-AliCachevetimersExpiryTimerissimultaneously:usedbyifthetheKPU-sideeep-
AliveTransmissiontimerisscheduled,thePR-HdoesnotwaitforatimeoutofasentASAPEndpoint
Keep-Alivemessage(usingtheKeep-AliveTimeouttimer)andifithassentsuchamessage,itdoes
notscheduleanothertransmissionuntilthemanagedPEanswersusinganASAPEndpointKeep-Alive
Ack.timerThefactmanagement:thatonlyforoneanyofPEthestructure,timersitcanisbeonlyschedulednecessaryattoanystorecertaithetnypepointofoftimertimeanditssimplifiesschedulethe
timestamp.Itisthenpossibletorealizethehandlespacetimermanagementasshowninfigure4.2:the
TimerSchedulecanbedesignedashandlespace-globalsetofPEreferences,obviouslysortedbythe

4.4.THEHANDLESPACEMANAGEMENTDESIGN

Algorithm2TheTwo-Part16-BitInternetChecksumAlgorithm
1unsignedintbeginCalculateInternetChecksum16(void*data,sizetcount)
{23unsignedshort*addr=(unsignedshort*)data;
4unsignedintsum=0;
56while(size>=sizeof(*addr)){/*Maincalculationloop*/
7sum+=*addr++;
8size−=sizeof(*addr);
}910if(size>0){/*Addleft−overbyte,ifany*/
11sum+=*(unsignedchar*)addr;
}1213return(sum)
}141516unsignedshortfinishCalculateInternetChecksum16(unsignedintsum)
{1718while(sum>>16){/*Fold32−bitsumto16bits*/
19sum=(sum&0xffff)+(sum>>16);
}2021return(˜sum);
}22

77

timerscheduletimestampinascendingorderinthefirstorder.Toenforceuniquenessofthesorting
order,thePEreferencesarestoredwithasortingorderdefinedbythesortingkeycomposedoftime
stamp,PEIDandPH.UponexpiryofaPEsscheduledtimer,thetype-specifichandlingprocedure
called.betohasHandlingChecksum4.4.4Asexplainedinsubsection3.10.5,the16-bitInternetChecksumalgorithmdefinedinBradenetal.
(1988),Rijsinghani(1994)isusedtocalculatethehandlespacechecksum.Thischecksumalgorithm
providesanimportantproperty:itallowsincrementalupdates.Thatis,itisnotnecessarytore-
calculatethechecksumofthecompletehandlespaceuponachangeofthehandlespacedata.Ifa
newPEisadded,itschecksumcanbeaddedtothecurrenthandlespacechecksum.Onremoval,
thechecksumofthePEcansimplybesubtracted.IfaPEsregistrationinformationisupdatedwith
informationaffectingitschecksum,thePEchecksumbeforethechangecanbesubtractedandthenew
PEchecksumcanbeadded.Insummary,thechecksummaintenancecanberealizedinO(1)time.
However,toactuallyimplementanincrementalchecksumupdate,itisinsufficienttoonlystore
the16-bitchecksumascomputedfromthefunctionpresentedinalgorithm1(seesubsection3.10.5).
Thereasonrequiresaslightlyfurtheranalysisofthechecksumalgorithm:atfirst,thememoryblock
overwhichthechecksumhastobecalculatedisviewedasanarrayof16-bitunsignedintegervalues.
These16-bitvalues,plusapossibleleft-overbyte,aresummedupusinga32-bitaccumulator(line6
to12).Inthefinalstep(line13to16),thenumberof16-bitoverows(thecarrypartintheupper16
bits)isaddedtotheaccumulator,whichnowfitsinto16bits.Afterthat,theaccumulatorisinverted
andreturnedas16-bitchecksumvalue.
Inordertoperformincrementalupdates,itisnecessarytokeepthefullaccumulatorvalueas

78

CHAPTER4.THEHANDLESPACEMANAGEMENT

Figure4.3:TheOwnershipSetStructure

providedbythefirstpartofthealgorithm.Usingthisvalue,checksumadditionsandremovalsare
besimplyobtainedperformedbyapplyingbyrethegularcarry32-bitadditionadditionsandinandversion.subtractions.Theactualchecksumvaluecanthen
rithmThe2.Itsresultingfunctiontwo-partcalculationbeginCalculateInternetChecalgorithmfortheksum16()16-bitcalculatesInternettheChecksum32-bitisshoaccumulatorwninvalue.algo-
Thisvalueisusedinallplacesofthehandlespacestoragestructures.Therefore,theadditionand
subtractionofchecksumsbecometrivial.Theactual16-bitInternetChecksumvalueisobtainedfrom
ifthetheaccumulatorchecksumvhasaluetobybeecallingxportedfromthefinishCalculateInternetChechandlespacemanagement.ksum16().This16-bitvalueisonlyused

onizationSynchr4.4.5Forthehandlespacesynchronizationprocedureasdefinedinsubsection3.10.5,itisrequiredtobe
abletoresumethetraversalofthehandlespaceatacertainrememberedpoint.Thena¨ıvewayto
rememberthehandlespacepositioncouldbeapointertothelastPEstructurebeingdeliveredtothe
therequestinghandlespacePRinancontentENRPmayHandlechangeTablebetweenResponsetwosuccessimessage.vecallsTheofdiftheficultyhsMgtTofthisraverse()approachisfunction:that
therememberedPEmayalreadybegone.Therefore,suchapointerrequiresactivemaintenance–
inparticularprocessingtime–oneveryhandlespacechange.Furthermore,multiplesynchronization
operationsmaybeinprogresssimultaneously(withdifferentpeerPRs,ofcourse).Thismeansthat
pointerapproachwmaintenanceouldimplytoactuallypossiblymeanskputeepingsignificantasetofcostssuchonpointersfrequentup-to-date.operationsIntokeepsummarythe,theinfrequentpointer
synchronizationoperationsimple.Therefore,asuperiorapproachseemedtobeappropriate.
lastThePEobtainedresultingbythetrasynchronizationversalfunction.implementationAsubsequentapproachcallcansimplythenfindremembersthePEPHinandthePEIDhandlespaceofthe
whichhasthenext-nearestidentityaftertherememberedone,accordingtothesortingorderofpools
(byfromPH)theandobtainedPEs(byPE.PEThatID).is,onlyAfterthethelookupsynchronizationhasbeenperoperationformed,itselfthecarriestraversalthebcanurdenbeofcontinuedmaking

4.4.THEHANDLESPACEMANAGEMENTDESIGN

Figure4.4:AnOverviewofStorageStructures

79

itsownresumptionpossible–allotherhandlespaceoperationsremainunaffected.
Tintroduced.osimplifyInthistravset,ersingthethePEPEsoreferenceswnedbyareastoredspecificwithPR,aasortingfurthersetorderdenoteddefinedasbyOwnertheshipsortingSetkeisy
composedofHome-PRID,PEIDandPH.Figure4.3presentsanexampleOwnershipSetstructure.

ManagementHandleoolP4.4.6ThelengthofRSerPoolapoolstandardshandleisonlydocumentslimiteddobynotthesetalimitmessageforsizethesize(therefore,ofapoolitcouldhandle.beasWhilelonginasfactaboutthe
65,000bytes;seesection3.8),itisimpracticaltosupportoverlylonghandles.Inthiscase,itwouldbe
necessarytodynamicallyallocatememoryforaPHinadditiontothepoolstructureitself.Therefore,
ishasbeendecidedtolimitthePHsizeto32bytesandreserveafixedsizeof32bytesforthePH,as
partofthepoolstructureitself.Thisresultsinasimplificationofthehandlespacemanagement.
32bytesseemtobesufficientforanyreasonableASCIIorUnicoderepresentationofapooland
arealsosufficientforstoringa256-bitSHA-256hashvalue4orarandomvalueofthesamesize.Such
hashvaluescouldbeusedtocreatePHsinthescenarioofRSerPool-basedMobilitySupportforSCTP
).3.6.6subsection(seeLastbutnotleast,areasonablelimitforthePHsizealsoenhancessecurity:thePHistheonly
refieldgisteringofanPE.ASAPThatReis,angistrationattackermessagehaving(seegainedsubsubsectitheonpermission3.9.2.1)torewhichgistercanPEsbewfreelyouldbesetbyabletheto
quicklyletPRsallocatelargeportionsofmemoryinordertostorePHs.

4.4.7StorageStructuresandAlgorithms
TheprevioussubsectionsfrequentlyusedthetermSettodenoteastorageclassbeingabletokeep
objectsinacertainorder.However,ithasnotbeenexplainedhowaSetisactuallyimplemented.This
4SeeNIST(2002),EastlakeandJones(2001),Rivest(1992).

80

CHAPTER4.THEHANDLESPACEMANAGEMENT

AlgorithmOperationAverageRuntimeWorstCaseRuntime
LinearListRemoInsertionvalOO((nn))OO((nn))
LookupInsertionOO((logn)n)OO((nn))
BinaryTreeRemoLookupvalOO(log(lognn))OO((nn))
TreapRemoInsertionvalOO(log(lognn))OO((nn))
LookupInsertionOO(log(lognn))OO(log(n)n)
Red-BlackTreeRemoLookupvalOO(log(lognn))OO(log(lognn))

Table4.9:StorageStructuresandtheirComputationalComplexity

taskisthefinalsteptocompletethehandlespacemanagement.Clearly,sincetheefficiencyofthe
algorithms,handlespaceitismanagementcrucialtoheacarefullyvilyreliesconsideronthetheirchoiceperformanceandoftheimplementation.underlyingstoragestructuresand
Thena¨ıvesolutiontoactuallyrealizeaSetisobviouslyalinearlist,amoreappropriatestorage
structureisbasedonbinarytrees.Toexplainandcomparevariouspossiblestructureswouldexceed
thescopeofthischapter;suchdescriptionscanbefoundincomputerscienceliteraturelikeCormen
aetval.erage(1998and).wInorstshort,caseaccesspossibletimessolutionsofbothforthestructureSetsaredatatypepresentedarelinearintablelistsand4.9,anbinaryillustrationtrees.Theof
thestructuresisprovidedinfigure4.4.Whileintheaveragecaseoperationsonabinarytreehavea
runtimeofO(logn),theirworstcaseruntimeisstillinO(n),asforthelinearlist.However,balanced
treesliketheAVLtree(seeAdelson-VelskiiandLandis(1962))ensurethatthisworstcase(almost)
occurs.ervneThestate-of-the-arttechniquesforbalancedtreesarethefollowingstructures:
Treap:Arandomtreapisnumbera,binarywhichtree,ischosenwherewheneachthenodenodenisincludesinserted.apriorityTheninsertionPriority.andThisremovpriorityaloperis-a
ationsonatreaparesimilartoaregularbinarytree,exceptforenforcingthefollowingcondi-
tionbyapplyingappropriaterotationsofthetreesnodes:ifnodecisachildofnodep,then
cPriority≥pPriority.TheoperationsruntimeofthetreapisO(logn)onaverage,butstillO(n)
intheworstcase.However,therandomizationmakesthiscaseveryunlikely.Foradetailed
introductiontotreaps,seeSeidelandAragon(1996),AragonandSeidel(1989).
Red-BlackTree:Unlikethetreap,thered-blacktreeusesadeterministicstrategytokeepitbalanced.
Thisfollows:resultseachinanodenguaranteedincludesaoperationscolournruntimeColour,ofwhichO(logmayn).beAeitherred-blackredortreeblackis(hencedefinedtheas
namered-blacktree).Byappropriatelyrotatingthenodesofthetree,thefollowingconstraints
areenforceduponinsertionandremovalofnodes:
Therootnodeisblack,
Allleavesareblack,

4.4.THEHANDLESPACEMANAGEMENTDESIGN

Figure4.5:ALinkageImplementationusingSeparateNodeStructures

Figure4.6:ALinkageImplementationusingIntegratedNodeStructures

81

Bothchildrenofarednodeareblackand
Allpathsfromanygivennodetoitsleafnodesincludethesamenumberofblacknodes.
Basedontheseconstraints,itcanbeproventhatthelongestpossiblepathfromtheroottoaleaf
isDetailsatmostontwicered-blackaslongtreesascanthebeshortestfoundinpossibleGuibaspath.andSedgeThatis,wicka(log1978),arithmicCormenheightetisal.(1998assured.).
AnimportantobservationofthehandlespacemanagementsusageoftheSetdatatypeisthatac-
cessestothenextorpreviouselementofagivenelementarefrequent.Ausefuloptimizationifusing
atree-basedimplementationmightthereforebetofurtherlinktheelementsusingadoubly-linked
linearlist.ThisstrategyisdenotedasLeaf-Linking.AssuminganaccessruntimeofO(logn),this
additionallinkingdoesnotincreasetheoverallcomplexity.
SincetheimplementationoftheSetdatatypeiscrucialforthehandlespacemanagementperfor-
mancemanceforanditrealisticisnotobhandlespaceviouslyclearsizeandwhichaccessimplementationscenarios,optionperformanceactuallytestsprohavvidesebeensuperiormade.perforTheir-
resultswillbepresentedinchapter7.

ImplementationNode-Linkage4.4.8IndependentfromthestoragealgorithmusedtoimplementtheSetdatatypetokeepthehandlespace
structuresandtimers,itisworthtothinkabouthowobjectsarestoredinsuchsets.Well-knownli-

82

CHAPTER4.THEHANDLESPACEMANAGEMENT

brariessuchasGLIB(seeGNOMEProject(2001))ortheclassesprovidedbyOMNET++(seeVarga
(2005b,a))workasfollows:wheneveranobjectisaddedtoastorageclasslikealistortree,anode
structureiscreatedfirst.Thisnodestructureistheninsertedintothelistitselfandreferencesto
theactualobject.Figure4.5illustratesthisdesignforthehandlespacemanagementstructures.The
advantageofthisapproachisthattheactualobjectwhichismanagedbythelistdoesnotneedto
knowanydetailsofthelistitisstoredin;inparticular,anarbitraryobjectmaybekeptinavirtually
lists.ofnumberunlimitedHowever,theapproachofusingseparatenodeshassomeseveredisadvantages:First,theinsertion
ofobjectsmayfailduetoinsufficientresourcestoallocatethenodestructure.Whilecapturingthis
exceptionisnosignificantproblemife.g.anewPEisinserted(inthiscase,thePEregistrationcould
berejectedwithanappropriateerrorcode),failinge.g.toscheduleatimerbecomesasevereproblem:
shouldaPEberemovedifitisimpossibletoscheduleitsKeep-AliveTimeoutTimer?Whatshould
theRSerPoolprotocolsdo?ASAPdoesnothaveanypossibilitiestomeaningfullysignalsuchan
eventtothePE.
Anotherproblemofusingseparatenodestructuresismemoryfragmentation5:amodernoperating
systemutilizesthememorymanagementunit(MMU)ofaprocessortomapvirtualaddressspaceto
physicalmemoryandthereforeacontinuousmemoryblockinvirtualaddressspacemayactuallymap
tophysicalmemoryblocksbeingwidelydistributedinthephysicaladdressspace(seeEisele(2002)
formoredetails).However,lightweightoperatingsystemslikeAmigaOS/AROS(AmigaResearch
OperatingSystem,seeAROSDevelopmentTeam(2005a))orinparticularrouteroperatingsystems
liketheCisco™InternetOperatingSystem(IOS,seeCiscoSystems(2004))donotsupportaMMU–
thehardwareforwhichthesesystemsaredesignedforsimplydoesnotprovideit.Overthetime,the
memoryofsuchoperatingsystemsgetsfragmented:duetoconsecutiveallocationsanddeallocations
ofsmallmemoryblocks,thenumberoflargecontinuousmemoryblocksdecreases,makingitmore
andmoredifficulttofindsuchablock.Aftersometime,allocationsoflargermemoryblocksmayfail,
duetothelackofcontinuousmemoryblocks–althoughamplefreememory(butinsmall,scattered
pieces)maybefree.Theapproachtocopewithsuchproblemsistokeepper-applicationpoolsof
pre-allocatedmemoryandtaketheapplicationmemoryoutofsuchpools(seeAROSDevelopment
Team(2005b)forhowthisisrealizedforAmigaOS/AROS).
Finally,allocationsanddeallocationsrequireruntime–regardlessofwhetherthememoryistaken
fromadedicatedmemorypoolorfromaglobalmemorymanagement.Themorenodestructuresare
createdanddestroyed,themoreruntimeiswastedformemorymanagement.
Insummary,theapproachofseparatenodestructuresseemstobeunsuitableandinefficientforthe
handlespacemanagement.Therefore,asuperiorapproachhasbeenchosen:theobjectstobestored
aredirectlyequippedwiththenodestructuresnecessarytokeepthemintheirsets.Forexample,PE
referencesareusuallykeptintheIndexSet,SelectionSet,OwnershipSetandtheTimerSchedule.
Therefore,thePEstructureincludesoneintegratednodestructureforeachofthefoursetsasshown
infigure4.6.AspartofthePEstructure,thememoryisalreadyavailablewiththePEstructureitself.
NofurthermemorymanagementoperationsforthisPEarenecessaryuntilitisfinallyremoved.In
particular,noSet-relatedoperationforthisPEcanfailduetoalackofresources.Andfinally,the
approachsavessome6amountofmemory,sinceitismoreefficienttomanageoneblockofallocated
memorythantomanagefive(thePEstructureitselfplusfourseparatenodestructures).

56TheSeeNeelyamount(of1996sa),vedBergermemoryetal.is(2001highly),Berdependentger(2002on),theBerCPUgeretal.(architecture2002)forandthedetailsusedonthisoperatingsubject.system.

4.5.THEHANDLESPACEMANAGEMENTVALIDATION

83

4.5TheHandlespaceManagementValidation
Thecodeactuallyimplementingthehandlespacemanagementdesignintroducedinsection4.4in-
cludesmorethan11,500linesofCcode,ofwhichabout5,500linesactuallyimplementthehandle-
spacemanagementandabout3,500linesrealizefourclassesofstorageimplementations:linearlist,
binarytree,treapandred-blacktree.Theremaininglinesprovidehelperfunctionsforvarioustasks
–likerandomnumbergeneration,stringhandling,timestampmanagementandconversionaswellas
thehandlingoftransportaddresses.Althoughcarefullycreated,itisobviousthatthecodehassome
errorsanditthereforehasbeennecessarytotakeadequatecareofitsvalidationbeforeusingitfor
tests.andsimulationsThevalidationstrategyisbasedonfourbuildingblocks:assertions,functionsforconsistency
checking,regressiontestsandtheusageofdebuggingsoftware.Thesefouritemsaredescribedinthe
wing.follo

Assertions4.5.1AnimportantexperiencefromtheimplementationoftheRSPLIBandformerprojectshasbeenthat
wrongparameterspassedtofunctionsmayoftenresultinproblemsatcompletelydifferentpartsof
thecode.Sincesucherrorsaredifficulttotrackdown,thehandlespacemanagementimplementation
checksassertionsatallcrucialpartsofthecode.Especially,thismeanstocheckthevalidityofim-
portantparameterspassedtointernalfunctionsinthehandlespacemanagement.Forpublicfunctions,
suchchecksaremandatoryinordertoprovidearobustimplementation.Furtherimportantassertions
tocheckaree.g.whetherornotatimerisalreadyscheduledwhenitistriedtoscheduleorcancelit.
Togetherwiththefollowingpartofthevalidationstrategy–theconsistencycheckingfunctions–as-
sertioncheckinghasbeenproventobeaveryusefultechniqueinthedevelopmentofthehandlespace
management.

FunctionsCheckingConsistency4.5.2Thehandlespacemanagementconsistsofacomplexstructureofpools,PEsandtimersaswellasa
connectiontothepeerlistmanagement.Afunctionwhichcanverifytheconsistencyofthehand-
lespacemanagementthereforehasbeenrealized.Thisfunctionincludescheckslikethefollowing
ones:Arethestoragestructures(e.g.red-blacktree)valid?
IseachPEreferencedintheIndexSetalsoreferencedbytheSelectionSetandviceversa?
DoeseachPEinapoolusethesamepolicy,transportprotocolandoptions?
ArethepolicyparametersofeachPEvalid?
HaseachPRentryinthepeerlistthecorrectchecksumreferringtothemanagedhandlespace?
Thehandlespacemanagementvalidationfunctionisinvokedinformofanassertioncheckafter
eachoperationonthehandlespace.Sincethisoperationisratherexpensive,thisassertioncheckis
turnedonbyacompile-timeoptionfortestingpurposes.Productionversionsoftheimplementation
donotapplytheconsistencychecks.

84

CHAPTER4.THEHANDLESPACEMANAGEMENT

estsTessionRegr4.5.3Toverifythecorrecthandlingofdifferentinputbythehandlespacemanagementoperations,regression
testshavebeenapplied.Asetofdifferentinputdatatogetherwiththeexpectedresultshasbeen
collectedinformoftestroutineswhichareinvokedbytheregressiontestprogramwiththeconsistency
checkingfunctionsenabled.Forexample,thereisatestroutinecheckingthattheregistrationfunction
rejectsaregistrationtryingtoregisteraPEusingtheLeastUsedpolicyintoanalreadyexisting
poolusingWeightedRandom.Theregressiontestprogramhasbeenusedaftereverychangeof
theimplementationtoverifythatitisstillworkingcorrectlyforthecollectedtestcases.During
developmentthetestcaseshavebeencontinuouslyextendedeachtimenewbutuncoverederrorshave
beendetected,sothatthecurrentversion–includingabout1,900linesoftestcode–canbeexpected
tobequitereliablycoverthecasesofdifferentinputs.

eSoftwaralidationV4.5.4Tofurtherenhancethecorrectnessofthehandlespacemanagementimplementation,thememoryde-
buggingsoftwareVALGRIND7hasbeenintensivelyused,inparticularincombinationwiththere-
gressiontestprogram.Inshort,VALGRINDisax86binarycodeinterpreterthatactuallyexecutesa
programandkeepstrackofallmemoryaccesses.Inparticular,itdoesnotonlyrememberwhichbit
belongstoanallocatedchunkofmemorybutalsokeepstrackwhichbitisstilluninitialized.
Thatis,VALGRINDdoesnotonlydetectaccessestoinvalidoralreadydeallocatedmemoryblocks
butalsowarnsifanuninitializedbitisusede.g.inaconditionalbranch.Especiallythesecondcategory
oferrorsisotherwiseverydifficulttodetect:forexample,a16-bitvariableisuninitialized,i.e.it
containsarandomvalue.Whiletheprobabilityis65,535:1thatitcontainsanon-zerovalue,itmay
justcontain0inaninappropriatemomentandleadtoasevereandalmostuntraceable(sincedifficult
toreproduce)malfunctionofthesystem.

Summary4.6Inthischapter,thedesignandimplementationofthehandlespacemanagement–usedforboth,the
RSerPoolprototypeimplementationRSPLIBaswellasthesimulationmodelRSPSIM–havebeen
introduced.Themainrequirementsforthehandlespacemanagementapproachhavebeenthesupport
andextensibilityforvariouspoolpoliciesaswellastheefficienthandlingoflargepools.These
ofsortedrequirementssets,arewhileachiepoolvedbypoliciesarereducingdefinedtheefbyfortaofcertainmaintainingsortingordertheaswellhandlespaceasatoselectionthemanagementprocedure.
Basedontheseideas,asetofpolicyrealizationshasbeenpresented,togetherwithdetailedexamples.
Afterhandlethat,themanagementhandlinghaveofbeentimers,ethexplained.checksumFinally,somehandling,generalthesynchronizationoptimizationsoftheprocedurehandlespaceandthedatapool
storagehavebeenpresented.ThelastopenissueishowtoactuallyrealizetheSetdatatypeonwhich
thehandlespacemanagementisbased.Thisquestionwillbeansweredaspartofthehandlespace
managementperformanceevaluationsinchapter7.

7SeeSewardandNethercote(2005),ValgrindDevelopers(2005).

5Chapter

TheRSPLIBPrototypeImplementation

HISchapterdescribesthedesignandimplementationoftheRSPLIBprototype.First,ashort
overviewoftheprototypeshistoryisgiven.Afterthat,therequirementsfortheprototype
Tanditsimportantdesigndecisionsareexplained.Thisisfollowedbyadescriptionofthe
prototypeparts:thePR,thePU/PElibraryandthedemonstrationsystem.AsurveyofotherRSerPool
.chapterthisconcludesimplementations

oductionIntr5.1ThebeginningofourRSerPoolactivitieshasbeenathree-yearcooperationprojectbetweenourgroup
(i.e.theComputerNetworkingTechnologyGroupoftheInstituteforExperimentalMathematicsat
theUniversityofDuisburg-Essen)andSiemensAG,Munich,whichhasalsobeensupported1bythe
Research).BundesministeriumGoaloff¨thisurBildungcooperation,undFstartedorschungOctober(BMBF01,,i.e.2001,thehasGermanbeentoMinistryrealizeforthewEducationorldsfirstand
OpenSourceprototypeimplementationoftheupcomingRSerPoolstandardinorderto
VerifythattheprotocolsdefinedbytheIETFdraftsareactuallyusefulandworking,
ToperformresearchonthecapabilitiesofRSerPoolandsuggestimprovementsofthestandard,
aswellasBringingchangesandimprovementsintotheIETFstandardizationprocess.
TheRSerPoolproject–calledtheRSPLIBprototype–hasbeenthecontinuationofanothersuccess-
fulcooperationwithSiemensAG:theimplementationoftheOpenSource,userlandSCTPproto-
typeSCTPLIBtogetherwithitsAPIlibrarySOCKETAPI(seeJungmaier(2005)andT¨uxen(2001)).
SinceOctober01,2004,ourRSerPoolactivitiesincludingtheRSPLIBprototypearesupportedbythe
DeutscheForschungsgemeinschaft(DFG),afterthecooperationprojectwithSiemensAGhadbeen
before.daythefinishedsuccessfullySincethebeginning2oftheproject,theRSPLIBprototypeispubliclyavailableundertheGNU
GeneralPublicLicense(GPL)andcanbedownloadedatDreibholz(2006c).Itisnowusedbythe
IETFRSerPoolWGasreferenceimplementation.Theprototypeimplementationconsistsofthree
separateparts–aPR,alibraryforPUsandPEs,aswellasademonstrationsystem.Theseparts
12FSee¨orderkFreeSoftwennzeichenareF01AK045.oundation(1991).

85

86

CHAPTER5.THERSPLIBPROTOTYPEIMPLEMENTATION

aredescribedafterfirstdefiningtherequirementsfortheprototypeandprovidingasurveyofthe
decisions.designimportant

5.2TheRequirementsforthePrototype

AnimportantrequirementfortheRSerPoolprototypeimplementationtobedesignedandimple-
mentedhasbeentomakeitOpenSource.Thatis,everybodyinterestedinthedeploymentofRSerPool
isabletotesttheprototypeandtopossiblycontributeanextensionorimprovement.Therefore,we
havechosentheGPLlicense,whichononehandallowsmodifications,butontheotherhandalso
forcesthatmodifiedversionshavetobemadepubliclyavailableagain.
Themostfundamentalrequirementfortheprototypehasbeentheindependenceoftheunderlying
hardwarearchitectureandoperatingsystem.Inparticular,theprototypeshouldhavebeendesigned
withapplicationsonembeddeddevicesinmind.Thatis,itshouldhavebeenpossibletousethe
prototypeinnetworkingandtelecommunicationsequipment(e.g.routersandtelephonesignalling
devices)beingbuiltbyourprojectpartnerSiemens,aswellasonmobilephonesorPDAs.

DecisionsDesignThe5.3

SinceplatformandoperatingsystemindependenceisoneofthemainrequirementsfortheRSPLIB
prototype,ithasbeennecessarytochooseacommonprogramminglanguagefirst.Althoughan
implementationinC++wouldhavesomewhatsimplifiedtheimplementationandmaintainabilityof
thecode,ithasbeendecidedtouseANSI-Cinstead:ANSI-Ccompilersareavailableforalmostany
kindofexoticplatform,whileC++onesarescarce.
TheapplicabilityoftheRSPLIBprototypeonexoticsystems–inparticularonsystemswithout
MMU–alsohashadasignificantimpactonthedesignandimplementationofthehandlespaceman-
agement.Seesection4.4forfurtherdetailsonthissubject.
In2002,IPv6supporthasalreadybeenavailablebyallmajoroperatingsystems,thereforeprovid-
ingtherequiredIPv6supporthasbeeneasy.However,theSCTPprotocolhasbeenanovelty.While
kernelimplementationsofSCTPforoperatingsystemslikeLinux(LKSCTP,seeLKSCTP(2006)),
FreeBSD(KAME,seeKAME(2006)),MacOSXandSolarishavealreadybeenunderdevelopment,
noneoftheseSCTPstackshadbyfarreachedakindofmaturitynecessarytobasealargesoftware
projectlikeaRSerPoolimplementationonthem.TheonlySCTPimplementationpowerfulandstable
enoughtobasetheRSPLIBdevelopmentonhasbeenourownSCTPprototypeSCTPLIB(seeT¨uxen
(2001)).TogetherwithitsAPIlibrarySOCKETAPI,itincludesanAPIcompatibletotheupcom-
ingstandarddefinedinStewart,Xie,Yarroll,Wood,PoonandT¨uxen(2006).Thatis,basingthe
RSPLIBprototypeonthisstandardSCTPAPIshouldensureindependencefromtheusedSCTPim-
plementation.However,notallfunctionalitiesdefinedintheAPIdraftarecurrentlysupportedbyall
implementations.Thatis,somekindofwrapperfunctionalityisrequiredfortheseimplementations
aspartoftheRSPLIB.
TheRSPLIBprototypeconsistsofaPR(describedinsection5.5),aPU/PElibrary(describedin
section5.6)andademonstrationsystem(describedinsection5.7).Sinceallthreepartsrequirea
commonabstractionlayerfortheunderlyingoperatingsystemfunctionalities,thislayerisdescribed
beforetheactualpartsinthefollowingsection.

5.4.THEFOUNDATIONCOMPONENTS

ComponentDispatcherThe5.1:Figure

ComponentsoundationFThe5.4

87

TocopewiththerequirementthattheRSPLIBprototypeshouldbeeasilyportabletonewoperating
systems,allsystem-dependentfunctionalitieshavebeenencapsulatedintoanabstractionlayer.This
abstractionsub-componentslayer–hasNetbeenUtilities,calledTimerDispatcMgtherand,itsEvbentuildingCallbackblocks–arearedescribedpresentedininthefigurefollo5.1wing..These

UtilitiesorkNetw5.4.1ThedutiesoftheNetUtilitiessub-componentoftheDispatcherincludefunctionalitiesliketrans-
portaddresshandlingandconversion,byteorderconversionandsocketmanagement.Mainly,these
functionalitiesaresimplyprovidedbysmallwrapperfunctionsfortheunderlyingoperatingsystems
operations.appropriateButoneofthemostimportanttaskoftheNetUtilitiessub-componentistobuildwrappersfor
missingSCTPfunctionsofdifferentSCTPimplementationsasexplainedinsection5.3:Asof2006,
onlytheSCTPLIBimplementationprovidesfullsupportofthecompleteSCTPAPIdefinedinStewart,
Xie,Yarroll,Wood,PoonandT¨uxen(2006);forallotherimplementations,theNetUtilitiescompo-
nenthastoprovidesomewrapperfunctionalities.However,itisonlyaquestionoftimeuntilthe
functionalitiesoftheseimplementationscatchupandthewrapperfunctionsbecomeobsolete.

5.4.2TimerManagementandEventCallbackHandling
Thehandlingoftimersandeventsonnetworksockets(e.g.inputonaSCTPassociation)arehandled
bythetwosub-componentsTimerMgtandEventCallback.TheEventCallbackcomponentisre-
sponsibleformanagingeventnotificationsonnetworksockets(e.g.notifyingaboutincomingdata).
Furthermore,itsdutiesalsoincludetimereventnotificationsincooperationwiththeTimerMgtsub-
component.TheTimerMgtsub-componentitselfisresponsibleformanaging,settingandcancelling
timers.UsingtheDispatchercomponentasbase,thethreeRSPLIBpartshavebeenrealized.Theseparts
aredescribedinthefollowingthreesections.

RegistrarThe5.5

APRisabasicrequirementforaRSerPoolsystem,thereforethefirsttaskofcreatingtheRSPLIB
prototypehasbeentodesignandimplementaPR.Figure5.2showsthebuildingblocksofthePR
implementation.system-dependentClearly,functionalitiesthe(seeDispatchersection5.4component).TheprocentralvidesitselementfoundationofaPRandistheencapsulateshandlespacethe

88

CHAPTER5.THERSPLIBPROTOTYPEIMPLEMENTATION

Figure5.2:TheRSPLIBRegistrar

management.Thehandlespacemanagementdesignandimplementationaredescribedindetailin
chapter4.Asexpected,thetwocomponentsASAPProtocolandENRPProtocolprovidethePR-side
implementationofthetwoRSerPoolprotocols.Thisobviouslyincludesmessageencapsulationand
decapsulationbutalsothehandlingofkeep-aliveandexpirytimersincooperationwiththehandle-
spacemanagement(seealsosubsection4.4.3).
TheRegistrarManagementcomponentisthemediationlayerbetweenthetwoprotocolsandthe
handlespacemanagement.Itverifiesthatalloperationsonthehandlespacerequestedviaoneof
theprotocolsareallowedandrejectsrequestsifnecessary.Forexample,theRegistrarManagement
checkswhetherthetransportaddressesunderwhichaPEdesirestoregisterarematchingtheaddresses
itusesfortheASAPassociationwiththePR(seealsosubsubsection3.7.1.2).Anotherresponsibility
oftheRegistrarManagementistheauthenticationandauthorizationofrequests.
Currently,thePRprototypedoesnotimplementitsownsecuritymechanismsandreliesonIPsec3
instead.AfutureversionmayalsosupportTLS4orourownapproachSecure-SCTP(seesubsubsec-
).2.4.3.7tion

LibraryPU/PEThe5.6TheimplementationofthePEandPUfunctionalitieshasbeenrealizedasafunctionlibrarycalled
RSPLIB.Thislibraryhasalsogivenourprojectitsname.TheRSPLIBlibraryisdescribedinthis
section.AshortoverviewofitsAPIcanbefoundinsubsection5.6.2.

BlocksBuilding5.6.1ThebDispatcheruildingblockscomponentofthe(seePU/PEsectionlibrary5.4)isarereusedpresentedtoinencapsulatefigure5.3the.Asforthesystem-dependentPRimplementation,functionalities.the
theOnPUtopofandthePEsides.DispatcherIt,consiststheofASAPthefolloInstancewingcomponentsub-components:realizesthecoreASAPfunctionalitieson
34SeeSeeKentDierksandandAtkinsonAllen((19991998c),,aBlak),e-WDeeringilsonetandal.(Hinden2003().1998b).

LIBRARPU/PETHE5.6.Y

Figure5.3:TheRSPLIBLibrary

89

ASAPProtocol:Clearly,thissub-componentrealizesthePUandPEsideoftheASAPprotocoland
providesASAPmessageencapsulationanddecapsulation.
ASAPCache:TheASAPCachereusesthehandlespacemanagementimplementationdescribedin
chapter4torealizethePU-sidecache.Thatis,eachtimeahandleresolutionisperformed,its
resultsarepropagatedintothiscacheandmaybereusedforfurtherhandleresolutions(seealso
).3.7.1.1subsubsectionRegistrarTable:PRidentities–configuredstaticallybytheadministratororlearnedbylisteningto
thecorrespondingPRsmulticastASAPAnnounces–arestoredintotheRegistrarTable(see
alsosubsection3.9.4).ItisalsointheresponsibilityoftheRegistrarTabletoushexpired
entriesandtoestablishanassociationtoarandomlyselectedPRonrequest.
MainLoopThread:TheMainLoopThreadisaneventloopthathandlestimerevents(e.g.ushing
out-of-datePRentriesintheRegistrarTable)andsocketevents(e.g.answeringASAPEndpoint
Keep-Alivesasdescribedinsubsubsection3.9.2.2).InordertosimplifytheusageoftheRSer-
Poolfunctionalities,theMainLoopThreadhasbeenrealizedasaseparatethread.Thatis,it
runsinbackgroundsothattheapplicationusingthelibrarydoesnothavetotakecareofthe
processing.entveRSerPoolTheASAPInstancecannotbedirectlyaccessedbytheapplicationitself.Instead,twolevelsof
APIlayersarebuiltontopoftheASAPInstance:theBasicModeAPIandtheEnhancedModeAPI.
Theyareshortlydescribedinthefollowingsubsection.

APIoolRSerPThe5.6.2Asmentionedintheprevioussubsection,thePU/PELibraryprovidestwolevelsofAPI:theBasic
ModeandtheEnhancedMode.WhiletheBasicModeAPIonlyprovidesthebasicfunctionalitiesfor
registration,re-registrationandderegistrationforPEs,aswellasforhandleresolutionandfailurere-
portingforPUs,theEnhancedModeAPIprovidesthecompleteASAPSessionLayerfunctionalities.
rienceThewithBasictheRModeSPLIBandprototype.EnhancedTheModeyareAPIsnowhaveunderbeensuggestedstandardizationasbyparttheofourIETFasresearchresultsandofexpe-our

90

CHAPTER5.THERSPLIBPROTOTYPEIMPLEMENTATION

Algorithm3AnExampleforaPoolUserusingtheBasicModeAPI
1/*SelectapoolelementfrompoolMyPool*/
2rspgetaddrinfo(MyPool,&eai);
...345/*Createasocket*/
6sd=socket(eai−>aifamily,eai−>aisocktype,eai−>aiprotocol);
7if(sd>=0){
8/*Connecttopoolelement*/
9if(connect(sd,eai−>aiaddr,eai−>aiaddrlen){
...1011if(failure){
12/*Failureoccurred−>reportit*/
13rsppefailure(MyPool,eai−>aiidentifier);
..14.}15...16}17}18

discussions5atIETFmeetingsandaredescribedindetailintheWorkingGroupDraftSilvertonetal.
).2005(APIModeBasicThe5.6.2.1Asmentionedintheintroduction,theBasicModeAPIonlyprovidesthebasicfunctionalitiesfor
registration,reregistrationandderegistrationforPEs,aswellashandleresolutionandfailurereporting
forPUs.ThemainreasonforusingthisAPIistoprovideRSerPoolsupportinexistingapplications
whichdonotrequirethesupportoftheSessionLayer(i.e.theControlChannelbetweenPUandPE,
seesubsection3.9.5).Inthefollowingtwoparagraphs,thePUandPEsidesoftheBasicModeAPI
areshortlyexplained;moredetailscanbefoundinDreibholz(2005b),DreibholzandT¨uxen(2003).
PoolUserSideAnon-RSerPoolclientapplicationusuallyconnectstoaserverbyfirstresolvingits
hostnameintoatransportaddressusingDNSandthencreatingandconnectingasocket.TheUnix
functiontoresolveahostnameintoatransportaddressiscalledgetaddrinfo()6.
SincethemainintentionoftheBasicModeAPIistoaddRSerPoolsupporttoexistingapplica-
tions,ithasbeendecidedtomimictheDNSresolutionAPIofUnixforthehandleresolutioncall.
Algorithm3presentstheprinciple:insteadofresolvingahostnameviagetaddrinfo()intoatransport
address,theRSPLIBfunctionrspgetaddrinfo()(line1)resolvesaPHintothetransportaddressof
apolicy-selectedPE.Thestructuresincludingthetransportaddressarecompatibletothestandard
call.etaddrinfo()gIncaseofaPEfailure,itisthedutyofthePUtoreportthisfailure(usingrsppefailure()in
line13)aswellastoperformafailoverafterselectinganewPEandconnectingtoit(notshown
here).

5SeeT¨uxenetal.(2004)andSilvertonetal.(2004).
6Theoldgethostbyname()callissimilar.

YLIBRARPU/PETHE5.6.

Algorithm4AnExampleforaPoolElementusingtheBasicModeAPI
1voidregistrationLoopThread()
{23while(!shuttingDown){
4rspperegister(MyPool,...);
5usleep(reregistrationInterval);
}67rsppederegister(poolHandle,...);
}8

Algorithm5AnExampleforaPoolUserusingtheEnhancedModeAPI
1/*Createsession*/
2session=rspsocket(0,SOCKSTREAM,IPPROTOSCTP);
3rspconnect(session,MyPool,...);
45/*Runapplication:filedownload*/
6rspsend(session,GETLinux−CD.isoHTTP/1.0\r\n\r\n);
7while((length=rsprecv(session,buffer,...))>0){
8doSomething(buffer,length,...);
}91011/*Closesession*/
12rspclose(session);

91

PoolElementSideAPE-sidecodeexamplefortheBasicModeAPIispresentedinalgorithm4:
in(lineline34,to6)theinPEtheisreintervgisteredalgivbyenbycallingthevrspariableperereregistergistr().ThisationIntervalfunction.Theiscalledfunctionaspartusleepof()awloopaits
thecorrespondingtimespan.Finally,thePEisderegisteredbyacalltorsppederegister().
plicationSinceaservice,PEitshouldisnotassumedonlythattakethecarereofgistrationitsreloopgistrationfunctionbutinshownparticularinproalgorithmvide4itsisexactualecutedap-
byaseparatethread.ThisensuresthatthePEsapplicationservicewillneverbeinterruptedbyany
task.RSerPool

APIModeEnhancedThe5.6.2.2WhiletheBasicModeAPIonlyprovidesthebasicfunctionalitiesforsupportingRSerPoolinappli-
cations,theEnhancedModeAPIincludesfullsupportoftheASAPSessionLayer(i.e.theControl
Channelaswellasconnectionmaintenanceandfailover).Thenexttwoparagraphsshortlydescribe
theEnhancedModeAPIforthePUandPEside;moredetailscanbefoundinDreibholz(2005b).

PoolUserSideInanon-RSerPoolclientapplication,aconnectiontoaserverisusuallyestablished
byresolvingtheservershostnameintoatransportaddressusingDNS,creatingasocket(usingthe
Unixsocket()call)andconnectingthissockettotheresolvedtransportaddress(usingtheUnixcon-
nect()call).Afterthat,thecallssend()andrecv()canbeusedtosendandreceivedataviathesocket.
Finally,thesocketisremovedbyacalltoclose().Formoredetails,seeStevensetal.(2003).

92

CHAPTER5.THERSPLIBPROTOTYPEIMPLEMENTATION

ToreducetheeffortofaprogrammertoadaptaprogramtoRSerPool,theapproachfortheEn-
hancedModeAPIhasbeentomimictheUnixsocketsAPI.Algorithm5presentsanexample:First,
asessioniscreatedusingtherspsocket()callinline2(asessionisalsodenotedasRSerPoolSocket
forcompatibilityreasons).Afterthat,thesessionisconnectedtoapoolbycallingrspconnect()in
line3.ThepoolisgivenbyitsPH(here:MyPool).
Afterestablishmentofthesession,itcanbeusedfortheapplicationprotocol.Inline6to9,the
exampleapplicationdownloadsafilebysendingarequest(usingrspsend())andreceivingthefile
(usingasequenceofrsprecv()calls).Note,thatassoonasaPEhasreceivedthedownloadrequest
andsentacookieincludingfilenameandcurrentpositiontothePU,theRSPLIBcantransparently
handlefailovers.Thatis,noadditionalapplicationcodeisnecessary.Aftercompletionofthefile
download,thesessionisfinallyremovedusingtherspclose()call(line12).

PoolElementSideAnon-RSerPoolserverapplicationusuallycreatesasocket(againusingthe
socket()call),bindsittoaspecificportnumber(usingthebind()call,e.g.TCPport80foraweb
server),putsthesocketintothelistenmode(usingthelisten()call)andacceptsincomingconnec-
tionsusingtheaccept()call.Forservingthenewlyconnectedclient,anewthreadmaybecreated.It
handlestheapplicationprotocolonthenewconnectionusingsend()andrecv()calls.Theconnection
isclosedbyacalltoclose().Formoredetails,seeStevensetal.(2003).
AsforthePU,thePE-sideEnhancedModeAPIalsomimicstheUnixsocketsAPI.Anexample
isgiveninalgorithm6:first,aRSerPoolsocketiscreated(line21)andthePEisregisteredtoapool
givenbyitsPH(here:MyPool).TheRSPLIBlibrarywillautomaticallytakecareofre-registrations.
AfterregisteringthePE,aloopwaitsforincomingsessions(usingrsppoll()inline27),acceptsthem
(usingrspaccept()inline31)andcreatesnewthreadstoservethem.
Athreadfunctionwhichhandlesasessionispresentedinline1to16:first,itcheckswhether
thefirstmessagereceivedbyrsprecv()(line3)isastatecookie.Inthiscase,asavedsessionstate
isrestoredandthefirstcommandisread(line6to7).Afterthat,theloopfromline9to14handles
commandsandsavesthecurrentsessionstatesasstatecookies(usingrspsendcookie()inline12;the
RSPLIBlibrarysendsthecookiesviaASAPCookiemessagesovertheControlChanneltothePUs
SessionLayer).Finally,thesessionisshutdownusingrspclose().
TheEnhancedModeAPIalsoallowstheUDP-likeprogrammingmodelandpoll()/select()-based
implementations.Duetospacelimitations,theseprogrammingschemesarenotexplainedhere.De-
tailscanbefoundatDreibholz(2006c)andinSilvertonetal.(2005).

SystemDemonstrationThe5.7AspartoftheRSPLIBprototype,ademonstrationsystemhasbeencreated,inordertoillustratively
presentthefunctionalitiesofRSerPoolinareal-timedistributedcomputingscenario(seesubsec-
tion3.6.5).IthasbeenintroducedinDreibholzandRathgeb(2005a),Dreibholz(2004a)anddescribed
inmoredetailinDreibholz(2005b,2006b).Ascreenshotofthedemonstrationsystemcanbefound
infigure5.4.Thesystemconsistsofthefollowingthreeparts:
FractalPoolElement:AFractalPEprovidesacomputationserviceforMandelbrotfractalgraphics.
Itcanbeturnedintoasocalledunreliablemodewhichbreaksatransportconnectionaftera
certainnumberoftransmittedpacketstosimulateasessionfailure.
FractalPoolUser:TheFractalPUisagraphicalapplicationwhichrequeststhecomputationoffrac-
talgraphicsfromthepool.Theresultispresentedinawindow(seetheright-handsideoffig-

SYSTEMTIONDEMONSTRATHE5.7.

Algorithm6AnExampleforaPoolElementusingtheEnhancedModeAPI
1voidserviceThread(session)
{23rsprecv(session,command,...);
4if(commandisacookie){
5/*Gotacookie−>restoresessionstate*/
6Restorestate;
7rsprecv(session,command,...);
}8{od910/*Handlecommandsfrompooluser*/
11Handlecommand;
12rspsendcookie(session,currentstate);
13rsprecv(session,command,...);
14}while(sessionisactive);
15rspclose(session);
}161718intmain(...)
{1920/*Createandregisterpoolelement*/
21poolElement=rspsocket(0,SOCKSTREAM,IPPROTOSCTP);
22rspregister(poolElement,MyPool,...);
2324/*Handleincomingsessionrequests*/
25while(serverisactive){
26/*Waitforevents*/
27rsppoll(poolElement,...);
2829if(incomingsession){
30/*Acceptnewsession*/
31session=rspaccept(poolElement,...);
32Createservicethreadtohandlesession;
}33}343536/*Deregisterpoolelement*/
37rspderegister(poolElement);
38rspclose(poolElement);
}39

93

94

Figure

CHAPTER

5.4:

The

R

5.

S

P

L

THE

I

B

R

S

P

L

I

B

TYPEOTOPR

Demonstration

System

AIMPLEMENTTION

5.8.THEPROTOTYPEIMPLEMENTATIONVALIDATION

95

ure5.4),wheretheimageiscontinuouslybuildingup.Failoversareexemplifiedbyaslight
colourchange.Thatis,theobserverwillseeacontinuouscalculationprogressandnoticepos-
siblyfrequentfailuresofthecalculatingPEsonlybythecolourchanges.
DemoTool:AuserinterfacetostartandstopPEs,PUsandPRsisprovidedbytheDemoTool(see
theleft-handsideoffigure5.4).Italsopresentsthestates(activeorinactive)ofeachcomponent
andtheconnectionstatusandruntimeforeachassociation.
Aspecialfeatureofthedemonstrationsystemisthateachcomponentmayrunonitsownhost.That
is,ademonstrationscenariomayconsistofdifferentPCs;componentfailuresanddynamicpool
reconfigurationcanbepresentedbydisconnectingPCsfromthenetworkandreconnectingthemagain.

5.8ThePrototypeImplementationValidation
Tovalidatethecorrectnessoftheprototypeimplementation,thedebuggingsoftwareVALGRIND7–
whichhasalreadybeenusedtodebugthehandlespacemanagementimplementationasdescribedin
section4.5–hasbeenusedagain.Inordertodetectproblems,long-termtests(i.e.uptoseveraldays)
havebeenperformed,usingthedemonstrationsystemdescribedinsection5.7;eachcomponenthas
beenexecutedbyVALGRIND.IncaseoferrorsreportedbyVALGRIND,thetestshavebeenrestarted
ug-fixing.bafterInordertovalidatethemessageowandthefunctionalcorrectnessofthecomponents,simple
testsofspecificfunctionalities(likeabortingaPRandobservingthetakeoverprocedure)havebeen
performed.TheresultinglogoutputaswellasthecontentsoftheRSerPoolmessagestransmitted
havebeenverifiedmanually.Forthepurposeofviewingthecontentsofmessages,thenetworksniffer
softwareWIRESHARK8(formerlycalledETHEREAL)–whichincludespacketdissectorsforvarious
protocoltypesincludingSCTPaswellasASAPandENRP–hasbeenusedintensively.

5.9ASurveyofOtherRSerPoolImplementations

AfterhavingexplainedtheRSPLIBprototypeimplementationindetail,thissectiongivesashortsurvey
ofotherRSerPoolimplementationswhicharecurrentlyavailableorunderdevelopment.Itconcludes
withsomenotesoninteroperabilitytests.

olaMotor5.9.1AsdescribedbySilvertonandT¨uxen(2005),theNetworksBusinessUnitofMotorolaInc.isdevelop-
ingaclosed-sourceimplementationofRSerPool.ItcurrentlyincludesanASAPlibraryrunningunder
Linux,FreeBSDandSolaris.SupportforautomaticconfigurationandtheEnhancedModefunction-
alityisstillunderdevelopment.TheMotorolaPRimplementationcurrentlyrealizestheASAPpart
configuration).automatic(withoutonly

Cisco5.9.2AnimplementationoftheRSerPoolprotocolsbyCisco–forinclusionintotheirCisco™InternetOp-
eratingSystemIOSforrouters–iscurrentlyunderdevelopment.AsofJuly2006,theimplementation
78SeeSeeSewLampingardandetal.(Nethercote2006),(W2005ireshark),V(algrind2006).Developers(2005).

96

CHAPTER5.THERSPLIBPROTOTYPEIMPLEMENTATION

Figure5.5:TheFirstInteroperabilityTestsatthe60thIETFMeeting

isstillinanearlydevelopmentstageandincludesanASAPlibrary(runningunderFreeBSD)without
futuresupportIOSforvtheersions–automatichasnotbeenconfigurationstartedyet.feature.HoweTheverPR,aquickimplementationproject–progresswhichisewillxpected,beapartduetoof
thegrowinginterestofroutervendorsintheRSerPoolprotocolframework.

5.9.3M¨unsterUniversityofAppliedSciences
AsDistribpartutionof9a(BSD)Bachelorslicensethesis,isancurrentlyOpenunderSourcedevPRelopmentimplementationattheM¨underunstertheUnivBerkersityeleyofSoftwAppliedare
SciencesinM¨unster,Germany.AccordingtoT¨uxenandDreibholz(2006a),thegoalofthisprojectis
todevelopaPRforresearchonthesynchronizationbehaviouroftheENRPprotocol.Itisthereforenot
PRplannedwillrtounrealizeunderthedifferentautomaticoperatingconfigurationsystemswithkfunctionalitiesernelSCTPoftheASAPimplementations,andENRPincludingprotocols.Linux,The
X.MacOSandFreeBSD

estsToperabilityInter5.9.4ProtocolinteroperabilitytestsareanimportantstepintheIETFstandardization,sincetheIETFrelies
onrunningcode.Thatis,beforeaWorkingGroupDraftdefiningaprotocolcanbecomeaRFC,two
independentlydevelopedprotocolimplementationshavetoproveinteroperability.
RSPLITheBvanderythefirstproprietaryinteroperabilityMotorolatestsoftwimplementation,oASAPhavetakimplementations,enplaceatthebetween60thourIETFownMeetingprototypein
SanDiestandardizationgo,haveCalifornia/U.S.A.beencapturedoninAugust4picturesandandAugustare6,presented2004.inThesefigurehistoric5.5.momentsinRSerPool
InJuly/August2006,thefirstofficialRSerPoolinteroperabilitytest–calledBakeoff–hasbeen
realizedinconjunctionwiththe8thSCTPBakeoffinVancouver,Canada(seealsoWagner(2006)).
Atthismeeting,theRSPLIBprototypeimplementationhasbeensuccessfullytestedforinteroper-
abilitywiththeCiscoASAPlibrary.Inparticular,theRSPLIBhasalsobeentestedwithdifferent
SCTPimplementations–includingourownuserlandimplementationSCTPLIBaswellasthekernel
9SeeOpenSourceInitiative(1999).

SUMMAR5.10.Y

97

mentationimplementationsofinteroperabilityLinux,testingFreeBSDareandspecifiedMacOSasX.InternetRulesforDrafttheinevDreibholzaluationofandtheT¨uxen(RSerPool2006).imple-

Summary5.10

Inthischapter,theRSPLIBprototypeimplementationhasbeenpresented.Basedonaresearchproject
betweenourgroupandSiemens,ithasbecometheworldsfirstcomplete,OpenSource,multi-
platformRSerPoolimplementationandisnowthereferenceimplementationoftheIETFRSerPool
WG.Inthefirstpartofthepresentation,theimportantdesigndecisionsoftheRSPLIBprototypehave
beenshown:theindependencefromtheunderlyingoperatingsystemandeasyportabilitytonew
platforms.Thethreemainpartsoftheimplementationhavebeenintroducedinthefollowing:the
PRimplementation,thePU/PElibraryandthedemonstrationsystem.SincethePU/PElibrarypro-
videstheconnectionbetweenRSerPoolanditsapplications,ithasbeenexplainedinmoredetail.In
particular,thetwo-partAPIhasbeenpresentedwithsomepseudo-codeexamples:theBasicMode
APIprovidingthecoreRSerPoolfunctionalities(e.g.PEregistrationhandlingandhandleresolutions)
only,andtheEnhancedModeAPIincludingthefullSessionLayerfeatures.Finally,asurveyofother
RSerPoolimplementationshasbeengiven,togetherwithsomenotesontheinteroperabilitytesting
implementations.theseamong

98

CHAPTER

5.

THE

R

S

P

L

I

B

TYPEOTOPR

TIONAIMPLEMENT

6Chapter

TheRSPSIMSimulationModel

NintroductiontotheRSerPoolsimulationmodelRSPSIMisprovidedinthischapter.At
first,ashortoverviewofthemodelsmotivationandhistoryisgiven;thisisfollowedby
Athedefinitionoftherequirementsforthemodel.Afterthat,thesimulationenvironment
includingthereasonsforthechoicesofsimulationandpost-processingtoolsaredescribed,followed
byapresentationofthesimulationmodelitself.

oductionIntr6.1

AfterthedevelopmentoftheRSerPoolprototypeimplementationRSPLIB(seechapter5)hadbeen
started,therehasbeenagrowingdemandfortheresearchonperformanceaspectsofRSerPool.In
particular,theintentionhasbeentoevaluatethebehaviourofdifferentpoolpoliciesaspartoftheIETF
standardizationactivitiesandtoverifytheideasfortheusageofRSerPoolforreal-timedistributed
computing(seesubsection3.6.5).Butperformingsuchanalysesusingaprototypeimplementationis
verydifficult,sinceeffectsdiscoveredandpossibleproblemsarenoteasytoreproduce.Therefore,it
hasbeennecessarytodevelopasimulationmodel.
Thesimulationmodel–whichissimplycalledRSPSIMforRSerPoolSimulation–isde-
scribedinthefollowingsections.ItsdevelopmentissupportedbytheDeutscheForschungsgemein-
schaft(DFG)sinceOctober01,2004.

6.2TheRequirementsfortheSimulationModel

Beforedesigningasimulationmodel,itisfirstnecessarytodefineitsgoals.FortheRSerPoolsimu-
lationRSPSIM,thegoalhasbeenaperformanceevaluationoftheRSerPoolfunctionalities.Thatis,
ithasbeennecessarytomodeltheRSerPoolprotocolsandcomponentfunctionalities.Thishasin-
cludedthehandlespacemanagementofthePRandthePU-sidecache,thepolicies(seesection3.11)
andtheSessionLayerbehaviour–inparticulartheclient-basedstatesharingasdescribedinsubsub-
.3.9.5.2sectionFurthermore,therequirementsforthehandlespacemanagementhaveincludedthepossibilityto
maintainlargepools,e.g.fortheapplicationofreal-timedistributedcomputingasdescribedinsub-
section3.6.5.Inordertocomparepoliciesandevaluateimprovements,ithasalsobeenanimportant
demandtobeabletoaddnewpolicieseasily.Sincethehandlespacerequirementshavebeensimilar
fortheRSPLIBprototypeaswellasfortheRSPSIMsimulationmodel,ithasbeendecidedtocreate

99

100

CHAPTER6.THERSPSIMSIMULATIONMODEL

acommonhandlespacemanagementimplementationtobeusedforbothsystems.Thishandlespace
managementapproachisdescribedseparatelyinchapter4.
explicitlyWhenthestatethegoalsofnon-goals:theRSPitSIMhasnotsimulationbeenamodelgoaltohavesimulatebeenthedefined,TitransporthasalsoLayerbeenprotocol,usefuli.e.to
haSCTPve.eEvxceededaluatingtheefscopefectsofofthisSCTPthesisandIP(which–likiseRSerPoolmulti-homingforperformance).networkRelatedpathworkredundanconythe–wsubjectould
Tof¨uxTen(ransport2002),LayerConradetresilienceal.(2002with),SCTPJungmaiercanbeetal.found(in2000b,a).Jungmaier(2005),Jungmaier,Rathgeband

orkFramewSimulationThe6.3Inthissection,thesimulationframeworkisintroduced.Thisincludesreasoningthechoicesfora
discreteeventsimulationtoolkitaswellasforapackagetoperformthestatisticalpost-processingof
obtainedresults.TheactualdescriptionofthesimulationmodelRSPSIMfollowsinsection6.4.

6.3.1ADiscussionofSimulationPackages
Beforeasimulationmodelcanbecreated,itisobviouslynecessarytochooseasimulationsystem.
PossiblechoicesofdiscreteeventsimulationpackageshavebeenaLISP-basedsimulationpackage
aswellasOPNETMODELER,NS-2andOMNET++.Inthefollowingsubsubsections,thefeatures
ofthesepackagesareshortlysummarized.ThisisfollowedbymotivatingthechoicefortheRSPSIM
model.simulation

ackagePSimulationLISP-based6.3.1.1ThefirstpossiblechoiceforasimulationsystemhasbeenanOpenSourcesimulationpackagefor
CommonLISP,whichismaintainedbytheUniversityofAppliedSciencesatM¨unster,Germany
(seeT¨uxen(2003)).ThispackagehadalreadybeenusedbyusforsimulatingalightweightQoS
device1,thereforesomeexperiencewiththispackagehadalreadybeenavailable.Ontheotherhand,
thesimulationpackageisveryrudimentary;itdoesnotprovideanyadditionalfunctionalitiesexcept
thediscreteeventsimulatoritself,somerandomnumbergeneratorsandclassestocollectstatistics.
ForalargeprojectlikeRSPSIM,usingthispackagewouldhaverequiredtoself-developfundamental
functionalitieslikevectorandscalaroutputaswellasscenarioandconfigurationmanagement.That
is,usingtheLISP-basedpackagehasbeennorealisticoptionfortheRSPSIMsimulationmodel.

6.3.1.2OPNETMODELER
Duetoanacademiccontract,wealsohadaccesstothecommercialsimulationsystemOPNETMOD-
ELER(seeOPnetTechnologies(2003)).Itdoesnotonlyincludeadiscreteeventsimulationsystem
withgraphicaluserinterface,butalsoprovidespowerfulfunctionalitiesforthepost-processingofob-
tainedresultsdata.Itincludesahugelibraryofcompletemodelsforallkindsofnetworkingprotocols
andevenmodelsforexistinghardware(likerouterandswitchmodelsofvariousvendors);ownmod-
elsarerealizedinANSI-C.OPNETMODELERisalsoawell-knownsimulationsystemwhichisin
particularfrequentlyusedinITUstandardization.
Ontheotherhand,licensinghasbeenanissue.OPNETMODELERlicensesarehighlyexpensive
andfurthermoretime-limited.Thatis,afterexpirationofthelicensethesimulatorrefusestorunand
1SeeDreibholz,IJsselmuidenandAdams(2005),Dreibholzetal.(2004),Dreibholz,SmithandAdams(2003).

6.3.THESIMULATIONFRAMEWORK

101

thedevelopedmodelbecomesuseless.Therequirementofanexpensivelicensetorunasimulation
alsopreventsmanyinterestedpeoplefromactuallyusingthesimulationmodel.Thisisinparticular
animportantissueinIETFstandardization,whereforsuchreasonsOpenSourcesimulationsystems
preferred.areAparticularlynegativeexperienceofworkingwithOPNETMODELERhasbeenthatitis–despite
itshugelicensingcosts–farawayfrombeingbug-free.Inmultiplecases,bugscausingprogram
crasheshavebeenexperiencedintheClosedSourcepartsofthesimulationsystem.Sincesuchbugs
areimpossibletotrackdown(sincethereisnosourcecode),ithasbeennecessarytofindwork-
aroundsbytrial-and-errortestsandaskingforhelpinsupportforums.Thishasbeenaveryannoying
procedure.time-consumingand

2-SN6.3.1.3evTheentNS-2simulation(NetworkpackageSimulatorwhich2,isseewell-knoNS-2(wn2003and))alsosimulationfrequentlysystemusedisinanIETFOpenSourcestandardization.discreteIt
alreadyincludesmodelsforavarietyofnetworkprotocols,manymoremodelsareavailableseparately
asOpenSource.UnlikeOPNETMODELER,itdoesnotincludespecificmodelsforexistinghardware
switches.orrouterselikNS-2simulationsareusuallywritteninacombinationofC++modulesandoTcl(Object-Tcl,
seeWetherallandLindblad(1995))scripts,wheretheoTclpartisusuallyusedtodefinethesimulation
scenarioaswellasforparametrization.Theperformance-criticalsimulationobjectsarerealizedin
C++.Thechoiceofobject-orientedprogramminglanguagessimplifiesthecreationofsimulation
models:itiseasilypossibletocreatederivedclassesofmodelsandherebytorealizespecialised
variantsofthesimulationobjects.
simulationAllsimulationresults.outputOptionallyis,thewrittentoprogramsoNcalledAMtrace(Networkfiles;theseAnimator,filesareincludedusedintotheNS-2post-processpackage)the
candisplayananimationofthesimulationscenario,includingpacketowandqueuelengths.

6.3.1.4OMNET++
OMNET++(ObjectiveModularNetworkTestbedinC++,seeVarga(2005b,a))isanotherOpen
Sourcediscreteeventsimulationsystem,butsignificantlynewerthanNS-2andthereforenotthat
widespread.Nevertheless,therearealreadymodelsforthemostimportantnetworkingprotocols
available;although,thelibraryofmodelsforOMNET++isstillsignificantlysmallerthanforNS-2
orOPNETMODELER.OwnsimulationobjectsareimplementedinC++.
AveryinterestingandusefulfeatureofOMNET++istohavethepossibilitytoeithercreate
acommand-line(Cmdenvenvironment)orGUIversion(Tkenvenvironment)ofasimulationmodel
bysimplylinkingittotheappropriateenvironmentlibrary.Duringmodeldevelopment,thisallows
theusertorunthemodelusingtheGUIandviewananimationofthepacketowaswellasto
inspectthingslikethecontentsofqueuesduringtherun.Itisfurthermorepossibletointerruptthe
simulationandperformsinglestepstocheckthebehaviourofthemodel.Finally,toperformtheactual
setsofsimulationrunsusingscripts,itissufficienttosimplylinkthemodelwiththecommand-line
vironment.enAnotherusefulfeatureoftheOMNET++packageistheavailabilityofcodeandMakefilegener-
ators.InOMNET++,thesegeneratorstakeoverfrequentlyrecurringtaskslikethecreationofclasses
formessagesandmodules(i.e.objectsofasimulationscenario,seesubsection6.3.3fordetails)from
theirdefinitionsaswellascreatingaMakefileforthesimulationscompletesources.Thesefunction-

102

CHAPTER6.THERSPSIMSIMULATIONMODEL

alitiesallowasimulationdevelopertoconcentrateonthesimulationsubject,ratherthantotakecare
forhowtocreatetheappropriatestructuresfortheunderlyingsimulationsystem.
DuetoitspowerfulcodeandMakefilegeneratorsanditsexcellentdocumentation,thetimere-
quiredforinitialtrainingbeforebeingabletoactuallyrealizeasimulationmodelisshort.Therefore,
wehavealreadyusedOMNET++forstudenttrainingsessionsandgainedexperiencewiththissim-
ulationsystem.Inparticular,theexperiencehasbeenthatitisalsoverystable–sincestartingtouse
it,nobugsinitssimulatorclasseshavebeendiscovered.

Conclusion6.3.1.5Afterintensivelytestingallfoursimulationsystemspresentedintheprevioussubsubsections,theonly
applicablechoicesforasimulationpackagetorealizetheRSPSIMsimulationmodelhavebeenNS-2
orOMNET++:whiletheLISPsimulationpackageclearlylacksoffeatures,thelicensingissueof
OPNETMODELERwouldhaveputtoomanyrestrictionsontheusabilityofthemodel.
Fromtheremainingsetofpossiblechoices–NS-2andOMNET++–theOMNET++package
finallyhasbeenselected(startingwithversion2.3,nowusingversion3.2),duetoitspowerfulcode
generatorsaswellasitsmoreadvanceduserinterfaceandconfigurationoptions.Whilethecode
generatorsallowaquickdevelopmentprogress,theGUI-baseduserinterfaceisagreathelpduring
debuggingandvalidationofmodels(seealsosection6.5).Inparticular,theanimationandobject
monitoringfeaturesofOMNET++canbeusedduringthesimulationrun–incontrasttoNS-2s
NAMprogram,whichcanonlyprocessatracefile.

ocessingost-PrPStatistical6.3.2Forthestatisticalpost-processingofsimulationdata,therehadbeenthechoicebetweenthetwo
OpenSourcepackagesGNUOCTAVE(seeEaton(2003))andGNUR(seeRDevelopmentCore
Tlareamge(number2005)).ofstatisticsPost-processingfrominthecaseofsimulationtheRSPoutput,SIMperformsimulationsomemodelkindofmainlydatameansaggretogationcollectanda
manipulation(likethecalculationofconfidenceintervals),andfinallyplotfiguresasPDFfiles.
Forthetaskofcollecting,aggregatingandmanipulatingstatisticsdata,thefeaturesofbothpack-
forages–creatingGNUOgraphicalCTAVEoutput.andGNWhileURG–NareUOCquiteTAVEsimilarrelies.onTheGNmainUPLdifOTference(seeWareilliamstheandpossibilitiesKelley
(2003))tocreategraphicaloutput,GNURprovidesitsownsetofgraphicsprimitivesandhigh-level
plottingfunctions.GNURoffersawidevarietyofdirectlyusablefunctionstoplotgraphs;itisalso
possibletousethegraphicsprimitivestocreateownplottingfunctionswithcustomizedfeatures(like
beencalculatingthereasontoconfidencechooseintervGNUals)Rand(versionappearance.2series)Thisforthepossibilitystatisticaltohighlypost-processingcustomizetasks.plottinghas

6.3.3AnOverviewoftheOMNET++DiscreteEventSimulator
toTointroduceunderstandsomethebasicdescriptionterminologyoftheandsimulationconceptsmodeloftheinOtheMNEfolloT++wingdiscretesections,eventitisfirstsimulator.necessaryFora
detailedintroductionandtutorialsofOMNET++,seeitsdocumentationinVarga(2005b).
CentralelementofanOMNET++simulationmodelisaNetwork.AnetworkconsistsofModules,
eachmodulemaybeeitheraCompoundModuleoraSimpleModule.ACompoundModuleconsists
ofTheatleastSimpleoneModulesub-module,isatomic,eachitssub-modulefunctionalitiesmayagainrequirebeaimplementationCompoundModule(inC++oraSimplelanguage)byModule.the

6.3.THESIMULATIONFRAMEWORK

103

user.InterfacesbetweenmodulesarecalledGates.AgatemaybeconnectedbyaConnectionto
anothermodulesgateorthegateofitsparentCompoundModule.Gatesareusedforthedirected
transportofMessages,i.e.agatecanbeeitherusedforthetransmissionorreceptionofmessages
(dependingonitstype:inputgateoroutputgate).Messagesfollowthepathgivenbythegates
connections.Aconnectionmayintroducedelay,bandwidthlimitationandbiterrors2.
Timersarerealizedbyschedulingthetransmissionofamessagetoamoduleitself.Atimer
(i.e.ascheduledmessage)isstoredintheFutureEventSet(FES);itarrivesatthemodulewhenthe
simulationtimereachesitsscheduletimestamp.
ThedefinitionofmessagesisdoneinanOMNET++-specificdefinitionlanguage.Networksas
wellasmoduleswiththeirgatesandconnectionsarespecifiedinOMNET++sNEtworkDescription
language(NED).ThecodegeneratorsofOMNET++usethesedefinitionsandcreateC++classes.
TheonlytaskoftheuseristoimplementthecustomfunctionalitiesofSimpleModulesinderived
classes.ImplementingtheactualbehaviourofSimpleModulescanberealizedintwoways:thefirstpos-
sibilityisasaFiniteStateMachine(FSM),wherethemodulehandlesincomingmessagesaccording
toitscurrentstate.Thehandlingofamessagepossiblychangesthemodulesstate.Stateswhichare
onlyleftuponthereceptionofamessage(e.g.whenatimerexpires)aredenotedasStableState;on
theotherhand,aTransientStateisimmediatelyleft.Thesecondpossibilitytoimplementthefunc-
tionalitiesofaSimpleModuleisbyusingcooperativethreads.Sincethefirstapproach–FSMs–has
beentakentoimplementthesimulationmodel,cooperativethreadsarenotdescribedhere.Formore
detailsonthissubject,seeVarga(2005b).
Acompiledsimulationcanuseparametersreadfromaninputfile(called.inifile,duetoitssuffix)
toparametrizeanetworkwithitsmodules.Forexample,anown.inifileforeachsimulationrun
couldspecifydifferentseedsandloadparametersforasimulatedclientapplication.Theoutputofa
simulationrunisavectorfileincludingallrecordedvectorsaswellasascalarfileincludingallwritten
scalars.

6.3.4TheSimulationToolChain
BasedonascriptexecutedbyGNURandtheRSPSIMsimulationmodellinkedwiththeCmdenv
environment,atoolchaintoactuallyrunasimulationandperformthepost-processinghasbeencre-
ated.Atfirst,theGNURscriptdefinestheparameterspacetosimulate.Thatis,asetofvaluesis
specifiedforeachsimulationparameter(e.g.parameter1Set={1,2,5},parameter2Set={LeastUsed,
Random},...)andthenumberofrunsforeachcombinationofparametersisdefined.Foreach
runandparametercombination,an.inifileaswellasanentryinaMakefileiswritten.TheMakefile
entryactuallyexecutesthesimulationmodelprogramusingthecorresponding.inifileasinputand
compressestheresults(scalars,vectors,debugoutput)usingtheBZIP2compressionsoftware(seeSe-
ward(2005))tosavediskspace.Finally,anallentryiscreatedintheMakefilewhichincludesall
simulationrunsaswellascreatingasetofGNURinputfiles(tobeexplainedlater).
ActuallyperformingthesimulationrunssimplymeanstoinvokeMAKE(seeFreeSoftwareFoun-
dation(2003))onthecreatedMakefile.AusefulpropertyofMAKEisitsoptiontoparallelizethe
simulationrunsonmulti-CPUand/ormulti-coremachines.Inparticular,thesimulationsofthisthesis
havebeenrunondual-corePentiumIVCPUs,whereMAKEhasbeenabletoutilizebothcoreswith
runs.simulationferentdif2Biterrorsareintroducedbysettinganerroraginthemessageobject.Theactualmessageremainsunmodified,the
receiverhastotakecareoftheerrorbitandimplementappropriatereactions.

104

CHAPTER6.THERSPSIMSIMULATIONMODEL

Figure6.1:TheSimulationScenarioNetwork

Aftersuccessfullyperformingallsimulationruns,thegeneratedandcompressedscalaroutput
filesarecollectedbyaself-developedcollectiontoolcalledcreatesummary.Itstoresthecollected
resultstatistics(e.g.forsystemutilizationandrequesthandlingspeed,tobeexplainedinsection8.5)
intoseparateinputfilesintheGNURdataformat(i.e.simpleASCIItables).Alloutputfilesare
BZIP2-compressedon-the-yusingLIBBZIP2(seeSeward(2005)),sincetheymayalsobecome
otherwise.gelarquiteTakingthecompresseddatafilesasinput,otherGNURscriptsactuallyplottheresultsdata.
Forplotting,customoutputprocedureshavebeencreatedwhichmaynotonlygeneratemulti-page,
multi-columnandmulti-rowtwo-dimensionalgraphswithmultiplecurvespergraph,butalsoallowto
specifyparametersforcreatingtitleandlegendtextsaswellasallowtochooseamongcoloured,grey-
scaleorblack-and-whitedrawing.Inparticular,theplottingproceduresalsotakecareofcomputing
andplottingtheconfidenceintervals.ThecreatedplotsareexportedasPDFfiles.

ModelSimulationThe6.4Inthissection,theRSerPoolsimulationmodelRSPSIMispresented.

orkNetwThe6.4.1Asexplainedinsubsection6.3.3,thefirststeptodefineanOMNET++simulationscenarioistoset
upanetwork.ThenetworkusedfortheRSPSIMsimulationmodelisshowninfigure6.1;itconsistsof
aControllerModule(whichwillbeexplainedinsubsubsection6.4.2.1)andaninterconnectedarray
.ModulesLANofALANmoduleisaCompoundModuleconsistingofthesub-modulespresentedinfigure6.2:
PUs,PEsandPRsareattachedtoaswitch.Theswitchitselfisconnectedtoinputandoutputgatesof
theLANmodule.Thatis,aLANcanbeconnectedtoothermodules;inparticularitcanbeconnected
tootherLANsasshowninthescenarionetworkinfigure6.1.
BeforeprovidingtheactualdescriptionofthemodulesforPR,PEandPUcomponentsaswellas
fortheswitch,itisfirstnecessarytointroducesomebasicmoduleswhicharethebuildingblocks.

THE6.4.MODELTIONSIMULA

ModuleLANThe6.2:Figure

105

ModulesoundationFThe6.4.2ThebuildingblocksoftheRSPSIMsimulationcomponentsaretheControllerModule,theTransport
NodemoduleandtheRegistrarTablemodule.Thesethreemodulesaredescribedinthefollowing.

ModuleollerContrThe6.4.2.1TheControllerModuleisaSimpleModulewhichisresponsibleforthefollowingtasks:
Collectingglobalstatistics(liketheglobalPEutilization),
Resettingstatisticsdataafteraninitialstartupphase,
Triggeringthewritingofstatistics(scalars)totheoutputfileafterreachingapredefinedsimu-
andtimelationsimulation.thestoppingFinallyEveryRSPSIM3simulationscenariorequiresexactlyoneinstanceoftheControllermodule,i.e.itisa
.objectsingleton

6.4.2.2TheTransportNodeModule
Asexplainedinsection6.2,itisnotthegoaloftheRSPSIMsimulationmodeltobuildIPorSCTP
networks.Therefore,realizingafully-featuredSCTP/IPstackincludingaroutingprotocolwouldbe
unnecessaryandinefficient.TheTransportNodeModulethereforerealizesasimpleLayer4transport
ofmessagesviaanetworkofinterconnectedTransportNodemodules,basedonDijkstrasshortest
pathalgorithm(seeCormenetal.(1998))providedbyOMNET++.Thismodulethereforebecomes
anintegralpartofallnetworkdevicemodels.
ATransportNodeconsistsofanarrayofNetworkLayergates,alocalNetworkLayeraddress(i.e.
auniqueIDfortheTransportNode)andanarrayofApplicationLayergates.ApplicationModules
connectedtoApplicationLayergatescanbindtheircorrespondinggatetoacertainportnumber.
3Thesingletondesignpatternisusedtorestricttheinstantiationofaclasstooneobject.

106

CHAPTER6.THERSPSIMSIMULATIONMODEL

Figure6.3:TheRegistrarModule

IncomingmessagesareeitherpassedtothecorrespondingApplicationLayerport(ifthemessages
destinationaddressisthelocalTransportNodesaddress)orareforwardedoutoftheappropriate
areNetwstoredorkinLayeragcache,atessothataccordingthetothealgorithmDijkstraisonlyalgorithm.calledifFtheorefdestinationficiency,isshorteststillunknopathwn.computations
oftheTheTswitchransportofNodetheLANmodulemoduleprovidingpresentedaninappropriatesection6.1number(seeofalsoinputfigureand6.2)outputisgsimplyates.aninstance
Itisimportanttonotethat–althoughimplementingafullSCTP/IPstackisnotagoalofthe
RSPSIMsimulationmodel–itshouldbeeasilypossibletoreplacetheTransportNodemodulewitha
module.SCTP/IP

6.4.2.3TheRegistrarTableModule
ThethirdfoundationmoduleoftheRSPSIMsimulationmodelistheRegistrarTableModule.Itreal-
izestheRegistrarTable(seesubsubsection3.7.1.1)forPEandPUmodules.Itincludesagatetoa
TransportNodetoreceivePRannouncesaswellasagatetoanASAPPU/PEProcessmodule(tobe
explainedlater).ThePUorPEProcessmodulecanrequestaPRidentityusingaServerHuntRequest
message;arandomlyselectedPRisreturnedbyaServerHuntResponsemessage.

ModulesoolRSerPThe6.4.3Usingthefoundationmodulesdescribedinsubsection6.4.2,itisnowpossibletodefinetheactual
RSerPoolcomponentmodulesforPR,PEandPU.Thesemodulesareintroducedinthefollowing.
SinceadetaileddescriptionoftheRSerPoolcomponentinteractionandtheprotocolshasalready
beenprovidedinchapter3,themodelsworkowsareinthefocusofthesedescriptions.

ModuleRegistrarThe6.4.3.1AnillustrationofthecompoundRegistrarModuleispresentedinfigure6.3.Clearly,itincludesa
TransportNodemodule(seesubsubsection6.4.2.2)foritsnetworkcommunicationfunctionalities.
ConnectedtotheTransportNodemoduleistheRegistrarProcessModule,whichprovidestheactual
PRfunctionalitiesandthereforehastobeintroducedinmoredetail.Sincethemanagementofa

MODELTIONSIMULATHE6.4.

Figure6.4:TheRegistrarProcessFiniteStateMachine

107

handlespacemanagementisobviouslyimplementation.themainAstaskalreadyofaePR,xplainedtheReingistrarsectionProcess6.2,theModuleRSPSIMrequiresasimulationhandlespacemodel
reusestheimplementationintroducedinchapter4.
illustratedTheinactualfigurePR6.4.InfunctionalitiesordertoofenhancetheRethegistrarclearnessProcessofthemodulepresentation,arerealizedtheasfolloawingFSM,notationwhichisis
usedforthisandthefollowingFSMs:stablestatesareshowninangularboxes,whiletransientstates
ones.roundeduseARegistrarProcessgoesintotheSTARTUPSERVICEstateuponexpiryofitsStartupTimer.This
stateinitializestheRegistrarmodulesTransportNodesub-moduletoreceiveASAPandENRPtraffic.
Uponsubsectionstartup,3.10.2.4thePRandhastosubsectionfinda3.10.3mentorforPRanddetails).downloadFinally,itsthepeerPRlistgoesandintothehandlespaceRUNSER(seeVICEsub-
state,i.e.thePRgoesintonormaloperation.UponexpiryoftheShutdownTimer,thePRdeactivates
thereceptionofASAPandENRPtrafficandwaitsforarestartinthestateWAITFORRESTART.A
newlyexpiringStartupTimerrepeatsthewholeprocedure.

6.4.3.2ThePoolElementModule
ThebuildingmodulesofthePoolElementModulearepresentedinfigure6.5.Clearly,aPErequires
networkaccess.Therefore,aTransportNodemodule(seesubsubsection6.4.2.2)isthebasisofthePE
model.Furthermore,eachPErequiresaRegistrarTabletomaintainitslistofPRs.TheRegistrarTable
moduleprovidesitsservicetothePoolElementASAPModule,whichofferstheactualPE-sideASAP
functionalitiestotheApplicationServerProcessModule.ThisApplicationServerProcessmodule
performstheactualserviceofthePE,likethecomputeservicebeingpresentedlaterinsection8.3.
SincethemainASAPfunctionalityislocatedinthePoolElementASAPProcessmodule,itis
necessarytoshowthismoduleinmoredetail.Therefore,figure6.6presentsanoverviewofitsFSM.
Uponstartup,thePoolElementASAPProcessgoesintotheWAITFORAPPLstateandwaitsfor

108

Figure

Figure

6.6:

The

6.5:

Pool

CHAPTER

The

Pool

Element

6.

THE

Element

ASAP

R

S

P

Module

Finite

State

S

I

M

SIMULATION

Machine

MODEL

SIMULATHE6.4.MODELTION

109

initiatecommandstherefromgistrationtheofApplicationthePEServusingeraReProcessgisterPmodule.oolElementTheApplicationmessage.FirstServerstepofProcesstheremodulegistrationcan
processistotelltheTransportNodemoduletoallocateanewport(tobeexplainedbelow)forthe
ASAPcommunicationintheBINDstate.Afterthat,theregistrationataPRisprocessed;thispossi-
blyrequiresaserverhuntprocedurebyaskingtheRegistrarTablemoduleforaPRidentity(usinga
ServsponsefromerHuntRequestthePR),themessage).PoolAfterElementsuccessfulASAPreProcessgistration(i.e.confirmsreceptionthereofgistrationantoASAPRethegistrationRe-Application
ServerProcessusingaRegisterPoolElementAckmessageandgoesintotheREGISTEREDstate.
Theregistrationprocedureisrepeatedeveryre-registrationinterval(controlledbytheReregistra-
tionTmodule,imer).adereUpongistrationreceptionisofaperformed.DeregisterPThisdereoolElementgistrationmessageprocessfrompossiblytheincludesApplicationtoServchooseeraProcessnew
PRbyaserverhuntprocedure.Aftersuccessfulderegistration(i.e.receptionofanASAPDeregis-
trationResponsefromthePR),thePoolElementASAPProcessconfirmsthederegistrationtothe
ApplicationServerProcessusingaDeregisterPoolElementAckmessageandtheASAPcommunica-
tionportisreleasedbytheTransportNodemodule(UNBINDstate).
AnApplicationServerProcessmodulecanupdateitspolicyinformationusingaPolicyUpdate
message.ThiseventmayalsohappenduringwaitingforaregistrationresponsefromthePR(state
WAITFORREGRESP)orduringaserverhuntinordertoregister(REGWAITSRVHURESP).
Therefore,itisnecessaryinthesecasestore-schedulethere-registrationtoimmediately.
ResetPItisoolElementfurthermoremessage.possibleInforthisancase,noApplicationdereServgistrationerisProcessprocessed,tofailbutataninsteadytimethebyPoolsendingElementa
ASAPProcessconfirmstheresetusingaResetPoolElementAckmessageandgoesimmediatelyinto
theUNBINDstate.Thatis,theASAPendpointofthePEbecomesunreachableandPR-HandPUs
havbindingetothedetectPooltheunaElementvailabilityASAPofProcessthePE–asbywelltheirasownthemechanisms.ApplicationServSinceeraProcessrestart–oftothenewPEports,meansit
isensuredthatreincarnationsofthePEcanbedistinguished.Thisschememodelsthebehaviourof
theRSPLIBprototypeimplementation,whereportsfortheASAPinstanceandtheapplicationservice
arerandomlychosen.IfaPEisrestarted,itwillalmostcertainlybecomereachableunderdifferent
numbers.port

6.4.3.3ThePoolUserModule
itsTheblastuildingRSerPoolmodules.moduleAsforisthethePPooloolUserElementModuleModule,,whichitproconsistsvidesoftheaTPUransportmodel.NodeFiguremodule6.7sho(seews
subsubsection6.4.2.2)foritsnetworkcommunicationandaRegistrarTablemodule(seesubsubsec-
tion6.4.2.3)formaintainingitslistofPRs.TheactualapplicationofthePU(e.g.thecomputeservice
beingpresentedlaterinsection8.3)isperformedbytheApplicationClientProcessModule.All
PU-sideASAPfunctionalitiesareprovidedbythePoolUserASAPModule.
Figure6.8presentstheFSMofthePoolUserASAPmodule.Uponcreation,thePoolUserASAP
ProcessimmediatelygoesintothestateWAITFORAPPL,whichobviouslywaitsforrequestsfrom
theClientApplicationmodule.CommandsforpurgingthePU-sidecache(CachePurgemessage)
andrequestreportingforaanhandleunreachableresolutionPE((ServerSelectionEndpointUnreacmessage)hableismessage)processedcanbybethehandledPU-sideimmediatelycachein.theA
SELPEFROMCACHEstate.Ifthecacheisempty,queryingaPRisrequired.IfcurrentlynoPR
isServavailable,erHuntRequestaserverhuntmessage)isprocedurenecessary(i.e.first.requestingFinallya,itPRisagfromainthetriedRetogistrarselectTaablePEinmoduletheSELusingPEa
state.

110

Figure

Figure

6.8:

6.7:

The

CHAPTER

The

Pool

Pool

User

6.

User

ASAP

THE

R

Module

State

S

P

S

I

M

Machine

TIONSIMULA

MODEL

6.5.THESIMULATIONMODELVALIDATION

111

6.5TheSimulationModelValidation
Tovalidatethecorrectnessofthesimulationmodelsimplementationcode,thedebuggingsoftware
VALGRIND4–whichhasalreadybeenusedtodebugthehandlespacemanagementimplementation
asdescribedinsection4.5andtheRSPLIBprototypeimplementationasdescribedinsection5.8–has
beenintensivelyusedagain.Thatis,somesimulationrunswiththesimulationmodelexecutedby
VALGRINDhavebeenperformed,checkingthatnoerrorshavebeendetected.
Inordertovalidatethefunctionalcorrectnessofthemodels,simplesimulationsofspecificfunc-
tionalities(likeregisteringaPEinacertainscenariosetup)havebeenperformed.Then,theoutput
ofthesimulationmodelhasbeencheckedforcorrectnessmanually.Forthesetests,theGUI-based
TkenvenvironmentoftheOMNET++systemhasbeenveryhandy:itallowstostopthesimulation
atarbitrarypointstocheckthestateofthemodulesandthemessageow.Afterwards,thesimulation
canberesumeduntilthenextinterestingpointforvalidation.

Summary6.6Inthischapter,theRSerPoolsimulationmodelRSPSIMhasbeendescribed.Themotivationtodesign
andcreatethismodelhasbeenthedemandforperformanceevaluations,whichare–duetoawide
parameterrange–difficulttorealizeinalabsetupusingtheRSPLIBprototype.Thefirstpartofthis
chapterhasdescribedtherequirementsforthesimulationmodelandprovidedasurveyofpossible
simulationsystemsandpost-processingpackages.Afterthat,thechoicesofpackagesactuallyused
fortheRSPSIMsimulationmodel–OMNET++andGNUR–havebeenmotivated.Inthefollowing,
ashortintroductiontoOMNET++hasbeengivenbeforetheactualdescriptionofthesimulation
modelitself.Inparticular,theactualapplicationmodelforthePEandPUmoduleisreplaceable;the
applicationmodelusedfortheperformancesimulationswillbedescribedinsection8.4.Finally,a
shortoverviewofthemodelvalidationhasbeengiven.

4SeeSewardandNethercote(2005),ValgrindDevelopers(2005).

112

CHAPTER

6.

THE

R

S

P

S

I

M

TIONSIMULA

MODEL

7Chapter

ormanceerfPManagementHandlespace

ANAGINGthehandlespaceusingtheapproachpresentedinchapter4,theimplementation
effortismainlyreducedtothestorageofsortedsets.However,itisnotobviouswhich
Mdatastructureismostsuitabletomanagesuchsets.Therefore,thegoalofthischapteris
tomeasureandevaluatethehandlespacemanagementperformanceusingdifferentsetimplementa-
tionsfirst.Afterthat,thehandlespacemanagementsscalabilitytohighnumbersofPEsandpoolsis
aluated.ve

oductionIntr7.1

everyHandlespacePR(seemanagementsubsectionis3.7.1a),itcrucialisalsopartoferequiredveryforRSerPoolmaintainingsystem:theitisPU-sidenotonlycacheinte(seegraldutysubsec-of
tion3.7.3).Optimizingtheperformanceofitsimplementationthereforebecomesadvantageousfor
manycomponentsofaRSerPoolsystem.

7.2ThePerformanceMetric

Toevaluatetheperformanceofthehandlespacemanagementimplementationdescribedinchapter4,
itCPUisfirst(tobenecessarydiscussedtobelodefinew)arequiresperformancetoperformmetric.theThebasicchosenoperationsmetriconisasimplyhandlespacethetimeofaarealisticcertain
size,referringtothenumberofpoolsandPEs.Thesebasicoperationsaredefinedasfollows:

Registration/DeregistrationOperations:RegistrationofaPE(duetoexplicitregistrationbyaPR-
Hdereorgistration,learnedviabutENRPalso,dueseetoanesubsubsectionxpired3.9.2.1Lifetime)orandadereKeep-Aligistrationveoftimeout,aPEsee(duetoesubsubsec-xplicit
).3.9.2.3tion

Re-RegistrationOperations:UpdateofaPEsregistration,inparticularalsoofitspolicyinforma-
).3.7.1.2subsubsection(seetion

HandleResolutionOperations:HandleresolutionforPUs(seesubsubsection3.7.1.4).

TimerOperations:ThehandlingoftheKeep-Alivetimer(PR-H,seesubsubsection3.7.1.3)andthe
LifetimeExpirytimer(non-PR-H,seesubsubsection3.9.2.1).

113

114

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

SynchronizationOperations:Thehandlespacesynchronizationprocedure(i.e.astep-wisetraversal
asdescribedinsubsection3.10.5).
EveryRSerPoolscenariorequiresatleasttwoPRstomakeitoperationalandtoprovideacertain
letovelbeofrealisticredundancthaty.routersUsually,alsoeveryhostnetwPRorkofprocessesacertaintoavoidsizethealsocostscontainsforrouters,installingandthereforeitmaintainingseems
dedicateddevices.Therouterswhichareavailabletodayalreadyprovidemanymoreservicesthan
VjustoIPgaterouting:ways.fromTherefore,DHCPit(DynamicisquiteHostrealisticConfigurationthatroutervProtocol)endorsovlikereDNSCiscoandwillHTTPalsoofferproxiesaPRto
vices.defutureinserviceRoutersstronglyrelyonhardwaresupporttocopewithperformance-criticaltasksliketherouting
itselfandchecksumcalculations.Furthermore,itisalsopossibletohaveTCPprovidedinhardwareby
(aso2003),calledRangTCParajanOfoadetal.(Engine2002(T),TOE).FomonoriordetailsandonTMasanoriOEs(and2003pe).LikrformanceeforTCP,comparisons,itisseethereforeMogulalso
realistictoassumethatSCTPOfoadEngines(SOE)becomewidespread.
SincetheintentionofthisworkistoevaluateRSerPoolandtheperformanceofitshandlespace
evaluatingmanagement,thethethroughputinuenceofofthethemessagehandlespacetransportoperations.iscompletelyneglectedandthefocusislaidon

SetupementMeasurThe7.3

Inordertoperformahandlespaceperformanceevaluation,ithasbeennecessarytochoosearealistic
setup.SincetheintentionistopossiblyrunPRprocessesonrouters,theselectedsystemshouldhave
beenequippedwithaCPUpowerrealisticforarouter.Thatis,insteadofusingthenewestandmost
powerfuldesktopPC,ithasbeendecidedtouseanAMDAthlon-basedPCcontaininga1.3GHz
CPU–asystemwhichhasbeenstateoftheartaboutfiveyearsago(in2001).Itisassumedthatthe
computationpowerofsuchaCPUisrealisticfortheupcomingaccessroutergeneration.
Thehandlespacemanagementapproachexplainedinchapter4mainlyreducestheeffortofmain-
tainingahandlespacetotheefficienthandlingofsortedsets.Therefore,themostperformance-critical
partistheefficiencyofthesetstoragealgorithms(seealsosubsection4.4.7).Inordertoevaluatethe
handlespacemanagementapproach,thefollowingdatastructuresandalgorithmsforthemaintenance
ofsortedsetshavebeenrealized:
LinearList:Alinearlististhemostobviousstructuretostoreaset.Fortheimplementation,a
doubly-linkedringlistisused,i.e.itsupportsthetraversalinforwardandbackwarddirections.
Furthermore,itallowsfortheinsertionofanewnodebeforeorafter–andtheremovalof–a
knownnodeinO(1)time.
BinaryTree:Theimplementationofthebinarytreeusesiterativeimplementationsoftheinsertion,
lookupandremovalfunctions.Thatis,norecursion–whichwouldobviouslybelessefficient
–isusedhere.Thegeneralimplementationofthethreeoperationsissimilartothefunctions
providedinLIBDICT(seeMela(2005)),withsomeimportantmodificationsforimprovingthe
performance:managementhandlespace

Toimprovethespeedoffindingthenextorpreviousnodeaccordingtothesortingorder,
allnodesarealsoback-linked.Thatis,anodedoesnotonlylinktoitschildnodesbutalso
node.parentitstoback

7.3.SETUPMEASUREMENTTHE

Figure7.1:FindingtheSuccessorofaNodeinaBinaryTree

115

Inparticular,theback-linkingsimplifiesthelookupofsuccessornodesinthetree.If
lookingforthesuccessorofanoden,thefollowingtwocasescanoccur:
1.Thesubtreenodesnleftmostpossesseschild.aAnrightexamplesubtree.isInprothisvidedcase,onthethenodeleft-handlooksideedofforisfigurethe7.1right.
2.Nodenhasnochildnodes.Inthiscase,thechainofancestornodesmustbetraversed
–usingtheback-linkedreferences–untiltheparentnodeslinkisaleftone.The
right-handsideoffigure7.1providesanexampleforthiscase.
Indefinedorderintosefubsubsectionficientlyhandle4.4.2.1,theitislookuppossibleoftonodesspecifybasedaonpositivtheeWinteeightgerweightSumconstructconstant
owiwnforweighteachandnodethei.childEachnodenodesalsoweightincludessums.aweightTherefore,sumtheWi,rootwhichnodeisstheweightsumofsumits
WNorowot,for=anyiwinumberisrequal∈[1to,.the..,sumWroofot]all⊂N,nodesauniquelyweightsasidentifiedrequirednodebyoftheequationtree4.2can.
bereachedinamaximumnumberofstepsequaltothedepthofthenode.Anexampleis
providedinfigure7.2,wherer=19choosesanodeoutofthetreewithWroot=25.Note,
itthatistherealizedweightforthesummaintenanceimplementation)withrequiresnon-recursivback-linkingeinsertionasanddescribedremovaboalve.functions(as

Treap:Thetreapimplementationisbasicallyequaltothebinarytree,exceptforensuringthetreap
constraint,asdefinedinsubsection4.4.7,uponinsertionandremovalbytherotationofnodes.
Red-BlackTree:Asforthetreapimplementation,thered-blacktreerealizationisalsobasically
equaltothebinarytree.Again,theonlyexceptionistheenforcementofthered-blacktree
constraints,asdefinedinsubsection4.4.7,byapplyingrotationsuponinsertionandremovalof
nodes.Finally,toevaluatewhethertheideaofleaf-linkingthetreesnodes(seesubsection4.4.7)provides
aperformanceimprovement,appropriatelymodifiedversionsofthebinarytree,treapandred-black
treeimplementationshavebeenprovided.Tolinkthenodes,theimplementationofthedoubly-linked
ringlisthasbeenused.Thatis,thetreestructurebecomesanindexintothelist;insertingbefore
orafteraknownnodeinto–orremovingaknownnodefrom–thedoubly-linkedringlistitselfis
time.(1)Oinpossible

116

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.2:UsingSelectionbyWeightSuminaBinaryTree

manceThefirstcreatesmeasurementaprogramhandlespacedevelopedconsistingforofthethegivanalysisenofnumbertheofpoolshandlespaceandpoolmanagementelements.perforAfter-
that,theactualoperationsasdescribedinsection7.2areprocessedandtheirthroughputismeasured.
AllMandrimeasvaLinuxurements10.2haveinbeenrunlevelperformed3(i.e.usingwithoutthealreadygraphicalusermentionedinterf1.3aceGHzandanAthlonyothersystem,services,runningin
ordercompiledtowithperformGCCtheversionmeasurements4.0.1,allwithoutpossibledisturbanceoptimizationsbyhaothervebeenprocesses).turnedonThe(optionsources-O3hav).ebeen
theThenecessaryactualstatisticalthroughputaccuracy,measurementallmeasurementsruntimeforhaveachebeenparameterrepeated18settingtimes.hasThebeen5s.plotsToshowensurethe
ainvtheerageplotsvaluesproofvidesthesethevariablemeasurementsettingsruns,fortogethereachcurvwitheoftheirthe95%plot.Vconfidenceariableintervbindingsals.canThebelefoundgend
oftheright-justifiedcorrespondingabovethecurvplots.es;difDifferentferentsettingssettingsofofathesecondfirstvvariableariableareresultinrepresenteddifferentbydifcolours/shadesferentline
stylesbetween(e.g.solidcomparableandanddotted).Thenon-comparableaxiscolourplots.isMoreuniquefordetailseachontheoutputunit,post-processinginorderoftodiftheferentiateobtained
resultsarepresentedinsubsection6.3.2.

7.4TheOperationsThroughputofDifferentStorageAlgorithms

Inferentthesetfirststorasetofgeperformanceimplementationssimulations,forsmallthehandlbehaespaceviourofsizestheisevhandlespacealuated.Inmanagementparticular,alsounderinap-dif-
propriatealgorithmsarepresentedforcomparison.Thescalabilityofthehandlespacemanagement
tolargescenariosforachoiceofapplicablesetstoragealgorithmsisshownlaterinthesubsequent

7.4.THEOPERATIONSTHROUGHPUTOFDIFFERENTSTORAGEALGORITHMS

.7.5section

Figure7.3:TheCompleteHandlespaceStructure

117

7.4.1RegistrationandDeregistrationOperations
Thefirstoperationevaluatedisthecombinationofregistrationsandderegistrationsasexplainedin
section7.3.Sinceitisnotpossibletoonlyrunregistrationsorderegistrationsalone–thehandle-
spacewouldgroworshrink–theperformanceofbothoperationstogetherisobserved:eachtimea
registration/deregistrationoperationisinvoked,thesizeofthepoolischecked:
IfthenumberofPEsisequaltotheconfiguredpoolsize,arandomlyselectedPEwillbe
gistered.dereOtherwise,anewPEwillberegistered.
Thatis,thehandlespacesizeremainssteady.Thisbehaviourisalsorealistic,sincethereisnodereg-
istrationwithoutregistrationandeveryregisteredPEwill–soonerorlater–alsobederegistered.
Uponaregistration,thehandlespacemanagementhastoperformthefollowingtasksonthehand-
lespacestructure(illustratedinfigure7.3):
CreationofthePEstructure,
FindingthepoolgivenbythePH(apoolwillbecreatedifitisnotalreadyexisting),
CheckingwhetherthenewPEiscompatibletothepool(referringtoitspolicyinformationand
transportparameters,seesubsubsection3.7.1.2),
LinkingthePEstructureintothepoolsIndexSet(sortedbyPEID),
LinkingthePEstructureintothepoolsSelectionSet(sortedbythesortingorderofthepools
andy)polic

118

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.4:TheThroughputoftheRegistration/DeregistrationOperations

LinkingthePEstructureintotheOwnershipSet(sortedbyHome-PRID/PEID/PH,seesub-
).4.4.5sectionHandleAreUpdategistrationmessageoperation(seeisalsoequivalentsubsubsectionforboth,re3.7.1.2).gistrationTheviaannecessaryASAPtimerRehandlinggistration(Koraneep-AliENRPve
orLifetimeExpiry)isevaluatedseparatelyinsubsection7.4.3.
Thederegistrationoperationrequiresthefollowingtasks:
FindingthepoolgivenbythePH,
FindingthePEgivenbythePEIDwithinthepool,
UnlinkingthePEstructurefromtheIndexSet,SelectionSetandOwnershipSet,
DisposingthePEstructureand
Removingthepoolifitbecameempty.
usingAsanforASAPthereDeregistrationgistrationoperation,oranaENRPdereHandlegistrationUpdateoperationmessageisequi(seevalentalsoforboth,subsubsectionderegistration3.7.1.2).
Noteagainthatthetimerhandlingisevaluatedseparatelyinsubsection7.4.3.
Figure7.4presentsthethroughputofthecombinedregistration/deregistrationoperations(asex-
izedplainedabo(right-handve)perside;PEandhere:WsecondeightedforRandom)deterministicpolicies(left-handinaside;handlespacehere:RoundconsistingRobin)ofaandsinglerandom-pool
andthegivennumberofPEs.
Theobviousassumptionontheperformanceoftheregistration/deregistrationoperationwouldbe
listthatandtheretheisnobalancedsignificanttreesdif(red-ferenceblackandfortreap),distincttherepolicies.isaWhilesignificantthisgapcanbebetweenverifiedthefortheperformancelinear
ofthebinarytreeforadeterministicpolicyandarandomizedpolicy:forexampleat500PEs,the
canberandomizedhandledpoliconythesideachiedeterministicvesaboutpolic75yside.operationsTheperreasonPEforandthissecond,behaviourwhileislessthethanSele50ctionSet:operationsfor

7.4.THEOPERATIONSTHROUGHPUTOFDIFFERENTSTORAGEALGORITHMS119

bythePEIDrandomizedtoensureWeighteduniquenessRandom(seepolicy,subsubsectiontheorderof4.4.2.3this).setisSinceirrelethevPEant.IDisaTherefore,randomthesetnumberis,sortedthe
nodesareinsertedintothebinarytreeunsystematically.Inthiscase,thedepthofthebinarytreecan
bethePEsassumedaretobeinsertedwithinOsystematically:(logn).Onnewthenodesothergohand,tointhecaseendofofthetheset,deterministicresultinginRoundthebinaryRobintreepolictoy,
degeneratetoalinearlist.Clearly,theefforttomanageatreeisslightlyhigherthanforalinearlist,
whichresultsinthelowerperformanceofthetree.
Asexpected,thelinearlistandthebalancedtreesshownosignificantperformancedifferences
forvaryingpoliciesandtheperformanceofthebalancedtreesisclearlysignificantlybetterthanthe
operationsthroughputofthelinearlist.Furthermore,theperformanceofthebalancedred-blacktree
isquiteconstantlybyabout25operationsperPEandsecondbetterthantheperformanceofthetreap.
Thatis,insteadofonlytryingtokeeptheprobabilityofbeingunbalancedaslowaspossible(asitis
realizedbythetreapusingrandomization),itisusefultoaddsomemorecomplexityinthestorage
operationstoguaranteeanoptimaltreestructure.

OperationsRe-Registration7.4.2Theboth,nere-rextoperationgistrationtousingbeevanaluatedASAPisRethegistrationre-reorgistration.anENRPAre-reHandlegistrationUpdateoperamessage.tionisTheequivalentnecessaryfor
re-retimergistration,handlingthe(Keep-AlihandlespaceveorLifetimemanagementExpiry)hastoisevperformaluatedthefolloseparatelywingintaskssubsectiononthe7.4.3.handlespaceFora
structure(seefigure7.3foranillustration):
FindingthepoolgivenbythePH,
FindingthePEstructuregivenbythePEIDand
Checkingwhethertheupdatedinformationisstillcompatibletothepool(referringtoitspolicy
informationandtransportparameters,seesubsubsection3.7.1.2)and
Iftheupdatechangesthepolicyinformation:
–UnlinkingthePEstructurefromtheSelectionSet,
–UpdatingthePEinformation(includingpolicyinformationandtransportaddresses)and
–Re-linkingthePEstructureintotheSelectionSetagain.
updateInthecaseofPEsalocationchangedinownethershipOwnership(i.e.theSetPE(i.e.hasbychangedunlinking,itsPR-H),updatingitisandfurthermorere-linking).necessarySincetheto
onotprownershipvideanchangeynewoccursinsightsquitecomparedrarelyto(onlytheinadapticasevofeapolictakyeovcase,er),theandythearecorreomittedinspondingthefolloresultswingdo
analysis.itisSincenecessaryanforupdatedthepolicyperformanceinformationevaluationimpliestoaremodistinguishvalfrombetweenandadaptire-insertionveintopoliciesthe(i.e.SelectionthepolicSet,y
informationchanges)andnon-adaptivepolicies.Figure7.5presentsthethroughputofre-registration
adaptioperationsvepoliciesperPEand(right-handsecondside;forhere:non-adaptiLeastveUsed).policiesForthe(left-handadaptiveside;Leasthere:UsedRoundpolicy,Robin)eachandre-
registrationupdatesthepolicyinformationwitharandomlychosenloadvalue.
theAsenon-adaptixpected,vethepolicylinearcaselistonlypromeansvidesPEthewstructureorstlookup,performancethereforisbothnoclassesperformanceofdifpolicies.ferenceSincefor

120

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.5:TheThroughputoftheRe-RegistrationOperation

thethreetree-basedvariants.Furthermore,sincetheloadvaluefortheadaptiveLeastUsedpolicyis
chosenrandomly,thisresultsinanunsystematicinsertionintotheSelectionSet.Therefore,node-
case,generationtheofbinarytheandbinaryred-blacktreeastreesdescribedprovideinalmostsubsectionthe7.4.1samecanandbebestobserved.performance.FortheTheadaptivsomeepolicwhaty
lowhereweritisoperationsnotnecessarythroughput(otherwise,forthethetreapisaperformanceresultofofthetryingbinarytotreeoptimizewouldthehavestructurebeenofsignificantlythetree
worse).Thisresultsintimeconsumedforactuallyunnecessarynoderotations.
sectionIn7.4.2summary,the,asbestithasperformancealreadyisbeenproobservvidededbyforthetherered-blacktree.gistration/deregistrationoperationinsub-

HandlingimerT7.4.3Asdescribedinsubsection4.4.3,alltimersarestoredinthesameTimerSchedule.Whileanon-PR-H
onlyhastoscheduleaPEentrysexpiration(LifetimeExpiryTimer),aPR-HhastoscheduleaKeep-
AliveTransmissionTimer.Uponitsexpiration,anASAPEndpointKeep-Alivemessageissenttothe
PEandaKeep-AliveTimeoutTimerisscheduled.Usually,thePEwillacknowledgethekeep-alive
messageandthetimercanbeunscheduled.Afterthat,anewtransmissiontimerhastobescheduled.
Toevaluatethetimerhandlingperformanceofthehandlespacemanagement,itisusefultodefine
thetimeroperationasfollows:
ForownedPEs,atimeroperationconsistsof:
–LinkingthePEstructureintotheTimerSchedulewiththetimertypesettoKeep-Alive
,imerTransmissionT–UnlinkingthePEstructurefromtheTimerSchedule,
–LinkingthePEstructureintotheTimerSchedulewiththetimertypesettoKeep-Alive
andimerTimeoutT–UnlinkingthePEstructurefromtheTimerScheduleagain.

7.4.THEOPERATIONSTHROUGHPUTOFDIFFERENTSTORAGEALGORITHMS

Figure7.6:TheThroughputoftheTimerHandlingOperation

Fornot-ownedPEs,atimeroperationconsistsof:

121

–LinkingthePEstructureintotheTimerSchedulewiththetimertypesettoLifetimeExpiry
andimerT–UnlinkingthePEstructurefromtheTimerSchedule.

Figure7.6presentsthethroughputoftimeroperationsperPEandsecondinahandlespacecon-
sistingofasinglepoolandthegivennumberofPEs,forvaryingsetstoragealgorithmsandfractions
ofownedPEs.AsfractionsofownedPEs,0%(none)and100%(all)havebeenchosen,inorderto
makethedifferencebetweenthesetwoextremesvisible.
ForbothfractionsofownedPEs,thebalancedtrees(treapandred-black)clearlyprovideasig-
nificantlybetterperformancethanlinearlistandbinarytree.Again,theoperationsthroughputofthe
treapissomewhatlowercomparedtothered-blacktree.However,themostinterestingobservationis
theperformanceofthebinarytree:itisevenoutperformedbythelinearlist.Thereasonforthiseffect
isthesystematicinsertionandremovalbehaviouroftheTimerSchedule:timersarescheduledfora
futureevent,i.e.theyareusuallyappendedtotheendoftheschedule.Ontheotherhand,expiring
timersareremovedfromthetopoftheschedule.Thisresultsinadegenerationofthebinarytreeto
(nearly)alinearlist.Sincethemanagementofabinarytreeissomewhatmorecomplexthansimply
holdingalinearlist,theperformanceofthebinarytreeusedfortheTimerScheduleisworsethanfor
alinearlistitself.Andunliketheregistration/deregistrationcase,wheretheperformancedropinthe
SelectionSetiscompensatedbyfasterPElookupsintheIndexSet(seesubsection7.4.1),nosuch
compensationeffectisavailablehere.
Obviously,thetimeroperationforownedPEsisslightlymorecostlythanfornot-ownedPEs,since
infacttwotimershavetobemanaged.Whilethethroughputdifferenceforthered-blacktreebetween
0%and100%ofownedPEsremainssmallerthan25operationsperPEandsecondfor200to1,000

122

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.7:TheThroughputoftheHandleResolutionOperation

PEs,thegapgrowstouptoabout50operationsperPEandsecondforthethreeotheralgorithms.The
reasonforthissmalldifference–despiteofhavingtoperformdoubledworkwhendealingwiththe
twoKeep-Alivetimers–istheCPUscache:afterunlinkingtheKeep-AliveTransmissionTimer,its
formerstructureisturnedintotheKeep-AliveTimeoutTimerandviceversa.Therefore,thestructure
andpartsoftheTimerSchedulesnodescanbeexpectedtobestillsituatedintheCPUscache.

OperationsResolutionHandle7.4.4Thenextoperationtobeevaluatedisthehandleresolution.

oductionIntr7.4.4.1Asdescribedinsubsection4.4.1,thedefaultselectionprocedureistosimplytakethefirstelement
fromtheSelectionSet.ThisissuitableforthedeterministicpolicieslikeLeastUsedorRoundRobin.
Butontheotherhand,thissimpleselectionprocedurecannotbeusedfortherandomizedpoliciesas
definedinsubsubsection4.4.2.5.Inthiscase,theweight-sumbasedselectionisused.Clearly,the
computationalcomplexityofthetakefromthetopselectionprocedureisO(1),whiletherandom-
izedvariantdependsonthesizeoftheselectionset(i.e.O(n),wherenisthesizeoftheSelection
Set).Usingthetree-basedapproachpresentedinsection7.3,theprocessingcomplexityisO(logn),
undertheassumptionofabalancedtree.
Inordertorealizethehandleresolutionoperation,itisnecessarytointroducetwovariables:
MaxHResItemsdefinesthenumberofPEidentitieswhichshouldbereturneduponahandle
resolution.Thatis,ifMaxHResItems=3,ahandleresolutionshouldreturn3PEidentities.Of
course,ifthepoolonlyconsistsof2PEs,only2PEidentitiescanactuallybereturned.
MaxIncrementspecifiesthemaximumnumberofselectedPEidentitiesforwhichastatus
updateisperformeduponselection.Asdefinedinsubsubsection4.4.2.1,aselectionusually
meanstounlinkthePEentryfromtheSelectionSet,performe.g.asequencenumberupdate
andre-linkthePEintotheSelectionSetagain.LimitingthenumberofPEentrieswhichhave

7.4.THEOPERATIONSTHROUGHPUTOFDIFFERENTSTORAGEALGORITHMS123

topasstheunlink-update-re-linkcycleisinparticularusefulfortheRoundRobinpolicies,
asitwillbedescribedinsubsection8.8.1.
Toprocessahandleresolution,thehandlespacemanagementhastoperformthefollowingtasks
onthehandlespacestructure(seefigure7.3):
FindingthepoolgivenbythePH,
SelectinguptoMaxHResItemsPEstructuresfromtheSelectionSet,
UnlinkinguptoMaxIncrementoftheselectedPEstructuresfromtheSelectionSet,
Updatingtheselectioninformation(inparticularthePEsequencenumberasdefinedinsubsub-
section4.4.2.1)oftheunlinked(andonlyofthethese)PEstructuresand
Re-linkingtheremovedPEstructuresintotheSelectionSetagain.
Duetothediversitiesoftheselectionprocedures,thehandleresolutionresultsshowninfigure7.7
havebeenseparatedintoadeterministicpolicypart(left-handside;here:RoundRobin)andaran-
domizedpolicypart(right-handside;here:WeightedRandom).Inthepresentedscenario,thecase
ofMaxHResItems≤MaxIncrementisevaluated,i.e.allselectedPEstructureshavetobeunlinked,
updatedandre-linked.AnadviceonhowtoappropriatelyconfiguretheMaxIncrementparameteris
givenlaterinsection8.8andsection8.9.

7.4.4.2AnUnlimitedSettingofMaxIncrement
Asexpected,theworstperformanceinthedeterministicpolicyscenarioisprovidedbythebinarytree
andthelinearlist.Again,thebinarytreesoperationsthroughputisworsethantheperformanceofthe
linearlist,duetothedegenerationeffectalreadyobservedforthetimerandregistrationoperations(see
binarysubsectiontreeis7.4.3nearlyandashighsubsectionasfor7.4.2the).Hobestwever,algorithm:forthetherandomizedred-blacktree.scenario,Inthisthescenarioperformance(Wofeightedthe
Randompolicy),theSelectionSetisorderedbytherandomlychosenPEIDandnodegeneration
occurs.ComparingthecurveforMaxHResItemsh=1totheresultsofMaxHResItemsh=3,theoper-
ationsthroughputforthedeterministicpolicyonlyslightlydecreases:fromabout300toabout250
operationsperPEandsecondforusingapoolof500PEsandthered-blacktree.Ontheotherhand,
theperformancenearlyhalves–fromabout200toonlyabout100operationsperPEandseconds–
fortherandomizedpolicy.Clearly,thisistheresultofthemoreexpensiveselectionprocedure.

7.4.4.3AReducedSettingofMaxIncrement
Asbeingexplainedlaterinsection8.8andsection8.9,theparameterMaxIncrementcanbesmaller
thanMaxHResItems,i.e.morePEidentitiesarereturnedthanareactuallypassedthroughtheunlink
-update-re-linkcycle.ThisisusuallythecaseifmultiplePEidentitiesshouldbereturnedtosupport
thePUscachingfunctionality,undertheassumptionthatonlyalimitednumberofPEs(e.g.only
one)isactuallyusedforprocessingarequest.Figure7.8showstheoperationsthroughputusing
MaxHResItems=10foradeterministicpolicy(left-handside;here:RoundRobin)andarandomized
policy(right-handside;here:WeightedRandom).Sincelinearlistandbinarytreedonotprovide
areasonableperformance,theirresultsareomittedhere.ThesettingsofMaxIncrementhavebeen
unlimitedandtheminimumusefulvalue.Forthedeterministicpolicyscenario,thesmallestvalue

124

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.8:TheHandleResolutionThroughputforaVariationofMaxIncrement

is1–otherwise,alwaysthesamelistofPEidentitieswouldbereturned.Incaseofarandomized
policy,theminimalsettingis0.
Forthedeterministicpolicy,areductionofMaxIncrementresultsinasignificantperformance
gTheain:fromthroughputabout140incrementtoisalmostlarger300fortheoperationstreap:perfromPEaboutand80secondto240at500operationsPEsforperthePEandred-blacksecond.tree.
Clearly,areductionoftheunlink-update-re-linkcycles–whicharelessefficientforthetreap–
providesahigherspeed-upcomparedtothered-blacktree.Nevertheless,theperformanceusinga
treapisalwayslower.
Comparingtheresultsofthedeterministicpolicyandtherandomizedpolicy,thechoiceofthe
TheparameterreasonisMaxtheIncrbehaemenviourtofonlytheWslightlyeightedchangesRandomthepolicoperationsy:theunlinkthroughput-ofupdatethe-re-linkrandomizedcyclepolicdoesy.
notchangethepositionoftheentry.Therefore,itisre-insertedatthesameplace.Sinceallnodesin
thetreeleadingtoajustselectednodecanbeassumedtobestillintheCPUscache,theunlinking
andre-linkingprocedureisquitefast.

Summary7.4.4.4Insummary,thered-blacktreeagainachievesthebestperformance,whilethetreapsoperations
throughputissomewhatlower.Aperformancegaincanbeachievedbyconfiguringtheparameter
MaxIncrementappropriately(i.e.smallerthanMaxHResItems),inparticularfordeterministicpoli-
cies.Thehandleresolutionoperationforrandomizedpoliciesismorecostly,duetothemorecomplex
selectionprocedure(O(logn)vs.O(1)).

OperationsonizationSynchr7.4.5Thelastoperationtobeevaluatedisthesynchronization.Asynchronizationoperationdenotesthe
step-wisetraversalofthecompletehandlespaceinordertoobtainthenecessaryreferencestothePE
structuresforcreatingENRPHandleTableResponsemessagesasdescribedinsubsection3.10.3.A
synchronizationoperationperformsthefollowingtasks:

7.4.THEOPERATIONSTHROUGHPUTOFDIFFERENTSTORAGEALGORITHMS125

ObtainingthefirstuptoMaxElementsPerHTRequestreferencestoPEstructures.
SavingPHandPEIDofthelastPEreferenceobtainedtoresumethehandlespacetraversal.
Aslongastheendofthehandlespacedatahasnotbeenreachedyet:
–FindingthesubsequentPEentryofthePEwhoseidentity(PHandPEID)hasbeenstored.
–ObtainingthenextuptoMaxElementsPerHTRequestreferencestoPEstructures.
–SavingPHandPEIDofthelastPEreferenceobtainedtoresumethehandlespacetraver-
sal.TherearetwopossibilitiestoselectthePEidentitiestobeincluded:allPEsoronlythePEsowned
bythePRitself.SincetheownPEsonlyoptionisonlyatraversalonasubsetofthePEs(basedon
theOwnershipSet,seesubsection4.4.5)fromtheperspectiveofthesynchronizationoperation,this
caseisneglectedhere.Instead,theperformanceanalysisconcentratesonthemosttime-consuming
task:thefulltraversal.
First,itisnecessarytospecifythestepsizeMaxElementsPerHTRequestforthetraversal.Since
theHandleTableResponsemessage(seefigure3.34foranillustration)–likeallmessagesasdefined
insection3.8–onlyallowsasizeof64KBytes,themaximumusefulstepvalueislimitedbythe
minimumsizeofthePoolElementParameterasdefinedinsubsubsection3.9.2.1(anillustrationof
thisparameterispresentedinfigure3.19):theminimumsizeofthisparametertypeis60bytes,
consistingof20bytesforheader,PEID,Home-PRIDandRegistrationLifeaswellasatleast16
bytesforaUserTransportParameter(withasingleIPv4address),atleast16bytesforanASAP
TransportParameter(withasingleIPv4address)andatleast8bytesforaPolicyParameter(e.g.
RoundRobin;seealsofigure3.20andT¨uxenandDreibholz(2006b)).AssumingthatallPEsarein
thesamepool(i.e.onlyasinglePoolHandleParameterisneeded;seealsofigure3.18),anENRP
HandleTableResponsemayconsistofuptoabout1,050PEidentities.Clearly,itisusefultofilla
messageuptothelimit.Therefore,astepsizeof1,024andasizeof128forcomparisonareevaluated.
Thesmallervalueshouldcoverrealisticcasesofmulti-homedIPv4+IPv6PEsandalsotheneedfor
PHs.multipleFigure7.9presentstheoperationsthroughputinoperationspersecondforavaryingnumberof
PEsinasinglepool.Note,thatthesynchronizationoperationisaglobaloperation;therefore,itisnot
usefultoshowanoperationsperPEandsecondvaluehere.Sinceanumberof1to1,000PEswould
notbeveryinteresting–thefirststepofsize1,024alreadyreturnsallPEreferences–thenumberof
PEsisvariedbetween1and5,000here.Clearly,forlessthan1,000PEstheoperationsthroughputfor
allstoragealgorithmssignificantlyexceedsmorethan15,000synchronizationspersecond(exceptfor
thelinearlistandthesmallstepsize–withstillmorethan6,000operationspersecond).Thisisby
ordersofmagnitudemorethansufficientforanyrealisticscenarioofthathandlespacesize.
However,forlargerscenariosthelinearlistbecomesquiteinefficient:foreachadditionalstepnec-
essary,allPEstructurestraversedbeforehavetobevisitedagainandagain.Thiseffectisconfirmed
bythecurveforastepsizeof128:thereisahighperformancedropcomparedtothelargestepsize.
Ontheotherhand,thetree-basedalgorithmsarequitefastandachievemorethan850synchroniza-
tionspersecondat5,000PEs.Furthermore,thedifferencebetweenthestepsizesof128and1,024
issmall.Theperformanceofthebinarytreeisonlyveryslightlyworsethanfortreapandred-black
tree:theIndexSet,whichisusedforthePEIDlookup,isorderedbytherandomPEID(seealso
subsection7.4.1).Anadditionalreasonforthehighperformanceofthetree-basedimplementationsis
theback-linkingofthenodesasdescribedinsection7.3:iteratingtothenextnodeisquiteefficient,
especiallyifthetreeisbalanced.

126

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.9:TheThroughputoftheSynchronizationOperation

ConclusionandSummary7.4.6

Insummary,thefirstsetofperformancemeasurementshasshownthatthehandlespacemanagement
approachisquiteefficientinscenariosofupto1,000PEs–aslongasanappropriatesetstorage
algorithmisused:abalancedtree.Whilethetreap–arandomizedtree–implementationissomewhat
simpler,themorecomplexred-blacktreecanguaranteebalancingandthereforeprovidesthebest
throughput.operationsWhileitisobviousthatthena¨ıveapproachofusinglinearlistsresultsinalowperformance,
simplebinarytreeshaveshowntobesurprisinglyinefficient:duetosystematicinsertionandremoval
intheSelectionSetandtheTimerSchedule,thesebinarytreescandegeneratetolinearlists.Sucha
degenerationresultsinanevenworseperformancethanforalinearlistitself.
Aftertheseinterestingresults,thenextstepoftheperformanceanalysisistoevaluatethescala-
bilityofthehandlespacemanagementapproachtomuchlargerpools.

7.5TheScalabilityoftheNumberofPoolElements

Inthissection,theperformanceofthehandlespacemanagementapproachisevaluatedforscenarios
containingupto100,000PEs.Suchscenariosseemtoberealisticforlarge-scalereal-timedistributed
computingpoolsasdescribedinsubsection3.6.5.Asalreadyshowninsection7.4,alinearlistor
anunbalancedbinarytreeasstorageimplementationsarecompletelyunsuitable.Therefore,onlythe
usefulimplementations–basedonred-blacktreeandtreap–areevaluatedhere.

7.5.THESCALABILITYOFTHENUMBEROFPOOLELEMENTS

Figure7.10:TheScalabilityoftheRegistration/DeregistrationOperation

127

7.5.1RegistrationandDeregistrationOperations
Thefirstoperationtobeevaluatedinthescalabilityanalysisisthecombinedregistration/deregistration
operationasdefinedinsubsection7.4.1.Again,itisusefultodifferentiatebetweendeterministicand
policies.randomizedAseFigurexpected7.10fromthepresentsresultstheinoperationssubsectionperPE7.4.1,andnosecondsignificantforadifrangeferenceofPEsbetweenbetween1determiniandstic100,000.poli-
cies(here:RoundRobin)andrandomizedpolicies(here:WeightedRandom)canbeobserved.Fur-
thermore,asithasalreadybeenobservedforthesmallerscenarios,thered-blacktreewithitsguar-
anteeoperationsforbeingperPEbalancedandsecondprovidesvs.aaboutsome0.4whatoperationsbetterperoperationsPEandthroughputsecondforthan50,000thetreap:PEs.about0.55
Insummary,thehandlespacemanagementapproachisevenapplicableinascenarioof100,000
PEs:atthisnumberofPEs,eachPEcouldregisterorderegisteraboutevery3seconds(morethan0.3
afewoperationsminutes,perPEthereandgistrationssecond,usingandaderered-blackgistrationstree).donotAssumingimposeaarealisticsignificantPEloadlifetimeontheofasCPU.shortas

OperationsRe-Registration7.5.2tionThene7.4.2xt.Agoperationain,itisoftheusefulscalatodifbilityferentiateanalysisisbetweentheadaptire-revegistrationandnon-adaptioperationveaspolicies.definedinFiguresubsec-7.11
showstheoperationsperPEandsecondforarangeofPEsbetween1and100,000.Fortheadaptive
policy(here:LeastUsed),eachre-registrationupdatesaPEspolicyinformationwitharandomly
chosenloadvalue.Ontheotherhand,thenon-adaptivepolicy(here:WeightedRandom)doesnot
information.ypolicthemodifyAsexpected,theperformancedifferencebetweentreapandred-blacktreeissmall,butthered-

128

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.11:TheScalabilityoftheRe-RegistrationOperation

blacktreeisalittlebitmoreefficient.Forthenon-adaptivepolicies,bothalgorithmsachieveanopera-
tionsthroughputofabout0.8operationsperPEandsecondforapoolof100,000PEs.AssumingaPE
lifetimeofasshortasafewminutes,theachievedthroughputforthenon-adaptivepoliciesisclearly
sufficient:Accordingtosection5.1ofStewart,Xie,StillmanandT¨uxen(2006a),there-registration
intervalisdefinedastothelowervalueof10minutesorthePEsRegistrationLifeparameter(see
20s.minus)3.9.2.1subsubsectionHowever,re-registrationsmayoccurmorefrequentlyincaseofanadaptivepolicy:wheneverthe
PEdecidesthatitspolicyinformationchangeshavetobepropagatedintothehandlespace,e.g.ona
loadchangefortheLeastUsedpolicy,thePEperformsare-registration.Asshowninfigure7.11,the
operationsthroughputfor50,000PEsstillachievesabout0.8operationsperPEandsecond,whileit
dropstoabout0.3operationsperPEandsecondfor100,000PEs.Thatis,carehastobetakenthat
thenumberofre-registrationsdoesnotoverloadthehandlespacemanagementifverylargepoolsuse
anadaptivepolicy.

HandlingimerT7.5.3Thescalabilityoftheoperationsthroughputforthetimeroperationasdefinedinsubsection7.4.3is
presentedinfigure7.12.Again,theresultsforthetwoextremesettingsofthefractionofownedPEs
–0%(none)and100%(all)–areshown.
Asexpectedfromtheresultsobtainedforthere-registrationoperationshownintheprevious
subsection7.5.2,thebestperformanceisprovidedbythered-blacktree.Thedifferencebetween0%
and100%ofownedPEsisquitesmall;howeveriscanbeassumedthatthetimerfrequencyforowned
PEsishigher:inthiscase,thetwoKeep-Alivetimersaremanaged(seealsosubsection7.4.3).The
frequencyofthetimerhandlingdependsonthetimeoutsettingsoftheKeep-AliveTransmissionand
Timeouttimers.Ontheotherhand,thetimerofanot-ownedPEistheLifetimeExpiryTimer,which

7.5.THESCALABILITYOFTHENUMBEROFPOOLELEMENTS

Figure7.12:TheScalabilityoftheTimerHandlingOperation

129

dependsNevonertheless,thePEthesRetimergistrationhandlingLifesettingoperation(seeisfastsubsubsectionenoughtohandle3.9.2.1)andscenariosisobofviously100,000longerPEs.at
about0.6operationsperPEandsecond.Thisisclearlysufficientfornot-ownedPEs(e.g.arealistic
RegistrationLifecouldbeintherangeofafewminutes)andalsoforownedPEs(e.g.theKeep-Alive
Transmissiontimersettingcouldbe60sandthetimeoutcouldbesetto10sinapoolof100,000PEs).
Notefurther,thatinlargescenariostheownedPEsareusuallydistributedamongthePRs,i.e.aPRis
onlythePR-HofafractionofthePEsintheoperationscope.

OperationsResolutionHandle7.5.4Thefigure7.13.performanceAgain,ofitisthehandledistinguishedresolutionbetweenoperationdeterministicasdefinedpoliciesin(i.e.subsectionusingthe7.4.4defisaulttakpresentedefromin
thetopselectionprocedureasintroducedinsubsection4.4.1)andrandomizedpolicies(i.e.usingthe
weightsumbasedselectionasdescribedinsection7.3).Inthepresentedscenario,MaxIncrement
hasbeensettoMaxHResItems.
Asalreadyexpectedfromtheresultsinsubsection7.4.4,thebestoperationsthroughputisreached
usingthered-blacktree:for50,000PEsandusingadeterministicpolicy(here:RoundRobin),about
2.5operationsperPEandsecondareachievedforMaxHResItems=1,whilethereisstillathrough-
putofmorethan1.75operationsperPEandsecondforMaxHResItems=3.Ontheotherhand,in
caseofarandomizedpolicy(here:WeightedRandom),theperformancedropsfrommorethan1.0
MaxoperationsHResItperemsPE=3.andForansecondincreasedforMaxHRnumberesItofemsPEs,=1thetoaboutoperations0.5operationsthroughputperbecomesPEandsecondsignificantlyfor
.smallerpresentedHavinginafigurelookat7.14the,theresultseforxpectationsMaxIncfromresment<ubsectionMaxHR7.4.4esItareemsandconfirmedMaxforHRlaresIgetemspools:=f10or

130

Figure

Figure

7.14:

CHAPTER

7.13:

The

The

Handle

7.

CEAHANDLESP

Scalability

Resolution

of

the

Handle

Scalability

of

GEMENTAMAN

Resolution

a

ariationV

PERFORMANCE

Operation

of

MaxIncrement

7.5.THESCALABILITYOFTHENUMBEROFPOOLELEMENTS

131

thedeterministicpolicy(shownontheleft-handside;here:RoundRobin),thereisasignificant
performancedifferencebetweenthecurvesfordifferentsettingsofMaxIncrement.Inapoolof
50,000PEs,asettingof1stillachieves2.5operationsperPEandsecond,whileanunlimitedsetting
dropstheoperationsthroughputtoabout0.5(red-blacktree).Theresultsfortherandomizedpolicy
(presentedontheright-handsideoffigure7.14)showtheexpectedresults:requiringallselected
PEidentitiestorunthroughtheunlink-update-re-linkcycledropstheoperationsthroughputfrom
about0.3operationsperPEandsecondtoabout0.2operationsperPEandsecond.Again,sincethe
locationsofthePEstructureswithinthetreedonotchangehere,theCPUscachehasacceleratedthe
unlink-update-re-linkcycle(seealsosubsection7.4.4).
Clearly,theoperationsthroughputfortherandomizedpolicyissignificantlylowercomparedto
thedeterministicone(e.g.2.5vs.0.3operationsperPEandsecondforMaxHResItems=10and
thelowestpossibleMaxIncrementinapoolof50,000PEsandusingred-blacktrees).However,
despiteofalowoperationsthroughputinscenariosoflargepools(e.g.morethan10,000PEs),itis
easilypossibletoscaleaRSerPoolsystembyaddingadditionalPRs.Thatis,apoolof100,000PEs
managedby10PRscanreducetheper-PRhandleresolutionloadto101th–undertheassumption
thattherequestloadofthePUsisapproximatelyequallydistributedamongthePRsoftheoperation
scope.Finally,acomparisonbetweenred-blacktreeandtreapagainshowstheexpectedresult:thered-
blacktreeissomewhatmoreefficientthanthetreap.

OperationsonizationSynchr7.5.5Thelastoperationtobeevaluatedisthesynchronizationoperationasdefinedinsubsection7.4.5.
Forbetweenthethesmallertwoscenariosrealisticinstepthissizes128subsectionand(up1,024tohas5,000beenPEs),observonlyed.asmallThisobservperformanceationcandifalsoferencebe
confirmedforpoolsizesofupto100,000PEsasshowninfigure7.15.Asithasalreadybeenobserved
forthepreviousoperations,thered-blacktreeprovidesthebestperformance.
Forascenarioof100,000PEs,itisstillfeasibletoperformabout30fullsynchronizationsper
second.Thisisbyordersofmagnitudemorethannecessaryinanyrealisticscenario:inawell-
plannedRSerPoolsystemofthatsize,thereareprobablyatleast10PRs–forredundancyreasons
andtodistributethehandleresolutionandmonitoringload.Thatis,theusualcaseistoonlyrequest
theoprocedurewned(seePEsincasesubsectionofa3.10.5detected).Assuminghandlespacethat100,000checksumPEsdifareferenceapproximatelyduringthedistribhandlespaceutedequallyaudit
moreamongthan10500PRs,suchoperationsaperhandlesecondtableareresponsepossible.mayTheincludefullabouthandlespace10,000PEs.synchronizaFortionthistableproceduresize,is
onlysubsectionnecessary3.10.3on).aInPRarealisticstartup,toscenario,requestthisthecasecompletemightoccurhandlespacee.g.oncecontentwithinfromsevaeralMentordays.PR(see

Summary7.5.6Asaresultofthescalabilityanalysisinthissection,ithasbeenshownthatthered-blacktreeistheap-
propriatedatastructuretostorethehandlespacedata.Inanycase,atreaphasareducedperformance,
sinceitcanonlyapproximateabalancedtreestructureinsteadofassuringit.
Forthehandlespacemanagement,ithasbeenshownthatitispossibletomanagealarge-scale
poolofupto100,000PEsifcarefullydeployed:
Foradaptivepoliciesinlargepools,carehastobetakenoftherateofre-registrations,e.g.bya
PRparameteror–thebettersolution–byareasonableapplicationdesign.

132

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.15:TheScalabilityoftheSynchronizationOperation

DistributingthenumberofPEsandPUsamongallPRsoftheoperationscopeisuseful:this
reducestheper-PRworkloadofdealingwithhandleresolutionrequestsandtimerhandling
timers).eveep-Ali(K

7.6TheScalabilityoftheNumberofPools

Whileonlyasinglelargepoolasbeenevaluatedinsection7.4andsection7.5,itisfurthermore
necessarytoanalysethehandlespacemanagementsscalabilitytoahighernumberofpools.

aluationEvandAnalysis7.6.1Figure7.16presentsthelogarithm(base10)ofthenumberofhandleresolutionoperations(seealso
subsection7.4.4)perPEandsecondforavaryingnumberratiobetweenpoolsandPEs,inhandlespa-
cesbeenconsistingchoseninoforder100,to1,000fittheandoperations10,000PEs.throughputShowingofthetheselogveryarithmdifinsteadferentofpoolthesizesactualintovaaluesinglehas
plot.ParametersforthehandleresolutionhavebeenMaxHResItems=1andMaxIncrement=1,the
policyhasbeenRoundRobin.
usingClearlytreaps,theleadstoresultsareducedmainlyreectperformancewhatcancomparedbeetoxpectedthefromusageoftheprered-blackvioustrees.simulationFurthermore,results:
distributingthenumberofPEsintoalargernumberofpoolsdoesnotresultinasevereperformance
degradation:at100PEs,theoperationsthroughputkeepsquiteconstantatslightlyabove1,600handle
resolutionsperPEandsecond–forallPEsinasinglepooluptoanownpoolforeachPE(red-black
fromtree).Faboutora150handlespaceoperationsperconsistingPEandof1,000secondPEs(forastoredsingleinpool)red-blackdowntotrees,aboutthe140throughput(foranisownreducedpool

7.6.THESCALABILITYOFTHENUMBEROFPOOLS

Figure7.16:TheScalabilityoftheNumberofPools

133

foreachPE)–areductionbylessthan7%.
However,thereductionbecomeslargerfor10,000PEs:fromabout13.5operationsperPEand
secondtoabout10.0(i.e.areductionbyabout26%;red-blacktree).Clearly,thisisaresultofthe
differencesbetweenPoolSetandIndexSet:asexplainedinsubsection4.4.6,themaximumPHsize
hasbeenlimitedto32bytes.InordertosorttheentriesofthePoolSet,abyte-wisecomparisonof
PHsisnecessary.Forthemeasurements,randomizedPHsofthemaximumsizehavebeenconstructed.
Clearly,thePHcomparisonismoreexpensivethansimplycomparing32-bitPEsequencenumbersas
itisneededintheSelectionSetfortheRoundRobinpolicy(seesubsubsection4.4.2.2).
Theresultsforregistration,re-registration,timerandsynchronizationoperationsareverysimilar
tothehandleresolutionresultsprovidedinthissubsection.Sincetheirresultsdonotprovideanynew
insights,theyareomittedhere.

7.6.2OutlookonFutureScalabilityEnhancements

ForrealisticRSerPoolscenarios,itcanbeexpectedthatthenumberofpoolsinanoperationscopeis
proportionaltothenumberofdeployedRSerPoolapplications.Fromthecurrentperspective,values
ofupto10seemtoberealistic,whilecasesofupto100mightsometimesbeuseful.Currently,the
onlydescribedapplicationscenariowiththerequirementforasignificantlylargernumberofpoolsis
theRSerPool-basedmobilitysupportforSCTP,asdescribedinsubsection3.6.6.
Sincethenumberofpoolsisassumedtobeverysmall,noadditionaleffortforaspeed-upof
thepoollookuphasbeentakenyet.Aspartoffutureworkontheoptimizationofthehandlespace
management,itmightbeconsideredtouseahashtableforthepoollookup.However,inthiscase
carehastobetakentoappropriatelychooseahashfunction.Asuitablehashfunctionmustmeetthe
requirements:wingfollo

134

CHAPTER7.HANDLESPACEMANAGEMENTPERFORMANCE

Figure7.17:UsingLeaf-LinkedTreesfortheSynchronizationOperation

1.Ithastobefast(ofcourse)and

2.Thehashingmaynotdegenerate,evenifanattackercreatespoolswithspecially-craftedPHs.

Thesecondrequirementreferstosocalledcomputationalcomplexityattacks,whicharedescribed
inCrosbyandWallach(2003).Thesolutionspresentedtherearesocalleduniversalhashfunctions.
AreasonablelimitationofthecurrenthandlespacemanagementisthatthemaximumPHlength
hasbeensetto32bytes,inordertosimplifyitsstorage(seealsosubsection4.4.6formoredetails).
Whileatthemomentthislimitseemstobejustifiedandsufficient,specialapplicationsinthefuture
mightwishtouselargerPHs.Inthiscase,ahashfunctionmightalsobecomeusefulforasmaller
numberofpools,duetothecostofcomparinglongPHs.

7.7UsingLeaf-LinkedTrees

Asithasbeenshownbythemeasurementspresentedinsection7.4andsection7.5,thehandlespace
managementapproach–incombinationwithanappropriatesetstoragealgorithm–alreadyachieves
agoodperformance.Asexplainedinsection7.3,theback-linkingtechniqueshouldallowtoeasily
findthepredecessororsuccessorofanygivennode.However,asexplainedinsubsection4.4.7,leaf-
linking,i.e.usingthetreeonlyforindexingintoadoubly-linkedlinearlist,mayachieveanother
performancegain.Toinsertanewnodebeforeorafteraknownnodeinto,ortoremoveaknownnode
fromadoubly-linkedlinearlistispossibleinO(1)time.Abalancedtreecanensurethequicklookup
ofnodestoinsertbeforeorafter,aswellasofnodestoberemoved.
Themainbenefitoftheleaf-linkingtechniquecanbeexpectedfromthesynchronizationopera-
tion(seesubsection7.4.5),whileaminorspeed-upmayalsobeexpectedfromthehandleresolution

YSUMMAR7.8.

135

operation(seesubsection7.4.4)incombinationwithadeterministicpolicy.Thespeed-upforthesyn-
chronization(in%comparedtotheconventionalimplementation)isshowninfigure7.17.Clearly,
asignificantperformanceimprovementforscenariosofupto10,000PEscanbeobserved:byup
toabout85%at2,500PEs.Thereasonforahigherperformanceimprovementinsmallscenarios
isthattheaveragetreedepthtoPEnumberratioptreeeNumDeptberhishigherifthepoolsizeissmall:e.g.
1,10000=0.01for1,000PEs,butonly65,16000≈0.000246for65,000PEs.Forrisingratios(i.e.more
shallowtrees),itcanbeexpectedthatthenumberofnodestovisitfortheback-linkingtechnique
describedinsection7.3alsorises;thisexplainsthebetterperformanceifleaf-linkingisused.
Formorethan7,500PEs,thespeedimprovementforleaf-linkedtreesisstillabout10%.As
expected,nosignificantdifferencebetweentreapandred-blacktreeisobservable.
Whileleaf-linkingclearlyprovidesasignificantspeed-upforthesynchronizationoperation,there
isunfortunatelyaperformancedropfortheotherfouroperations:theoperationsthroughputdecreases
byabout2%forthetreapandevenbyabout5%forthered-blacktree.Thereasonforthetreaps
smallerdecrementisitsless-optimalbalancing.Thatis,thedepthofthetreapissomewhathigher,
requiringafewmorenodevisitstofindapredecessororsuccessornodewithoutleaf-linking.
Insummary,leaf-linkingdecreasesthehandlingspeedperformanceforboth,treapandred-black
tree:itonlyachievesaspeed-upforthesynchronizationoperation(whichisrare)atthecostofan
increasedruntimefortheotheroperations(whichoccurfrequently).

Summary7.8

Asthemeasurementshaveshown,theappropriatealgorithmforthedatastorageinthehandlespace
iseasilythefeasiblered-black-tree.tomaintainForarealistichandlespacescenariosofupoftoPE100,000lifetime,PEsre-reonagistrationrouterCPU,andti.e.imeraCPUintervhaals,vingitisa
wouldcomputationnotpoimposeweranovcomparableerlytosignificanttheusedload1.3onGHztheCPU.Athlon.Inthiscase,thehandlespacemanagement
Thehandlespacemanagementapproachisscalableforboth,thenumberofPEsandthenumber
ofpools.However,foralargenumberofpoolsandlongPHs,itmaybeconsideredtoaddahashtable
lookup.poolthefor

136

CHAPTER

7.

CEAHANDLESP

GEMENTAMAN

PERFORMANCE

8Chapter

RSerPoolPerformanceResults

ORawell-designedRSerPoolscenario,itcanusuallybeexpectedthatcomponentfailuresoccur
veryrarely.Thatis,in99.9...%ofitsruntimethesystemisinnormaloperation.Clearly,
Ftheperformanceofthesysteminthisimportantcaseismostcrucialforasystemscost-
benefitratio.ThegoalofthischapteristhereforetoevaluatetheperformanceofRSerPoolsystemsin
scenarioswithoutfailures.Failurecasesareanalysedinchapter9.

oductionIntr8.1

InordertoperformevaluationsofaRSerPoolsystembasedontheRSPSIMsimulationmodeldescribed
inapplicationchapter6,itmodeliswillclearlybethefirstintroducedstepintothedesignandsubsequentrealizesections.anappropriateAfterthat,applicationsuitableandmodel.realisticThis
Toperformanceunderstandmetricsthe–efforfectsserviceofvproariousvidernetwandorkserviceanduserRSerPool–willbesystemdefined(seeparameters,sectionitis8.5).crucialto
knowtheimplicationsofdifferentworkloadcompositions:requestfrequencyanddurationaswell
asparallelism.Therefore,theinitialsetofsimulationspresentedinsection8.7willevaluatethese
parametersperformancefirst.byAfterperformingthat,itfurtherispossiblesimulations:toanswerimportantquestionsontheRSerPoolsystem

ThePolicyPerformance:Whichpoolpolicyachievesthebestsystemperformance,underwhich
circumstances?Arethereanypitfallsinpolicyconfiguration?

ThePU-SideHandleResolutionCache:WhatarethegeneraleffectsofthePU-sidehandleresolu-
tioncacheonthesystemperformance?Underwhichcircumstancescanitprovideabenefit,and
howisitconfiguredappropriately?

CopingwithNetworkDelay:Howdoesnetworkdelayaffectthesystemperformance?Isitpossible
toimproveitinscenariosofgloballydistributedpools?

HandlingferentHetercapacities?ogeneousIsitServpossibleertoCapacities:improvetheWhatsystemhappensifperformancethepoolinconsistssuchofscenarios?PEshavingdif-

Inparticular,itisalsoagoalofthesimulationsandevaluationstoidentifycriticalparameter
spacestoprovideguidelinesfordimensioningefficientRSerPoolsystems.

137

138

CHAPTER8.RSERPOOLPERFORMANCERESULTS

8.2TheRequirementsfortheApplicationModel
Beforeanyperformanceevaluationscanbeperformed,itisfirstnecessarytocreateanapplication
model.Theapplicationmodeltobedesignedhastomeetthefollowingrequirements:
1.Ithastomodeltypicalavailability-sensitiveInternetapplicationsand
2.Client-basedstatesharing(seesubsubsection3.9.5.2)forfailoversupporthastobeintegrated.
Inordertodesignanapplicationmodelfulfillingtheserequirements,itisnecessarytohavealookat
somecommonInternetapplicationsinordertoidentifytheirgenericbehaviour.
ThemostcommonInternetapplicationprotocolistheHTTPprotocol(seealsoFieldingetal.
(and1999)):optionallyinorderatobytedorangewnload(e.g.afile,toitonlyisgetrequestedthedatausingfromatheGET10,000thcommandbytetospecifyingtheendtheoffilethenamefile).
Usingthebyterangeoption,itispossibletocontinueaninterrupteddownload.Byapplyingthe
byconcepttheofSessionclient-basedLayer.ThestatemainsharingasresourcesdescribedconsumedinbysubsubsectiontheHTTP3.9.5.2do,thiswnloadtaskarecouldclearlybeI/Ohandledand
networkbandwidths;theyaresharedamongallcurrentlyrunningsessions.
tion3.6.5Another).Antypical,applicationavlikeailability-sensitithefractalvegraphicsapplicationiscomputationreal-timeservicedistribproutedvidedbycomputingthe(seedemonstrationsubsec-
systemoftheRSPLIBprototypeimplementation(seesection5.7foradetaileddescription)requests
thecomputationofafractalimagefromaserver.Incaseofaserverfailure,thecalculationcanbe
resumedusingastatecookie.Thatis,italreadyincludestheconceptofclient-basedstatesharing.The
mainresourceconsumedbyadistributedcomputingsystemiscomputationpower,hencethename.
Thisresourceissharedamongallcurrentlyactivesessions.
Insummary,mostInternetapplicationsrequestacertainamountofresources(e.g.bandwidthor
computationpower)fromaserver.Theseresourcesaresharedamongallsessions.Sessionresumption
incaseofafailovercouldberealizedbyclient-basedstatesharing.

8.3TheDesignoftheApplicationModel
Goaloftheapplicationmodeltobedesignedistomodelthegenericapplicationpropertieswhich
havesessions;beenidentifiedcheckpointsinsetbysectionstate8.2:acookiesresourceproisvidetherequestedpossibilityfromatoserverresume,itisansharedinterruptedamongallsessionactivone
anotherserver.Sinceonlythecontrolpath(i.e.thehandlingofsessions)isinteresting,thedatapath
isisoutomitted.ofscopeThatandis,thethereforeactualnottransmissionincludedinoftheresultapplicationdata(e.g.model.webpagesorcalculatedgraphicsdata)
normalTheoperationmessageissequencepresentedofinthefigureapplication8.1:amodelclientscanprotocolrequest–thecalledprocessingCalcAppProfaotocolRequest–duringfrom
aserverusingaCalcAppRequestmessage.EachrequesthasacertainRequestSize(RequestSize),
whichspecificisunit:givenforinatheHTTPabstractserviceunittheofrequestCalculationssizecould.Adenotecalculationafilesize,denoteswhileanitcouldarbitrary,meanaapplication-number
ofcertainoperationsRequesttobeIntervalperformed(ReqforuesatIntervreal-timeal)anddistribareutedappendedcomputingtothejob.RequestRequestsQueueareasgeneratedillustratedina
infigure8.2.Therequestsoftherequestqueuewillbeprocessedsequentially,i.e.eachtimethe
newlyprocessingselectedofaservrequesterforisprocessingcompleted,(liktheenextprintingoneinjobstheinaqueueprinter(ifstherequeue).isany)willbeprovidedtoa

8.3.

THE

DESIGN

Figure

OF

8.1:

Figure

THE

The

8.2:

TIONAPPLICA

CalcAppProtocol

The

Request

MODEL

Message

Generation

Sequence

and

for

Handling

at

Normal

the

Operation

Client

Side

139

140

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.3:TheMulti-TaskingBehaviouroftheServerSide

Aservercanacceptanewrequestifthenumberofcurrentlyactivesessionsdoesnotexceedits
configuredlimitMaxRequests.Inthiscase,theacceptanceoftherequestissignalledtotheclient
usingaCalcAppAcceptmessage.Ontheotherhand,iftheclientreceivesaCalcAppRejectmessage,
therequesthasbeenrejectedandanotherserverhastobechosenafterwaitingacertaintimespan
givenbytheparameterRequestRetryDelay.Thereasontowaitisaconsequenceoftheexperience
withthereal-timedistributedcomputingsystemdescribedinDreibholz,RathgebandT¨uxen(2005),
Zhang(2004):ifcurrentlynoservercanaccepttherequest,thenetworkwouldbeoodedwithserver
selection(i.e.handleresolution),requestandrejectmessages.
TheCapacity(Capacity)ofaserverissharedamongallcurrentlyactivesessionsinaMulti-
Taskingmanner.Clearly,theunitforthecapacityiscalculationspersecond,i.e.dependingonthe
applicationthiscoulde.g.meanbandwidth(bytespersecond)orprocessingoperationspersecond.
Anexampleforthehandlingofrequestsinareal-timedistributedcomputingsystemisillustratedin
figure8.3:atfirst,request#1istheonlyactiverequest.Thatis,thisrequestcanutilizetheservers
fullcapacity.Whenrequest#2isstarted,ithastosharethecapacity(eachrequestishandledwithhalf
oftheserversCPUpower).Afterstartingathirdsession,eachrequestcanonlybehandledwithone
thirdofthecapacityandsoon.
Duringtheprocessingofarequest,bothsides–i.e.theclientandtheserver–sendCalcApp-
KeepAlivemessagesinaregularintervalgivenbytheparameterSessionKeepAliveInterval.The
peersidehastoacknowledgeaCalcAppKeepAliveusingaCalcAppKeepAliveAckmessage.Ifthere
isnoanswerwithinthetimeintervalgivenbytheparameterSessionKeepAliveTimeout,thepeer
sideisassumedtobedead.Fortheserverside,thissimplymeanstoremovethesession;theclient
sidehastoperformafailover.
Toactuallysupportafailover,client-basedstatesharingisused.Thatis,theserversidesendsa
statecookieeachtimeoneofthetwofollowingconditionsapplies:
1.theThesessionnumberhasofexceededcalculationsCookieMaxconsumedCalbycuthelationsessionsaftercalculations.sendingthelastcookieorstarting
2.Thetimeelapsedaftersendingthelastcookieorstartingthesessionhasexceededthenumber
ofsecondsgivenbyCookieMaxTime.
detected,Thethemessagenewoservwerof#2afisailovchosenerisanditpresentedistriedintofigureresume8.4:theaftersessionthefbyailuresendingofservtheerstate#1hascookiebeento

8.4.THEIMPLEMENTATIONOFTHEAPPLICATIONMODEL

Figure8.4:TheCalcAppProtocolMessageSequenceforaFailover

141

thenewserverusinganASAPCookieEchomessage.Server#2hasacceptedtherequestandresumes
thesessionatthecheckpointspecifiedbythestatecookie.Clearly,thecalculationsprocessedbythe
oldserver#1betweensendingitslastcookieandthefailurearelostandhavetobeprocessedagain
bythenewserver.
AservercanperformasocalledcleanshutdownbysendinganASAPCookiedirectlyfollowed
byaCalcAppAbortmessage.Thatis,theclientistoldthattheservercannotcontinueprocessingthe
requestbecauseitisshuttingdown.Duetotheup-to-datestatecookie,anewserverdoesnotneedto
calculation.yanre-processFinally,whentheprocessingofarequestiscompleted,itissignalledtotheclientbyaCalcApp-
message.Complete

8.4TheImplementationoftheApplicationModel

ToactuallyrealizeaPUandPEforthecalculationapplication,theserverandclientsideshavebeen
implementedinformofaServerApplicationProcessandaClientApplicationProcessmodulefor
theRSPSIMsimulationmodel,asdescribedinsubsubsection6.4.3.2(PEmodule)andsubsubsec-
tion6.4.3.3(PUmodule).Thatis,theserverregistersintoapoolusingthePoolElementASAP
ProcessmoduleandtheclientutilizesthePoolUserASAPProcessmoduleforserverselection.The
application-relatedparametersofPUandPEaresummarizedintable8.1.

8.5ThePerformanceMetrics

Givactuallyenthedefinedesignofperformance:theapplicationtheservicemodelproviderintroducedsperspectiinvsectionereferring8.3,totherethearepooltwoofviePEswpointsandtheto

142

CHAPTER8.RSERPOOLPERFORMANCERESULTS

PDescriptionComponentNamearameterCapacityPECapacityoftheserver[Calculations/s]
MaxRequestsPEMaximumnumberofsimultaneousrequests
CookieMaxCalculationsPEMaximumcalculationsbeforesendingastatecookie
CookieMaxTimePEMaximumtimebeforesendingastatecookie
RequestSizePUSizeofrequests[Calculations]
RequestIntervalPUIntervalofrequests[s]
RequestRetryDelayPUDelaybeforeretryingrequestdistribution[s]
SessionKeepAliveIntervalbothSessionKeep-AliveInterval[s]
SessionKeepAliveTimeoutbothSessionKeep-AliveTimeout[s]

Table8.1:TheCalculationApplicationPoolElementandPoolUserParameters

inservicethefollouserswingperspectisectionsve–oneconcerningforthetheservicePUs.providerTherefore,sidetwoandoneperformancefortheservicemetricsareuserbeingside.defined

8.5.1PerformancefromtheServiceProvidersPerspective
FtooraoptimizeservicetheproSystemvider,theUtilizationprimary(seegoalalso–andDreibholzthereforeandtheRathgebmain(2005cperformance,e)),whichmetricis–isdefinedobasviously
sesourceRdUseSystemUtilization=AvailableResources.(8.1)
Thatis,howmanyoftheavailablecalculationshavebeenactuallyusedtoprocesswork?Intheusual
case,utilizationaserviserlopoolwer,eisxpensidesignedveservforeracapacitycertainaisvwerageasted.TarOngettheSystemotherhand,Utilizationatooofhighe.g.80%.utilizationIfre-the
ducesthesystemscapabilitytocopewithtemporaryoverloadandtheresilienceincaseofcomponent
ailures.finAchapter7secondaryandgoalDreibholzoftheandserviceRathgebpro(vider2005bis),tokpooleepthemanagementpoolmanagementmainlymeanscostlow.handlespaceAsdescribedman-
agement.Thesecondaryperformancemetricoftheserviceprovidersideisthereforetokeepthe
handlespaceoperationsruntimeinanacceptablerange.

8.5.2PerformancefromtheServiceUsersPerspective
Thegoaloftheserviceuserisclearlytogethisrequestshandledasquicklyaspossible.Thatis,the
andperformanceRathgeb(metric2005c)),onthewhichisserviceinuencedusersidebyisthetoitemsmaximizeshowntheinfigureHandling8.5:Speed(seealsoDreibholz

QueuingDelay:Afterarequesthasbeengenerated,itisenqueuedintotherequestqueuetowait
untilallpreviouslycreatedrequestshavebeenprocessed(seesection8.3).
StartupDelay:Whentherequestisdequeuedfromtherequestqueue,aPEhastobeselected(pos-
siblyrequiringaRTTbetweenPUandPRforahandleresolution)andtherequesthastobe
profferedtotheselectedPEandbeaccepted(requiringaRTTbetweenPUandPE).Possibly,
thePErejectstherequestandtheprocedurehastoberepeateduntilaPEacceptstherequest.

METRICSPERFORMANCETHE8.5.

Figure8.5:TheDefinitionoftheRequestHandlingTime

143

ProcessingTime:TheprocessingtimeisthetimespanbetweentheacceptanceofarequestbyaPE
(receptionoftheCalcAppAcceptatthePU)anditscompletion(receptionoftheCalcAppCom-
pletemessageatthePU).Incaseoffailovers,theprocessingdelaynotonlyincorporatesthe
actualtimerequiredtoprocesstherequest(thesocalledGoodputTime)butalsothetimefor
failover(i.e.selectingandcontactinganewPE,aswellasre-processingcalculationsnotsaved
bythelateststatecookie;thisisthesocalledFailoverTime).

TheHandlingTimedHandlingforarequestisdefinedasthesumofqueuingdelaydQueuing(stay
inrequestqueue),startupdelaydStartup(dequeuinguntilreceptionofacceptanceacknowledgement
fromPE)andprocessingtimedProcessing(acceptanceuntilfinish):
dHandling=dQueuing+dStartup+dProcessing.(8.2)
Givenarequestssize,itispossibletocalculatetherequestsHandlingSpeed(incalculations/s)as
ws:folloHandlingSpeed=RedquestSize.(8.3)
lingHandTheperformancemetricforthepooluserisclearlytomaximizetheaveragehandlingspeedfor
therequestsinthesystem.Thatis,thesumofallcompletedrequestssizesdividedbythetotaltime
consumedfortheirhandlingshouldbeashighaspossible.
InordertoalsomeasurethespeedaPErequirestoprocessarequestofagivensize,theProcessing
Speedisdefinedasfollows:

ProcessingSpeed=RedquestSize.(8.4)
gessincProInordertomakehandlingspeedandprocessingspeedvaluesindependentoftheaveragePE
capacity,itispracticaltonormalizethem.Therefore,itisusefultodividetheprocessingandhandling
speed(incalculations/s)bytheaveragePEcapacity(incalculations/s)andrepresenttheresultingratio
ainv%.erageForrate;eifxample,thearequestprocessinghasgotspeedhalfofofthe100%averagemeansthatcapacitythe,therequestprocessinghasbeenspeedisprocessed50%.atthefull

144

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.6:ThePerformanceSimulationScenario

Furthermore,thesizeofrequestscanberepresentedbytheRequestSratioizebetweentherequestsizein
ascalculationsrequestandsize:PEtheavcapacityeragePEratioandcapacity(inrepresentsthecalculations/s):runtimep(ineCapacitseconds)y.Thisrequiredratioforwillebexclusidenotedvely
processingarequestofthegivensizeonaPEprovidingtheaveragecapacity.

8.6TheSimulationScenarioSetupandResultsPresentation
Thesimulationscenariosetupusedforthefollowingperformancesimulationsispresentedinfig-
ure8.6.ItconsistsofoneormoreLANmodules(seesubsection6.4.1),eachincludingavariable
numberofPR,PEandPUmodulesconnectedtoaswitch(i.e.aTransportNodemodule,seesub-
subsection6.4.2.2).Unlessotherwisespecified,thefollowingsystemparametersareusedforthe
simulationspresentedinthesubsequentsections:
Anactualsimulatedreal-timeof120minutes,afterastartupphaseof15minutesforinitializing
andstartingupallcomponents(allstatisticsareresetafterthestartupphase);
Nocomponentlatencies(thelatencyofthehandlespacemanagementisnegligible,asshownin
section7.4aswellasinsection7.5andinDreibholzandRathgeb(2005b));
Latency-freenetworklinks(theeffectsofnetworkdelayareevaluatedinsubsection8.10.1);
AnetworkconsistingofasingleLANmodule(LANsinterconnectedviaWANlinksareevalu-
);8.10.1subsectioninated1PR(sincefailurescenariosarenotevaluatedhere;seesection8.9fortheeffectsofvarying
PRs);ofnumberthe10PEs,eachprovidingacapacityof106calculations/s(heterogeneousservercapacityscenarios
areevaluatedinsection8.12);
ThenumberofsimultaneousrequestsperPEisnotlimited,i.e.requestswillnotberejected;
Atargetsystemutilization(seesubsection8.5.1)of80%;

8.7.UNDERSTANDINGTHEIMPACTOFTHELOADPARAMETERS

145

Negativeexponentialdistributionforrequestintervalsandrequestsizes,sincethegoalisa
genericparametersensitivityanalysiswhichisindependentofspecificapplicationtypes(see
and)8.5.1subsectionThePU-sidehandleresolutioncacheisturnedoff(i.e.thestalecachevalueissetto0s;the
cacheparametersareevaluatedinsubsection8.11.1).
requests;Fortheare-reload-basedgistrationpoolispolicies,performedLoadoneisverydefinedloadasthechange.currentnumberofsimultaneouslyhandled
statisticalEveryaccuracsimulationy.Theconsistsresultingof24plotsrunsha–veeachbeenusingagenerateddifferentusingseedGNvUalueR–astodescribedensureainsufsubsec-ficient
tion6.3.2Unlessandshootherwisewtheavspecified,eragevthealuesresultsandtheirpresentation95%inconfidencethisandintervthefolloals.wingchapterisstructured
ws:folloasAcurvesresults(profigureviderscontainsperformancetwoplots;metric)theandleft-handtheplotright-handshowsplottheapresentsveragetheavsystemerageutilizationrequest
handlingspeed(usersperformancemetric).Theaxiscolourisuniqueforeachoutputunit,in
ordertodifferentiatebetweencomparableandnon-comparableplots.
TheThatis,ifhandlingarequestspeedhasisbeenbeinghandlednormalizedwithbyathespeedavoferage7.5*10PE5capacitycalculations/sandandpresentedtheavineragepercent.PE
capacityis106,theresultingvalueis75%.
Thelegendshowsthevariablesettingsforeachcurveoftheplot.Thebindingsofthevariables
andtheirunitsinsquarebrackets(ifuseful)canbefoundright-justifiedabovetheplot.
Toenhanceclearness,differentsettingsofthefirstvariableresultindifferentcolours/shadesof
thecorrespondingcurves.Differentsettingsofasecondvariablearerepresentedbydifferent
linestyles(e.g.solidanddotted).
Forexample,theplotsoffigure8.8includethefollowingtwovariables:
1.Thepoolpolicyp(withthepolicyname,ofcourse)and
2.Therequestsize:PEcapacityratios(inseconds).

8.7UnderstandingtheImpactoftheLoadParameters
InordertoevaluatetheperformanceofRSerPoolsystems,itismandatorytowellunderstandthe
inuenceofdifferentloadparameters.Therefore,thissectionfirstquantifiesloadandthenshows
initialsimulationresultstoidentifycriticalparameterranges.Theseresultshavealsobeenpublished
inDreibholzandRathgeb(2005c).

orkloadWQuantifying8.7.1Asdescribedinsection8.3,twoPU-sideparametersdefinetheloadcreatedbyaPU:
Therequestsize(RequestSize)and
Therequestinterval(RequestInterval).

146

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.7:TheCoherenceoftheThreeWorkloadParameters

GiventhenumberratiobetweenPUsandPEs,thesocalledPU:PERatio(puToPERatio)aswellas
the(average)capacityperPE,itispossibletocalculatethesystemsutilization:
izeRequestSaltervRequestInSystemUtilization=puToPERatio∗peCapacity.(8.5)
TheloadfractiongeneratedbyasinglePUisgivenbythefollowingformula:
izetSuesqRepuLoad=RequestInterval∗peCapacity.(8.6)
Awell-designedRSerPoolsystemisprovisionedforacertaintargetsystemutilization(denoted
asTargetSystemUtilization,e.g.60%or80%)asdescribedinsection8.5.Thatis,undertheas-
(i.e.sumptionpuTofoPEafixRatioed,avReqerageuestSPEizeandcapacityRe,qgivuesentInvtealuesrval),fortheanythirdtwoofparameterthethreecanwbeorkloadcalculatedparametersusing
equation8.5.Thisleadstothefollowingthreeequations:
RequestInterval=puToPERatio∗RequestSize,(8.7)
TargetSystemUtilization∗peCapacity
puToPERatio=RequestInterval∗TargetSystemUtilization∗peCapacity,(8.8)
izetSuesqReRequestSize=RequestInterval∗TargetSystemUtilization∗peCapacity.(8.9)
RatiooPETpufigureTo8.7m:akethetheleft-handcoherencesideshoamongwsthethethreerequestwintervorkloadalbasedparametersonclearequation,e8.7,xamplesthearemiddleprovidedpartthein
PU:PEratiobasedonequation8.8andtherightparttherequestsizebasedonequation8.9.Note,that
therequestsizehasbeennormalizedbythe(average)PEcapacitytomakeitindependentfromthe
setting.capacityInthefollowingthreesubsections,theperformanceofRSerPoolsystemsisevaluatedfordifferent
policiesandvariationsoftheworkloadparametersdescribedabove.

8.7.2VariationofPU:PERatioandRequestSize
Forthefirstsimulation,thePU:PEratiorhasbeenvariedfrom1to20forrequestsize:PEcapacity
ratiossfrom1to100.Foreachpairofbothvalues,therequestintervalhasbeencalculatedusing
equation8.7describedinsubsection8.7.1(seetheleftpartoffigure8.7foranillustration).Theresults
ofthesimulationarepresentedinfigure8.8.

8.7.UNDERSTANDINGTHEIMPACTOFTHELOADPARAMETERS

Figure8.8:TheVariationofPU:PERatioandRequestSize

147

8.7.2.1TheImpactontheSystemUtilization
Asshown,thePU:PEratior–givingthedegreeofparallelisminrequesthandling–hasasignificant
impactontheutilization:atr=1,theutilizationisat53%fortheRandompolicyandbetween60%and
for65%theforpoliciesRoundRobin.becomesUsingsignificantlytheLeastsmallerUsedifpolicry,itincreases:nearlyforrreaches=5,the80%.difTheferenceutilizationisaboutdif6%ferenceand
forr=10,itdecreasestoabout3%.
istheThenumberreasonofforthisrequestsbehaprocessedviour(i.e.asimultaneouslydecreasingbytheutilizationPEs:difaPU:ferencePEforratioarrising=1,therePU:PEshouldratiorbe)
exactlyonePEforeveryPU.Thatis,eachPUexpectstooccupyaPEexclusively,whichprocesses
itsrequestsduring80%(targetutilization)ofitsruntime(seealsoequation8.5andequation8.6).
EachcapacitytimeupatobadhandlePEistwoselectedrequestsforarequest,simultaneouslyone.PEObremainsviously,idlethisbehawhileviouranotherismoneosthasfrequenttosplitifPEsits
arechosenrandomly.ForRoundRobinselection,thePEjustselectedshouldbechosenagainonly
afterhavingusedeveryotherPEbefore.Thismethodalreadyachievesasignificantimprovementover
–theexceptRandomforthepolicyrare.casesFinally,ofLeastsimultaneousUsedhastheselectionkno–wledgetheoftheleast-loadedPEsPEcurrentcanloadbeused.states;Thisthereforeisthe
reasonforthesuperiorperformanceoftheLeastUsedpolicy.
Observingtheutilizationforavariationoftherequestsize:PEcapacityratios,onlyminordiffer-
side),encestheareshoutilizationwn.Evenbetweenforas=1change(solidoftwolines)ordersandsof=100magnitude(dottedaslines)presentedonlyindecreasesfigure8.8byabout(left-hand3%
fortheLeastUsedpolicy,aboutupto4%forRoundRobinabout1%forRandom.Thereasonfor
theutilizationdecrementisthatlongerrequestsincreasetheimpactdurationoftheselectiondecision:
foronee–xample,putativaelyheathevily-loadedappropriatePEchoicemay–becomemaystayidleatwithinthisloadthelenevxtelfeforwquitesecondssomewhiletime.aThis75%-loadedeffect
isillustratedwithaninfigure8.9:badselectionsforlargerequests(left-handside)maylastfor
quitesometime,whiletheeffectofsuchaselectionforsmallrequests(right-handside)isshort.An
analogonispresentedinfigure8.10:itismorewastefultostorelargestonesinabucketthanstoring
smallpebbles–althoughthevolumesofbothsetsofstonesareequal.Clearly,theprobabilityofa

148

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.9:ARequestSchedulingExample

Figure8.10:AnAnalogonforRequestScheduling

non-optimalassignmentishighestfortheRandompolicyandlowestforLeastUsed,explainingthe
performancedifferencesamongthesepolicies.

8.7.2.2TheImpactontheHandlingSpeed

Theresultsforthehandlingspeedshowninfigure8.8(right-handside)reecttheresultsforthe
systemutilization:sincetheper-PUload(seeequation8.6)becomeshigherwithalowerPU:PE
ratio,abadselectiondecisionleadstoqueuingofrequests.Clearly,thisrequestjamsignificantly
contributestothehandlingspeedloss.
Whiletheutilizationforhigherrequestsizessdecreases,thehandlingspeedincreases:forex-
ofamples=100100isafrequestsfectedofonlysizesonce=1–areforafthefectedsame100numbertimesbyofcthealculationsqueuingtodelaybe,whileprocessed.onelargeClearly,requestthis
resultsinasignificanthandlingspeedgain.
acertainUnlikvealuetheforallutilizationpoliciescurvifestheforthePU:PEdifratioferentrpolicies,becomesthehighhandlingenough.speedInstead,doesanotconsignificantvergegapto
andamongRoundtheRobinhandlingfrequentlyspeedsofmaktheebadRandom,assignments,RoundRobinleadingandtoLeastlowUsedhandlingpoliciesspeedsremains:andthereforeRandom
tothereforelongerlorequestwersthequeuespenaltyofofthePUs.queuingWhiledelaythe(leadingper-PUtoanload(simproeevedequationhandling8.6)speed),decreasesthewithprobabilityr,and
tothanmakforetheLeastnon-optimalUsedselectionpolicy.decisionsremainssignificantlyhigherforRandomandRoundRobin

8.7.UNDERSTANDINGTHEIMPACTOFTHELOADPARAMETERS

Figure8.11:TheVariationofRequestSizeandRequestInterval

149

8.7.3VariationoftheRequestSizeandRequestInterval
Inthenextsimulation,therequestsize:PEcapacityratioshasbeenvariedbetween5and100for
valuesoftherequestintervalibetween1sand500s.Usingequation8.8describedinsubsection8.7.1,
thePU:PEratiohasbeencalculated.Note,thatthesmallertherequestsize,thehigherthePU:PEratio
(seealsothemiddlepartoffigure8.7).Theresultsofthesimulationareshowninfigure8.11.
Obviously,theutilizationishighestforsmallvaluesoftherequestsize:PEcapacityratios,since
herethePU:PEratiorishighest.AsalreadyobservedintheanalysisofthePU:PEratioinsubsec-
tion8.7.2,thedifferenceamongthepoliciesbecomessmallforhighvaluesofr,duetotheparallelism
oftherequesthandling.SettingtherequestintervalitoahighervalueleadstoahigherPU:PEratior.
Therefore,theutilizationdecrementrateforrisingsbecomessmaller.Comparetheresultsfori=50
(solidlines)withthecurvesfori=500(dottedlines)ats=25:thePU:PEratioats=25hasalready
reached1fori=50andthereforethegapamongthethreepoliciesasexplainedinsubsection8.7.2can
beobserved.Ontheotherhand,thePU:PEratioris16fori=500;therefore,thedifferenceamong
thepoliciesisfairlysmallatonlyabout2%.
Thehandlingspeedresultsshownontheright-handsideoffigure8.11reecttheobservations
fortheutilizationcurves.rsinkswitharisingrequestsizes,i.e.thepossibilitytocompensatebad
selectionsbyparallelismbecomessmallerandthehandlingspeeddecreases.Obviously,thiseffect
becomessmallerforhigherrequestintervalsi.Again,asexplainedinsubsection8.7.2,thereisahuge
differenceamongthethreepolicies,duetotheirdifferentselectiondecisionqualities.

8.7.4VariationoftheRequestIntervalandPU:PERatio
Therequestintervalisthethirdandlastworkloadparameter.Ithasbeenevaluatedintherangefrom
1sto500sforPU:PEratiosfrom1to10.Therequestsizehasbeencalculatedusingequation8.9
describedinsubsection8.7.1;thehighertherequestinterval,thehighertherequestsize(seealsothe
right-handsideoffigure8.7).Figure8.12presentstheresultsofthesimulation.
eWxplainedithinincreasingsubsectionrequest8.7.2intervandali,subsectiontherequest8.7.3,thissize:PEleadscapacitytoasmallratiosalsodecrementincreases.oftheAsutilization,already

150

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.12:TheVariationofRequestIntervalandPU:PERatio

duetothelongerdurabilityofselectiondecisions–inparticular,itextendstheimpactofbad
choices.Obviously,themostinuencingfactoristhePU:PEratior:whilethepoliciesutilizations
forr=1differintherangeofabout20%to25%,thevariationalreadysinkstoabout10%forr=3for
thereasonofparallelismasexplainedinsubsection8.7.2.
Thehandlingspeedcurvespresentedontheright-handsideoffigure8.12mainlyreectthere-
sultsforthesystemutilization.Butwhiletheutilizationslightlydecreases,thehandlingspeedslowly
increases.Thereasonisthatsincreaseswithi.Therefore,asexplainedindetailinsubsection8.7.2,
thenumberofrequestsdecreasesandtheprocessingtimedProcessinginequation8.2gainsmoreim-
portanceoverthequeuingdelaydQueuing.Thisimpliesahandlingspeedimprovement(seealso
equation8.3).Note,thatthehandlingspeedcurvesforRoundRobinandRandomatr=1exceedthe
curvesatr=3forsufficientlyhighvaluesofi.Here,thegainbylarger(andthereforefewer)requests
atr=1overstepsthegainbyparallelismatr=3.

Summary8.7.5Insummary,threeimportantworkloadparametershavebeenidentified:
ratio,PU:PETheandsizerequestTheal.intervrequestTheForagiventargetsystemutilizationandPEcapacity,anyoneoftheseparameterscanbecalculatedif
valuesfortheothertwoareprovided(usingequation8.5).
IthasbeenshownthatthePU:PEratioisthemostcriticalworkloadparameter.AsmallPU:PE
andratiobadmeansPEahighselectionsper-PUleadtoload;areducedappropriatelysystemschedulingutilizationtheandrequestshandlinginspeed.thisVcasearyingismostthedifrequestficult
size:PErequestsiswcapacityorseratio(sinceittogetherisevenwithmorethedifPU:PEficultratio,toitachiecanvebeaobservgoodedthatscheduling),thebutilizationuttheforhandlinglonger

8.8.THEFALLACIESANDPITFALLSOFROUNDROBINSELECTION

151

speedincreases(sinceasinglebutlongrequestisaffectedbyqueuingdelayonlyonce,whilen
smallcombinationsrequestscanarebeaffecteddeduced.ntimes).Theperformanceresultsforthetwootherworkloadparameter
UsedpolicComparingyprothevidesthreeaebettervaluatedperformancepoolpolicies,thantheithasnon-adaptbeenivshoewnRoundthatRobin.usuallytheAndadaptiRoundveRobinLeast
stillachievesabetterperformancethanRandom.SinceRoundRobinisaverysimple,non-adaptive
policy,itseemstobeagoodchoicetomakeitthedefaultpolicyofRSerPool(i.e.makeitthepolicy
thatsomemustpitfallsbethatsupportedmustbebytakevenerycareRSerPoolof.implementation).However,theRoundRobinpolicyhas

8.8TheFallaciesandPitfallsofRoundRobinSelection
ThissectionpresentseffectsoftheRoundRobinbasedpoliciesthatleadtoasevereservicedegrada-
tionincaseofunsuitableparameterchoices.

8.8.1PitfallsoftheRoundRobinListPointer
Thefirstperformancedegradationeffect,whichhasbeenpublishedbyusinDreibholz,Rathgeb
andT¨uxen(2005),occursiftheRoundRobinpolicyisusedwithaninappropriatesettingofthe
MaxIncrementparameter.Asexplainedinsubsection7.4.4,thisparameterspecifies–incaseof
theRoundRobinpolicy–byhowmanystepstheselectionpointerintotheRoundRobinlistis
advancedwhenkPEidentitiesofthepoolarereturnedbythePRuponahandleresolutionre-
quest.Thatis,ifthePRreturnsk≤MaxHResItemsPEidentities,thepointerisadvancedby
l=min(kConsidering,MaxIancreRoundment)Robinonly.selection,thena¨ıveassumptionisclearlytouseaconfigurationof
MaxactuallyIncremenreturned.t=MaxHoweHRveresI,tthisemscanandleadtothereforeaseveretoadvanceperformancethedelistpointergradation,byasasshomanwnyinfigureelements8.13as
forhavingvariedMaxIncrementinpoolsof10and24PEswithasettingofMaxHResItems=∞(i.e.
allPEidentitiesofthepoolarereturneduponahandleresolutionrequest).Thisexamplesimulation
hasusedatargetsystemutilizationof60%,arequestsizeof107calculations(negativeexponential
distribution)andaPU:PEratioof2.
notuseThetheirreasoncache,forthei.e.theirobservedstalecachesystematicvaluehasperformancebeensetdropsto0s.istheThen,behafromviourtheoflistthereturnedPUs:thebyythedo
PRpoolisonlyantheintefirstgerPEmultipleisactuallyoftheusednumberforofprocessingentriesinatherequest.listsentThatbackis,ifbythethePR,numberspecificofPEsPEsinwillthe
besystematicallyoverloadedwhiletheotherswillbehardlyused.Assume,e.g.thatthepoolconsists
ofPRtheinapoolhandleelementsresolutionPE#1toresponsePE#6–andMaxthatHResIthetems–configuredis3.Nonumberw,theoffirstPEhandleidentitiesdeliresolutionveredquerybytheto
thePRreturnstheset{PE#1,PE#2,PE#3},thefollowingonewillreturn{PE#4,PE#5,PE#6}.
Then,newhandleresolutionsagainstartwith{PE#1,PE#2,PE#3}andsoon.Intheworstcase,
thepoolsizeissmallerthanMaxHResItems.Then,thereplyisalwaysthesame(i.e.thecomplete
pool).FromthelistreceivedfromthePR,thePUselectsagainonePEtoestablishtheapplication
theconnectionlist.Thatto.is,onlyAccordingPE#1toandRoundPE#4Robinwillbeusedselection,tothishandleselectedrequestsPEwhilewillalthewaysfourbeotherthePEsfirstoneremainof
idle.Theexampleshowninfigure8.13makesthisdegradationeffectclear:forthepoolof10PEs,

152

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.13:ThePerformanceforaFixedStepSize

theworstcaseisclearlyMaxIncrement≥10.Note,thatMaxIncrementsettingsabove10causeno
changes,sinceitisnotpossibletoreturnmorethan10elements.Alocalperformanceminimum
canbeobservedatMaxIncrement=5.Inthiscase,onlythefirstPEandsixthPEareused.Setting
sameMaxInefcrfectemencanttobeaneobservvenedfornumbertheresultspoolofin24usingPEsaifsetofMax5InPEscreonlyment(theissettoaodd-numbereddivisorofPEs).24.TheIn
particular,localperformanceminimacanbefoundatasettingof12(onlytwoPEsareused)aswell
as6and18(onlyfourPEswillbeused).
TherearetwopossibleapproachestosolvetheproblemoftheRoundRobindegradation:to
setMaxIncrement=1,i.e.alwaysadvancingthelistpointerbyone,ortorandomizethenumberof
steps.Thefirstcaseisalreadyshowninfigure8.13:here,itcanbeobservedthatasettingof1for
MaxIncrementsolvestheproblem.Thatis,thesystemutilizationisat60%asdesiredandthebest
reached.isspeedhandlingpossibleInthesecondapproach,thepointeradvancementisrandomizedbychoosingarandomstepsize
σ∈R{1,...,MaxIncrement}foreachhandleresolutioncall.Asshownbytheresultspresentedin
figure8.14,theutilizationbecomesindependentoftheMaxIncrementsetting.However,thehandling
speedisnegativelyaffectedbysettingslargerthan1.Thereasonforthiseffectisthatforσ>1,atleast
onePEisskippedforthecurrentRoundRobinround.Thatis,theroundfinishesearlierandincreases
ofthethisPE,probabilityallofthatitsanrequestsalreadyreceiloadedveaPEofdecreasedthelistprocesgetssinganotherspeed.request.ThisDueleadstotoatheincrereductionasedofloadthe
speed.handlingeragevadentlyItisperformimportantRoundtonoteRobinthatusingselections.theThatPU-siis,deitcacheisnotcreatespossibletoadditionalachieveainstancesglobalwhichRoundindepen-Robin
selectioninthiscase.Nevertheless,settingMaxIncrement=1orrandomizingthenumberofsteps
ensuresthateachotherselectioncomponentperformsitsown,localRoundRobinselectionusinga
point.startingferentdifAtthetimeofthesimulations,theRSerPoolstandardsdocumentshadonlyspecifiedsomepolicy
ofnames;theedetailsxperienceonhoinwtoimplementingimplementtheandpoliciesappropriately(aspartcofonfigurethethemhandlespacehadbeenmanagementmissing.Asapproachresult

8.8.THEFALLACIESANDPITFALLSOFROUNDROBINSELECTION

Figure8.14:ThePerformanceforaRandomizedStepSize

153

describedinchapter4)andconfiguringthem,theInternetDraftT¨uxenandDreibholz(2005)(Indi-
vidualSubmission)hasbeensubmittedtotheIETF.Thisdocumenthasincludedanexactdefinition
ofthepoliciesaswellasofthepitfallstoavoid(inparticular,theRoundRobinproblemdescribed
above).Afteradiscussionatthe60thIETFMeeting2004,thisdocumenthasbecometheWorking
GroupDraftT¨uxenandDreibholz(2006b)oftheIETFRSerPoolWGandisnowofficialpartofthe
documents.standardsRSerPool

8.8.2PitfallsoftheWeightedRoundRobinPolicy
ThedegradationeffectoftheRoundRobinpolicycanalsobetransferredtotheWeightedRound
Robinpolicy.ButthisdegradationeffectisnottheonlyproblemofWeightedRoundRobin;thispol-
icyinducesadditionalproblemswhichhavebeenexplainedbyusinDreibholzandRathgeb(2005e):
consideringthepolicyperformanceresultsforhomogeneousservercapacityscenariosasshownin
section8.7,RoundRobinclearlyprovidesabetterperformancethantheRandompolicy.Unfor-
tunately,itisafallacytoassumethattheWeightedRoundRobinpolicyprovidesabetterperfor-
mancethanWeightedRandominscenariosofheterogeneousservercapacities(tobeevaluatedin
).8.12sectionThefirstproblemofWeightedRoundRobinisitssystematicselection.Considerapoolof3PEs
withweightswPE1=2,wPE2=3andwPE3=10.Inthiscase,theselectionorderofaWeightedRound
Robinroundwouldbeasfollows:
oncePE1⇒PE2⇒PE3⇒PE2⇒PE3⇒PE3
times7times2Obviously,afterPE1andPE2havebeenselectedtheirgivennumberoftimes,theWeightedRound
RobinselectionoverloadsPE3.Thatis,ifaPRselectsPE3for8differentPUs,8PUssimultaneously
usePE3whileotherPEsmayremainidle.
ThesecondproblemoftheWeightedRoundRobinpolicyisthattheweightshavetobeintegers:
letPE4have3.5timesthecapacityofPE5andPE61.75timesthecapacityofPE4.Inthiscase,the

154

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.15:TheEffectsofVaryingtheNumberofRegistrarsfortheRoundRobinPolicy

beresultingconsecutiweightsvelycouldselectedbe8wPEtimes.4=4,wPEClearly5=7,theandwhigherPE6=14.thefInactorthetoendeofxtendthetheselectionweightsround,tointePE6gers,wouldthe
morewSimulationsorsetheforselectionscenariosproblemofdescribedheterogeneousabovserve.ercapacitydistributions,whichwillbepresented
thelaterinresultssectionfor8.12Round,havRobineshoandwnevthatentheforRandomperformance(exceptofWforeightedtheRoundhomogeneousRobincaseismuchwherewWorseeightedthan
RounddescribedRobinisscenarios,equalcurvtoesRoundforthisRobin).policyhaSinceveWbeeneightedomitted.RoundRobiniscompletelyuselessinthe

Summary8.8.3AsaguidelineontheusageoftheRoundRobinpolicy,theMaxIncrementparameterofthePR
shouldbesetto1.Thisnotonlyavoidsthedegradationproblem,butalsoresultsinaperformancegain
forthehandlespacemanagementasshowninsubsection7.4.4andsubsection7.5.4.Appropriately
configured,theRoundRobinpolicyusuallyprovidesabetterperformancethanRandom–asshown
insection8.7.Therefore,itisjustifiedtomakeRoundRobinthedefaultpolicyofRSerPool,i.e.
thisistheonlypolicywhoseimplementationismandatorilyrequiredbythepoliciesdraftT¨uxenand
).2006b(DreibholzAlthoughtheRoundRobinpolicyprovidesabetterperformancethanRandom,itisapitfallto
assumeWeightedRoundRobintobeperformingbetterthanWeightedRandom.Ifconsideringtouse
WeightedRoundRobin,extremecarehastobetakenoftheweightsettings(whichmustbeintegers)
anditssystematicselectionbehaviour.

8.9TheImpactoftheNumberofRegistrars
AnimportantparameterforredundancyisthenumberofPRs.Insection8.6ithasbeennotedthat
thisnumberdoesnotsignificantlyinuencethesystemperformance.Thisassertionhastobesubstan-
tiated.

8.10.THECHALLENGEOFNETWORKDELAY

155

PRssynchronizetheirhandlespacecopiesusingtheENRPprotocol.Neglectingthenetworkdelay
(analysedlaterinsection8.10),itisobviousthatthenumberofPRsdoesnotaffecttheLeastUsed
andRandompolicies,sincetheyarestateless(seealsosubsection3.11.1).However,RoundRobin
isstateful,i.e.theselectionofthenextPEdependsonthepreviouschoices.Asalreadyshownin
subsection8.8.1,thesystemperformanceoftheRoundRobinpolicyhighlydependsonthesettingof
MaxIncrement:somePEsmaybesystematicallyskipped,whileotheronesareoverloaded.
InordertoshowtheimpactoftheMaxIncrementsettingwhenthenumberofPRsisincreased,
asimulationhasbeenperformed.ThenumberofPRshasbeenvariedbetween1and10fordifferent
settingsofMaxIncrement.ThePU:PEratiohasbeen25,therequestsize:PEcapacityratiohasbeen
10andthesimulatedreal-timehasbeen60minutes.PEsandPUshavebeendistributedequallyamong
thePRs.TheresultsofthesimulationfortheRoundRobinpolicyarepresentedinfigure8.15.
Obviously,settingMaxIncrementto1isalsousefultoavoidtheRoundRobinproblem(which
hasbeenexplainedinsubsection8.8.1)forscalingthenumberofPRs.Inthiscase,theutilization
remainsunaffected,whileonlythehandlingspeedslightlydecreaseswiththenumberofPRs.The
reasonforthisdecreaseisthattheroundrobinselectiononindependentcomponentsdiffersfroma
selection.robinroundglobalOntheotherhand,usinginappropriateMaxIncrementsettings,theperformanceseverelysuffers:
PEsareagainsystematicallyskipped,dependingonthevalueofMaxIncrementandthenumberof
PRs.AninterestingobservationisthepeakforMaxIncrement=2and2PRs:eachPRhasitsown
SelectionSet,orderedbyPEsequencenumber(seesubsubsection4.4.2.2).Thatis,the2PRsuse
distinctlysortedroundrobinlists.So,PEsskippedbythefirstPRcanbeselectedbytheotherone.
Thisresultsinthegoodperformanceofthisparameterset.However,thisisnotacommoncase,as
shownfordifferentsettingsofPRnumberandMaxIncrement.
TheperformanceresultsofLeastUsedandRandomarenotaffectedbythenumberofPRs.There-
fore,plotsforthesepolicieshavebeenomitted.

8.10TheChallengeofNetworkDelay
Afterlatencyhaisvingnowevaddedaluatedtothethelinksperformanceasdescribedofdifinferentsectionpool8.6policiesandtheineffectsscenariosareevwithoutaluated.networkdelay,

8.10.1GeneralEffectsofNetworkDelay
Thefirstsimulationwithnetworkdelayenabled(basedonourpaperDreibholzandRathgeb(2005c))
isgoingtoshowthegeneraleffectsintroducedbythelatency.Networklatencyhasthehighestimpact
onthesystemperformanceinthefollowingtwocases:
1.ThePU:PEratiorissmall,sothattheper-PUload(seeequation8.6)islargeand
2.Theratiobetweendelayandrequestsize:PEcapacityratiosishigh.
Thefirstcasereducesthesystemscapabilitytoabsorbtheimpactofbadselectiondecisions(see
subsection8.7.2);inthesecondcasethedelaydecreasesthehandlingspeed(seeequation8.3).A
simulationforasimulatedreal-timeof60minuteshasbeenperformedinorderRtoTTpresenttheseeffects:
vtheariedratiobetweenbetween0.0tocomponent1.0,forRtheTTandPU:PErequestratiorsize:PEbetween1capacityand3.ratio,Therequestrequestsize:PEsize:PEcapacityratiocapacity,hasratiobeens
hasbeen1.Figure8.16presentstheresults.

156

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.16:TheGeneralEffectsofNetworkDelay

Obviously,thePU:PEratiorhasthemainimpactontheutilizationforarisingdelay:while
thecurvesonlyslightlydecreaseforr=3,thereisasteepdescentforr=1.Asexplainedinsubsec-
tion8.7.2,thelowerthePU:PEratior,thehigherthecapacityshareasinglePUexpectsfromits
PE.Thatis,atr=1aPEisexpectedtoexclusivelyprovide80%ofitscapacitytorequestprocess-
ing.Only20%remaintoabsorbboth,thedelayofcommunicationsandthespeeddegradationdueto
requests.handledsimultaneouslyAnimportantobservationcanbemadeforLeastUsed:whiletheutilizationofLeastUsedcon-
vergestothecurveofRoundRobinforr=1,itconvergestothecurveofRandomforhighervalues
ofr.Thereasonisclear:whileforr=1RoundRobinprovidesthebestchancetogetanunloaded
PE,theprobabilityofRoundRobintomakeagoodchoicedecreaseswithrisingparallelism(see
).8.7.2subsectionThehandlingspeedresults–presentedontheright-handsideoffigure8.16–reecttheobserva-
tionsfortheutilization.Clearly,thenetworkdelayhasasignificantimpactonthehandlingspeedfor
smallsettingsofs(here:s=1),sinceitaffectsthehandlingspeed(seeequations8.2andequation8.3)
bythecommunicationofPUandPR(handleresolution)andPUandPE(requesthandling).
Insummary,theimpactofnetworkdelayonthesystemutilizationissmall–exceptforthecritical
parameterrangeofasmallPU:PEratio.However,thereisasignificanteffectonthehandlingspeed
incaseofrequestsizeshavingaprocessingtime(seeequation8.4)ofaboutorlessthanthelatency.
Forthisparameterrange,counter-measuresseemtobeuseful.

8.10.2CreatingDistance-AwarePolicies
Ingeneral,iftherequestsareshort,itisusefultokeepthenetworklatencybetweenPUandPE
assmallaspossible.Butcriticalapplicationsmayrequiretheirpoolstobedistributedglobally,in
ordertocontinuetheirserviceevenincaseoflocalizeddisasters(e.g.earthquakesortsunamis).An
exampleisprovidedinfigure8.17:ifsomePEsinAsiabecomeunavailable(here:2PEs),itisstill
topossibledynamicallytocompensateincreasethethereducedcapacityincapacitythebyotherPEsreingionsthe(here:otherreonegions.PEinAmericFurthermore,aanditisonepossiblePEin
Europe).

8.10.THECHALLENGEOFNETWORKDELAY

Figure8.17:AScenarioofGloballyDistributedRSerPoolComponents

157

Clearly,itisnecessarytodefinepolicieswhichincorporateadistancemeasure.Butinorderto
definesuchpolicies,itisobviouslynecessarytoquantifyDistancefirst.

8.10.2.1HowtoQuantifyDistance?
Forthedescriptionofthedistance,twoapproacheshavebeenconsidered:theusageofgeographical
positioninformationanddelaymeasurements.Bothapproacheswillbediscussedinthefollowing.

GeographicalDistanceOfcourse,themostobviousapproachtodescribeadistanceistoobtainthe
geographicalcoordinatesofthecomponentsandcalculatethedistancesamongthem.Thatis,given
twocomponentsAandBwiththeirlatitudesδAandδBaswellastheirlongitudesλAandλB,it
issimplypossibletocalculatetheorthodromeL(i.e.thelengthoftheshortestpathontheearths
ws:folloasace)surf

ζ=arccos(sinδAsinδB+cosδAcosδBcos(λB−λA))
L=360ζ°∗40,000km
Whilesuchadistancecalculationisrathersimple,thisapproachhassomedecisivedisadvantages:
atfirst,asmallgeographicaldistancedoesnotimplyasmallnetworklatency.Forexample,asatellite
linkdelaybetweentwosystemsinFrenchPolynesiamayhaveahigherlatencythanawired,inter-
continentalconnectionbetweenEuropeandNorthAmerica–althoughthegeographicaldistanceis
byanorderofmagnitudesmaller.Formoredetailsontherelationshipbetweendistanceandlatency
intheInternet,seealsoSubramanianetal.(2002).
selectionAfurtheristheneeddisadvforantageaGPSofusing(GlobalgeographicalPositioningSystem)coordinatesdeviceandbeingdistancesbuiltasintoaeverymetricforcomponent,server
inordertoobtainitsposition.Otherwise,thesystemwoulddropthesupportformobility(which
mightConsiderbee.g.acceptable)serversandclaimingalsotointroducebeanearbypossiblelocatedpointtoaofsetoffarmisconfiguration-awayclients(which–themayberesultwcritical).ould
possiblybeasevereperformancedegradation.

basedMeasuronedup-to-dateDelayThemeasurementsappropriateofwtheaytonetworkdescribedelay.Thisdistancesforapproachservhasertheselectiondisadvwantageouldbethatclearlythese

158

CHAPTER8.RSERPOOLPERFORMANCERESULTS

measurementshavetobeperformedcontinuously.However,theSCTPprotocol(seesubsection2.4.3)
alreadystandardSCTPcalculatesAPIa(usingsmoothedtheRTTSCTPforSTitsATUSpaths.option,Furthermore,seeSteitwisart,possibleXie,Ytoarroll,queryWtheood,RTTPoonviaandthe
T¨uxen(2006)).
acknoBydefwledgingault,athepacket,calculatedinRorderTTtohasapiggybacksmalltherestriction:acknoaSCTPwledgementendpointchunkwaitswithuptopayload200msdata.beforeIn
thiscase,theRTTwouldincludethislatency.UsingtheoptionSCTPDELAYEDACKTIME,the
maximumdelaybeforeacknowledgingapacketcanbesetto0ms(i.e.acknowledgeassoonas
thepossible).end-to-endAfterdelaythat,thebetweenRTTtwoapproximatelyassociatedconsistscomponentsofisthenetwapproximatelyorklatencRyTT.only.UsingtheRTT,
2andInPEreal#1isnetw5msorks,andthetherelatencmayybenebetweengligiblethePUdelayanddifPE#2ferences:is6ms.foreFromxample,thetheservicedelayusersbetweenperspec-aPU
tive,suchminordelaydifferencesarenegligibleandfurthermoreunavoidableinInternetscenarios.
Therefore,thedistanceparameterbetweentwocomponentsAandBcanbedefinedasfollows:
TTRDistanceA↔B=DistanceStep∗roundDistanceStep2(8.10)

Thatis,thedistanceparameterroundsthemeasureddelaytothenearestintegermultipleofthecon-
.DistanceStepstant

8.10.2.2AnEnvironmentforDistance-AwarePolicies
Inordertodefineadistance-awarepolicy,itisnecessarytodefineabasicrule:PEsandPUschoose
nearbyPRs.SincetheoperationscopeofRSerPoolisrestrictedtoasingleorganization,thiscon-
ditionassociationscanbetometeacheasilyofitsbyPEs.AsappropriatelypartofitslocatingENRPPRs.updatesAPR-HtoothercanPRs,measureitcanthereportdelaythisofthemeasuredASAP
ofdelaytheENRPtogetherwithassociationthePEwiththeinformation.PR-HtoAthePEnon-PR-Hsreportedreceivingdelay.suchNowan,eachupdatePRcansimplyaddsapproximatethedelaythe
distancetoeveryPEintheoperationscopeusingequation8.10.Note,thatdelaychangesarepropa-
gateddynamicallytoallPRsadaptsupontothePEstatere-reofthegistrations,netwi.e.ork.thedelayinformation(andtheapproximateddistance)
Anexampleforthedescribeddistancepropagationenvironmentisprovidedinfigure8.18:PE#12
is75ms.connectedForPRto#3,itsthePR#1delaywithtoaPRdelay#12ofis8ms.PR8ms+75ms=83ms.#1isconnectedPR#3toisPRthe#3oPR-HverofENRPPE#7withandaPEdelay#24,of
i.e.theyappearinPR#3shandlespacewiththeirconnectiondelaysof10msand2ms.SincethePU
asksitsnearbyPR#3forhandleresolution,itwillreceivetheviewofPR#3.Usingequation8.10
andasettingofDistanceStep=50ms,thedistanceofPE#7andPE#24is0ms,whileitis100msfor
#24.PE

8.10.2.3TheDefinitionofDistance-AwarePolicies
ThepoliciessuitabletobeextendedwithdistanceinformationareLeastUsedandWeightedRandom.
TryingtoadaptWeightedRoundRobinwouldleadtotheproblemofnon-integerweights,asdescribed
.8.8.2subsectionin

8.10.THECHALLENGEOFNETWORKDELAY

Figure8.18:AnExamplefortheDistance-AwarePolicyEnvironment

159

LeastUsedwithDistancePenaltyFactorTheLeastUsedpolicycanbeadaptedasfollows:instead
ofonlytakingtheloadvalueintoaccountforserverselection,thenewloadvalueLoad∗iscomputed
byincreasingthePEsreportedvalueLoadbyadistance-dependentDistancePenaltyFactor(DPF)
ws:folloasactorFPenaltyDistanceLoad∗=Load+Distance∗LoadDPF.
TheconstantLoadDPFdescribestheloadunitspermillisecondbywhichaPEsreportedloadvalue
isincrementedforeverymillisecondofthemeasurednetworkdelay.Thatis,theunitforLoadDPF
isms-1.DuetotheDPFparameter,thenewpolicyisdenotedasLeastUsedwithDPF(LU-DPF).It
simplyselectsthePEwiththelowestvalueofLoad∗.Iftherearemultiplelowest-valuedPEs,round
robinselectionisappliedamongthem.

WeightedRandomwithDistancePenaltyFactorTheWeightedRandomwithDPF(WRAND-
DPF)policycanbedefinedbycalculatinganadaptedweightconstantWeight∗fromtheactualweight
valueWeight–againusingadistancepenaltyfactor–asfollows:
actorFPenaltyDistanceWeight∗=max{1,Weight∗(1−Distance∗WeightDPF)}.
Clearly,theconstantWeightDPFdefinesthefractionbywhichtheactualweightisdecreasedfor
eachmillisecondofthemeasurednetworkdelay.TheunitforWeightDPFisthereforems-1.Sinceit
doesnotmakesense1tohaveaweightoflessthanone,Weight∗isdefinedasthemaximumvalueof1
andtheadaptedweight.TheactualPEselectionisanalogoustoWeightedRandom,usingWeight∗as
weight.Note,thatthesortingofthePEsintheSelectionSetisstillper-PRratherthanper-PUforbothnew
DPF-basedpolicies,asithasbeenrealizedfortheotherpoliciespresentedinsubsection4.4.2.That
is,thehandlespaceandpolicymanagementapproachasdescribedinsection4.4–basedonkeeping
sortedsetsofPEs–canstillbeused.ThisallowsaPRtoefficientlystorethehandlespacedataand
it.onoperationsperform1Aweightof0wouldresultinaprobabilityof0%foraPEtobeselected.Clearly,ifthePEistheonlyoneinthepool,
thiswouldcauseaproblem.

160

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.19:TheDistance-AwarePoliciesProofofConceptSimulationSetup

Figure8.20:AProofofConceptfortheLeastUsedwithDPFPolicy

8.10.3AProofofConcept
Inordertoprovideaproofofconceptforthenewdistance-sensitivepolicies,simulationshavebeen
performedforascenariohavingconsistedof3LANs,asshowninfigure8.19.EachLANhasincluded
1PRand10PEs.Whiletheinter-componentLANdelayhasusedafixedsettingof10ms,theWAN
delayhasbeenvariedfrom0msto500ms(thesesettingsarebasedonPLANETLABmeasurementsand
willbemotivatedindetailinsubsubsection8.10.5.2).ThePUshaveusedarequestsize:PEcapacity
ratioofs=1(i.e.ifprocessedexclusively,theprocessingtimeofanaveragerequestis1s),thePU:PE
ratiorhasbeenvariedbetween1and10foratargetsystemutilizationof60%.DistanceStephas
been1ms,thesimulatedreal-timehasbeen15minutes.
TheperformanceresultsoftheLeastUsedwithDPFpolicyfortheLoadDPFsettings0(i.e.
thepolicyissimilartoplainLeastUsed)and1*10−5(thisparameterwillbeevaluatedindetailin
subsection8.10.4)areshowninfigure8.20.Clearly,forLoadDPF=0,theutilizationisnegatively
affectedbytheWANdelayifthePU:PEratiorissmall(1inthiscase).Thiscorrespondstothe
resultsobservedandexplainedinsubsection8.10.1.However,forLoadDPF=1*10−5,theutilization
isnotaffectedbythedelayanymore.Asimilareffectcanalsobeobservedfortherequesthandling
speed:whileforLoadDPF=0thehandlingspeedseverelysuffersduetotheWANdelay(asexpected
fromtheresultsinsubsection8.10.1),thenetworklatencyonlyveryslightlydecreasesthehandling
speedforLoadDPF=1*10−5.ThesmalldescentiscausedbytheselectionofremotePEsifalllocal
PEshaveahigherload.
Figure8.21presentstheresultsoftheWeightedRandomwithDPFpolicyforhavingvaried

8.10.THECHALLENGEOFNETWORKDELAY

Figure8.21:AProofofConceptfortheWeightedRandomwithDPFPolicy

161

WeightDPFbetween0(i.e.thepolicybehaveslikeregularWeightedRandom)and8*10−3(this
parameterwillbeevaluatedindetailinsubsection8.10.4).Note,thatacurveforaPU:PEratioof
1hasbeenomitted,sinceitisnotusefulforthispolicy.Clearly,theresultsfortheutilizationare
insignificantifthePU:PEratioishighenough.However,thehandlingspeedisdecreasingifthePE
selectiondoesnottakecareofthenetworklatency(i.e.WeightDPF=0).ThehighertheWeightDPF,
thesmallertheinuenceofthedelayonthehandlingspeed.Thatis,asufficientlylargeWeightDPF
settingisrequiredinordertoachieveasignificanteffect.
Insummary,theproof-of-conceptsimulationshaveshownthatthenewpolicies–LeastUsed
withDPFandWeightedRandomwithDPF–areuseful.Buthowtoappropriatelyconfiguretheir
parametersLoadDPFandWeightDPF?

8.10.4TheAppropriateChoiceofParameters
Sincetheusecasesofthenewpoliciesaregloballydistributedpoolswhichhavetocopewithlocalized
disasters,theconfigurationsoftheparametersLoadDPF(forLeastUsedwithDPF)andWeightDPF
(forWeightedRandomwithDPF)havebeenevaluatedwithrespecttosituationswherePEsofaregion
becomeunavailableandremotePEsshouldbeusedinstead.Therefore,theDPFparametershavebeen
vPRariedandin12aPEs.scenarioInoforder3toLANssimulate(again,aaslocalizedillustrateddisainster,figurthee8.19number),withofPEseachintheLANhathirdvingLANhascontainedbeen1
vthirdariedLAN,betweenadditional100%PEs(i.e.haallve12beenPEs)distriband25%uted(onlyequally3PEs).betweenTotheothercompensate2LANs.theThatcapacityis,thelossovinerallthe
ofcapacity10msofandtheaWpoolANhasdelayalwofays150ms2remainedhavethebeensame.usedIn(tothebejustiscenario,fiedaninintersubsubsection-component8.10.5.2LAN).delayAll
otherparameters(exceptforthePU:PEratior,seebelow)havebeensetasfortheproof-of-concept
simulationdescribedinsubsection8.10.3.
policyTheforaperformancePU:PEratioresultsof3forarehashovingwnvinariedfigurethe8.22LoadD.PFWhilethereparameterisnoofthesignificantLeastUsedimpactwithonDPFthe
2Thisisarealisticdelayforaninter-continentalWANconnection;seesubsubsection8.10.5.2.

162

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.22:FindingaReasonableLoadDPFSettingfortheLeastUsedwithDPFPolicy

forautilizationvery(ssmallincetheLoadDPFparametersettingsetis(e.g.not1*10critical),−5).Ifthethereishandlingachoicspeedeisamongincreasedmultiplesignificantlyleast-loaded–evPEs,en
therewillbeapreferenceforanearbyone.Asexpected,ahighervalueoftheLoadDPFsettinghas
noimpactifallPEsinthethirdLANareavailable.Inthiscase,thereissufficientcapacitywithinthe
LANstohandletherequestslocally.Thatis,theserverselectionmainlybehavesasforthreeseparate
pools.However,decreasingthenumberofPEsinthethirdLANandincreasingthenumberintheother
LANs,thescenariobecomesheterogeneous.ForsettingtheLoadDPFparametertoohigh,thehan-
dlingspeeddecreasesand–forasufficientlyhighsetting(here:15*10−5for≤50%PEsinthethird
LAN)–thehandlingspeedisevenoutperformedbyplainLeastUsed(i.e.aLoadDPFof0).The
higherthesettingofLoadDPF,thelesslikelytheusageofremotePEs–evenifthiswouldbede-
sirableinthecaseofalocalizeddisaster.Thatis,theresultingguidelineforsettingtheLoadDPF
parameterisrathersimple:setittoavalueslightlyabove0–e.g.1*10−5.Thissettinggivestheserver
selectionapreferenceforthenearestPE–incaseofhavingmultipleleast-loadedPEs.Furthermore,
ittionalsoofproremotevidesPEsanifimprovnecessaryed.Thatperformanceis,inLeastUsedscenarioswithofDPFlocalizedcanachiedisastersvea–bysignificantenablingtheperformanceselec-
benefitoverplainLeastUsed.
Figure8.23presentstheperformanceresultsforhavingvariedWeightDPFbetween0and8*10−3
fortheWeightedRandomwithDPFpolicyandaPU:PEratioof10(otherwise,thescenariowould
becomecriticaltooquicklytoillustrativelyshowthedesiredeffects).Theotherparametershave
showremainedanasimprovdescriement:bedforforWtheeighothertDPFpolic≥y.3*10As−e3,thexpected,handlWeingighspeedtDPFstartshastorisingbesuffor75%ficientlyandlarge100%to
ofthePEsinthethirdLAN.However,forasmallernumberofPEs,thehandlingspeeddecreases
quickly.Inthiscase,theprobabilitytoselectaremotePEbecomestoosmallandalocalrequest
jamwhileWoccurseightedintheRandomthirdLAN.withDPFThisproeffectvidescanabevperformanceerifiedbyimprothevreducedementinsufsystemficientlyutilization.homogeneousThatis,
scenarios,itsabilitytocopewithlocalizeddisastersislimited.
Insummary,thesimulativeresultsfortheLeastUsedwithDPFpolicyarequitepromising.Butis

8.10.THECHALLENGEOFNETWORKDELAY

163

Figure8.23:FindingaReasonableWeightDPFSettingfortheWeightedRandomwithDPFPolicy

italsousefulinreality?Toanswerthisquestion,aperformanceevaluationusingtheRSPLIBprototype
inaPLANETLABscenariohasbeenperformed.

8.10.5ExperimentalValidationusingthePLANETLAB
ThePLANETLABisactuallyasetofgloballydistributed,Linux-basedresearchhostsintheInternet.
Itsintentionistoallowthetestandvalidationofnetworksoftwareinreal-worldsetups.Adetailed
introductiontothePLANETLABcanbefoundinPetersonandRoscoe(2006),Petersonetal.(2005),
Peterson(2004),thebasicideasandconceptsareexplainedinChunetal.(2003),Roscoe(2002),
givesPetersonabriefandRoscoeintroduction(2002to),thePLPetersonANETetLAal.B(en2002).vironmentTherefore,usedforthethefollofollowingwingsubsubsectionmeasurements.only

8.10.5.1ThePLANETLABEnvironment
HavingaccesstoasetofPLANETLABnodes(whichiscalledSlice),itispossibletologintothese
hostsusingtheSecureShell(SSH)software.SSHprovidesinteractiveandscript-basedexecutionof
remoteprograms,aswellasthetransferoffilesbetweenhosts.BasedonSSH,shellscriptshavebeen
developedforthefollowingtasks:
ETLABnode,create-binariescompilesitcopiesandansendsarchianvearchiofvetheofRSthePLIBcreatedprototypeexecutablesimplementationback.toaPLAN-
install-binariessimultaneouslyinstallstheexecutablesonagivensetofPLANETLAB
nodes.create-hosts-filestestsagivensetofPLANETLABnodesforusability(tobeexplained
w).beloAsetofscenarioscriptscansetupcertainconfigurationsofPRs,PEsandPUs.Furthermore,
dethesevelopedscriptsforthecollectthesimulationcreatedmodel,statisticsseeandsubsectionusethe6.3.4tool)tocreatecreatesummaryinputfilesfor(whichGNUhasR.been

164

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Again,GNUR(seesubsection6.3.2)hasbeenusedforthestatisticalpost-processingandplottingof
results.theWhileitispossibletoassignalloftheabout650PLANETLABnodestoaslice,thenumberof
nodeswhicharereallyaccessibleatagiventimeisabout30%.Sincetheresourcesofanode(CPU
power,memoryandharddiskspace)aresharedamongallslices,thereareusuallysomenodeswhich
arehighlyoverloaded(e.g.theyareextremelysloworspendmostoftheirtimeswappingmemory
toandfromharddisk).Inordertofindsuitablenodesforanexperiment,thefollowingtestsare
script:create-hosts-filesthebyperformed1.Loggingintothenode(i.e.itisverifiedthatthenodeisaccessible),
2.Invokingpingtotestthenetworkconnectiontohttp://www.planet-lab.org(i.e.itincludesa
DNSresolutionofthehostname)withatimeoutof5s,
3.Creatingafileconsistingof16*106bytesofrandomnumbers(i.e.itisverifiedthatthefile
andwriteable)issystem4.ComputingtheSHA-13hashofthetestfile(i.e.thecomputationpowerofthenodeistested).
Ifthesefouroperationsaresuccessfullycompletedwithin36s,thenodeisconsideredtobeusable(for
comparison:theAthlon1.3GHzPC,usedforthehandlespacemanagementperformanceevaluation
inchapter7,performsthesetasksinabout9s).
SincethePLANETLABnodesdonotsupportSCTPdirectly(i.e.bytheirLinuxkernel),ouruser-
landimplementationSCTPLIB(seesection5.3)hasbeenused.Duetotechnicallimitationsofthe
PLANETLABsoftware,allSCTPtraffichastobetunnelledviaUDP.ThisSCTPoverUDPencapsu-
lationisdefinedinT¨uxenandStewart(2005).Exceptforaslightincreaseinoverhead(additional8
bytesfortheUDPheaderperpacket),itdoesnotimposeanyfunctionalrestriction.
InordertoperformPLANETLAB-basedperformanceevaluationsofthenewpolicy,theRSPLIB
prototype(seechapter5)hasbeenusedtorealizePEandPUapplicationimplementations.These
implementationsprovideandusethesamecomputeserviceasdescribedforthesimulationmodelin
.8.3sectionUsingtheapplicationsPUandPEimplementationsaswellasthePRoftheRSPLIBprototype
package,systematictestsindifferentPLANETLABscenarioshavebeenperformedinordertovalidate
thecorrectnessoftheprograms.WhiletheRSPLIBprototypeaswellastheunderlyingSCTPLIB
andSOCKETAPIimplementationshavealreadyshowntobeworkingquitereliablyinlong-termlab
testsusingthedemonstrationsystem(seealsosection5.8),anInternetscenarioisdifferent.Here,
unforeseeablepacketlossesandtransmissiondelaysmayoccur.Therefore,asignificantfractionofthe
timerequiredtorealizethemeasurementsetuphasbeentoalsodebugtheSCTPLIBandSOCKETAPI
implementations.Forexample,therehavebeenbugslikeaforgottenunschedulingoftimersfor
alreadyremovedassociationsorwrongreactionstounexpectedpacketretransmissions–whichhave
bothcausedsegmentationfaults.Suchbugs–whichoccurquiterarelyandarethereforedifficultto
reproduce–havebeenidentifiedbylong-termPLANETLABtestswithmorethan100componentsover
severaldays,collectingthecoredumpscreatedonsegmentationfaultsandanalysingthecoredumps
andlogfiles.Afterbug-fixing,thesetestshavebeenrepeated.Theresultofthesetime-consuming
debuggingsessions–whichhavetakenweeks–hasbeenasignificantlyimprovedrobustnessofall
threecomponents:theRSPLIBprototypeaswellastheSCTPLIBandSOCKETAPIpackages.
3SecureHashAlgorithm1(192bits),seeEastlakeandJones(2001).

8.10.THECHALLENGEOFNETWORKDELAY

IntervalNetworkStateLU-DPFLUImprovement
1m-14mNormalOperationI2.22s±0.072.55s±0.0612.9%
16m-29mFailureinAsia2.44s±0.072.55s±0.064.3%
31m-45mAddedBackupCapacity2.32s±0.062.46s±0.065.7%
46m-49mFailureResolved2.23s±0.132.46s±0.129.3%
51m-64mNormalOperationII2.21s±0.062.69s±0.0717.8%

Table8.2:TheAverageRequestHandlingTimeResultsofaFirstTrial

165

SetupementMeasurThe8.10.5.2Themeasurementsetupforthefollowingperformanceevaluationshasconsistedofcomponentsdis-
tributedintothreeregions(seealsofigure8.17):Europe(mainlyGermany),America(U.S.A.,mainly
Wtheest5PEsCoast)andand15AsiaPUsof(mainlythereJapan).gion.AsEachforrethegionhassimulation,containedtheonePEshaPR,vewhichusedahascapacitybeenofused10by6
calculations/s.Testsusingpingandtraceroutehaveshownlatenciesbetween5msto15mswithinthere-
gions;theinter-regiondelayvariesbetweenabout75msto150msbetweenEuropeandAmericaand
TheAmericadelaysandbetweenAsia,asanywelltwasoaboutendpoints250mshavtoenot350msshownbetweenasignificantEuropeandvAsiaariation.(routedThatis,viatithecanbeU.S.A.).as-
sumedthattherehasbeensufficientbandwidthavailable.ThisisalsorealisticforRSerPoolscenarios,
sinceallcomponentsbelongtoasingleoperationscope(e.g.acompany)andQoSmechanismscan
thereforebeappliedeasily(e.g.WANconnectionsviaDiffServ-basedVPNlinksusinganappropriate
SLA).Basedonthedelayexperiments,DistanceStephasbeensetto75ms.

8.10.5.3TheResultsofaFirstTrial
InordertovalidatethesetupfunctionalityandtoshowtheessentialeffectsofaPLANETLABmea-
surement,theresultsofafirsttrialarepresentedinthissubsubsection.Along-termexperimentwill
beanalysedinthefollowingsubsubsection8.10.5.4.
Themeasurementsofthefirsttrialhavehadaruntimeof65minutes,withthefollowingactions:
att1=15min,2ofthe5AsianPEshavebeenturnedoff;att2=30min,twoadditionalPEshavebeen
turnedon–oneinAmerica,theotheroneinEurope.Att3=45min,thefailureinAsiahasbeen
repaired.BothPEsareagainaddedtothepool,increasingitstotalcapacity.ThetwoadditionalPEs
inEuropeandAmericahavebeenturnedoffatt46=50min.Theapplicationhasusedthefollowing
workloadparameters:anaveragerequestsizeof10calculationsandanaveragerequestintervalof
7.5s(bothusingnegativeexponentialdistribution).
InordertomaketheperformanceresultsoftheLeastUsedpolicyandtheLeastUsedwithDPF
policy(usingaLoadDPFsettingof1*10−5)comparable,therunsforbothpolicieshavebeenper-
formedsimultaneously.Thatis,twopoolshavebeensetup–oneforLeastUsed,theotheronefor
LeastUsedwithDPF.OneachofthePEhosts,twoPEinstanceshavebeenstarted:onehavingreg-
isteredintotheLeastUsedpool,theotheronehavingregisteredintotheLeastUsedwithDPFpool.
Analogously,eachPUhosthasruntwoPUinstances:onehavingusedtheLeastUsedpool,theother
onehavingusedtheLeastUsedwithDPFpool.Duetothesimultaneousexecution,ithasbeenen-
suredthatbothmeasurementshavebeenaffectedequallybytemporalvariationsoftheInternetsQoS
conditions.Therefore,theresultsofbothpoliciesarecomparable.

166

CHAPTER8.RSERPOOLPERFORMANCERESULTS

IntervalNetworkStateLU-DPFLUImprovement
1m-14mNormalOperationI2.17s±0.052.63s±0.0517.5%
16m-29mFailureinAsia2.55s±0.022.78s±0.028.3%
31m-45mAddedBackupCapacity2.54s±0.022.71s±0.026.3%
46m-49mFailureResolved2.35s±0.062.55s±0.037.8%
51m-64mNormalOperationII2.08s±0.022.47s±0.0215.8%

Table8.3:TheAverageRequestHandlingTimeResultsof22Measurements

Theresultingaveragerequesthandlingtimesandtheir95%confidenceintervalsarepresentedin
table8.2.Note,thatthetableshowstheaverageoverintervalsbeginningoneminuteafterandending
oneminutebeforeasystemconditionchange,sincethelatencytologintoaPLANETLABnodetostart
orstopacomponentmaytakeuptoabout30s.Furthermore,smalldeviationsofthehostsclocksmay
bepossible.ComparingtheresultsforLeastUsedandLeastUsedwithDPF,theLeastUsedwithDPF
policyprovidesasignificanthandlingspeedgain:between12.9%and17.8%forthetwophasesof
normaloperationandstillaround5%duringthefailureinAsia.WhenthetwoPEsinAsiacomeback
again,thegainrisesto9.3%.
TofurtherillustratethebehaviourofthetwopoliciesandtheeffectsoftemporalnetworkQoS
changesintheInternet,figure8.24showsthehandlingspeedofeachhandledrequestfortheLeast
UsedwithDPFpolicy;figure8.25presentsthesameforLeastUsed.Eachrequestisrepresentedbya
linestartingattherequestsqueuingtimeandendingatitscompletiontime;foreachPU,adifferent
colour/shadeisused.Thepositionofarequestonthey-axisshowsitshandlingspeed;theaverage
handlingspeed(presentedbythethickline)hasbeencomputedas
dAverageHandlingSpeed=iRequestSizei
lingHandiiforeachrequestibeingcompletedinthecorrespondinginterval.ComparingtheplotsforLeastUsed
withDPFandplainLeastUsed,itisclearlyobservablethattherearesignificantlymorerequestsin
therangeofhigherhandlingspeeds.Furthermore,thevaryingnetworkQoSconditionsintheInternet
areobservablebysmalldentsandpeaks(e.g.23minto26minor43minto45min)–theyappearfor
bothpoliciesinthesametimeintervals.

8.10.5.4TheResultsofaLong-TermMeasurement
Inordertoachieveamoreaccurateanalysis,themeasurementrundescribedinsubsubsection8.10.5.3
hasbeenrepeated22times,coveringatotalexperimentruntimeofabout35hours.Theresultingav-
eragevaluesofthesemeasurementsarepresentedintable8.3.Theseresultshavealsobeenpublished
inDreibholzandRathgeb(2007).
ComparingtheresultsforLeastUsedandLeastUsedwithDPF,theLeastUsedwithDPFpolicy
providesasignificanthandlingspeedgain:between17.5%and15.8%forthetwophasesofnormal
operationandstillaround8%duringthefailureanditsresolutioninAsia.Thatis,theexpectations
fromthefirstmeasurementinsubsubsection8.10.5.3aremet.
Itisimportanttonotethattheperformanceinthefailureresolvedstateislowerthanforthenor-
maloperationsstates,althoughthereareadditionalPEsinAmericaandEurope:theover-capacityin
theseregionsattractstheassignmentofrequestsfromAsia.Thiseffect–theassignmentofrequests

8.10.

THE

CHALLENGE

Figure

OF

8.24:

Figure

ORKNETW

Experimental

8.25:

YDELA

Results

Experimental

of

the

Results

Least

of

the

Used

Least

with

Used

DPF

yPolic

yPolic

167

168

CHAPTER8.RSERPOOLPERFORMANCERESULTS

toslightly-loadedservers–isapropertyofallload-basedpolicies;tryingtoavoiditbysimplyusinga
highLoadDPFsettingwouldnotleadtoaperformanceimprovement,asshowninsubsection8.10.4.
Insummary,themeasurementshaveshownthatthenewLeastUsedwithDPFpolicyisalso
workingasintendedunderrealisticconditionsintherealInternet.However,aspartoffuturework,
additionalmeasurementswithdifferentworkloadparametersetsshouldbeperformedtovalidatethe
usefulnessofthenewpolicyunderabroaderparameterrange.

Summary8.10.6Insummary,ithasbeenshownthatdistance-awarepoliciescansignificantlyimprovetheperformance
ofhighlydistributedRSerPoolsystems.TheapproachofPR-baseddelaymeasurementsasdistance
metricallowstoreusetheefficienthandlespacemanagementapproachpresentedinchapter4.The
newpoliciesLeastUsedwithDPFandWeightedRandomwithDPFarerealizationsofdistance-
awarepolicies.WhileLeastUsedwithDPFonlyhastointroduceasmallpreferenceforlocalPEs
(byalowsettingofLoadDPF)toimprovetheperformanceevenforscenariosoflocalizeddisasters,
WeightedRandomwithDPFrequiresasignificantlyhigherinuenceofthedistance(i.e.ahigher
WeightDPFsetting)toachieveaperformancebenefit.Thisresultsinareducedperformanceina
localizeddisastersituation.Therefore,theusageofthispolicyhastobeplannedcarefully.Finally,
testsinthePLANETLABhaveshownthatthenewLeastUsedwithDPFpolicyalsoprovidesthe
expectedresultsintherealInternet.

8.11ConfiguringthePU-SideCacheParameters
Dependingonthefrequencyofhandleresolutionsandtheircosts(RTTbetweenPUandPR,network
bandwidth),itcanbebeneficialtousethePU-sidehandleresolutioncacheasdescribedinsubsec-
tion3.7.3.Thatis,afterqueryingaPRforahandleresolution,thecontentofthePRsresponseis
storedinthecacheforthetimegivenbythestalecachevalueparameter.Subsequenthandleresolu-
tionsmaybedirectlysatisfiedfromthecache,avoidingtheoverheadtoquerythePRagain.

8.11.1GeneralEffectsofthePU-SideCache
Tobeeffective,thestalecachevaluehastobeconfiguredtoavaluegreaterthantherequestintervali.
Asimulationtopresentthegeneraleffectsofthecache–againbasedonourpaperDreibholzand
systemRathgeb(utili2005czation)–ofhas60%been(foraperformedlargerusingsettingi=1stheandeffectsPU:PEwouldratiosbertoobetweenstrong).1andThe3,forstaleatarcacheget
vresolutionalue:requestbeforeintervhaalvingratitoocisqueryaaPRmeasureagforain.hoItwhasmanbeenyvtimesariedthefromcache0tocan10.beMaxutilizedHResIforteamshandlehas
been∞(i.e.all10PEidentitiesarereturneduponahandleresolution)andthesimulatedreal-time
hasbeen60minutes.Figure8.26presentstheresultsofthesimulation.
Asexpected,thecachedoesnotaffecttheresultsoftheRandompolicy,neithertheutilizationnor
theduetoahandlinghighperspeed.-PUloadClearly,(seetheequationutilization8.6)ofandtheLeastthereforeUsedahighpolicypenaltyquicklyofbaddecaysforselectionrisingcdecisions.atr=1,
moreCachinglikelyresultsforlarmeansgervaluesusingofc.out-of-dateForhigherloadvaluesinformation.ofthePU:PETherefore,ratior,theinappropriatesituationdecisionsbecomesbecomemuch
better,sincetheper-PUpenaltyofabadchoiceisreducedbyparallelism(seesubsection8.7.2).
ofPRsThe(seereasonsectionforthe8.9):decayingeachcomponentutilizationofselectstheRoundindependentlyRobinpolicusingyisthetheroundsameasrobinforstratethegy.numberBut

8.11.CONFIGURINGTHEPU-SIDECACHEPARAMETERS

Figure8.26:TheGeneralEffectsofusingthePU-SideCache

169

viewedglobally,theselectionsdifferfromthedesiredroundrobinbehaviour.Forr=1,itcanbe
observedthattheutilizationoftheRoundRobinpolicyevenslightlyfallsshortoftheRandompolicys
utilizationforcgreaterthan4–thehigherthevalueofc,themoreindependentaretheroundrobin
selections,themoretheglobalviewoftheselectionsadaptstorandom.
AlthoughtheutilizationpenaltyforusingthecachebecomessmallifthePU:PEratiorishigh
enough,therequesthandlingspeedisseverelyaffectedforpoliciesotherthanRandom:usingthe
LeastUsedorRoundRobinpolicies,thereremainsasignificantlossinhandlingspeed.Thisis
causedbylongrequestqueuesduetoaninappropriatePEselection.
Asaconclusion,usingthecacheforapolicyotherthanRandomdoesnotmakemuchsense.
Thegainofsavingsomeoverheadmessagesonhandleresolutioncomesatahighpriceonsystem
performance.So,inwhichcasescanthecachebecomevaluable?

8.11.2WhentoUsethePU-SideCache?
ThePU-sidecachecanbecomeusefulinthefollowingsituations:
1.Therequestsize:PEcapacityratiosissmall,thereforethenetworkdelayforthehandleresolu-
tionatthePRsignificantlycontributestoareductionofthehandlingspeed.

2.APEmayrejectarequestwithacertainprobabilityaPE,implyingtheneedforanadditional
handleresolutiontofindanewPE.Thisise.g.thecaseintelephonesignalling,ifthePEs
requestqueueisfullanditcurrentlycannotacceptanymorerequest.

ForagivenrejectionprobabilityaPE,thetotalrejectionprobabilityafterhavingperformedn
trialsisaTotal=(aPE)n.ToreachagivenvalueofaTotal(e.g.0.05foranacceptanceprobabilityof
95%),thenumberoftrialsncanbecomputedasfollows:

n(aTotal,aPE)=max(1,logaPE(aTotal)).

(8.11)

170

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.27:AnExampleforanEffectiveCacheUsage

Itisnowpossibletoconfigurethecachetocoverthecalculatednumberofntrials.Thatis,thetime
requiredtoperformntrialscanbecomputedasfollows:

d(aTotal,aPE)=RTTPU↔PR+(n(aTotal,aPE)−1)∗RTTPU↔PE.(8.12)

Usingtheresultingdelayasstalecachevalue,thetrialsarecoveredbythecache.
Toshowtheeffectivenessofthecache,anexamplesimulationhavingusedaPU:PEratioofr=3,
arequestintervalofi=1andatargetsystemutilizationof60%hasbeenperformed.Theresult-
ingrequestsize:PEcapacityratio,basedonequation8.9,hasbeens=0.2(suchfrequentandshort
transactionsaretypicalfortelecommunicationssignalling)foracomponentRTTof200ms.Again,
MaxHResItemshasbeen∞andthesimulatedreal-timehasbeen60minutes.
Figure8.27presentstheresultingqueuingdelay(seesubsection8.5.2forthedefinition)forvary-
ingstalecachevaluesfrom0s(nocache)to1s(equaltotherequestinterval)andrequestrejection
probabilitiesaPEfrom0.0(0%)to0.2(20%).Thesystemutilizationisstableat60%forallsettings;
therefore,afigureisomitted.PlotsfortheRoundRobinpolicyarealsonotshown,sincetheresults
.similarquiteareObviously,thecachehasahugeimpactonthereductionofthequeuingdelay(andtherefore
onanimprovementofthehandlingspeed).IntheextremecaseofusingtheRandompolicyand
aPE=0.2,astalecachevalueofonly400msreducestheaveragequeuingdelayfrom28stoless
than7s(d(0.05,0.20)=400ms);foraPE=0.1,thequeuingdelayofRandomdropsfrom8sto4sat
thesamestalecachevalue(d(0.05,0.10)=400ms).Clearly,thereductionsfortheLeastUsedpolicy
aresmaller,sincetheinitialdelayoftheLeastUsedpolicywithoutcacheismuchsmallerthanfor
Random.Furthermore,theloadvaluesbecomeinaccurateduetothecache.Butnevertheless,thereis
stillasignificantqueuingdelayreductionbyseveralseconds.

Summary8.11.3Insummary,thegeneralguidelineonthePU-sidecacheusageistoavoidit–exceptfortheRandom
policy,ofcourse.However,ontheconditionsof

8.12.THEEFFECTSOFHETEROGENEOUSSERVERCAPACITIES

Costlyhandleresolutions(referringtonetworkdelay)ataPR,aswellas

171

Acertainprobabilitythatthehandleresolutionhastoberepeated(duetorejectionoftherequest
bythechosenPE,oriftheselectedPEisunreachable),

thehandleresolutioncache–withareasonablyconfiguredstalecachevalue–canachieveasignificant
benefit.performance

8.12TheEffectsofHeterogeneousServerCapacities
AfterclearlyhavingnecessaryanalysedtoalsothehavebehaaviourlookatofscenariosRSerPoolinwherescenariossomePEsofaremorehomogeneouspowerfulserverthancapacities,others.itTheis
goalofthissectionisthereforetoprovidemoreinsightsintotheimplicationsofsuchscenariosand
toidentifycriticalconfigurationsandparameterranges.Asubsetoftheseresultshasbeenpublished
inDreibholzandRathgeb(2005e).

8.12.1ServerCapacityDistributionScenarios
Inthefollowingsubsubsections,theeffectsofheterogeneousservercapacitiesareanalysedfordiffer-
utions.distribcapacityent

8.12.1.1ASinglePowerfulServer
Themostobviousscenarioofheterogeneousservercapacitiesisprobablytohaveadesignatedpower-
fulolderservander.sloSuchweraonesscenariotoproisvidelikfelyailureifthereisprotectionabyhigh-capacityredundancservy.erIntoorderdototheusualpresentwtheorkefandfectsasetintro-of
ducedbyheterogeneousservercapacities,itisusefultovarythedegreeofheterogeneity.Therefore,
thevariableκisdefinedasthecapacityratiobetweenthepowerfulserverandaslowserver:
yCapacitastPEFκ=CapacitySlowPE.(8.13)
Forexample,avalueofκ=5meansthatthefastserverhasfivetimesthecapacityofaslowone.
Forthefollowingsimulation,κisvariedbetween1and8.Inordertokeeptheoverallpoolcapacity
constant,theresultingservercapacitiesarebeingnormalized(scenariosofanincreasedoverallpool
capacityareevaluatedlaterinsubsection8.12.2).Thatis:

CapacityoftheFastPEOverallCapacityoftheSlowPEs
PoolCapacity=1∗(κ∗CapacitySlowPE)+(PEs−1)∗CapacitySlowPE.
UsingafixedsettingofPoolCapacity,thePEcapacitiescanbecalculatedasfollows:
PoolCapacity
CapacitySlowPE(κ)=1∗(κ−1)+PEs,
CapacityFastPE(κ)=κ∗CapacitySlowPE.

(8.14)(8.15)

172

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.28:ASinglePowerfulServer:UsingtheLeastUsedPolicy

Figure8.29:ASinglePowerfulServer:UsingtheWeightedRandomPolicy

Figure8.30:ASinglePowerfulServer:UsingInappropriatePolicies

8.12.THEEFFECTSOFHETEROGENEOUSSERVERCAPACITIES

173

ResultsfortheLeastUsedPolicyFigure8.28presentstheperformanceresultsfortheLeastUsed
policy.κhasbeenvariedfordifferentsettingsofthePU:PEratior(from1to10)andoftherequest
size:PEcapacityratiosbetween1and100.Thetargetsystemutilizationhasbeen80%.Clearly,
theutilizationisonlynegativelyaffectedforacriticalPU:PEratio(r=1)inconjunctionwithalarge
requestsize:PEcapacityratio(s=100).Asalreadyobservedandexplainedfortheworkloadparameter
analysisinsubsection8.7.2,mappingalongrequesttoaninappropriatePEresultsinarequestjamat
thecorrespondingPUandthereforeleadstoadecreasedsystemutilization.Inthecaseofadedicated
fastserver,abadPEselectionmeansmappingarequesttoaslowserver.Thatis,itobviouslytakes
request.theprocesstolongerTheresultsfortheutilizationarereectedbytheaveragehandlingspeedresults:thehandling
speedisslightlydecreasingwithκ,sincethecapacity–andthereforetheprocessingspeed–ofthe
slowserversdecreaseswithrisingκ(sincetheoverallpoolcapacityremainsconstant).Butwhile
thereductionforr=1ishighest(e.g.from20%atκ=1toabout5%atκ=8),thedescentbecomes
significantlysmallerwithahigherPU:PEratior:forr=10,thehandlingspeedonlydegradesfrom
68%atκ=1toabout65%atκ=8.Thatis,asalreadyobservedfortheworkloadparameteranalysisin
subsection8.7.2,thePU:PEratiohasthemostsignificantinuenceonthesystemperformance.And
again,theinuenceoftherequestsize:PEcapacityratiosisfairlysmallifrishighenough:whileat
r=1ahandlingspeeddifferenceof8%to10%canbeobservedbetweens=1ands=100,thedifference
isonlyabout1%forr=10.ForasufficientlyhighPU:PEratior,itiseasytocompensatebad
selectionsbyusingafastPEforthenextrequest.Furthermore,usinglongerrequestsresultsinabetter
handlingspeed,sincetheinuenceofthequeuingdelayisreduced(seealsosubsubsection8.7.2.2).

ResultsfortheWeightedRandomPolicyTheperformanceresultsfortheWeightedRandompol-
icyarepresentedinfigure8.29.Clearly,theweightwiofeachPEiisitscapacityCapacityi(seealso
subsubsection4.4.2.5).Asexpectedfromtheworkloadparameteranalysisinsubsection8.7.2,the
PU:PEratioriscriticalforthenon-adaptive(Weighted)Randompolicy.Ifrislow,theutilizationis
lowest.Thiseffectisamplifiedbyalargerequestsize:PEcapacityratios–mappinglargerequestsis
ficult.difmoreArisingheterogeneityofthecapacities(i.e.anincreasedvalueofκ)simplifiesthedistributionof
requests:thereisasinglepowerfulserverwhichconcentratesmostofthepoolscapacity.However,
whilethisobservationholdsforthecriticalparameterrange(rlow),itcanbeobservedforthecurves
ofr=10andsbetween1and100,thattheutilizationdecreasesuntilaboutκ=4andthenstartsrising
again.Thereasonforthiseffectisthatthecapacityoftheslowerserversbecomeslow,butthereis
stillacertainprobabilitythattheygetrequests.
Inordertofurtherevaluatethiseffect,itisnecessarytohavealookattherequesthandlingspeed.
Here,therequestsize:PEcapacityratiosbecomestheimportantparameter:asalreadyobservedin
subsection8.7.2,largerrequestsarebetterforthehandlingspeed–asinglelargerequestisaffected
bythequeuingdelayonlyonce,splittingitupinto100shortrequestsresultsin100occurrencesof
queuingdelay.Usingsmallrequests(here:s=1ands=10),thesystemperformancedoesnotreachan
acceptablevalueuntilacertainPU:PEratiorisreached(here:r=10).However,thehandlingspeed
stillquicklydecayswithrisingκ(seetheplotsforr=10ands=1ors=10).Thatis,usingWeighted
RandominheterogeneousscenariosrequiresareasonablyhighPU:PEratiorinordertoprovidean
acceptableperformance.Inthiscase,ahandlingspeedincreasingslightlywithκcanbeexpected–
themorepowerfulthefastPE,thehigherthepossiblespeeditcanprocessitsrequestswith.

174

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.31:OneThirdPowerfulServers

ResultsfortheRoundRobinandRandomPoliciesForcomparison,figure8.30showstheper-
formanceresultsfortheRoundRobinandRandompolicies.Clearly,sincethesepoliciesdonothave
anyinformationaboutdifferentservercapacities(likeWeightedRandom)orloadstates(likeLeast
Used),theperformanceofbothpoliciesseverelydegradesinheterogeneousscenarios(i.e.forκ>1).
Therefore,theRoundRobinandRandompoliciesareunsuitableforheterogeneousservercapacity
scenarios.

ableSummarytocopeInwiththesummarye,xistenceithasofabeenshodedicatedwnthatfasttheserver.policiesAgain,LeasttheUsedPU:PEandratioWiseightedthemostRandomcriticalare
wtheorkloadPU:PEratioparameterhastoforbethesufsystemficientlyhighperformance.toachievIneaparticularreasonableifusingtheperformance.WeightedRandompolicy,

8.12.1.2MultiplePowerfulServers
tionAfterha8.12.1.1ving,itevisaluatedclearlytheusefulscenariotoanalyseofaasinglescenariodedicatedcontainingfastservmultipleerinpothewerfulpreservviousers.Tosubsubsec-show
theessentialresults,ascenarioof9PEsincluding3fastoneshasbeenused.Again,κdefinesthe
capacityratiobetweenfastandslowservers.Thatis,theoverallcapacityPoolCapacityofthepool
by:envgiisOverallCapacityoftheFastPEsOverallCapacityoftheSlowPEs
PoolCapacity=PEsFastPE∗(κ∗CapacitySlowPE)+(PEs−PEsFast)∗CapacitySlowPE.
UsingafixedsettingofPoolCapacity,thePEcapacitiescanbecalculatedasfollows:
PoolCapacity
CapacitySlowPE(κ)=PEsFast∗(κ−1)+PEs,(8.16)
CapacityFastPE(κ)=κ∗CapacitySlowPE.(8.17)

8.12.THEEFFECTSOFHETEROGENEOUSSERVERCAPACITIES

175

Theotherparametershaveremainedasforthescenarioofasinglefastserverdescribedinsubsubsec-
.8.12.1.1tionFigure8.31showstheresultsofthesimulationfortheWeightedRandomandLeastUsedpolicies.
Asithasalreadybeenshownandexplainedinsubsubsection8.12.1.1,theRandomandRoundRobin
policies–aswellasWeightedRandomforatoosmallPU:PEratio–areuselessforheterogeneous
scenarios.Therefore,theresultsforthesepoliciesandparametersettingshavebeenomitted.
Asexpectedfromtheutilizationresultsoftheprevioussimulation,onlyasmallPU:PEratiorin
combinationwithahighrequestsize:PEcapacityratiosiscritical:r=1ands=100forLeastUsed,
aswellasr=10ands=100forWeightedRandom.Comparingtheseresultsforthecriticalparameter
figuresettings8.29tothefortheresultsWofeightedthesingleRandomfastservpolicery;snotecenariothedif(seeferentfigureaxis8.28scalingfortheforLeastthehandlUsedingpolicyspeed),and
aslightutilizationdecreaseforLeastUsedandanutilizationincreaseforWeightedRandomcanbe
observed.Theseeffectsaretheresultsofthechangedcapacitydistribution:havingmultiplefast
servers,mostofthepoolscapacityisconcentratedontheseservers.Thisleadstoasmallercapacity
ofaslowserverforafixedsettingofκ.
Forexample,theslowservercapacityforκ=5isabout455,000calculations/siftherearethree
fastservers(seeequation8.16),whileaslowserverstillhasacapacityofabout714,000calculations/s
inthesinglefastserverscenario(seeequation8.14).
utilizationDuetointhethesmallerthreefastcapacitiesserversofthescenario:slowaservsloers,witservisermorewilldifbeficultselectedforifLeastitonlyUsedtohasthereachloawesthigh
loadvalue.Thisleadstoadecreasedhandlingspeed.Thehighertheheterogeneityofthepool,the
moresignificanttheeffect.Inacriticalparameterrange(i.e.alowPU:PEratioandahighrequest
size:PEcapacityratio),thisleadstoqueuingofrequests(requestjam)andthereforetoareduced
utilization.Ontheotherhand,havingmultiplefastserversisbeneficialfortheWeightedRandompolicy:
sincewi=Capacityi,theprobabilityofmappingarequesttotheveryslowPEsbecomessmall,
scenario.resultinginanFurthermore,improvedasrequestalreadyobservhandlingedinspeedandsubsubsectionutilization8.12.1.1compared,thetospeedtheissingleimprofvastedservwitheran
increasedheterogeneityofthepool(i.e.ahighervalueofκ).

utionDistribCapacityLinear8.12.1.3Thenextcapacityscenariotobeanalysedisalineardistribution.Suchadistributionislikelyifthe
poolconsistsofmultiplegenerationsofservers.Forexample,apoolcouldconsistof10PEs,where
thefirstonehadbeeninstalledfiveyearsagoandanewone–containingstate-of-the-arthardware–
hasbeenaddedevery6months.Thecapacitiesoftheotherserversarelinearlydistributedbetween
ftheastest)capacityPE(ofCapacittheyoldestFastest(i.e.PE).sloThewest)ratioPE(betweenCapacitytheseSlowtwestoPE)capacitiesandthedefinescapacitytheofthescalingnefwestactorγ(i.e.:

γ=CapacityFastestPE.
CapacitySlowestPE

Thatis,forγ=3thefastestserverhasthreetimesmorecapacitythantheslowestone.Again,while
scalingγ,theresultingservercapacitiesarebeingnormalizedtokeeptheoverallpoolcapacitycon-
stant(non-normalizedscenariosareanalysedlaterinsubsection8.12.2).Thatis,theoverallcapacity

176

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.32:UsingaLinearCapacityDistribution

ofthepool,PoolCapacity,canbecalculatedasfollows:
PEsPEs
PoolCapacity=(γ−1)∗CapacitySlowestPE∗(i−1)+CapacitySlowestPE
i=1PEs−1i=1
CapacitiesPEAdditionalCapacityGradientBasePECapacities
PEs−1
PEs−1SlowestPE
=(γ−1)∗CapacitySlowestPE∗i+Capacity∗PEs
=1i=CapacitySlowestPE∗(γ−1)∗PEs+PEs.
2Thatis,givenafixedvalueofPoolCapacity,CapacitySlowestPEcanbecalculatedasfollows:
PoolCapacity
CapacitySlowestPE=(γ−1)∗PEs+PEs
2PoolCapacity
=PEs∗(γ−1+1).
2

Finally,thecapacityofPEiis:
Capacityi=(γ−1)∗CapacitySlowestPE∗(i−1)+CapacitySlowestPE.
GradientCapacityPEs−1BaseCapacity
Inordertoperformthesimulation,thesetupdescribedinsubsubsection8.12.1.1hasbeenused
withthePEcapacitydistributionasdescribedabove.Theresultsarepresentedinfigure8.32.Clearly,
thedistributionofthecapacitiesinalinearscenarioisverysystematic:foreachslowerPE,there
willalsobeafasterone.APUbeingservedbyaslowPEwillbeservedbyafastonesoon.On

8.12.THEEFFECTSOFHETEROGENEOUSSERVERCAPACITIES

Figure8.33:UsingaUniformCapacityDistribution

177

average,thedifferencetoahomogeneousscenario(i.e.γ=1)becomesalmostinsignificant.Again,as
alreadyobservedandexplainedforthefastserverscenariosinsubsubsection8.12.1.1andsubsubsec-
tion8.12.1.2,theonlycaseofacriticalparameterrangeisasmallPU:PEratiorincombinationwith
alargerequestsize:PEcapacityratios.
Afterhavingevaluatedscenariosusingdeterministiccapacitydistributions,itisalsousefulto
examinerandomizedones.Suchanalyseswillbeprovidedbythefollowingtwosubsubsections.

8.12.1.4UniformRandomCapacityDistribution
Forthefirstrandomizedcapacitydistributionscenario,theservercapacitieshavebeenuniformly
randomized.SuchascenarioislikelyifcurrentlyunusedPCsareaddedtoacomputepoolforreal-
timedistributedcomputing(seesubsection3.6.5).Clearly,thereisacertainlowerboundforthe
minimumcapacity(setbytheadministrator)andanupperbounddefinedbythestate-of-the-artinPC
technology.Theratiobetweentheminimumandmaximumcapacitiesisgivenbythefactorϑ,which
ws:folloasdefinedisϑ=CapacityFastestPE.
CapacitySlowestPE
Then,foreachPEi,itscapacityCapacityiisrandomlychosen(usinganuniformdistribution):
Capacityi∈R[CapacitySlowestPE,...,ϑ∗CapacitySlowestPE]⊂N.
Asfortheprevioussimulations,theresultingcapacitieshavetobenormalized,i.e.theaveragesystem
capacityremainsconstantforallsettingsofϑ(seesubsection8.12.2fornon-normalizedscenarios).
Inordertoshowtheessentialeffects,asimulationusinganuniformcapacitydistributionhas
beenperformed.Foreachparametersetting,64runshavebeenperformedwithdifferentseedsto
achieveasufficientstatisticalaccuracy.Allotherparametershaveremainedasdescribedinsubsub-
section8.12.1.1.Theresultsarepresentedinfigure8.33.
Again,asobservedandexplainedforthesimulationsoftheprevioussubsubsections,theutiliza-
tionisaffectedbycriticalparameterrangesofthePU:PEratior(small)andtherequestsize:PE
capacityratios(large)only.Inacriticalparameterrange,theutilizationsinksslightlywitharisingϑ.

178

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.34:UsingaTruncatedNormalCapacityDistribution

byaOnrisingthevaluehandlingofϑspeed(e.g.side,fromhowe59%verat,ϑthe=1curvtoes52%foratLeastϑ=8Usedforrare=3,obs=100),viouslynewhilegativanelyafincreasedfected
heterogeneityslightlyincreasestheperformanceofWeightedRandom(e.g.from22%atϑ=1toabout
of26%ϑ),attheϑ=8higherforrthe=10,sprobability=100).forThethereasonexistenceisclear:ofvtheeryslohigherwPEs.theWhileheterogeneityforW(i.e.eightedalargeRandomsettingthe
probabilityfortheselectionofsuchaslowPEsinkswithitscapacity,LeastUsedwillselectsucha
PEifitjusthasasmallload.Clearly,suchaselectionwillleadtoalowhandlingspeed.

8.12.1.5TruncatedNormalRandomCapacityDistribution
Finally,thelastcapacitydistributionscenariohasusedatruncatednormalrandomdistributionwith
agivenaveragePEcapacityCapacityAveragePE(106calculations/s)andthestandarddeviationgiven
byη∗CapacityAveragePE.Obviously,capacitiescannotbenegative.Therefore,thecapacityselection
istruncatedbyenforcingalowerlimit(104calculations/s).Finally,thePEcapacitieshavetobe
normalizedinordertokeeptheoverallsystemcapacityconstant.
Figure8.34showstheresultsofthesimulationforthetruncatednormaldistribution.Thesetup
hasremainedthesameashavingusedfortheuniformdistribution(seesubsubsection8.12.1.4).The
resultsfortheutilizationarequitesimilarcomparedtotheuniformresults:inthecriticalparameter
range(lowPU:PEratior,largerequestsize:PEcapacityratios),theutilizationdecreasesslightly
.ηwithHowever,thehandlingspeedcurvesforLeastUsedareobviouslynegativelyaffectedbyarising
valueofη(e.g.from59%atη=0to52%atη=0.5forr=3,s=100),whileanincreasedheterogeneity
slightlyincreasestheperformanceofWeightedRandom(e.g.from22%atη=0toabout26%atη=0.5
forr=10,s=100).Again,thehighertheheterogeneityofthepool(alargesettingofη),themore
probabletheexistenceoflow-capacityPEs.And–asexplainedintheprevioussubsubsection–Least
UsedmaymaprequeststosuchPEs(iftheyhavethelowestload),whileforWeightedRandomthe
weightsandthereforetheselectionprobabilitiesbecomesmall.

8.12.THEEFFECTSOFHETEROGENEOUSSERVERCAPACITIES

179

Summary8.12.1.6Insummary,thesimulationspresentedintheprevioussubsubsectionshaveoutlinedthefollowing
fects:efimportantThepoliciesRoundRobinandRandomareobviouslyunsuitableinheterogeneousserverca-
pacityscenarios,sincetheyneitherhaveinformationonPEloadstatesnorontheircapacities.
WeightedRandomiscapabletocopewithheterogeneousservercapacities,althoughthePU:PE
ratiohastobesufficientlyhigh.Forarisingheterogeneityofthepool,WeightedRandomis
alsoabletomakeuseofveryfastPEstoincreasetheoverallrequesthandlingspeed.
LeastUsedclearlyachievesthebestperformance,butitcanbeobservedthattheoverallrequest
handlingspeeddecreaseswitharisingheterogeneityofthepool–inparticular,ifthecapacities
arehighlyunbalanced(e.g.inthescenariosofdedicatedfastserversorforthetruncatednormal
distribution).ThisisaresultofthefactthatLeastUsedselectsbyPEloadratherthanbyPE
capacity:evenaveryslowPEwillgetrequestsaslongasitsloadstateisthelowest.
Inparticular,theobservationfortheLeastUsedhandlingspeedleadstothefollowingimpor-
tantquestion:whichpoliciesareabletomakebestuseofanincreasedpoolcapacity(i.e.inanon-
normalizedcapacityscenario)?Thisquestionwillbeansweredinthefollowingsubsection.

8.12.2ALoad-Increment-AwarePolicy
AnimportantfeatureofRSerPoolisthepossibilitytodynamicallyaddserverstoapoolorremove
themfrominordertoadaptapoolscapacitytoachangeddemand.Inparticular,thisallowsfor
anadministratortoaddcapacityifthepoolishighlyutilized.Butwhathappenswithadditional
capacityifthepoolis(temporarily)slightlyloadedandremovingsomeserversisnotuseful?Clearly,
itisusuallydesirablethatthesparecapacitywouldresultinanimprovementoftherequesthandling
speed!Butarethepoliciessufficientlyequippedtohandlesuchascenario?Somebasicideashave
alreadybeenproposedinDreibholz,RathgebandT¨uxen(2005),butamoregeneralevaluationhas
stillbeenmissing.Therefore,thegoalofthissubsectionistoprovideanappropriateanalysis.

8.12.2.1ASinglePowerfulServer
Probablythemostobviousscenarioofaddingcapacitytoapoolistoincreasethecapacityofasingle
dedicatedserver.Thiscoulde.g.meantoaddmorememory,increasetheCPUspeedoraddanother
CPU.Theratiobetweenthenewandtheoriginalcapacityofthepoolcanbegivenbythefollowing
equation:PoolCapacityNew
ϕ=PoolCapacityOriginal.(8.18)
ThevariableϕisdenotedastheCapacityScaleFactor;avalueofϕ=1standsfornocapacitychange,
whileϕ=4denotesaquadruplicatedcapacity.
PEInonly.caseThatofais,thesinglepocapacitywerfulincrementserver,ofthechangingwholeϕpool,resultsindenotedvaryingasΔthePool(ϕcapacity),canofbethecalculateddesignatedas
ws:folloΔPool(ϕ)=(ϕ∗PoolCapacityOriginal)−PoolCapacityOriginal.(8.19)
PoolCapacityNew

180

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.35:ServerCapacitiesoftheSinglePowerfulServerScenario

Then,thecapacityofthepowerfulPE,CapacityFastPE(ϕ),cansimplybecalculatedas:
CapacityFastPE(ϕ)=PoolCapPEacistyOriginal+ΔPool(ϕ).
ThecapacitiesofallotherPEsremainunchanged,i.e.:
PoolCapacitySlowPE(ϕ)=PoolCapacityOriginal.
sPEFigure8.35illustratesthecapacityforeachofthe10PEsforhavingvariedthevalueofϕ.
Toshowtheessentialeffectsofvaryingthefastserverscapacity,simulationshavebeenperformed
usingthepoliciesRoundRobin,Random,WeightedRandom,LeastUsedandPriorityLeastUsed(to
ofbes=10introducedforatarbelogetw)insystemascenarioutilizationofofPU:PE90%inratiostherfrom1heterogeneousto10andacase.requestNote,thatsize:PEthecapacityrequestloadratio
beingcreatedbythePUsremainsconstant.Theperformanceresultsarepresentedinfigure8.36.
theThepool,theresultslowerforthetheaverageutilization.systemSincetheutilizationrequestedarenotvcapacityeryremainssurprising:constantthehigherandthethepoliciescapacityareof
notusedincriticalparameterranges(i.e.inparticular,thePU:PEratioforthenon-adaptivepolicies
issufficientlyhigh,seesection8.7),nosignificantutilizationdifferencesamongtheusedpoliciescan
ed.observbedomproClearlyvide,astheitlocanwestbeehandlingxpectedfromspeed.theThatresultsis,inthesepolicsubsubsectioniesareonly8.12.1.1useful,inRoundRobinhomogeneousandRan-sce-
narios.However,theperformanceofLeastUsedissurprisinglybad:atϕ=5andforr=3,thehandling
speedisonly90%forLeastUsed,whileitisalreadyabout500%fortheWeightedRandompolicy.
WhatistheproblemoftheLeastUsedpolicyinthisscenario?
ThereasonforthebadperformanceoftheLeastUsedpolicyisthefactthattheselectiondecision
isbasedonthecurrentPEloadonly.ConsiderapowerfulPE#1loadedby11%andaslowPE#2
theloadednewbyrequest10%.mayClearly,increasethetheLeastslowUsedPEpolic#2sywloadouldbyselectanotherPE10%,#2,whilebecauseitusinghasPEthe#2lomaywestonlyload.havBute

8.12.THEEFFECTSOFHETEROGENEOUSSERVERCAPACITIES

Figure8.36:IncreasingtheCapacityofaSingleServer

181

increaseditsloadby2%.Thatis,PE#2wouldhavebeenabletohandletherequestmorequickly.
ThelackofLeastUsedtoincorporatetheaspectofdifferentloadincrementsfordifferentservershas
ledtothedefinitionofthePriorityLeastUsedpolicy(seesubsubsection3.11.3.2foritsdefinitionand
subsubsection4.4.2.4foritsimplementationaspartofthehandlespacemanagement),whichhasbeen
definedbyusinDreibholz,RathgebandT¨uxen(2005)asresultofthedescribedproblemstatement.
UsingPriorityLeastUsed,eachPEcanspecifyitsloadincrement,i.e.thenumberofloadunitsthe
serversloadisincreasedbyanadditionalrequest.Uponselection,thePEhavingthelowestsumof
loadandloadincrementischosen.Thatis,whileLeastUsedtakesthePEcurrentlyhavingthelowest
load,PriorityLeastUsedselectsthePEhavingthelowestloadafteracceptinganewrequest.
Forthesimulationsinfigure8.36,theloadincrementlˆiforeachPEiisdefinedas:
lˆi=Capacityi(ϕ).(8.20)
2.5∗105
Thatis,usingaPEoftheaveragecapacity(106calculations/s,seesection8.6),anewrequestrises
aPEsloadbylˆAveragePE=25%.IfthefastPEscapacitywouldbe5*106calculations/s,theload
incrementwouldonlybelˆFastPE=5%.
UsingthenewPriorityLeastUsedpolicywiththedescribedsettingoftheloadincrement,a
significanthandlingspeedgaincanbeobserved(seefigure8.36):about3,250%forϕ=5(r=3and
r=10)vs.600%forWeightedRandom(r=10)andabout160%forLeastUsed(r=10).Thatis,the
newPriorityLeastUsedpolicyprovidesthedesiredfunctionalityofincreasingthehandlingspeedin
thescenarioofasingledesignatedserver.ButwhataboutotherscenariosofincreasedPEcapacity?

8.12.2.2MultiplePowerfulServers
Anotherrealisticcapacityincrementscenarioistotunemultipleserversinsteadofadesignatedone
only.Therefore,thefollowingsimulationhasevaluatedtheeffectofequallyincreasingthepower
ofonethird(3)of9PEs.Allotherparametershaveremainedasfortheprevioussimulationin
subsubsection8.12.2.1.Therefore,thecapacityCapacityi(ϕ)ofPEihasbeendefinedasfollows

182

CHAPTER8.RSERPOOLPERFORMANCERESULTS

Figure8.37:IncreasingtheCapacityofOneThirdoftheServers

(forthepoolcapacityincrementΔPool(ϕ)asdefinedbyequation8.19):
ΔFastPE(ϕ)=ΔPool(ϕ),
PEPsoFolCastapacityOriginal
Capacityi(ϕ)=PPEosolCapacity+OriΔgiFnalastPE(ϕ)(i≤PEsFast).
PEs(i>PEsFast)
Figure8.37showsthesimulationresultsforthescenariohavingused9PEs,including3fast
servers.Sincetheutilizationresultsarequitesimilartotheprevioussimulation,theyhavebeen
omittedhere.Instead,theleft-handsideoffigure8.37presentsthecapacityofeachserverforvarying
resultsettingsforoftheϕ;thehandlingright-handspeed,sideitcanshowsbetheobservplotsedforthatthetheavbehaerageviourrequestofthehandlingpoliciesisspeed.quiteAsasimilargeneralto
thescenarioofonedesignatedpowerfulserver.Thatis,whileRoundRobinandRandomareinfact
useless,theperformanceofLeastUsedissignificantlyoutperformedbyWeightedRandom(e.g.a
handlingspeedofmorethan700%vs.about125%forLeastUsedforaPU:PEratior=10;therequest
improsize:PEvementcapacityisratioachievsisable10byforausingllthesimulationsPriorityofLeastthisUsedsubsection).policyAgwithain,thealoadsignificantincrementperformancesetting
.8.20equationbydefinedComparingtheresultswiththeobservationsforthesinglefastserverscenarioofsubsubsec-
tionerfulPE8.12.2.1,incorporatestheeffectsalmostofthethepoolchangedscompletecapacitydistribcapacityution(e.g.can4,100%beofobservasloed:wwhileserverthessinglecapacitypoforw-
ϕ=5,seefigure8.35),theadditionalcapacityisnowdividedupamongthreeservers(i.e.3servers
having1,300%ofaslowserverscapacityforϕ=5,seetheleft-handsideoffigure8.37).Inthiscase,
themaximumtophandlingpossiblespeedrequestislowerhandling(e.g.speed1,250%isvs.limited3,250%bytheforPriorityprocessingLeastspeedUsedofaatfϕast=5),PE.Fsinceortthehe
LeastUsedpolicyandalsoforRoundRobinandRandom,thefactthatnowonethirdoftheservers
arepowerfulonesbecomesbeneficial:theirhandlingspeedisincreased,sincetheprobabilityofmap-
pingarequesttoapowerfulPEisincreasedsignificantly(e.g.160%vs.125%forLeastUsedatϕ=5
=10).rand

8.12.THEEFFECTSOFHETEROGENEOUSSERVERCAPACITIES

Figure8.38:IncreasingtheServerCapacitiesLinearly

183

Insummary,whileithasbeenshownthatanappropriatepolicycanmakeuseofadditionalca-
pacitytoimprovethehandlingspeedinscenarioscontainingasetofpowerfulPEs,itisfurthermore
necessarytohavealookatascenariocontainingalessextremecapacitydistribution.

utionDistribCapacityLinear8.12.2.3Inthefinalcapacityscenario,thePEcapacitiesareincreasedlinearly.Thatis,whilethecapacityof
thefirstPEremainsconstant,thecapacitiesofthefollowingPEsareincreasedwithalineargradient
–forthepooltoreachitsdesiredcapacityPoolCapacityNew(seealsoequation8.18).Therefore,
thecapacityCapacityi(ϕ)ofPEiisdefinedasfollows(fortheadditionalpoolcapacityΔPool(ϕ)as
):8.19equationbydefinedΔFastestPE(ϕ)=2∗ΔPool(ϕ),
sPECapacityi(ϕ)=ΔFastestPE(ϕ)∗(i−1)+PoolCapacityOriginal.
GradientCapacityPEs−1PEs
AdditionalCapacityforPEi
Asimulationusingthelinearcapacitydistributionhasbeenperformed.Allotherparameters
haveremainedasforthefastserverscenarioofsubsubsection8.12.2.1.Theresultsarepresentedin
figure8.38;theleft-handsideshowsthecapacityforeachofthe10PEsforhavingvariedϕ,the
right-handsidepresentstheaveragehandlingspeedresults.Again,aplotforthesystemutilization
hasbeenomitted,sinceitdoesnotprovideanynewinsights.
Whilethegeneralrankingbehaviourofthepoliciesremainsasobservedforthetwoscenariosof
fastservers(seesubsubsection8.12.2.1andsubsubsection8.12.2.2),itisclearlyvisiblethatalinear
capacitydistributionresultsinsmallerperformancedifferencesamongthepolicies:forϕ=5,the
capacityofthefastestPEisonly900%oftheslowestPEsone(seetheleft-handsideoffigure8.38),
whileitis1,300%inthethree-fast-serversscenarioandeven4,100%forthededicatedfastserver(see

184

CHAPTER8.RSERPOOLPERFORMANCERESULTS

theleft-handsidesoffigure8.37andfigure8.35).Thatis,whilethetoprequesthandlingspeedfor
requestsissignificantlylower(e.g.onlyabout800%forPriorityLeastUsedandaPU:PEratior=10),
thechancethataless-performingpolicycanreachahighhandlingspeedissignificantlyimproved.
Atϕ=5andr=10,theRandompolicyalreadyachievesahandlingspeedofabout175%,whileRound
Robinevenexceeds200%.LeastUsedisabletoreachabout330%,butisstillbeingoutperformed
byWeightedRandomwithabout450%.Note,thatthehandlingspeedofWeightedRandomonly
outperformsLeastUsedforϕ>3–forsmallervaluesofϕ,itsperformanceissignificantlylower.
Comparedtoonededicatedserver(LeastUsedisalreadyoutperformedatϕ=2),theperformanceof
LeastUsedforalinearcapacitydistributionissignificantlybetter.
Asthemainresult,ithasbeenobservedthatthelinearscenarioissignificantlylesscritical–even
inappropriatepolicieslikeRoundRobinandRandomareabletomakeuseoftheimprovedcapacity:
whileitisstillpossibletomaparequesttoaslowerPE,theprobabilitytomapthenextrequesttoa
fasteroneisthesame(duetothelinearcapacitydistribution).

Summary8.12.2.4Insummary,ithasbeenshownthattheWeightedRandompolicyisabletomakeuseofincreased
servercapacitiesbyincreasingtherequesthandlingspeed,whileRoundRobinandRandom–as
expected–arenotusefulinaheterogeneouscapacityscenario.However,theperformanceoftheLeast
UsedpolicyfallssignificantlybehindWeightedRandomifthescenarioissufficientlyheterogeneous:
LeastUsedonlytakestheloadstatesintoaccount,butnotthePEscapacities.ThenewPriorityLeast
UsedpolicysolvesthisproblembyallowingaPEtodefinealoadincrementconstant.
AsresultofsuggestingthePriorityLeastUsedpolicyaspartofourpaperDreibholz,Rathgeband
T¨uxen(2005)aswellasoftheInternetDraftT¨uxenandDreibholz(2005)(IndividualSubmission)
includingthedefinitionofpoolpolicies,performanceresultshavebeenpresentedatthe60thIETF
Meeting(seeDreibholz(2004c)).Afterthat,thedrafthasbecometheWorkingGroupDraftT¨uxen
andDreibholz(2006b)oftheIETFRSerPoolWG.

Summary8.13

InAfterthisthat,chapter,simulatiavegenericperformanceapplicationevmodelaluationsasofwellasRSerPoolperformancesystemshametricsvehabeenvebeenperformedintroducedinorderfirst.to
identifycriticalparameterranges.Importantresultsoftheseanalyseshavebeen:
Anestimationoftheworkloadparameterinuenceonthepoolpolicyperformance,
TheidentificationoffallaciesandpitfallsfortheRoundRobinpolicies,
Thedefinitionofdistance-sensitivepoolpolicies,
GuidelinesfortheusageofthePU-sidecacheand
AnoptimizedLeastUsedpolicyforheterogeneousservercapacityscenarios.
Resultsoftheseevaluations,optimizationsandextensionshavebeencontributedintotheIETFsRSer-
process.standardizationPool

9Chapter

RSerPoolFailureScenarioResults

OAhasLofbeenthisdesignedchapterisforthe.evWhilealuationinofpractisefailurePEsasscenarios,wellasi.e.PUstheandusePRscasecanforfail,whichthefocusRSerPoolof
Gtheanalysesinthischapteristhemostimportantcase:thefailureofPEs.

oductionIntr9.1

Fromtheusersperspective(i.e.intheviewofaPU),aPEfailureishandledbythestepsillustratedin
figure9.1:duringnormaloperation,theapplicationsetscheckpointsforthesessionstatetoreturnto
incaseofproblems(e.g.bysendingstatecookiesifapplyingclient-basedstatesharingforthistask,
seealsosubsubsection3.9.5.2andDreibholz(2002)).However,allworkperformedbytheserver
betweenthelatestcheckpointandaPEfailurewillbelost1.
ThefailureofthePEwillbedetectedbythePUsideusingmechanismsbeingappropriatefor
theapplication.Intheusualcase,thefailuredetectionisprovidedbytimeouts,whichcoulde.g.be
basedonSCTPheartbeats,featuresoftheapplicationprotocolitself,orboth.Incaseofthesimulated
calculationapplication,itisprovidedbythesessionkeep-alivemechanismusingCalcAppKeepAlive
andCalcAppKeepAliveAckmessages(seesection8.3).

1Ofcourse,allworkislostifthereisnocheckpoint.

Figure9.1:TheDetectionandHandlingofaPoolElementFailure

185

186

CHAPTER9.RSERPOOLFAILURESCENARIORESULTS

Figure9.2:TheFailoverPerformanceofDynamicPools

Afterthedetectionofaserverfailure,anewPEhastobechosen(byapplyingthehandlereso-
lutionprocedure,seesubsubsection3.9.3.1),aconnectiontotheselectedPEhastobeestablished,
thesessionstatesavedbythelatestcheckpointhastoberestoredandthelostwork(i.e.theworknot
savedbyacheckpoint)tobere-processed.Afterthat,thesessioncancontinueitsnormaloperation.
ThedelaybetweenleavingthenormaloperationstateandreachingitagainisdenotedasFailover
.DelayClearly,thegoalofawell-provisionedRSerPoolsystemistominimizethefailoverdelay.The
approachestoreachthisgoalareto
Reducethetimerequiredforfailuredetectionaswellasfailoverinitiationand
Tominimizetheamountoflostwork(andthereforetokeepthere-processingeffortassmallas
possible).WhilethelastitemisrelatedtotheApplicationLayerandtightlyconnectedtothemechanismused
forsessionfailover(e.g.theclient-basedstatesharingasdescribedinsubsubsection3.9.5.2),thefirst
itemisrelatedtotheSessionLayeronly.Therefore,itisusefultodifferentiatebetweenthesetwoparts
ofthefailoverprocedure.Inthefollowingsections9.2and9.3,thefailuredetectionmechanismsare
analysed.Thisisfollowedbyanevaluationofthetwomostimportantfailovermechanisms:theabort
andrestartprincipleinsubsection9.4.1andclient-basedstatesharinginsubsection9.4.2.

oolsPDynamicUsing9.2

AbasicfeatureoftheRSerPoolarchitectureisitssupportfordynamicpools.Thatis,newPEscan
registerintoapoolandregisteredPEscanderegisterfromtheirpool.Inparticular,thisfunctional-
ityallowsfordynamicallyadaptingapoolscapacitytothecurrentuserrequirements.Thisfeature
ishighlyusefulforapplicationcaseslikereal-timedistributedcomputing,asdescribedinsubsec-
tion3.6.5.WhilethederegistrationofaPEinadynamicpoolisnotactuallyafailure,itisnevertheless
necessarytoperformafailover:incaseofalong-lastingsession,itisusuallynotpossibleforthePE

9.2.USINGDYNAMICPOOLS

187

tofinishallofitssessions.Thatis,aPEchangeduetoanintentionalderegistrationcanbeviewedas
controlledfailureofaPE.
Suchacontrolledfailureisthesimplestcaseforafailover:thePEcantellitsPUsaboutthe
oncomingshutdown,andthePUscantaketheappropriateactions.Intheusualcase,thisinparticular
meanstosetacheckpointtoseamlesslyresumethesessiononadifferentPE.Thisavoidsanylost
workandre-processingeffort.Thatis,acontrolledfailureonlyresultsinthecosts(i.e.usuallytime)
forhandleresolutionandresumptionofthesessiononthenewPE.Sincetheendpointlatenciesare
negligible,thesystemperformanceinsuchscenariosismainlyinuencedbythenetworkdelayonly.
Tomaketheeffectsclear,anexamplesimulationbasedonthescenariosetupdescribedinsec-
tion8.6hasbeenperformed.Thefollowingparametershavebeenused:atargetsystemutilizationof
60%,asimulatedreal-timeof60minutes,aPU:PEratioof10andarequestsize:PEcapacityratioof
10(i.e.anaverage-sizedrequestisprocessedwithin10s).TheMeanTimeBetweenFailure(MTBF)
ofthePEshasbeenvariedbetween0.1and5timestherequestsize:PEcapacityratio,havinguseda
negativeexponentialdistribution.Thatis,avalueof1meansthataPEgoesdowninanintervalequal
tothetimerequiredtoexclusivelyprocessarequestoftheaveragesize.
Figure9.2presentstheperformanceresultsofthesimulation.Clearly,theeffectsofPEfailuresfor
highsettingsoftheMTBFarealmostnegligible.Therefore,thissimulationhasinparticularusedvery
lowsettings.AusecasewheresmallMTBFvaluesmaybelikelyisreal-timedistributedcomputing,
whereuserPCsprovidetheircapacitytoapoolifidleandremovethemselveswhenutilizedagain
).3.6.5subsectionalso(seeTheleft-handsideoffigure9.2containstheplotsforthesystemutilization.Clearly,theeffects
ofdynamicpoolsaresmall,exceptincaseofverylowMTBFsettingsoccurringincombinationwith
networkdelay(here:aMTBFsettingof0.1andaninter-componentnetworklatencyof50ms).In
thiscase,thePUsspendasignificantfractionofthehandlingtimeforhandleresolutionandsession
resumption(i.e.startupdelay,seefigure9.1),leavingthePEsidle.Note,thatthenegativeeffectfor
theLeastUsedpolicyissmallerthanfortheRoundRobinandRandompolicies(e.g.57%vs.44%
utilizationforaMTBFsettingof0.1andanetworkdelayof50ms):asalreadyobservedanddescribed
insection8.7,theloaddistributionperformanceofLeastUsedissignificantlybetter.Therefore,itis
easiertoutilizethecurrentlyavailableresources.
However,fromtheusersperformanceperspectiveonthehandlingspeedside–shownonthe
right-handsideoffigure9.22–theeffectsofdynamicpoolscanbeobservedearlier(i.e.evenfor
highersettingsoftheMTBF):asexpected,thehandlingspeeddecreaseswithasmallerMTBFand
highernetworkdelay(i.e.morecostlyPEchanges).However,aninterestingobservationisthebe-
haviouroftheRoundRobinpolicy:whilethispolicyprovidesabetterperformancethanRandom
intheusualcase(e.g.ahandlingspeedof40%vs.about30%forRandomforaMTBFvalueof5;
seealsosection8.7foradetailedevaluationofthetwopolicies),thehandlingspeedofRoundRobin
convergestothevalueofRandomforlowerMTBFvalues.Theexplanationisquiteclear:theideaof
RoundRobinistochoosePEsinturn.However,foralowMTBFvalue,PEsappearanddisappear
frequently.Thatis,thereisnostablelisttoselectthePEsfrom.Instead,thechoicebecomesmore
random.moreandInsummary,thecontrolledfailoverfordynamicpoolsisclearlytheidealcase.Anideato
furtherenhancethefailoverperformanceincaseofverydynamicpoolsisthataPEgoingtoshut
downshortlyissuesashutdownsoonnotificationtoitsPUssomesecondsbeforeactuallygoingout
ofservice.ThisallowsforthePUstoselectanewPEandestablishaconnectiontoitinparallel,while
theoldPEisstillactive.Then,anewPEisalreadyinstand-bywhentheoldPEisactuallyfinishing
2Thelegendhasbeenomittedtoenhancereadability.Itisequaltothelegendoftheplotontheleft-handside.

188

CHAPTER9.RSERPOOLFAILURESCENARIORESULTS

Figure9.3:TheImpactofCleanShutdowns

itsshutdownbysettingthelastcheckpointandtearingdownthetransportconnection.

9.3TheHandlingofPoolElementFailures
AshortoverviewoftheimpactsofrealPEfailureshasalreadybeenpresentedinDreibholzand
Rathgeb(2005d).However,thissubjectinvolvesmultipleparameterstobetuned,thereforeitis
necessarytohaveamoredetailedlook.

9.3.1IntroducingServerFailures
ThefirstparametertobeevaluatedistheprobabilitythataPEperformsacleanshutdownbefore
disappearing(seealsosection8.3).SuchapropershutdownincludesaderegistrationatitsPR-H
andthetransmissionofastatecookieforeachsession.Incaseofanuncleanshutdown,thePE
simplydisappearswithoutnotification.Inordertoillustratetheessentialeffects,asimulationbased
onthescenariosetupdescribedinsection8.6hasbeenperformed.Figure9.3showstheresultsofthis
simulationwhichhasusedthefollowingparameters:
Asimulatedreal-timeof60minutes,
Anaverageinter-componentnetworkdelayof10ms,
Atargetsystemutilizationof60%,
APU:PEratioof10andarequestsize:PEcapacityratioof10,
AnaveragePEMTBFof5timestherequestsize:PEcapacityratio(i.e.50shere;negative
ution),distribxponentialeMAX-BAD-PE-REPORThasbeensetto1(tobeevaluatedlaterinsubsection9.3.4),

9.3.THEHANDLINGOFPOOLELEMENTFAILURES

189

Asessionkeep-aliveintervalof1s(tobeevaluatedindetailinsubsection9.3.2)andasession
keep-alivetimeoutof1s,
AnASAPEndpointKeep-Alivetransmissionintervalof1s(tobeevaluatedindetailinsubsec-
tion9.3.3)andanASAPEndpointKeep-AliveAckreceptiontimeoutof1sand
Acookie-basedfailoverhasbeenrealizedusingacookietransmissionintervalof10%ofthe
averagerequestsize(i.e.atmostabout10%oftheworkislostincaseofaPEfailure;tobe
evaluatedindetailinsubsection9.4.2).
Theright-handsideoffigure9.3showstheaveragerequesthandlingspeedforhavingvaried
theprobabilityofacleanshutdown,forthepoliciesLeastUsed,RoundRobinandRandom.As
secondparameter,thesessionkeep-aliveintervalhasbeenvaried.Thiskeep-aliveintervaldenotes
thetransmissionintervaloftheCalcAppKeepAlivemessages,asdefinedinsection8.3.Itisgivenas
fractionoftherequestsize:PEcapacityratio.Thatis,anintervalof0.1denotesasessionkeep-alive
intervalof10%oftherequestprocessingtimeiftherequestisprocessedexclusively.
Clearly,theresultsforthehandlingspeedareasexpected:thehigherthefractionofPEssilently
disappearing,thelowerthehandlingspeed.Furthermore,therankingofthethreepoliciesremains
asobservedinsection8.7.Thatis,LeastUsedisabletobetterdistributetheworkload,leadingto
abettercompensationforlostwork(e.g.ahandlingspeedofstillmorethan70%vs.about10%for
RoundRobinand5%forRandomat0%probabilityforacleanshutdown).Thiseffectcanbeverified
bytheresultsfortheaveragesystemutilizationshownontheleft-handsideoffigure9.3:while
theaveragesystemutilizationforLeastUsedrisesfrom60%(i.e.thedesiredtargetutilization)for
cleanshutdownstoonlyabout66%fornocleanshutdowns,theutilizationforRoundRobinreaches
80%,whiletheRandompolicyalreadyexceeds84%.Theslowerthehandlingspeed,thehigherthe
probabilitythataresumedsessionisinterruptedagain,themorecalculationsarelostandhavetobe
re-processed.Furthermore,thesessionkeep-alivemonitoringinterval(here:informoftheCalcAppKeepAlive
messagetransmission)isacrucialparameterforthehandlingspeed:atoohighvalue(e.g.10)extends
thetimethePUrequirestodetectthePEfailure.Thatis,whilethePUdoesnotcauseserverload
(theincreaseofthesystemutilizationremainssmall),thehandlingspeedisstronglyaffected.For
example,anintervalof10insteadof1decreasesthehandlingspeedofLeastUsedfrommorethan
65%tolessthan20%inthescenarioofnocleanshutdowns.
Insummary,ithasbeenobservedthattherankingofthepolicies(LeastUsedisbetterthanRound
RobinisbetterthanRandom)isalsovalidinfailurescenarios.Furthermore,themonitoringgranular-
ityofthesession(here:theCalcAppKeepAlives)iscrucialforthehandlingspeed,duetoitsinuence
onthetimelinessofthePEfailuredetection.

9.3.2SessionMonitoringbythePoolUser
TheprocessofmonitoringasessiondirectlyinuencesthespeedofdetectingaPEfailure(seealso
figure9.1).Arbitrarymechanismscanbeusedtoactuallyrealizesuchamonitoringfunctionality(e.g.
transactiontimeoutsorSCTPheartbeats);fortheRSPSIMsimulationmodelanditscalculationap-
plication,thismechanismisprovidedbythesessionkeep-alivemechanismusingCalcAppKeepAlive
messages(seesection8.3).Inordertopresentthegeneraleffectsintroducedbyavariationofthe
sessionkeep-aliveinterval,theresultsofanexamplesimulationarepresentedinfigure9.4.Thesim-
ulationhasbeenbasedonthescenariosetupdescribedinsection8.6,thefollowingparametersettings
used:beenevha

190

CHAPTER9.RSERPOOLFAILURESCENARIORESULTS

Figure9.4:TheImpactoftheSessionKeep-AliveIntervalforSessionMonitoring

Asimulatedreal-timeof60minutes,
Anaverageinter-componentnetworkdelayof10ms,
Atargetsystemutilizationof40%(inordertoillustrativelyshowtheeffectonthehandling
speed;foralargervalue,itwouldalreadybetoostrong),
APU:PEratioof10andarequestsize:PEcapacityratioof10,
AnaveragePEMTBFof5timestherequestsize:PEcapacityratio(i.e.50shere;negative
ution),distribxponentialeAprobabilityforacleanshutdownof0%,
MAX-BAD-PE-REPORT=∞(tobeevaluatedlaterinsubsection9.3.4),
Asessionkeep-alivetimeoutof1s,
AnASAPEndpointKeep-AliveAckreceptiontimeoutof1sand
Acookietransmissionintervalof10%oftheaveragerequestsize(tobeevaluatedindetailin
).9.4.2subsectionKeepAliThevefefectofmessages)varyingonthetheavsessionerageksystemeep-aliveutilization,interval(i.e.presentedtheonthetransmissionleft-handintervsidealofforfigureCalcApp-9.4,
isfairlysmall:neitherthesessionmonitoringgranularitynorthetransmissionintervaloftheASAP
Endpointutilization.KTheeep-Alivreasoneismessagesobvious:fromaPUthewPR-HsaitingtoforthefailurePEshavedetectionasignificantdoesnotinuenceconsumeonPEthecapacitysystem.
Onlytherankingofthepoliciescanbeobserved,i.e.whileLeastUsedachievesanutilizationof44%,
theutilizationofRoundRobinisabout45.5%andabout47%fortheRandompolicy.Again,the
LeastUsedpolicyachievesabetterloaddistribution,reducingtheamountoflostworkandtherefore
theneedforanextensivere-processingeffort.

9.3.THEHANDLINGOFPOOLELEMENTFAILURES

191

Whiletheresultsfortheaveragehandlingspeed,shownontheright-handsideoffigure9.4,
obviouslyreecttherankingofthepolicies,asignificantperformancelossisinducedbyalarger
sessionmonitoringgranularity.Clearly,themoretimeaPUrequirestodetectaPEfailure,themore
timeisneededtohandlearequest.
inuenceHavingcanabelookatobservtheed.ThistransmissionparameterintervalthereforeoftheneedsASAPaKmoreeep-Alidetailedveevmessages,aluation,onlyawhichverywillsmallbe
presentedinthefollowingsubsection9.3.3.
RSerPoolInsummarysystem:,thethequickfrequenteraPEmonitoringfailureofisadetected,sessionistheeshorterxtremelythefcrucialailoverfordelaythe.Note,performancethatwhileofa
sendingapplicationkeep-aliprotocolveitselfmessagesmayisproavidecommonfeaturestomechanismrealizetorealizemonitoringsuchamuchmoremonitoringefficientlyfunctionality.Consider,the
forexampleadatabasetransactionapplicationorthesimpledownloadofwebpages.Here,monitoring
couldbesimplyrealizedasatimeoutforissuedtransactionsordownloadrequests.Thatis,monitoring
doesnotnecessarilyimposeanyadditionaltrafficonthenetworkforsuchapplications.
healthWhileofPEs:thethemonitoringASAPofEndpointsessionsKiseep-Aliavmust,emonitoring.RSerPoolButproisvidesthisanotheradditionalfeaturetomonitoringobservefeaturethe
required?really

9.3.3PoolElementMonitoringbytheHomeRegistrar
ThePEmonitoringusingASAPEndpointKeep-Alivemessages(seesubsubsection3.7.1.3),per-
formedbyaPEsPR-H,isusedtoensurethatthehandlespacecontent(i.e.thesetofusablePEs)
reectsrealitywithahighprobability.Whilethesessionkeep-alivemonitoringgranularity(asevalu-
atedinsubsection9.3.2)reducesthetimeaPUrequirestodetectaPEfailure,theEndpointKeep-Alive
monitoringintervalreducesthestartuptime,i.e.thetimerequiredtostartorresumearequestona
selectedPE(seefigure9.1).Thestartuptimebecomesnegligibleforalargerequestsize:PEcapac-
ityratio(i.e.theprocessingtimedominatesthestartuptime;seealsotheresultsofthesimulationin
subsection9.3.2).Butastherequestsizegetssmaller,afaststartupgainsincreasingimportance.

9.3.3.1GeneralEffectsoftheEndpointKeep-AliveMonitoring
Tosimulationillustratearethepresentedinuenceinofthefigure9.5Endpoint.ThisKeep-Alisimulationvehastransmissionbeenbasedintervonal,thethescenarioresultsofsetupanedescribedxample
insection8.6;theparametersettingshavebeenasfollows:
Asimulatedreal-timeof60minutes,
Anaverageinter-componentnetworkdelayof10ms,
Atargetsystemutilizationof25%(inordertohavesufficientover-capacitytoillustratively
fects),efthepresentAPU:PEratioof10andarequestsize:PEcapacityratioof1,
AnnentialaveragedistribPEution),MTBFof2,5and10timestherequestsize:PEcapacityratio(negativeexpo-
Aprobabilityforacleanshutdownof0%,
MAX-BAD-PE-REPORT=∞(tobeevaluatedlaterinsubsection9.3.4),

192

CHAPTER9.RSERPOOLFAILURESCENARIORESULTS

Figure9.5:TheImpactoftheEndpointKeep-AliveIntervalforPoolElementMonitoring

Asessionkeep-aliveintervalof1sandasessionkeep-alivetimeoutof1s,
AnASAPEndpointKeep-AliveAckreceptiontimeoutof1sand
Acookietransmissionintervalof10%oftheaveragerequestsize(tobeevaluatedindetailin
).9.4.2subsectionAgain,Thetheresultsrankingforofthetheavpolerageiciessystem(LeastUsedutilizationisarebetterthanpresentedRoundontheRobinleft-handisbettersidethanoffigureRandom),9.5.
asalreadyobservedforthesimulationsoftheprevioussubsections,canbenoticed.Thesmaller
theaverageMTBFandtheworsetheloaddistributionqualityofthepolicy,thehigherthesystem
Kutilizationeep-Aliveduetotransmissionthenecessityintervaltobecomesre-processlostinsignificantwork.iftheClearlyav,eragethePEimpactMTBFoftheissufASAPficientlyEndpointhigh
(here:5or10timestherequestsize:PEcapacityratio).
However,asignificantimpactontheutilizationsoftheRoundRobinandRandompoliciescan
beobservedforanaverageMTBFof2timestherequestsize:PEcapacityratio:theutilizationis
fractionsinkingofwiththeirarisingruntimewmonitoringaitingforinterval.serviceThestartup:reasontheforthisprobabilityeffectisofthatthetheRoundPUsRobinspendaandsignificantRandom
policiestobeaffectedbyaPEfailureissignificantlyhigherthanforLeastUsed(sincetherequest
processingtakeslongerduetonon-optimaldistribution).So,afterdetectingaPEfailure,therewillbe
asecondstartupdelayforcontactinganewlychosenPE.IfthenewPEisstillinservice,itwillstart
re-processing(i.e.theutilizationisincreased).Clearly,theprobabilityforthiscaseismuchhigherif
thePRsmonitoringintervalissmall.Otherwise,thenewPEmaybealreadyoutofservice.Then,the
PUwaitsforitssessionkeep-alivetimeout(nocapacityisconsumed,i.e.lowutilization)andfinally
ain.agtriesTheeffectsobservedfortheutilizationarereectedbytheresultsfortheaveragehandlingspeed,
shownontheright-handsideoffigure9.5:thehandlingspeeddecreaseswiththeASAPEndpoint
Keep-Alivetransmissioninterval.ThelongerittakestodetectaPEfailure,thehighertheresulting
startuptime.Thisdirectlyimpliesanincreasedhandlingtimeandthereforeadecreasedhandling
speed.Furthermore,itcanbeobservedthatthehandlingspeedofRoundRobinconvergesintothe

9.3.THEHANDLINGOFPOOLELEMENTFAILURES

193

directionoftheRandomspeed(becausethereisnostablelisttoselectfrom)–asalreadybeing
observedandexplainedfortheresultsinsection9.2.
Insummary,theEndpointKeep-Alivemechanismisclearlynecessaryiftherequestsize:PEcapac-
ityratioissmallandthereforetheselectionofanunavailablePEsignificantlydecreasesthehandling
speedduetoanincreasedservicestartuptime.

9.3.3.2HowCostlyistheEndpointKeep-AliveMonitoring?
Obviously,thePR-H-basedmonitoringcausesnetworkoverhead–eachASAPEndpointKeep-Alive
messagealsohastobeansweredbyanASAPEndpointKeep-AliveAck.Furthermore,whilethe
sessionkeep-alivemonitoringmayutilizefeaturesoftheApplicationLayerprotocol,asmallermoni-
toringintervalforthePR-HdirectlyimpliesadditionalASAPoverheadtraffic.
EndpointTheperKeep-Ali-secondvecostfortransmissionthemonitoringinterval.Itofacanpoolbeclearlydescribedbydependstheonfollothewingsizeofformula:thepoolandthe
1cEKAMonitoring(PEs,EKAInterval)=2∗cPacket∗EKAInterval∗PEs
PEperCostsKeep-AliveandAcknowledgement
Incost.thisSinceformula,eachEndpointEKAIntervKaleep-Alidenotesvethealsohastransmissiontobeansweredintervalby(inanseconds)ASAPandEndpointcPackeKttheeep-Aliperve-packAcket
message,thiscostfactorisdoubled.
Thesize(andthereforethebandwidthcost)ofanEndpointKeep-AliveorEndpointKeep-Alive
AckmessagedependsonthesizeofthepoolsPH(seesubsubsection3.9.2.2foradetaileddescription
ofthemessagecontents,aswellasfigureA.1andfigureA.2forillustrationsofthemessageformats).
limit),Assumingtheansizeupperofeachlimitofofthe32Kbyteseep-AliforvaePHpack(seeettypessubsectioncouldbe4.4.6inforthearangedetailedofdisc80toussion100ofbytesthis
(includingIPheaderandSCTPoverhead).UndertheassumptionthatPEschooseanearbyPR-H
(e.g.byautomaticconfigurationasdescribedinsubsubsection3.7.1.1),theactualdistancebetween
PR-HandPEshouldusuallybesmall(seealsosubsubsection8.10.2.2foradetailedreasoningofthis
trafficassumption).keepslocalTherefore,andinthereforeabecomeswell-designedratherRSineerPoolxpensivnetwe.ork,thisusuallymeansthatmonitoring
Inreality,however,therearescenarioswhicharedifferentfromtheusualcase.FortheEndpoint
Keep-Alivetraffic,suchanexceptioncouldmeanthatthetransportbecomescostly(e.g.ifthemoni-
toringintervalhastobeverysmall,orifbandwidthisscarce).So,isthereapossibilitytoreducethe
ASAPmonitoringoverheadwhilekeepingthehandlespacecontentup-to-date?

9.3.4ReducingthePoolElementMonitoringOverhead
Asithasbeenshowninsubsection9.3.2,aPUmustmonitoritssessionwithaPEinordertodetect
aPEfailure.Thatis,ifthePEisalreadymonitoredbyitsPUsanyway,aPRmayutilizethisfeature
tosaveonitsownASAPbandwidth:PUscanreportPEfailurestotheirPRusinganASAPEndpoint
Unreachablemessage(seealsosubsubsection3.9.3.2).APRcountsthenumberoffailurereportsfor
eachPE;ifthethresholdMAX-BAD-PE-REPORTisreached(seesubsubsection3.7.1.4;thedefault
valueis3),thePEisremovedfromthehandlespace.
Toshowtheeffectivenessandessentialeffectsofthefailurereportingmechanism,theresultsofan
examplesimulationarepresentedinfigure9.6.Thissimulationhasbeenbasedagainonthescenario
setupdescribedinsection8.6andthefollowingparametershavebeenused:

194

CHAPTER9.RSERPOOLFAILURESCENARIORESULTS

Figure9.6:UtilizingFailureReportsfromPoolUsersforPoolElementMonitoring

Asimulatedreal-timeof60minutes,
Anaverageinter-componentnetworkdelayof10ms,
Atargetsystemutilizationof25%(fortheresultstobecomparabletothesimulationinsubsec-
),9.3.3tionAPU:PEratioof10andarequestsize:PEcapacityratioof1,
AnaveragePEMTBFof5timestherequestsize:PEcapacityratio(negativeexponentialdistri-
ution),bAprobabilityforacleanshutdownof0%,
Asessionkeep-aliveintervalof1sandasessionkeep-alivetimeoutof1s,
AnASAPEndpointKeep-AliveAckreceptiontimeoutof1sand
Acookietransmissionintervalof10%oftheaveragerequestsize(tobeevaluatedindetailin
).9.4.2subsectionClearly,theresultsfortheutilization–shownontheleft-handsideoffigure9.6–arenotvery
nosurprising:significantwhileimpactthepolicintroduceydrankingbyastheevxplainedariationsinofMAX-Bsubsection9.3.1AD-PE-REPORcanbeTobservandedtheagain,ASAPthereEnd-is
pointKeep-Alivetransmissioninterval.Thiscoincideswiththeexpectationsfromthesimulationof
subsection9.3.3forthenon-criticalMTBFsetting(here:5timestherequestsize:PEcapacityratio).
speed,Hoaswevshoer,wntheonsettingtheofright-handMAX-BsideofAD-PE-RfigureEPOR9.6:ThaswhileaMAX-BsignificantimpactAD-PE-REPORontheTavhaseragenoimpacthandlingif
theASAPEndpointKeep-Aliveintervalislow(here:1s),ahighersettingofMAX-BAD-PE-REPORT
incombinationwithalongermonitoringintervalcausesasignificantperformancedrop.Forexample,
theLeastUsedhandlingspeedsinksfromabout41%atMAX-BAD-PE-REPORT=1tolessthat12%
T=10.AD-PE-REPORMAX-Bat

9.4.THEEVALUATIONOFSESSIONFAILOVERMECHANISMS

195

Butcomparingtheaveragehandlingspeedresultsformonitoringintervalsof30sand1satMAX-
BAD-PE-REPORT=1,onlysmalldifferencescanbenoticed:41%vs.44%forLeastUsed,27%
vs.28%forRoundRobinand18%vs.21%fortheRandompolicy.Thatis,althoughtheASAP
monitoringtraffichasbeenreducedto301th,thereisonlyasmalldropinperformance.
Therefore,usingMAX-BAD-PE-REPORTcangreatlyreducethenumberofASAPmonitoring
packets.Inparticular,iftheapplicationprotocolalreadyincludesfeaturesforsessionmonitoring(e.g.
transactiontimeouts),thePU-basedmonitoringmayactuallycomeforfree.However,itwouldbea
fallacytogenerallyrecommendusingMAX-BAD-PE-REPORTinfavourofASAP-basedendpoint
monitoring:thefailurereportsgivePUsthepowertoimpeachPEsfromthehandlespace.Thisis
clearlyamajorsecuritythreatinscenariosofnon-trustworthyPUs.Therefore,itisnecessaryto
carefullyplantheusageoffailurereportsforPEmonitoring.

9.4TheEvaluationofSessionFailoverMechanisms
AfterhavingevaluatedthemechanismstodetectaPEfailureintheprevioussection,twoimportant
mechanismsforthesessionfailoverareanalysed:theabort-and-restartprincipleandclient-basedstate
sharing.

RestartandAbort9.4.1Clearly,theeasiestmechanismforasessionfailoveristosimplyrestartthesession.Thismecha-
nismisthereforedenotedasAbortandRestartprinciple.Themostimportantapplicationusingthis
themechanismabortandistherestartdownloadapproachofwebsimplypagesrestartsusingathesessionHTTPfromprotocolscratch,(seeitsFieldingperformanceetal.(is1999ob)).viouslySince
dependingontherequestsize:PEcapacityratio(i.e.beingproportionaltotheamountofworklostin
caseToofashoPEwftheailure)essentialandofeffectscourseofontheapplyingaverageabortandMTBFofrestartthe,thePEs.resultsofanexamplesimulation
arepresentedinfigure9.7.Again,thesimulationhasbeenbasedonthescenariosetupdescribedin
section8.6,havingusedthefollowingparameters:
Asimulatedreal-timeof60minutes,
Anaverageinter-componentnetworkdelayof10ms,
Atargetsystemutilizationof25%(tohaveover-capacityforthehandlingofPEfailures),
APU:PEratioof10,
AnaveragePEMTBFof2,5and100timesthetimerequiredtoexclusivelyprocessare-
questhavingarequestsize:PEcapacityof10(here:20s,50sand1000s;negativeexponential
ution),distribAprobabilityforacleanshutdownof0%,
MAX-BAD-PE-REPORThasbeensetto1,
Asessionkeep-aliveintervalof1sandasessionkeep-alivetimeoutof1s,
AnASAPEndpointKeep-Alivetransmissionintervalof1sandanASAPEndpointKeep-Alive
1s.oftimeoutreceptionAck

196

CHAPTER9.RSERPOOLFAILURESCENARIORESULTS

Figure9.7:UsingtheAbortandRestartPrinciplefortheSessionFailover

Asexpected,theimpactofusingabortandrestartontheaveragesystemutilization(shown
ontheleft-handsideoffigure9.7)issmall,aslongasthePEsaresufficientlyavailable(i.e.the
MTBFishighenough):foraMTBFof100timesthetimerequiredtoexclusivelyprocessarequest
havingarequestsize:PEcapacityof10,theutilizationincrementisalmostinvisible.Furthermore,the
decrementofthehandlingspeed(presentedontheright-handsideoffigure9.7)isalsosmall.Clearly,
therarenecessitytorestartasessionhasnosignificantimpactonthesystemperformance.
However,forasufficientlysmallMTBF,theresultschange:ataMTBFof5,asignificantutiliza-
tionrise–aswellasahandlingspeeddecrease–canbeobservedfortheRoundRobinandRandom
policiesiftherequestsize:PEcapacityratioisincreased.TheeffectontheLeastUsedpolicyis
smaller:asithasbeenexplainedfortheworkloadparameterevaluationsinsection8.7andasitis
expectedfromthedynamicpoolperformanceresultsofsection9.2,thispolicyisabletoprovidea
betterprocessingspeedduetosuperiorrequestloaddistribution.Thatis,theprobabilityforarequest
ofafixedsizetobeaffectedbyPEfailuresissmallerifusingLeastUsedinsteadofRoundRobin
andRandom.NotefurthermorethattheutilizationofLeastUsedalmostreaches95%foraMTBF
of2andlargerrequestsize:PEcapacityratios,duetoitsbetterloaddistributioncapabilities.Inthe
samesituation–i.e.highoverload,ofcourse–theRoundRobinandRandompoliciesonlyachieve
anutilizationoflessthan85%.
Insummary,whiletheabort-and-restartmechanismisfairlysimpleandusefulincaseofshort
transactionsandsufficientlyavailableservers,itisobviousthatthelongerthedurationoftherequests
beingprocessed,themoreinefficientisafailoverusingtheabort-and-restartapproach.Insuchcases,
itisinsteadusefultodefinecheckpointsandallowforthesessiontoberesumedfromthelatest
checkpoint.

SharingStateClient-Based9.4.2Client-basedstatesharing,usingstatecookiesasintroducedinsubsubsection3.9.5.2,providesasim-
plemechanismforaservertosetcheckpointstorecoverthesessionstatefrombysendingthestate
informofastatecookietotheclientside.Clearly,theintervalinwhichcheckpointsareset(and
thereforecookiesaretransmitted)limitstheeffortofre-processing.

9.4.THEEVALUATIONOFSESSIONFAILOVERMECHANISMS

Figure9.8:UsingStateCookiesfortheSessionFailover

197

9.4.2.1GeneralEffectsofaCookie-BasedFailover
Inordertopresenttheessentialeffectsofapplyingclient-basedstatesharingforsessionfailover,the
resultsofanexamplesimulationarepresentedinfigure9.8.Asfortheprevioussimulations,ithas
againbeenbasedonthescenariosetupdescribedinsection8.6andhasusedthefollowingparameter
settings:Asimulatedreal-timeof60minutes,
Anaverageinter-componentnetworkdelayof10ms,
Atargetsystemutilizationof60%,
APU:PEratioof10andarequestsize:PEcapacityratioof10,
AnaveragePEMTBFof2,5and10timestherequestsize:PEcapacityratio(negativeexpo-
ution),distribnentialAprobabilityforacleanshutdownof0%,
MAX-BAD-PE-REPORThasbeensetto1,
Asessionkeep-aliveintervalof1sandasessionkeep-alivetimeoutof1s,
AnASAPEndpointKeep-Alivetransmissionintervalof1sandanASAPEndpointKeep-Alive
1s.oftimeoutreceptionAckaverageHavingrequestalookathandlingtheresultsspeedforthe(right-handaveragepartofsystemfigureutilization9.8),thee(left-handxpectedpartefoffectscanfigurebe9.8)observanded:the
thelessfrequentthetransmissionofcookies,thehigherthesystemutilizationduetothere-processing
oflostworkandthelowertherequesthandlingspeed.Again,therankingofthepoliciesasexplained
ineterrangessubsection(here:9.3.1aissmallshown:MTBFtheandLeastaUsedhighpoliccookieyisintervclearlyal).betterDuetoableitstobettercopewithrequestcriticaldistribparam-ution

198

CHAPTER9.RSERPOOLFAILURESCENARIORESULTS

Figure9.9:TheNumberofStateCookiesfortheSessionFailover

capabilities,theresultinghandlingspeedishigher.ThisreducestheprobabilitythatforafixedMTBF
arequestisactuallyaffectedbyafailure.
requestThe(onlymostfortheinterestingMTBFcurvsetestoare2timespresentedtheinrequestfigure9.9size:PE;theyshocapacitywtheratio;numbertheofresultscookiesforotherper
MTBFnumberofsettingscookiesareaperlmostrequestsimilarwithandthehavesystembeenomittedperformance,toitenhanceisclearlyreadability).visiblethataComparingtoosmallthe
cookieintervalprovidesnobenefitandinsteadonlycausesasignificantnumberofoverheadpackets.
ofSo,theserviceresultingresiliencequestioninPEfclearlyailureis:hoscenarios?wtosetthecookieintervalwhilemaintainingacertainlevel

9.4.2.2HowCostlyisaStateCookie?
Thecostoftransportingastatecookieclearlydependsonthecookiesizeandthereforeontheap-
plicationstoringitsstateintothecookie.Inordertogiveanestimateofrealisticcookiesizes,some
shortly:analysedareapplicationsForthefractalgraphicsapplicationoftheRSPLIBdemonstrationsystem(seesection5.7),the
statecookiecontainstheimageandcalculationparameters,thelatestcalculationpositionand
someidentificationinformationtoverifythevalidityofthecookie.Theresultingsizeisabout
bytes.100Adownloadapplicationcouldstoreafilename(e.g.aboutupto250bytesforafileinadeep
directoryhierarchy),someuserauthenticationinformation(e.g.ausernameandpassword),a
cookiesignatureforsecurity(e.g.16bytesforaSHA-256-basedauthenticationcode)andof
coursethelatestfileposition.Therefore,cookiesizesof300orevenmorebytesarerealistic.
Dreibholz(2002)proposestousestatecookiesinane-shopsystem.Inthiscase,thecookie
containsapossiblylongorderlist,authenticationinformationandvariousotherdatarelatedto
thecustomermanagement.Cookiesizesofuptoafewthousandsofbytesseemtoberealistic
here.

9.4.THEEVALUATIONOFSESSIONFAILOVERMECHANISMS

199

Inadditiontotheactualpayloadsizeofthecookie,theoverheadfortheASAPmessagestructure
(seefigureA.12foranillustration)hastobeadded.DependingontheSCTPprimarypathsMaximum
TransmissionUnit(MTU),theASAPCookiemessagemayhavetobesegmented(bytheSCTP
protocol)intoseveralSCTPpackets,allincludinganIPv4orIPv6header.Theresultingcostsfora
cookiearethereforeafunctionofthecookiesizeitself,thepathMTUandtheSCTPandNetwork
erhead.voLayer

9.4.2.3LimitingtheAmountofCookieTraffic
Asalreadyshowninsubsubsection9.4.2.1,theperformanceofclient-basedstatesharingdependson
thepoolpolicy.Thatis,sendingcookiesinacertaintimeintervalresultsindifferentperformance
afresultsfectedforbyaeachfailurepolicy.thanfUsingortheusingLeastRoundUsedRobinpolicyor,aRandom,requestisdue–fortoathegifvactenthatMTBFthe–lesshandlingfrequentlyspeed
toofthespecifyLeastitinUsedtermspolicofyisbettcalculationser.To(multiplesrecommendofatheavcookieeragerequesttransmissionsize)intervratheral,itthanistime.thereforeGivenusefulthe
plecookieoftheavtransmissioneragerequeintervstalsize;seeparameteralsoCosectionokieMax8.3),Cthealcauveragelationsloss(inoftheunitcalculationsof(andcalculationsthereforeasmulti-the
re-processingeffort)perfailure–denotedasAverageLossPerFailure–canbecalculatedasfollows:
AverageLossPerFailure=CookieMaxCalculations.(9.1)
2ofThathalfis,ofthetheservavereragefailsinrequestthesize,middletheofacalculationscookieofinterv25%alonofavanaerage.verageForerequestxamplesizeusingarelostanintervuponala
failoverandhavetobere-processed.
KnowingtheaveragelossperfailureaswellastheMTBFandaveragePEcapacity,thegoodput
ratioofallperformedcalculations–i.e.thepercentageofcalculationsnotbeinglostduetofailures–
canbecalculatedasfollows:
GoodputRatio=(MTBF∗AverageCapacity)−AverageLossPerFailure.(9.2)
MTBF∗AverageCapacity
CoUsingokieMaxCalequationculation9.1sand(inunitsequationof9.2,itcalculationsispossiblegiventoascomputemultipleofthetheacookieveragetransmissionrequestsize)intervnec-al
essarytoreachadesiredgoodputratio(e.g.95%):
CookieMaxCalculations=−2∗MTBF∗AverageCapacity∗(GoodputRatio−1).(9.3)
1ClearlyA,plottheeshowingxpectationstheofcookiestheinitialperrequestsimulation(i.e.inCookieMsubsubsectionaxCalculations9.4.2.1)isaremet:presentedeveninforfigureaMTBF9.10.
ofaslowas2timestherequestsize:PEcapacityratio,about10cookiesperrequestaresufficientto
reachagoodputratioof97.5%.However,tryingtoonlyslightlyimprovethegoodputratioresultsin
avastincreaseinthenumberofcookies.
PETheMTBFandresultingcalculategeneraltheguidelinecookietointervsetaltheusingcookieequationtransmission9.3foraintervalreasonableisthereforegoodputtoratioestimateinthethe
97.5%.aboutofrange

Summary9.4.2.4Insummary,ithasbeenshownthattheclient-basedstatesharingapproachprovidesanefficientmech-
whileanismaforfgoodailoversystem.Ifusedperformancewithanisachieappropriateved.cookietransmissioninterval,theoverheadremainslow

200

CHAPTER9.RSERPOOLFAILURESCENARIORESULTS

Figure9.10:TheNumberofCookiesperRequest

9.5TheHandlingofRegistrarFailures

ThefailureofPRsaffectsboth,itsPEsandPUs.Intheusualcase,bothcomponentsdetectthefailure
ofaPRandsimplyselectanotherone.Here,theautomaticconfigurationfeatureofRSerPoolbecomes
important:PUsandPEscanmaintainanup-to-datelistofPRsbylisteningtotheirannounces.Ifthe
selectionofanewPRisrequired,itisquicklypossibletoconnecttoanotheroneandcontinuenormal
operation.DetailedevaluationsofthePRhuntmechanismcanbefoundinUyaretal.(2004),Uyar,
Zheng,FeckoandSamtani(2003b,a).Inparticular,thesepapersanalysetheperformanceofPR
changesinvariousscenarios.Insummary,itisshownthatthemechanismisquiteeffectiveandPR
failures–whichcanbeassumedtobesignificantlylessfrequentthanPEfailures–donotimposea
significantperformancelossintheusualcase.
However,specialcasesmayoccur.Inparticular,thefollowingscenariosneedamoredetailed
aluation:ve

ENRPsauditingandsynchronizationfeaturesincaseofPRshavingverydifferentviewsofthe
handlespace(e.g.duetonetworkproblemslikecongestionforENRPtraffic);

Theeffectsoftakeoversonthesystemperformanceand

Theseparationandreunionofnetworkparts.

Atfirstsight,thenumberofENRPitemstobeevaluatedlookssmall.Butduetothedependency
onothersystemparameters,thereareasignificantnumberofcaseswhichhavetobeevaluatedin
ordertocompletelycoverthetopic.Forexample,atakeovernegotiationcouldfailforsomereason.
Inthiscase,thePEsthemselvesdetecttheirPR-Hfailureattheirnextre-registration(parameter:re-
registrationinterval)orderegistration.Ifthepolicyisadaptive,are-registrationisalsoappliedon

YSUMMAR9.6.

201

policyinformationupdate(parameter:choiceofpolicy).Nevertheless,thePEsofthefailedPRstill
remaininthehandlespaceandarefoundandusablebyPUs.PUsareaffectedbythePRfailureonly
ifPEsrameter:oftheMAX-BunavailableAD-PE-REPORPRalsofT)ail.orInfinallythiscase,uponethexpiryPEsofmaytheirberemoLifetimevedExpiryuponfTaiimerlurereports(parameter:(pa-
life).gistrationreDuetospaceandtimelimitationsforthisthesis,itisnotpossibletocoverallcaseswhichwould
clearlybenecessaryforanexpressiveanalysisoftheENRPtopic.Therefore,theseitemswillbethe
subjectoffutureworkonthetopicofRSerPool–tobeprocessedinthecontinuationofourRSerPool
project.

Summary9.6

Inthischapter,failurescenarioshavebeenevaluatedwiththefocusonPEfailures.Theperformance
ofhandlingaserverfailureisinuencedbytwofactors:

ThetimerequiredtodetectaPEfailure(andtofindandcontactanewPE),aswellas

Theefforttore-processlostworkusinganappropriatesessionfailovermechanism.

Onthesubjectofthefirstitem,ithasbeenshownthatRSerPoolsystemsincludeefficientfeaturesto
detectPEfailures.Afterthat,twoimportantsessionfailovermechanismshavebeenevaluated:the
abortandrestartprincipleandtheclient-basedstatesharing.

202

CHAPTER

9.

RSERPOOL

AILUREF

ARIOSCEN

RESULTS

10Chapter

OutlookandConclusion

INALLY,intheverylastchapterofthisthesis,theachievedgoalsandresults–concerning
handlespacemanagement,performanceevaluations,theprototypeimplementationandstan-
Fdardization–aresummarizedinthefirstpart.Afterthat,anoutlooktointerestingfuturework
andopenissuesstilltobeevaluatedispresented.

10.1AchievedGoalsandObtainedResults

10.1.1ThePrototypeImplementationandtheSimulationModel
Asfirstgoalofthisthesis,theOpenSourceRSerPoolprototypeimplementationRSPLIB(seechap-
ter5)hasbeendesignedandrealized.InordertosystematicallyevaluatetheperformanceofRSerPool
systemsinareasonablemanner,alsotheRSPSIMsimulationmodelhasbeencreated(seechapter6).
Furthermore,agenericapplicationmodelusedforboth,theprototypeimplementationandthesim-
ulationmodel,hasbeendeveloped(seesection8.3).Basedonthisapplicationmodel,reasonable
performancemetricsforboth,theserviceprovidersperspective(systemutilization)andtheusers
perspective(requesthandlingspeed),havebeendefined(seesection8.5)inordertoevaluateRSer-
Poolsystemsandproposedimprovements.
Butfirst,ithasbeenfoundnecessarytodesignanefficienthandlespacemanagementcomponent
asafoundationofboth,theRSPLIBprototypeandtheRSPSIMsimulationmodel.

ManagementHandlespaceThe10.1.2Thedevelopedhandlespacemanagementapproachreducestheeffortofmaintainingahandlespace
tothemanagementofsortedsets,wherePEstructuresarelinkedaccordingtocertainsortingorders
(seesection4.4).Poolpoliciescanberealizedeasilybydefiningappropriatesortingordersand
selectionprocedures.Byperformanceevaluationsusingaruntime-basedperformancemetric,ithas
beenshowninchapter7thattheproposedhandlespacemanagementapproachisevenscalabletolarge
poolsofupto100,000PEs–withmoderateCPUpowerrequirements.Crucialfortheefficiencyof
thehandlespacemanagementisthedatastructurewhichisactuallyusedtorealizethesets.Ithasbeen
shownthatred-blacktreesarethemostsuitablestructure,whilesimplebinarytreescandegenerate
andleadtoanevenworseperformancethantakingthena¨ıveapproachofusinglinearlists.
Usingthenew,sophisticatedhandlespacemanagementapproach,theRSPSIMsimulationmodel
hasbecomecapableofefficientlyhandlingthescenariosnecessarytoperformreasonableevaluations
performance.systemRSerPooltheof

203

204

OUTLOOKANDCONCLUSION10.CHAPTER

10.1.3ThePerformanceofRSerPoolSystems
ThefirstpartoftheRSerPoolsystemperformanceevaluationshasfocusedonfailure-freescenarios,
whichare(hopefully)theusualoperatingconditionasystemisworkinginformostofitsruntime.
Therefore,performanceinthissituationbecomescrucialforthecost-benefitratiooftheprovided
service.

arametersPorkloadW10.1.3.1Usingagenericapplicationmodelandwell-definedperformancemetrics,ithasfirstbeenusefulto
evaluatethethreebasicworkloadparameters:thenumberratioofPUstoPEs(PU:PEratio),the
requestintervalandtherequestsize.GivenatargetsystemutilizationandPEcapacities,anyone
ofthethreeworkloadparameterscanbecalculatedifvaluesfortheothertwoareprovided(see
equation8.5insubsection8.7.1).
ThePU:PEratioisthemostcriticalworkloadparameter(seesection8.7):asmallvaluemeans
ahighper-PUload.Appropriatelyschedulingtherequestsinthiscaseismostdifficultandinap-
propriatePEselectionsleadtoareducedsystemutilizationandhandlingspeed.Varyingtherequest
sizetogetherwiththePU:PEratio,itcanbeobservedthattheutilizationforlongerrequestsisworse
(sinceitisevenmoredifficulttoachieveagoodscheduling),butthehandlingspeedincreases(since
asinglebutlongrequestisaffectedbyqueuingdelayonlyonce,whilensmallrequestsareaffectedn
times).Theperformanceresultsforthetwootherworkloadparametercombinations(requestsize/re-
questintervalandrequestinterval/PU:PEratio)canbededuced.
AgeneralobservationonthepoolpoliciesisthattheLeastUsedpolicyprovidesthebestperfor-
mance,inparticulariftheparametersareinacriticalrange–duetoitsknowledgeaboutPEstates
(adaptivepolicy).Non-adaptivepolicieshavealowerperformance:sinceRoundRobintriestoselect
PEsinturn,i.e.therecentlyselectedPEwillbechosenagainafterallotherPEshavebeenused,its
performanceisusuallybetterthansimplyapplyingRandomselection.However,incriticalparameter
ranges,theperformanceofRoundRobintendstoconvergetotheRandomresults.

10.1.3.2FallaciesandPitfallsoftheRoundRobinPolicies
WhileRoundRobinusuallyachievesabetterperformancethanRandom,somefallaciesandpitfalls
havebeendiscoveredfortheRound-Robin-basedpolicies(seesection8.8):aPRreplyingwithmulti-
plePEidentitiesmaynotadvancetheRoundRobinlistpointerbymorethanoneentry.Otherwise,PEs
maybesystematicallyskipped–leadingtoasevereperformancedegradation.UsingtheWeighted
RoundRobinpolicy,theproblemofnothavingthepossibilitytospecifynon-integerweightsaddsa
furtherproblem.ExpandingweightsleadstoanevenworseperformancethansimplyusingWeighted
Randomselection.Theguidelineforthispolicyisthereforenottouseit–exceptinspecial,carefully
cases.planned

10.1.3.3CopingwithNetworkDelay
Networkdelaybecomescrucialifitislargeincomparisontotherequestprocessingtime:inthis
case,thedelayseverelyaffectsthehandlingspeed.Inordertocopewiththisproblem,twodistance-
sensitivepolicieshavebeendefinedbyaddingadelaymeasurementfunctionalitytothePR,based
ontheRTTalreadyobtainedbytheSCTPprotocol(seesubsection8.10.2).Whiletheabilityofthe
beenshodistance-sensitiwnforvetheWeighteddistance-sensitiRandomvepolicLeastytoUsedcopepolicwithythatlocalizeditisalsodisasterabletoscenarioshandleissuchlimited,scenariosithas

10.1.ACHIEVEDGOALSANDOBTAINEDRESULTS

205

(seesubsection8.10.2).Afterthat,theapplicabilityandusefulnessofthisnewLeastUsedpolicy
hasbeensuccessfullyvalidatedintherealInternetbymeasurementsusingtheRSPLIBprototypeina
PLANETLABscenario(seesubsection8.10.5).

10.1.3.4MakingUseofthePU-SideCache
ExceptfortheRandompolicy,usingthePU-sidecacheresultsinadecreasedsystemperformance.
Whilethereforethegeneralguidelineistoavoiditatall(exceptfortheRandompolicy,ofcourse),
thecachebecomesveryusefulifthestartuptimeofrequests(i.e.thetimebetweenstartingahandle
overallresolutionhandlingandfinallytime.Thegettingtheusefulnessrequestoftheacceptedcachebyinasuchservaer)situationhasahassignificantbeencontribdemonstratedutiontobythean
examplesimulation(seesubsection8.11.2).

10.1.3.5ScenariosofHeterogeneousServerCapacities
Usingheterogeneousservercapacities(seesubsection8.12.1),theRoundRobinandRandompolicies
theclearlydifferentachieveacapacities.lowTheperformance,WeightedsincetheRandomserverpolicyselectioncancopeforwiththesesuchpoliciesscenarioshasnoifknothewledgePU:PEaboutratio
ishandlinghighenough.speed.HoFurweverthermore,,theitisperformancealsoableoftoLeasttakeadvUsed–antageofespeciallyveryfforastthePEstohandlingincreasespeedtheincriticalrequest
parameterranges–issignificantlybetter.SinceLeastUsedselectsPEsbasedonthecurrentserver
load,thereisaslightperformancedegradationifthescenariobecomesveryheterogeneous.Inthis
case,itispossibletoselectaveryslowPEifonlyitsloadstateissufficientlylow.
pacityTheislatterincreasedobserv(seeationhassubsectionledto8.12.2the),folloinwingparticularquestion:ifwhatcontaininghappensoneorinmorescenariosdedicatedwherepothewerfulca-
eservxceededers?FeorventhebyLeastWeightedUsedpolicRandomy,itifhasthebeenscenarioshownthatcontainsthesufrequestficientlyfhandlingastservspeeders–isduetosignificantlythese-
lectionofslowPEswithalowloadvalue.ThisdeficiencyhasbeenovercomewiththenewPriority
LeastUsedpolicy,whichincorporatesaPEloadincrementconstantdefininghowmuchanewre-
questincreasestheloadofaPE.Usingthisadditionalinformation,asignificantperformancegainis
ed.vachie

10.1.3.6PoolElementFailureDetectionMechanisms
Inthesecondpartoftheperformanceevaluations,PEfailurescenarioshavebeenanalysed–which
havebeenthemainmotivationforthecreationofRSerPool.Intheidealcase,aPEtellsitsPUthatit
iscontinuedisappearingthesession.andsetsThisascenariocheckpointonly(seebecomessection9.2critical).ifThen,theitPEisonlyMTBFisnecessaryverytolowfindandanethewtimePEtoto
findanewPEandconnecttobecomesoverlylong.
However,suchanidealcaseisunfortunatelyveryrareandPEsmayfailwithoutnotification.In
thiscase,itiscrucialthataPUisabletoquicklydetectthedeathofitsPE.Thisdetectionmechanism
(seefeatureliksubsectionethe9.3.2timeout)canforabebasedtransaction.onInSCTPthelatterheartbeats,case,kthiseep-alivemechanismmessagesactuallyorondoesannotapplicationimpose
anyadditionaloverheadtrafficonthenetwork.
ThePR-H-basedASAPEndpointKeep-AlivemonitoringofitsPEsisimportantiftherequest
isastartupsignificanttimechancesignificantlythatacontribjustutesselectedtothePEisrequestalreadyhandlingoutoftimeservice.(seeInthissubsectioncase,a9.3.3small),i.e.ifEndpointthere

206

OUTLOOKANDCONCLUSION10.CHAPTER

Keep-Aliveintervalbecomescrucialforthesystemperformance.However,thesmallertheinterval,
thelargerthenumberofoverheadmessages.UsingfailurereportsbyPUs,aPRcanremoveaPEfrom
thehandlespaceassoonasacertainthresholdoffailurereportshasbeenreached.Thismechanism
isalmostaseffectiveasafrequentASAPEndpointKeep-Alivemonitoring,butsignificantlysaveson
overheadbandwidth.Unfortunately,thismechanismcanonlybeappliedifthePUsareconsidered
trustworthy.Otherwise,itwouldallowamaliciousPUtoimpeachPEsandeasilycauseatleasta
severeservicedegradation.
Therankingofthepolicies,whichhasalreadybeenobservedandexplainedforthefailure-free
case,alsoremainsinfailurescenarios:theLeastUsedpolicyprovidesthebestperformance,due
toitsknowledgeofthePEsloadstates.Thisleadstohigherrequestshandlingspeedsandreduces
theprobabilitythat–foragivenaveragePEMTBF–arequestisaffectedbyafailure.Whilethe
performanceofRoundRobinisusuallystillbetterthanfortheRandompolicy,thehandlingspeedof
RoundRobinconvergesintothedirectionoftheRandomspeedinextremecases(i.e.ifthereisno
stablelistofPEidentitiestoselectserversfrominturn).

10.1.3.7SessionFailoverMechanisms
IncaseofaPEfailure,allworkperformedbythePEafterhavingsetthelastcheckpointislost.
Thesessionfailovermechanismusedbytheapplicationhastoensurethatthislostworkisbeing
re-processedbyanewlychosenPEinordertoresumethesession.Thesimplestcaseforasession
failovermechanismistheabortandrestartprinciple(seesubsection9.4.1):nocheckpointsareset,
allworkislostandthesessionhastostartfromscratch.Clearly,thisapproachbecomesinefficient
forlongrequestsincombinationwithasmallPEMTBF.
Usingclient-basedstatesharing,whichisprovidedbytheASAPSessionLayer(seesubsec-
tion9.4.2),statecookiesareusedtosetcheckpoints.Usingareasonableintervalforthetransmission
ofstatecookies(i.e.thedefinitionofcheckpoints),ithasbeenshowninsubsection9.4.2thatan
acceptablefailoverperformancecanbeachievedwhilekeepingthenumberofoverheadpacketslow.

10.1.4StandardizationandDeploymentofRSerPool
Finally,thelastgoalofthisthesishasbeentoalsobringresultsoftheRSerPoolimplementation
experiencesandresearchintothestandardizationprocessoftheIETF.Thefirststeptoreachthisgoal
hasbeenfulfilledbythedevelopmentoftheRSPLIBprototypeimplementation.Thisprototypeisthe
worldsfirst,fully-featuredandOpenSource(underGPLlicense)implementationoftheRSerPool
framework.IthasbecomethereferenceimplementationoftheIETFRSerPoolWorkingGroup.
Furthermore,ithasbeenpossibletomakecontributionstotheIETFstandardizationdocumentsin
formofthreeWorkingGroupdrafts(whichwillbecomeRFCs–i.e.officialstandardsdocuments–in
thenearfuture)andmultipleIndividualSubmissiondrafts:

T¨uxenandDreibholz(2006b)(WorkingGroupdraft)providesaspecificationoftheRSerPool
policies,inparticularalsoincludingimplementationguidelinesfortheRoundRobinpolicies
(toavoidthefallaciesandpitfallsfoundaspartofthisthesis)andthePriorityLeastUsedpolicy
(aresultofevaluatingtheheterogeneousservercapacityscenarios).

Silvertonetal.(2005)(WorkingGroupdraft)describestheRSerPoolAPI.Thisdocumentis
adirectresultoftheexperiencesmadebyimplementingtheRSPLIBprototype,aswellasof
discussionsandinteroperabilitytestsatIETFmeetingsandtheRSerPoolBakeoffevent.

10.2.OUTLOOKANDFUTUREWORK

207

Dreibholz,Mulik,ConradandPinzhoffer(2006)(WorkingGroupdraft)describestheSNMP
ManagementInformationBase(MIB)forRSerPool.Itisalsoaresultofexperiencesmadewith
theRSPLIBprototype.
PoolDreibholz,forIPFloCoenewandInformationConrad(2006Export)(Indi(IPFIX)vidual(seeSubmissionsubsectiondraft)3.6.3).describesIPFIXisthestillusageunderofRSstan-er-
dardizationandthisapplicationscenariomightgainmoreinterestassoonassomeIPFIXim-
plementationswillbeavailableanddeployed.
Dreibholz(2006a)(IndividualSubmissiondraft)proposestouseRSerPoolinDistributedCom-
devputingelopmentscenariosofthe(seeRSPLIsubsectionB3.6.5demonstration).Thissystemapplication(seesectionscenario5.7is)aninideaconjunctionoriginatingwithfromclient-the
basedstatesharing(seesubsubsection3.9.5.2).
DreibholzandPulinthanath(2006)(IndividualSubmissiondraft)proposestouseRSerPool
forprovidingSCTP-basedmobilityincombinationwiththeDynamicAddressReconfigura-
tion(Add-IP)extension(seesubsection3.6.6)ofSCTP.ItispartiallyalsoaresultofourSCTP
research(seealsoDreibholz,JungmaierandT¨uxen(2003)).
RSerPoolDreibholzandT¨uximplementationen(2006)(IndiinteroperabilityvidualtestingSubmissionatdraft)RSerPoolfinallyBakeoffdescribesmeetings.guidelinesItisafordirectthe
resultoftheRSPLIBimplementationexperienceandvariousdiscussionsontheIETFsRSerPool
mailinglistandatIETFmeetings.
Astestedpartforofourinteroperabilitystandardizationattheacti60thvities,IETFtheRSmeetingPLIBaswellprototypeasatthefirstimplementationofficialhasRSerPoolbeenBaksuccessfullyeoff.
temhavFinallye,beentosupportpresentedtheatdeplosomeymentleaofdingRSerPool,conferencestheRandSPLeIvBents,prototypeincludinganditsDreibholzdemonstrationandT¨uxsys-en
(work2003)Protocols),(LinuxConferenceDreibholzandAustralia),Rathgeb(Dreibholz2005a)((IEEE2004a)(IEEEINFOCOM),InternationalDreibholzandConferenceRathgeb(on2005bNet-)
and(IEEERathgebInternational(2005c)(IEEEConferenceLocalonTComputerelecommunications),NetworksDreibholzConference)(2005band)Dreibholz(LinuxTag),andDreibholzRathgeb
TENCON).(IEEE)2005e(

10.2OutlookandFutureWork
WhilethisthesishasevaluatedalotofimportantandinterestingaspectsofRSerPool,therearestill
manymorethingstobeanalysedindetailaspartoffuturework.
Forthehandlespacemanagement,theremaybeapplicationcaseswherethenumberofpools
timebecomesofalarPHge.inaAsshored-blackwnintree.sectionFora7.6lar,gerthenumberlookuptofimepools,foraanpooladditionalcurrentlyhashdependsfunctionontothespeedlookupup
thepoollookupmightbeconsidered.Furthermore,theimplementationcurrentlylimitsthePHsizeto
32bytes;forlargerPHs,ahashtablemayalsobeusefulforasmallernumberofpools.
ThemainsubjectoffutureworkisclearlytheENRPprotocolanditshandlespacesynchronization
features.Openquestionsonthistopicarethefollowing:
IsitpossibletoreducetheENRPUpdatemessageoverhead,e.g.byaggregatingupdatesor
performingselectiveupdatesonly?

208

ANDCONCLUSION10.CHAPTEROUTLOOK

WhathappensiftheENRPbandwidthisscarce(duetocongestion)andthehandlespaceviews
ofdifferentPRsbecomeheavilyunsynchronized?

HowefficientistheseparationandreunionofRSerPoolnetworkshandled,e.g.ifWANlinks
betweenLANcloudsbecomeunavailable?

Whichareusefulsettingsfortimeoutparametersinreal-worldnetworkscenarios?

CanthenumberofENRPassociationsbereduced?Forexample,insteadofhavinganownasso-
ciationbetweeneachPRofanoperationscope,aPR-Hcouldonlyconnecttosomedesignated
ones.ThesePRscouldthenforwardthereceivedinformation.

Inparticular,itisalsonecessarytovalidatesimulativeresultsforthesequestionsinreal-life,i.e.by
measurementsusingtheRSPLIBprototypeinPLANETLABscenarios.
AfurthersubjectoffutureworkistoanalyseadditionalstatesharingapproachesfortheApplica-
tionLayer:differentapproacheshavetobeevaluatedforcertainapplicationscenarios,inordertofind
improvementsandpossiblytodefineextensionsfortheRSerPoolprotocolstosupportcertainnew
mechanisms.Finally,animportantbutstillopentopicofRSerPoolisitssecurity.WhilethecurrentRSerPool
standardsdocumentssimplymovesecuritytoexistinglower-layertechnologieslikeTLSorIPsec(see
section3.13),thistopicshouldreceivesomemoreattentionontheRSerPoollevelitself.Inparticular,
thereistheneedtoclearlyidentifyandevaluateweakpointsoftheprotocolsandtodefinemechanisms
todecreasetheeffectivenessofattacks.Forexample,thePEfailurereportfeaturegivesPUsthepower
toimpeachPEs.Simplylimitingtheacceptancerateofsuchfailurereportscouldalreadymakean
ficult.difmoresignificantlyattack

AAppendix

MessageoolRSerP

ypesT

and

arametersP

FigureA.1:TheASAPEndpointKeep-AliveMessage

FigureA.2:TheASAPEndpointKeep-AliveAckMessage

FigureA.3:ThePoolElementIdentifierParameter

209

210

APPENDIX

Figure

Figure

A.5:

A.4:

The

A.

The

RSERPOOL

ASAP

ASAP

GEMESSA

gistrationDere

gistrationDere

TYPES

Message

Response

AND

Message

ARAMETERSP

Figure

A.6:

Figure

Figure

The

A.7:

A.8:

ASAP

The

The

Error

Error

Error

Message

arameterP

Cause

Parameter

211

RSERPOOL

GEMESSA

TYPES

AND

ARAMETERSP

212

Message

Resolution

Handle

ASAP

The

A.9:

Figure

Message

Unreachable

Endpoint

ASAP

The

A.10:

Figure

arameterP

Checksum

The

A.11:

Figure

APPENDIX

A.

Figure

Figure

A.12:

A.13:

Figure

The

The

ASAP

ASAP

A.14:

The

Cookie

Cookie

Cookie

Message

Echo

Message

arameterP

213

214

A.APPENDIX

Figure

A.16:Figure

A.15:

TYPESGEMESSARSERPOOL

The

ENRP

MessageError

TheENRPInitTakeoverAckMessage

A.17:Figure

TheENRPTakeoverServerMessage

AND

PARAMETERS

AlgorithmsofList

123456

The16-BitInternetChecksumAlgorithm...........
TheTwo-Part16-BitInternetChecksumAlgorithm......
AnExampleforaPoolUserusingtheBasicModeAPI....
AnExampleforaPoolElementusingtheBasicModeAPI..
AnExampleforaPoolUserusingtheEnhancedModeAPI..
AnExampleforaPoolElementusingtheEnhancedModeAPI

215

......

......

......

......

......

......

......

......

......

......

......

......

567790919193

216

LIST

OF

ALGORITHMS

esFigurofList

1.1ReliableServerPoolingandRelatedConcepts.....................
1.2TheRSerPoolProjectConcept.............................
2.1TheOSIandTCP/IPNetworkingModels.......................
2.2TheSCTPPacketFormat................................
2.3TheSCTPAssociationSetup..............................
2.4TheMulti-HomingFeatureofSCTP..........................
2.5TheMulti-StreamingFeatureofSCTP.........................
3.1TheRSerPoolConcept.................................
3.2TheHandlespace....................................
3.3TheRSerPoolProtocolStack..............................
3.4TelephoneSignallingwithSeparatedGKandMGC..................
3.5TelephoneSignallingwithCombinedGKandMGC..................
3.6RSerPool-basedSIPProxies..............................
3.7IPFIXwithRSerPool..................................
3.8LoadBalancingwithRSerPool.............................
3.9DistributedComputingwithRSerPool.........................
3.10SCTPMobilitywithRSerPool.............................
3.11TheBuildingBlocksofaRegistrar...........................
3.12AutomaticConfigurationofASAP...........................
3.13TheRegistrationandMonitoringofaPoolElement..................
3.14TheServerSelectionbyRegistrarandPoolUser....................
3.15TheMessageStructure.................................
3.16TheParameterStructure................................
3.17TheASAPRegistrationMessage............................
3.18ThePoolHandleParameter...............................
3.19ThePoolElementParameter..............................
3.20ThePolicyParameter..................................
3.21TheASAPRegistrationResponseMessage......................
3.22ThePoolElementMonitoringbyitsHomeRegistrar.................
3.23TheASAPHandleResolutionResponseMessage...................
3.24TheASAPServerAnnounceMessage.........................
3.25TheConceptofClient-BasedStateSharing......................
3.26ASessionFailoverusingClient-BasedStateSharing.................
3.27TheASAPBusinessCardMessage...........................
217

24713141416212224252626272828303132343637373940404041424344474748

218

FIGURESOFLIST3.28TheENRPPeerPresenceMessage...........................50
3.29TheServerInformationParameter...........................50
3.30ThePeerTablesofRegistrars..............................51
3.31TheENRPPeerListRequestMessage.........................53
3.32TheENRPPeerListResponseMessage........................53
3.33TheENRPHandleTableRequestMessage.......................54
3.34TheENRPHandleTableResponseMessage......................54
3.35TheENRPHandleUpdateMessage..........................55
3.36AnExampleforHandlespaceAuditandResynchronization..............57
3.37TheENRPInitTakeoverMessage...........................58
3.38ATakeoverExample..................................59
4.1TheHandlespaceDataStructureDesign........................71
4.2TheTimerScheduleStructure.............................76
4.3TheOwnershipSetStructure..............................78
4.4AnOverviewofStorageStructures...........................79
4.5ALinkageImplementationusingSeparateNodeStructures..............81
4.6ALinkageImplementationusingIntegratedNodeStructures.............81
5.1TheDispatcherComponent...............................87
5.2TheRSPLIBRegistrar..................................88
5.3TheRSPLIBLibrary...................................89
5.4TheRSPLIBDemonstrationSystem..........................94
5.5TheFirstInteroperabilityTestsatthe60thIETFMeeting...............96
6.1TheSimulationScenarioNetwork...........................104
6.2TheLANModule....................................105
6.3TheRegistrarModule..................................106
6.4TheRegistrarProcessFiniteStateMachine......................107
6.5ThePoolElementModule...............................108
6.6ThePoolElementASAPFiniteStateMachine.....................108
6.7ThePoolUserModule.................................110
6.8ThePoolUserASAPStateMachine..........................110
7.1FindingtheSuccessorofaNodeinaBinaryTree...................115
7.2UsingSelectionbyWeightSuminaBinaryTree...................116
7.3TheCompleteHandlespaceStructure.........................117
7.4TheThroughputoftheRegistration/DeregistrationOperations............118
7.5TheThroughputoftheRe-RegistrationOperation...................120
7.6TheThroughputoftheTimerHandlingOperation...................121
7.7TheThroughputoftheHandleResolutionOperation.................122
7.8TheHandleResolutionThroughputforaVariationofMaxIncrement.........124
7.9TheThroughputoftheSynchronizationOperation...................126
7.10TheScalabilityoftheRegistration/DeregistrationOperation.............127
7.11TheScalabilityoftheRe-RegistrationOperation...................128
7.12TheScalabilityoftheTimerHandlingOperation...................129
7.13TheScalabilityoftheHandleResolutionOperation..................130
7.14TheHandleResolutionScalabilityofaVariationofMaxIncrement..........130

FIGURESOFLIST2197.15TheScalabilityoftheSynchronizationOperation...................132
7.16TheScalabilityoftheNumberofPools........................133
7.17UsingLeaf-LinkedTreesfortheSynchronizationOperation.............134
8.1TheCalcAppProtocolMessageSequenceforNormalOperation...........139
8.2TheRequestGenerationandHandlingattheClientSide...............139
8.3TheMulti-TaskingBehaviouroftheServerSide....................140
8.4TheCalcAppProtocolMessageSequenceforaFailover................141
8.5TheDefinitionoftheRequestHandlingTime.....................143
8.6ThePerformanceSimulationScenario.........................144
8.7TheCoherenceoftheThreeWorkloadParameters...................146
8.8TheVariationofPU:PERatioandRequestSize....................147
8.9ARequestSchedulingExample............................148
8.10AnAnalogonforRequestScheduling.........................148
8.11TheVariationofRequestSizeandRequestInterval..................149
8.12TheVariationofRequestIntervalandPU:PERatio..................150
8.13ThePerformanceforaFixedStepSize.........................152
8.14ThePerformanceforaRandomizedStepSize.....................153
8.15TheEffectsofVaryingtheNumberofRegistrarsfortheRoundRobinPolicy....154
8.16TheGeneralEffectsofNetworkDelay.........................156
8.17AScenarioofGloballyDistributedRSerPoolComponents..............157
8.18AnExamplefortheDistance-AwarePolicyEnvironment...............159
8.19TheDistance-AwarePoliciesProofofConceptSimulationSetup...........160
8.20AProofofConceptfortheLeastUsedwithDPFPolicy................160
8.21AProofofConceptfortheWeightedRandomwithDPFPolicy............161
8.22FindingaReasonableLoadDPFSettingfortheLeastUsedwithDPFPolicy.....162
8.23FindingaReasonableWeightDPFSettingfortheWeightedRandomwithDPFPolicy163
8.24ExperimentalResultsoftheLeastUsedwithDPFPolicy...............167
8.25ExperimentalResultsoftheLeastUsedPolicy.....................167
8.26TheGeneralEffectsofusingthePU-SideCache....................169
8.27AnExampleforanEffectiveCacheUsage.......................170
8.28ASinglePowerfulServer:UsingtheLeastUsedPolicy................172
8.29ASinglePowerfulServer:UsingtheWeightedRandomPolicy............172
8.30ASinglePowerfulServer:UsingInappropriatePolicies................172
8.31OneThirdPowerfulServers..............................174
8.32UsingaLinearCapacityDistribution..........................176
8.33UsingaUniformCapacityDistribution.........................177
8.34UsingaTruncatedNormalCapacityDistribution...................178
8.35ServerCapacitiesoftheSinglePowerfulServerScenario...............180
8.36IncreasingtheCapacityofaSingleServer.......................181
8.37IncreasingtheCapacityofOneThirdoftheServers..................182
8.38IncreasingtheServerCapacitiesLinearly.......................183
9.1TheDetectionandHandlingofaPoolElementFailure................185
9.2TheFailoverPerformanceofDynamicPools......................186
9.3TheImpactofCleanShutdowns............................188
9.4TheImpactoftheSessionKeep-AliveIntervalforSessionMonitoring........190

220

FIGURESOFLIST

9.5TheImpactoftheEndpointKeep-AliveIntervalforPoolElementMonitoring....192
9.6UtilizingFailureReportsfromPoolUsersforPoolElementMonitoring.......194
9.7UsingtheAbortandRestartPrinciplefortheSessionFailover...........196
9.8UsingStateCookiesfortheSessionFailover......................197
9.9TheNumberofStateCookiesfortheSessionFailover.................198
9.10TheNumberofCookiesperRequest..........................200
A.1TheASAPEndpointKeep-AliveMessage.......................209
A.2TheASAPEndpointKeep-AliveAckMessage....................209
A.3ThePoolElementIdentifierParameter.........................209
A.4TheASAPDeregistrationMessage...........................210
A.5TheASAPDeregistrationResponseMessage.....................210
A.6TheASAPErrorMessage...............................211
A.7TheErrorParameter..................................211
A.8TheErrorCauseParameter...............................211
A.9TheASAPHandleResolutionMessage........................212
A.10TheASAPEndpointUnreachableMessage......................212
A.11TheChecksumParameter................................212
A.12TheASAPCookieMessage..............................213
A.13TheASAPCookieEchoMessage...........................213
A.14TheCookieParameter.................................213
A.15TheENRPErrorMessage...............................214
A.16TheENRPInitTakeoverAckMessage.........................214
A.17TheENRPTakeoverServerMessage..........................214

ablesTofList

3.13.2

4.14.24.34.44.54.64.74.84.9

8.18.28.3

TheNetworkandComponentFailureDetectionMechanismsofSCTP........
TheComponentFailureDetectionMechanismsofRSerPool.............

TheHandlespaceDatatypeStructure..........................
SequenceNumbersforPoolsandPoolElements....................
PoolElementWeightsandWeightSum........................
ARoundRobinPolicyExample............................
AWeightedRoundRobinPolicyExample.......................
ALeastUsedPolicyExample.............................
AWeightedRandomPolicyExample.........................
TheTimersoftheHandlespaceManagement.....................
StorageStructuresandtheirComputationalComplexity................

TheCalculationApplicationPoolElementandPoolUserParameters.........
TheAverageRequestHandlingTimeResultsofaFirstTrial.............
TheAverageRequestHandlingTimeResultsof22Measurements..........

221

6464

687172737474757680

142165166

222

LIST

OF

ABLEST

yBibliograph

Adelson-Velskii,G.M.andLandis,E.M.:1962,Analgorithmfortheorganizationofinformation,
SovietMathematicsDoklady,Vol.3,pp.1259–1263.4.4.7
Alvisi,L.,Bressoud,T.C.,El-Khashab,A.,Marzullo,K.andZagorodnov,D.:2001,Wrapping
Server-SideTCPtoMaskConnectionFailures,ProceedingsoftheIEEEInfocom2001,Vol.1,
Anchorage,Alaska/U.S.A.,pp.329–337.ISBN0-7803-7016-3.
URL:http://citeseer.ist.psu.edu/alvisi00wrapping.html1.2.1,1.2.4
Aragon,C.andSeidel,R.:1989,Randomizedsearchtrees,Proceedingsofthe30thIEEESymposium
onFoundationsofComputerScience,pp.540–545.4.4.7
AROSDevelopmentTeam:2005a,AROS:AmigaResearchOperatingSystem.
URL:http://www.aros.org4.4.8
AROSDevelopmentTeam:2005b,AROSApplicationDevelopmentManual.
URL:http://www.aros.org/documentation/developers/application-development.php4.4.8
Berger,E.:2002,MemoryManagementforHigh-PerformanceApplications,PhDthesis,TheUniver-
sityofTexasatAustin.
URL:http://citeseer.ist.psu.edu/berger02memory.html5
Berger,E.andBrowne,J.C.:1999,ScalableLoadDistributionandLoadBalancingforDynamic
ParallelPrograms,ProceedingsoftheInternationalWorkshoponCluster-BasedComputing99,
Rhodes/Greece.URL:http://citeseer.ist.psu.edu/berger99scalable.html1.2.2,3.11.1
Berger,E.,Zorn,B.andMcKinley,K.:2001,ComposingHigh-PerformanceMemoryAlloca-
tors,SIGPLANConferenceonProgrammingLanguageDesignandImplementation,Snowbird,
1-58113-414-2.ISBN114–124.pp.Utah/U.S.A.,URL:http://citeseer.ist.psu.edu/berger01composing.html5
Berger,E.,Zorn,B.andMcKinley,K.:2002,ReconsideringCustomMemoryAllocation,Proceed-
ingsoftheConferenceonObject-OrientedProgramming:Systems,Languages,andApplications
(OOPSLA)2002,Seattle,Washington/U.S.A.,pp.1–12.
URL:http://citeseer.ist.psu.edu/698680.html5
Bhattacharjee,S.,Ammar,M.H.,Zegura,E.W.,Shah,V.andFei,Z.:1997,Application-Layer
Anycasting,ProceedingsoftheIEEEInfocom97,Kobe/Japan,pp.1388–1396.ISBN0-8186-
7780-5.URL:http://citeseer.ist.psu.edu/bhattacharjee97applicationlayer.html1.2.1

223

224

BIBLIOGRAPHY

Bivens,A.:2006,Server/ApplicationStateProtocolv1,TechnicalReportVersion03,IETF,Individual
Submission.draft-bivens-sasp-03.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-bivens-sasp-03.txt3.6.4
Blake-Wilson,S.,Nystrom,M.,Hopwood,D.,Mikkelsen,J.andWright,T.:2003,TransportLayer
Security(TLS)Extensions,StandardsTrackRFC3546,IETF.
URL:http://www.ietf.org/rfc/rfc3546.txt4,9,4
Bozinovski,M.:2004,Fault-tolerantplatformsforIP-basedSessionControlSystems,PhDthesis,
AalborgUniversity,Aalborg/Denmark.
URL:http://kom.aau.dk/∼marjanb/thesis/PhDthesis.pdf3.6.2
Bozinovski,M.,Gavrilovska,L.andPrasad,R.:2003,AState-sharingMechanismforProviding
ReliableSIPSessions,Proceedingsofthe6thInternationalConferenceonTelecommunicationsin
ModernSatellite,CableandBroadcastingServices,Vol.1,Nis/SerbiaandMontenegro,pp.384–
387.URL:http://kom.aau.dk/∼marjanb/papers/telsiks03.pdf3.6.2
Bozinovski,M.,Gavrilovska,L.,Prasad,R.andSchwefel,H.-P.:2004,EvaluationofaFault-tolerant
CallControlSystem,FactaUniversitatisSeries:ElectronicsandEnergetics17(1),33–44.
URL:http://kom.aau.dk/∼marjanb/papers/facta04.pdf3.6.2
Bozinovski,M.,Renier,T.,Schwefel,H.-P.andPrasad,R.:2003,TransactionConsistencyinRepli-
catedSIPCallControlSystems,ProceedingsoftheCommunications&SignalProcessingand
FourthPacific-RimConferenceonMultimedia(ICICS-PCM2003),Vol.1,pp.314–318.
URL:http://kom.aau.dk/∼marjanb/papers/icics03.pdf3.6.2
Bozinovski,M.,Schwefel,H.-P.andPrasad,R.:2004,MaximumAvailabilityServerSelectionPolicy
forSessionControlSystemsbasedon3GPPSIP,ProceedingsofSeventhInternationalSymposium
onWirelessPersonalMultimediaCommunications,Padova/Italy.
URL:http://kom.aau.dk/∼marjanb/papers/wpmc04.pdf3.6.2
Braden,R.,Borman,D.andPartridge,C.:1988,ComputingtheInternetChecksum,StandardsTrack
IETF.,1071RFCURL:http://www.ietf.org/rfc/rfc1071.txt2.4.1,2.4.2,3.10.5,4.4.4
Cain,B.,Deering,S.,Kouvelas,I.,Fenner,B.andThyagarajan,A.:2002,InternetGroupManage-
mentProtocol,StandardsTrackRFC3376,IETF.
URL:http://www.ietf.org/rfc/rfc3376.txt2.3.1,2.3.2
Cardellini,V.,Colajanni,M.andYu,P.S.:2000,Geographicloadbalancingforscalabledistributed
Websystems,Proceedingsofthe8thInternationalSymposiumonModeling,AnalysisandSimula-
tionofComputerandTelecommunicationSystems,SanFrancisco,California/U.S.A.,pp.20–27.
URL:http://citeseer.ist.psu.edu/cardellini00geographic.html1.2.2
Chun,B.,Culler,D.,Roscoe,T.,Bavier,A.,Peterson,L.,Wawrzoniak,M.andBowman,M.:2003,
PlanetLab:AnOverlayTestbedforBroad-CoverageServices,ACMSIGCOMMComputerCom-
municationReview33(3).
URL:http://www.planet-lab.org/PDN/PDN-03-009/pdn-03-009.pdf8.10.5

BIBLIOGRAPHY

225

CiscoSystems:2000,Cisco™DistributedDirector.
URL:http://www.cisco.com/cpropart/salestools/cc/pd/cxsr/dd/prodlit/ddpa.pdf1.2.1,3.1
CiscoSystems:2004,Cisco™IOSReferenceGuide.
URL:http://www.cisco.com/warp/public/620/1.pdf4.4.8
Coene,L.:2002,StreamControlTransmissionProtocolApplicabilityStatement,InformationalRFC
IETF.,3257URL:http://www.ietf.org/rfc/rfc3257.txt2.4.3.1
Coene,L.,Conrad,P.andLei,P.:2004,ReliableServerpoolapplicabilityStatement,Internet-Draft
Version02,IETF,RSerPoolWorkingGroup.draft-ietf-rserpool-applic-02.txt.
URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-applic-02.txt3.6.4
Colajanni,M.andYu,P.S.:2002,APerformanceStudyofRobustLoadSharingStrategiesfor
DistributedHeterogeneousWebServerSystems,IEEETransactionsonKnowledgeandDataEn-
398–414.(2),14gineeringURL:http://citeseer.ist.psu.edu/596241.html1.2.2
Conrad,P.,Jungmaier,A.,Ross,C.,Sim,W.-C.andT¨uxen,M.:2002,ReliableIPTelephony
ApplicationswithSIPusingRSerPool,ProceedingsoftheStateCoverageInitiatives2002,
Mobile/WirelessComputingandCommunicationSystemsII,Vol.X,Orlando,Florida/U.S.A.
980-07-8150-1.ISBNURL:http://www.exp-math.uni-essen.de/∼ajung/papers/SCI2002ReliableIPTelephonywith
SIPandRSerPool16072002.pdf3.6.2,6.2
Conrad,P.andLei,P.:2005a,ServicesProvidedByReliableServerPooling,Internet-DraftVersion
02,IETF,RSerPoolWorkingGroup.draft-ietf-rserpool-service-02.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-service-02.txt1,3.9.5.1,3.9.5.2
Conrad,P.andLei,P.:2005b,TCPMappingforReliableServerPoolingEnhancedMode,Internet-
DraftVersion03,IETF,RSerPoolWorkingGroup.draft-ietf-rserpool-tcpmapping-03.txt,workin
progress.URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-tcpmapping-03.txt3.5
Cormen,T.,Leiserson,C.andRivest,R.:1998,IntroductiontoAlgorithms,MITPress,Cambridge,
Massachusetts/U.S.A.ISBN0-262-03141-8.4.4.7,6.4.2.2
Crosby,S.A.andWallach,D.S.:2003,DenialofserviceviaAlgorithmicComplexityAttacks,
Proceedingsofthe12thUSENIXSecuritySymposium,Washington,DC/U.S.A.,pp.29–44.
URL:http://www.cs.rice.edu/∼scrosby/hash/CrosbyWallachUsenixSec2003.pdf7.6.2
Davies,A.:2004,ComputationalIntermediationandtheEvolutionofComputationasaCommodity,
JournalofAppliedEconomics36(11),1131–1142.
URL:http://www.business.duq.edu/faculty/davies/research/EconomicsOfComputation.pdf
3.6.5.1Deering,S.andHinden,R.:1998a,InternetControlMessageProtocol(ICMPv6)fortheInternet
ProtocolVersion6(IPv6)Specification,StandardsTrackRFC2463,IETF.
URL:http://www.ietf.org/rfc/rfc2463.txt2.3.2

226

BIBLIOGRAPHY

Deering,S.andHinden,R.:1998b,InternetProtocol,Version6(IPv6),StandardsTrackRFC2460,
IETF.URL:http://www.ietf.org/rfc/rfc2460.txt2.3.2,3
Dierks,T.andAllen,C.:1999,TheTLSProtocol-Version1.0,StandardsTrackRFC2246,IETF.
URL:http://www.ietf.org/rfc/rfc2246.txt4,9,4
Dreibholz,T.:2001,ManagementofLayeredVariableBitrateMultimediaStreamsoverDiffServwith
AprioriKnowledge,MastersThesis,UniversityofBonn,InstituteforComputerScience.
URL:http://www.exp-math.uni-essen.de/∼dreibh/diplom/indexe.html2.3.2
Dreibholz,T.:2002,AnEfficientApproachforStateSharinginServerPools,Proceedingsofthe27th
IEEELocalComputerNetworksConference,Tampa,Florida/U.S.A.,pp.348–352.ISBN0-7695-
1591-6.URL:http://citeseer.ist.psu.edu/dreibholz02efficient.html1.2.1,1.2.3,1,3.6.7,3.9.5.2,9.1,9.4.2.2
Dreibholz,T.:2004a,AnOverviewoftheReliableServerPoolingArchitecture,Proceedingsofthe
12thIEEEInternationalConferenceonNetworkProtocols2004,Berlin/Germany.Posterpresen-
tation.URL:http://citeseer.ist.psu.edu/dreibholz04overview.html5.7,10.1.4
Dreibholz,T.:2004b,draft-ietf-rserpool-policies-00.txt-DefinitionofMemberSelectionPolicies,
Proceedingsofthe61stIETFMeeting,Washington,DC/U.S.A.
URL:http://tdrwww.iem.uni-due.de/dreibholz/rserpool/rserpool-publications/IETF61.pdf3.11
Dreibholz,T.:2004c,MemberSelectionPoliciesfortheReliableServerPoolingProtocolSuite,
Proceedingsofthe60thIETFMeeting,SanDiego,California/U.S.A.
URL:http://tdrwww.iem.uni-due.de/dreibholz/rserpool/rserpool-publications/IETF60.pdf3.11,
8.12.2.4Dreibholz,T.:2004d,PolicyManagementintheReliableServerPoolingArchitecture,Proceedings
oftheMulti-ServiceNetworksConference2004,Abingdon,Oxfordshire/UnitedKingdom.
URL:http://tdrwww.iem.uni-due.de/dreibholz/rserpool/rserpool-publications/
MSN2004-Final-with-Examples.pdf3.6.5.2,3.10.5,4.4
Dreibholz,T.:2005a,AnIPv4FlowlabelOption,Internet-DraftVersion04,IETF,IndividualSub-
mission.draft-dreibholz-ipv4-owlabel-04.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-dreibholz-ipv4-owlabel-04.txt2.3.2
Dreibholz,T.:2005b,Dasrsplib–Projekt–Hochverf¨ugbarkeitmitReliableServerPooling,Proceed-
ingsoftheLinuxTag,Karlsruhe/Germany.
URL:http://tdrwww.iem.uni-due.de/dreibholz/rserpool/rserpool-publications/LinuxTag2005.pdf
5.6.2.1,5.6.2.2,5.7,10.1.4
Dreibholz,T.:2006a,ApplicabilityofReliableServerPoolingforReal-TimeDistributedComputing,
Internet-DraftVersion01,IETF,IndividualSubmission.draft-dreibholz-rserpool-applic-distcomp-
progress.inorkw01.txt,URL:http://www.watersprings.org/pub/id/draft-dreibholz-rserpool-applic-distcomp-01.txt3.6.5,
10.1.4

BIBLIOGRAPHY

227

Dreibholz,T.:2006b,rsplib–EineOpenSourceImplementationvonReliableServerPooling,
ProceedingsoftheLinuxtageinEssen,Essen/Germany.
URL:http://tdrwww.iem.uni-due.de/dreibholz/rserpool/rserpool-publications/Linuxtage2006.
5.7pdfDreibholz,T.:2006c,ThomasDreibholzsRSerPoolPage.
URL:http://tdrwww.exp-math.uni-essen.de/dreibholz/rserpool1.3,5.1,5.6.2.2
Dreibholz,T.,Coene,L.andConrad,P.:2006,ReliableServerpooluseinIPowinformationex-
change,Internet-DraftVersion02,IETF,IndividualSubmission.draft-coene-rserpool-applic-ipfix-
progress.inorkw02.txt,URL:http://www.watersprings.org/pub/id/draft-coene-rserpool-applic-ipfix-02.txt3.6.3,10.1.4
Dreibholz,T.,IJsselmuiden,A.andAdams,J.L.:2004,SimulationofanadvancedQoSprotocol
formasscontent,SecondInternationalConferenceonPerformanceModellingandEvaluationof
HeterogeneousNetworks(HET-NET),Ikley,WestYorkshire/UnitedKingdom.
URL:http://tdrwww.iem.uni-due.de/dreibholz/owrouting/owrouting-publications/
HET-NET2004-Paper.pdf1
Dreibholz,T.,IJsselmuiden,A.andAdams,J.L.:2005,AnAdvancedQoSProtocolforMassContent,
ProceedingsoftheIEEEConferenceonLocalComputerNetworks30thAnniversary,Sydney/Aus-
0-7695-2421-4.ISBN517–518.pp.tralia,URL:http://citeseer.ist.psu.edu/dreibholz05advanced.html1
Dreibholz,T.,Jungmaier,A.andT¨uxen,M.:2003,AnewSchemeforIP-basedInternetMobil-
ity,Proceedingsofthe28thIEEELocalComputerNetworksConference,K¨onigswinter/Germany,
0-7695-2037-5.ISBN99–108.pp.URL:http://citeseer.ist.psu.edu/dreibholz03new.html3.6.6,10.1.4
Dreibholz,T.,Mulik,J.,Conrad,P.andPinzhoffer,K.:2006,ReliableServerPooling:Management
InformationBaseusingSMIv2,Internet-DraftVersion02,IETF,RSerPoolWorkingGroup.draft-
progress.inorkwietf-rserpool-mib-02.txt,URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-mib-02.txt10.1.4
Dreibholz,T.andPulinthanath,J.:2006,ApplicabilityofReliableServerPoolingforSCTP-Based
EndpointMobility,Internet-DraftVersion00,IETF,IndividualSubmission.draft-dreibholz-
progress.inorkwrserpool-applic-mobility-00.txt,URL:http://www.watersprings.org/pub/id/draft-dreibholz-rserpool-applic-mobility-00.txt3.6.6,
10.1.4Dreibholz,T.andRathgeb,E.P.:2005a,AnApplicationDemonstrationoftheReliableServerPooling
Framework,Proceedingsofthe24thIEEEINFOCOM,Miami,Florida/U.S.A.Demonstrationand
presentation.posterURL:http://citeseer.ist.psu.edu/dreibholz05application.html5.7,10.1.4
Dreibholz,T.andRathgeb,E.P.:2005b,ImplementingtheReliableServerPoolingFramework,Pro-
ceedingsofthe8thIEEEInternationalConferenceonTelecommunications,Vol.1,Zagreb/Croatia,
953-184-081-4.ISBN21–28.pp.URL:http://citeseer.ist.psu.edu/dreibholz05implementing.html1.2.4,3.6.5.2,3.10.5,4.4,8.5.1,
10.1.4,8.6

228

BIBLIOGRAPHY

Dreibholz,T.andRathgeb,E.P.:2005c,OnthePerformanceofReliableServerPoolingSystems,
ProceedingsoftheIEEEConferenceonLocalComputerNetworks30thAnniversary,Sydney/Aus-
0-7695-2421-4.ISBN200–208.pp.tralia,URL:http://citeseer.ist.psu.edu/dreibholz05performance.html2,3.6.5,8.5.1,8.5.2,8.7,8.10.1,
10.1.4,8.11.1Dreibholz,T.andRathgeb,E.P.:2005d,RSerPool–ProvidingHighlyAvailableServicesusing
UnreliableServers,Proceedingsofthe31stIEEEEuroMircoConferenceonSoftwareEngineering
andAdvancedApplications,Porto/Portugal,pp.396–403.ISBN0-7695-2431-1.
URL:http://citeseer.ist.psu.edu/dreibholz05rserpool.html1,3.6.5,9.3
Dreibholz,T.andRathgeb,E.P.:2005e,ThePerformanceofReliableServerPoolingSystemsin
DifferentServerCapacityScenarios,ProceedingsoftheIEEETENCON05,Melbourne/Australia.
0-7803-9312-0.ISBNURL:http://citeseer.ist.psu.edu/733077.html2,3.6.5,8.5.1,8.8.2,8.12,10.1.4
Dreibholz,T.andRathgeb,E.P.:2007,OnImprovingthePerformanceofReliableServerPooling
SystemsforDistance-SensitiveDistributedApplications,Proceedingsofthe15.ITG/GIFachta-
gungKommunikationinVerteiltenSystemen,Bern/Switzerland.
URL:http://tdrwww.iem.uni-due.de/dreibholz/rserpool/rserpool-publications/KiVS2007.pdf
8.10.5.4Dreibholz,T.,Rathgeb,E.P.andT¨uxen,M.:2005,LoadDistributionPerformanceoftheReliable
ServerPoolingFramework,Proceedingsofthe4thIEEEInternationalConferenceonNetworking,
Vol.2,SaintGillesLesBains/ReunionIsland,pp.564–574.ISBN3-540-25338-6.
URL:http://citeseer.ist.psu.edu/dreibholz05load.html2,3.6.5,3.11,3.11.2.1,3.11.3.2,8.3,8.8.1,
8.12.2.4,8.12.2.1,8.12.2Dreibholz,T.,Smith,A.andAdams,J.L.:2003,RealizingascalableedgedevicetomeetQoSre-
quirementsforreal-timecontentdeliveredtoIPbroadbandcustomers,Proceedingsofthe10thIEEE
InternationalConferenceonTelecommunications,Vol.2,Papeete/FrenchPolynesia,pp.1133–
0-7803-7661-7.ISBN1139.URL:http://citeseer.ist.psu.edu/dreibholz03realizing.html1
Dreibholz,T.andT¨uxen,M.:2003,HighAvailabilityusingReliableServerPooling,Proceedingsof
theLinuxConferenceAustralia,Perth/Australia.
URL:http://citeseer.ist.psu.edu/dreibholz03high.html3.6.5,5.6.2.1,10.1.4
Dreibholz,T.andT¨uxen,M.:2006,ReliableServerPooling(RSerPool)BakeoffScoring,Internet-
DraftVersion00,IETF,IndividualSubmission.draft-dreibholz-rserpool-score-00.txt,workin
progress.URL:http://www.watersprings.org/pub/id/draft-dreibholz-rserpool-score-00.txt5.9.4,10.1.4
Dykes,S.G.,Robbins,K.A.andJeffery,C.L.:2000,AnEmpiricalEvaluationofClient-SideServer
SelectionAlgorithms,ProceedingsoftheIEEEInfocom2000,Vol.3,TelAviv/Israel,pp.1361–
0-7803-5880-5.ISBN1370.URL:http://citeseer.ist.psu.edu/dykes00empirical.html1.2.2
Eastlake,D.,Crocker,S.andSchiller,J.:1994,RandomnessRecommendationsforSecurity,Infor-
IETF.,1750RFCmationalURL:http://www.ietf.org/rfc/rfc1750.txt3.7.2

BIBLIOGRAPHY

229

Eastlake,D.andJones,P.:2001,USSecureHashAlgorithm1(SHA1),InformationalRFC3174,
IETF.URL:http://www.ietf.org/rfc/rfc3174.txt4,3
Eaton,J.:2003,OctaveHomePage.
URL:http://www.octave.org6.3.2
Echtle,K.:1990,Fehlertoleranzverfahren,Springer-Verlag,Heidelberg/Germany.ISBN3-540526-
80-3.URL:http://dc.informatik.uni-essen.de/Echtle/all/buchftv/1.2.3,3.1,3.12.1
Eisele,K.:2002,DesignofaMemoryManagementUnitforSystem-on-a-ChipPlatformLEON,
Mastersthesis,Universit¨atStuttgart,Institutf¨urInformatik,Rechnerarchitektur.
URL:http://www.informatik.uni-stuttgart.de/cgi-bin/NCSTRL/NCSTRLview.pl?id=DIP-2013\
4.4.8&engl=Engelmann,C.andScott,S.L.:2005,ConceptsforHighAvailabilityinScientificHigh-EndCom-
puting,InProceedingsoftheHighAvailabilityandPerformanceWorkshop(HAPCW)2005,Santa
xico/U.S.A.MewNeFe,URL:http://citeseer.ist.psu.edu/745548.html1.2.3,3.12.4
Fielding,R.,Gettys,J.,Mogul,J.,Frystyk,H.,Masinter,L.,Leach,P.andBerners-Lee,T.:1999,
HypertextTransferProtocol-HTTP/1.1,StandardsTrackRFC2616,IETF.
URL:http://www.ietf.org/rfc/rfc2616.txt2.1,3.1,8.2,9.4.1
FreeSoftwareFoundation:1991,GPLGeneralPublicLicense.
URL:http://www.gnu.org/copyleft/gpl.html2
FreeSoftwareFoundation:2003,GNUMake.
URL:http://www.gnu.org/software/make/6.3.4
GNOMEProject:2001,GLibReferenceManual.
URL:http://developer.gnome.org/doc/API/glib/4.2,4.4.8
Gradischnig,K.D.andT¨uxen,M.:2001,SignalingtransportoverIP-basednetworksusingIETF
standards,Proceedingsofthe3rdInternationalWorkshopontheDesignofReliableCommunication
Networks,Budapest/Hungary,pp.168–174.
URL:http://www.sctp.de/papers/drcn2001.pdf2,3.6.1
Gradischnig,K.,Kramer,S.andT¨uxen,M.:2000,Loadsharing–akeytothereliabilityofss7-
networks,ProceedingsoftheSecondInternationalWorkshopontheDesignofReliableCommuni-
cationNetworks,Munich/Germany,pp.216–221.
URL:http://citeseer.ist.psu.edu/gradischnig00loadsharing.html2
Grams,T.:1999,ReliabilityandSafety-Zuverl¨assigkeitundSicherheit,UniversityofAppliedSci-
.yFulda/GermanFulda,encesURL:http://www2.hs-fulda.de/∼grams/Reliability/R\&S-Terms1.html3.1
Gray,J.andSiewiorek,D.:1991,High-AvailabilityComputerSystems,IEEEComputerMagazine
39–48.(9),24URL:http://citeseer.ist.psu.edu/727599.html1.1

230

BIBLIOGRAPHY

Guibas,L.J.andSedgewick,R.:1978,Adichromaticframeworkforbalancedtrees,Proceedingsof
the19thIEEESymposiumonFoundationsofComputerScience,NewYork/U.S.A.,pp.8–21.4.4.7
Gupta,D.andBepari,P.:1999,LoadSharinginDistributedSystems,ProceedingsoftheNational
WorkshoponDistributedComputing.
URL:http://citeseer.ist.psu.edu/245239.html1.2.2
Harrington,D.,Presuhn,R.andWijnen,B.:2002,AnArchitectureforDescribingSimpleNetwork
ManagementProtocol(SNMP)ManagementFrameworks,StandardsTrackRFC3411,IETF.
URL:http://www.ietf.org/rfc/rfc3411.txt2.1
Hinden,R.andDeering,S.:2006,IPVersion6AddressingArchitecture,StandardsTrackRFC4291,
IETF.URL:http://www.ietf.org/rfc/rfc4291.txt2.3.2
Hinden,R.andHaberman,B.:2005,UniqueLocalIPv6UnicastAddresses,StandardsTrackRFC
IETF.,4193URL:http://www.ietf.org/rfc/rfc4193.txt2.3.2
Hohendorf,C.,Dreibholz,T.andUnurkhaan,E.:2006,SecureSCTP,TechnicalReportVersion01,
IETF,IndividualSubmission.draft-hohendorf-secure-sctp-01.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-hohendorf-secure-sctp-01.txt2.4.3.7
Huitema,C.andCarpenter,B.:2004,DeprecatingSiteLocalAddresses,StandardsTrackRFC3879,
IETF.URL:http://www.ietf.org/rfc/rfc3879.txt2.3.2
IETF:2006,InternetEngineeringTaskForce.
URL:http://www.ietf.org/2.2
IETFRSerPoolWG:2005,Ietfreliableserverpoolingworkinggroup.
URL:http://www.ietf.org/html.charters/rserpool-charter.html3.1
Jungmaier,A.:2005,DasTransportprotokollSCTP,PhDthesis,Universit¨atDuisburg-Essen,Institut
Mathematik.Experimentelleur¨fURL:http://miless.uni-duisburg-essen.de/servlets/DocumentServlet?id=121522.4.3.1,2,2.4.3.5,
6.2,5.1,3.12.2Jungmaier,A.,Rathgeb,E.P.,Schopp,M.andT¨uxen,M.:2001,Amulti-linkend-to-endprotocolfor
IP-basednetworks,AEU¨-InternationalJournalofElectronicsandCommunications55(1),46–54.
URL:http://tdrwww.exp-math.uni-essen.de/inhalt/forschung/sctp-aeu.pdf2.4.3.1
Jungmaier,A.,Rathgeb,E.P.andT¨uxen,M.:2002,OntheUseofSCTPinFailover-Scenarios,
ProceedingsoftheStateCoverageInitiatives2002,VolumeX,Mobile/WirelessComputingand
CommunicationSystemsII,Vol.X,Orlando,Florida/U.S.A.ISBN980-07-8150-1.
URL:http://tdrwww.exp-math.uni-essen.de/inhalt/forschung/sctpfb/sctp-failover.pdf3.12.2,6.2
Jungmaier,A.,Rescorla,E.andT¨uxen,M.:2002,TransportLayerSecurityoverStreamControl
TransmissionProtocol,StandardsTrackRFC3436,IETF.
URL:http://www.ietf.org/rfc/rfc3436.txt2.4.3.7

BIBLIOGRAPHY

231

Jungmaier,A.,Schopp,M.andT¨uxen,M.:2000a,DasSimpleControlTransmissionProtocol(SCTP)
–EinneuesProtokollzumTransportvonSignalisierungsmeldungen¨uberIP-basierteNetze,Elek-
trotechnikundInformationstechnik–Zeitschriftdes¨OsterreichischenVerbandesf¨urElektrotechnik
381–388.(6),117URL:http://tdrwww.iem.uni-due.de/inhalt/forschung/sctpdeutsch.pdf2.4.3.1,2,6.2
Jungmaier,A.,Schopp,M.andT¨uxen,M.:2000b,PerformanceEvaluationoftheStreamControl
TransmissionProtocol,ProceedingsoftheIEEEConferenceonHighPerformanceSwitchingand
Routing,Heidelberg/Germany,pp.141–148.
URL:http://tdrwww.iem.uni-due.de/inhalt/forschung/sctp/ppframe.htm6.2
KAME:2006,WebpageoftheKAMEproject.
URL:http://www.kame.net5.3
Kent,S.andAtkinson,R.:1998a,IPAuthenticationHeader,StandardsTrackRFC2402,IETF.
URL:http://www.ietf.org/rfc/rfc2402.txt2.3.2,8,3
Kent,S.andAtkinson,R.:1998b,IPEncapsulatingSecurityPayload(ESP),StandardsTrackRFC
IETF.,2406URL:http://www.ietf.org/rfc/rfc2406.txt2.3.2,8
Kent,S.andAtkinson,R.:1998c,SecurityArchitecturefortheInternetProtocol,StandardsTrack
IETF.,2401RFCURL:http://www.ietf.org/rfc/rfc2401.txt2.3.2,8,3
Kremien,O.andKramer,J.:1992,MethodicalAnalysisofAdaptiveLoadSharingAlgorithms,IEEE
TransactionsonParallelandDistributedSystems3(6).
URL:http://citeseer.ist.psu.edu/kremien92methodical.html1.2.2
Lamping,U.,Sharpe,S.andWarnicke,E.:2006,WiresharkUsersGuide.
URL:http://www.wireshark.org/download/docs/user-guide-a4.pdf8
Le´on,J.,Fisher,A.L.andSteenkiste,P.:1993,Fail-safePVM:APortablepackageforDistributed
ProgrammingwithTransparentRecovery,TechnicalReportCMU-CS-93-124,SchoolofComputer
Science,CarnegieMellonUniversity,Pittsburgh,Pennsylvania/U.S.A.
URL:http://citeseer.ist.psu.edu/3289.html1.2.3
LKSCTP:2006,LinuxKernelSCTP.
URL:http://lksctp.sourceforge.net5.3
Loughney,J.,Stillman,M.,Xie,Q.,Stewart,R.andSilverton,A.:2005,ComparisonofProtocols
forReliableServerPooling,Internet-DraftVersion10,IETF,RSerPoolWorkingGroup.draft-ietf-
progress.inorkwrserpool-comp-10.txt,URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-comp-10.txt1.3,3.2
LVSProject:2003,LinuxVirtualServer.
URL:http://www.linuxvirtualserver.org1.2.1,3.1
libdict.2005,.:FMela,URL:http://freshmeat.net/projects/libdict/7.3

232

BIBLIOGRAPHY

Mogul,J.:2003,TCPofoadisadumbideawhosetimehascome,Proceedingsofthe9thWorkshop
onHotTopicsinOperatingSystems,Lihue,Hawaii/U.S.A.,pp.25–30.ISBN1-931971-17-X.
URL:http://citeseer.ist.psu.edu/657249.html7.2
Neely,M.S.:1996,AnAnalysisoftheEffectsofMemoryAllocationPolicyonStorageFragmentation,
Mastersthesis,UniversityofTexas,Austin,Texas/U.S.A.
URL:http://citeseer.ist.psu.edu/58412.html5
NIST:2002,SecureHashStandard(SHS),TechnicalReportFederalInformationProcessingStan-
dardsPublicationFIPS180-2,NationalInstituteofStandardsandTechnology.
URL:http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf4
NS-2:2003,TheNetworkSimulatorNS-2.
URL:http://www.isi.edu/nsnam/ns/6.3.1.3
Ong,L.andYoakum,J.:2002,AnIntroductiontotheStreamControlTransmissionProtocol(SCTP),
IETF.,3286RFCInformationalURL:http://www.ietf.org/rfc/rfc3286.txt2.4.3.1
OpenSourceInitiative:1999,BSDLicense.
URL:http://www.opensource.org/licenses/bsd-license.php9
OPnetTechnologies:2003,OPnetModeler.
6.3.1.2.opnet.comwwwhttp://URL:Peterson,L.:2004,PlanetLab:Version3.0,TechnicalReportPDN–04–023,PlanetLabConsortium.
URL:http://www.planet-lab.org/PDN/PDN-04-023/pdn-04-023.pdf8.10.5
Peterson,L.,Anderson,T.,Culler,D.andRoscoe,T.:2002,ABlueprintforIntroducingDisruptive
TechnologyintotheInternet,ProceedingsofHotNets–I,Princeton,NewJersey/U.S.A.
URL:http://www.planet-lab.org/PDN/PDN-02-001/pdn-02-001.pdf8.10.5
Peterson,L.,Bavier,A.,Fiuczynski,M.,Muir,S.andRoscoe,T.:2005,TowardsaComprehensive
PlanetLabArchitecture,TechnicalReportPDN–05–030,PlanetLabConsortium.
URL:http://www.planet-lab.org/PDN/PDN-05-030/pdn-05-030.pdf8.10.5
Peterson,L.andRoscoe,T.:2002,PlanetLabPhase1:TransitiontoanIsolationKernel,Technical
Consortium.PlanetLab,PDN–02–003ReportURL:http://www.planet-lab.org/PDN/PDN-02-003/pdn-02-003.pdf8.10.5
Peterson,L.andRoscoe,T.:2006,TheDesignPrinciplesofPlanetLab,OperatingSystemsReview
11–16.(1),40URL:http://www.planet-lab.org/PDN/PDN-04-021/pdn-04-021.pdf8.10.5
Plank,J.S.,Beck,M.,Kingsley,G.andLi,K.:1995,Libckpt:TransparentCheckpointingunderUnix,
ProceedingsoftheUSENIXWinter1995TechnicalConference,NewOrleans,Louisiana/U.S.A.,
213–224.pp.URL:http://citeseer.ist.psu.edu/plank95libckpt.html1.2.3,3.12.3
Postel,J.:1980,UserDatagramProtocol,StandardsTrackRFC768,IETF.
URL:http://www.ietf.org/rfc/rfc768.txt2.4.1

BIBLIOGRAPHY

Postel,J.:1981a,InternetControlMessageProtocol,StandardsTrackRFC792,IETF.
URL:http://www.ietf.org/rfc/rfc792.txt2.3.1
Postel,J.:1981b,InternetProtocol,StandardsTrackRFC791,IETF.
URL:http://www.ietf.org/rfc/rfc791.txt2.3.1
Postel,J.:1981c,TransmissionControlProtocol,StandardsTrackRFC793,IETF.
URL:http://www.ietf.org/rfc/rfc793.txt2.4.2,2.4.3.3

233

Postel,J.andReynolds,J.:1985,FileTransferProtocol(FTP),StandardsTrackRFC959,IETF.
URL:http://www.ietf.org/rfc/rfc959.txt2.1
RDevelopmentCoreTeam:2005,R:Alanguageandenvironmentforstatisticalcomputing,RFoun-
dationforStatisticalComputing,Vienna/Austria.ISBN3-900051-07-0.
URL:http://www.R-project.org6.3.2
Ramalho,M.,Xie,Q.,T¨uxen,M.andConrad,P.:2006,StreamControlTransmissionProtocol
(SCTP)DynamicAddressReconfiguration,TechnicalReportVersion15,IETF,TransportArea
WorkingGroup.draft-ietf-tsvwg-addip-sctp-15.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-ietf-tsvwg-addip-sctp-15.txt2.4.3.7,3.5,3.6.6

Rangoadingarajan,TCPM.,Bohra,ProcessingA.,inInternetBanerjee,ServK.,ers.Carrera,Design,E.andImplementation,Bianchini,R.:and2002,Performance,TCPServTecers:hnicalOf-
rURL:eport,http://RutgersciteseerUniversity..ist.psu.edu/rangarajan02tcp.html7.2

Rathgeb,E.P.:1999,TheMainStreetXpress36190:ascalableandhighlyreliableATMcoreservices
switch,InternationalJournalofComputerandTelecommunicationsNetworking31(6),583–601.
URL:http://tdrwww.exp-math.uni-essen.de/inhalt/forschung/cnrathgeb.pdf3.1
Renier,T.,Schwefel,H.-P.,Bozinovski,M.,Larsen,K.,Prasad,R.andSeidl,R.:2005,Distributed
redundancyorclustersolution?Anexperimentalevaluationoftwoapproachesfordependable
mobileInternetservices,∼LectureNotesinComputerScience3335.ISBN978-3-540-24420-2.
URL:http://kom.aau.dk/marjanb/papers/isasspringer05.pdf3.6.2
Riegel,M.andT¨uxen,M.:2006,MobileSCTP,TechnicalReportVersion06,IETF,IndividualSub-
mission.draft-riegel-tuexen-mobile-sctp-06.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-riegel-tuexen-mobile-sctp-06.txt2.4.3.7,3.6.6

Rijsinghani,A.:1994,ComputingtheInternetChecksumviaIncrementalUpdate,InformationalRFC
IETF.,1624URL:http://www.ietf.org/rfc/rfc1624.txt3.10.5,4.4.4
Rivest,R.:1992,TheMD5Message-DigestAlgorithm,InformationalRFC1321,IETF.
URL:http://www.ietf.org/rfc/rfc1321.txt4

Roscoe,T.:2002,PlanetLabPhase0:TechnicalSpecification,TechnicalReportPDN–02–002,Plan-
Consortium.etLabURL:http://www.planet-lab.org/PDN/PDN-02-002/pdn-02-002.pdf8.10.5

234

BIBLIOGRAPHY

Rosenberg,J.,Schulzrinne,H.,Camarillo,G.,Johnston,A.,Peterson,J.,Sparks,R.,Handley,M.and
Schooler,E.:2002,SIP:SessionInitiationProtocol,StandardsTrackRFC3261,IETF.
URL:http://www.ietf.org/rfc/rfc3261.txt3.6.2
Sadasivan,G.,Brownlee,N.,Claise,B.andQuittek,J.:2006,ArchitectureforIPFlowInforma-
tionExport,TechnicalReportVersion10,IETF,IPFlowInformationExportWG.draft-ietf-ipfix-
progress.inorkwarchitecture-10.txt,URL:http://www.watersprings.org/pub/id/draft-ietf-ipfix-architecture-10.txt3.6.3
Seidel,R.andAragon,C.:1996,Randomizedsearchtrees,Algorithmica16(4),464–497.
URL:http://citeseer.ist.psu.edu/seidel96randomized.html4.4.7
Seligman,E.andBeguelin,A.:1994,High-LevelFaultToleranceinDistributedPrograms,Technical
ReportCMU-CS-94-223,SchoolofComputerScience,CarnegieMellonUniversity,Pittsburgh,
ania/U.S.A.PennsylvURL:http://citeseer.ist.psu.edu/seligman94highlevel.html1.2.3
SETIProject:2003,SETI@home:SearchforExtraterrestrialIntelligenceathome.
URL:http://setiathome.ssl.berkeley.edu3.6.5
Seward,J.:2005,bzip2-Aprogramandlibraryfordatacompression,Snowbird,Utah/U.S.A.
URL:http://www.bzip.org/1.0.3/bzip2-manual-1.0.3.html6.3.4
Seward,J.andNethercote,N.:2005,UsingValgrindtodetectundefinedvalueerrorswithbit-
precision,ProceedingsoftheUSENIX05AnnualTechnicalConference,Anaheim,Californi-
17–30.pp.a/U.S.A.,URL:http://valgrind.org/docs/memcheck2005.pdf7,7,4
Silverton,A.,Dreibholz,T.andT¨uxen,M.:2004,Privatecommunicationatthe61stIETFmeeting,
5DC/U.S.A.ashington,WSilverton,A.,Dreibholz,T.,T¨uxen,M.andXie,Q.:2005,ReliableServerPoolingSocketsAPI
Extensions,Internet-DraftVersion00,IETF,RSerPoolWorkingGroup.draft-ietf-rserpool-api-
progress.inorkw00.txt,URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-api-00.txt5.6.2,5.6.2.2,10.1.4
Silverton,A.andT¨uxen,M.:2005,ReliableServerPoolingImplementations,Proceedingsofthe63rd
IETFMeeting,Paris,France.
URL:http://www3.ietf.org/proceedings/05aug/slides/rserpool-2/rserpool-3.ppt5.9.1
Stevens,W.,Fenner,B.andRudoff,A.:2003,UnixNetworkProgramming,Addison-WesleyProfes-
sional.ISBN0-131-41155-1.5.6.2.2,5.6.2.2
Stewart,R.,Arias-Rodriguez,I.,Poon,K.,Caro,A.andT¨uxen,M.:2006,StreamControlTransmis-
sionProtocol(SCTP)SpecificationErrataandIssues,InformationalRFC4460,IETF.
URL:http://www.ietf.org/rfc/rfc4460.txt2.4.3.1
Stewart,R.,Lei,P.andT¨uxen,M.:2006a,StreamControlTransmissionProtocol(SCTP)Packet
DropReporting,TechnicalReportVersion04,IETF,IndividualSubmission.draft-stewart-sctp-
progress.inorkwpktdrprep-04.txt,URL:http://www.watersprings.org/pub/id/draft-stewart-sctp-pktdrprep-04.txt2.4.3.7,3.12.2

BIBLIOGRAPHY

235

Stewart,R.,Lei,P.andT¨uxen,M.:2006b,StreamControlTransmissionProtocol(SCTP)Stream
Reset,TechnicalReportVersion02,IETF,IndividualSubmission.draft-stewart-sctpstrrst-02.txt,
progress.inorkwURL:http://www.watersprings.org/pub/id/draft-stewart-sctpstrrst-02.txt2.4.3.7
Stewart,R.,Ramalho,M.,Xie,Q.,T¨uxen,M.andConrad,P.:2004,StreamControlTransmission
Protocol(SCTP)PartialReliabilityExtension,StandardsTrackRFC3758,IETF.
URL:http://www.ietf.org/rfc/rfc3758.txt2.4.3.7
Stewart,R.,Xie,Q.,Morneault,K.,Sharp,C.,Schwarzbauer,H.,Taylor,T.,Rytina,I.,Kalla,M.,
Zhang,L.andPaxson,V.:2000,StreamControlTransmissionProtocol,StandardsTrackRFC
IETF.,2960URL:http://www.ietf.org/rfc/rfc2960.txt2.4.3.1
Stewart,R.,Xie,Q.,Stillman,M.andT¨uxen,M.:2006a,AggregateServerAccessProtcol(ASAP),
TechnicalReportVersion13,IETF,RSerPoolWorkingGroup.draft-ietf-rserpool-asap-13.txt,work
progress.inURL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-asap-13.txt3.3,3.9,3.9.2.1,
3.9.2.1,3.9.2.3,3.9.3.1,3.9.4,3.9.5.2,7.5.2
Stewart,R.,Xie,Q.,Stillman,M.andT¨uxen,M.:2006b,AggregateServerAccessProtocol(ASAP)
andEndpointHandlespaceRedundancyProtocol(ENRP)Parameters,Internet-DraftVersion10,
IETF,RSerPoolWorkingGroup.draft-ietf-rserpool-common-param-10.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-common-param-10.txt3.3,3.8,
3.9,3.9.2.1,3.9.2.1,3.9.2.3,3.9.3.1,3.9.5.2,3.10,3.10.2.1
Stewart,R.,Xie,Q.,Yarroll,Y.,Wood,J.,Poon,K.andT¨uxen,M.:2006,SocketsAPIExtensions
forStreamControlTransmissionProtocol(SCTP),Internet-DraftVersion12,IETF,TransportArea
WorkingGroup.draft-ietf-tsvwg-sctpsocket-12.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-ietf-tsvwg-sctpsocket-12.txt5.3,5.4.1,8.10.2.1
Stillman,M.,Gopol,R.,Sengodan,S.,Guttman,E.andHoldrege,M.:2005,ThreatsIntroduced
byRSerPoolandRequirementsforSecurity,Internet-DraftVersion05,IETF,RSerPoolWorking
draft-ietf-rserpool-threats-05.txt.Group.URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-threats-05.txt3.13
Stone,J.,Stewart,R.andOtis,D.:2002,StreamControlTransmissionProtocol(SCTP)Checksum
Change,StandardsTrackRFC3309,IETF.
URL:http://www.ietf.org/rfc/rfc3309.txt2.4.3.1
Subramanian,L.,Padmanabhan,V.andKatz,R.:2002,GeographicPropertiesofInternetRouting,
ProceedingsoftheUSENIXAnnualTechnicalConference,Monterey,California/U.S.A.,pp.243–
1-880446-00-6.ISBN259.URL:http://citeseer.ist.psu.edu/subramanian02geographic.html8.10.2.1
Sultan,F.,Srinivasan,K.,Iyer,D.andIftode,L.:2002,MigratoryTCP:HighlyavailableInternet
servicesusingconnectionmigration,ProceedingsoftheICDCS2002,Vienna/Austria,pp.17–26.
URL:http://citeseer.ist.psu.edu/sultan02migratory.html1.2.1,1.2.4
Tanenbaum,A.:1996,ComputerNetworks,PrenticeHall,UpperSaddleRiver,NewJersey/U.S.A.
2.2,20-13-349945-6.ISBN

236

BIBLIOGRAPHY

Tomonori,F.andMasanori,O.:2003,PerformanceoptimizedsoftwareimplementationofiSCSI,
Proceedingsofthe12thInternationalConferenceonParallelArchitecturesandCompilationTech-
niques,NewOrleans,Louisiana/U.S.A.
URL:http://citeseer.ist.psu.edu/685948.html7.2
T¨uxen,M.:2001,ThesctplibPrototype.
URL:http://www.sctp.de/sctp.html5.1,5.3
T¨uxen,M.:2003,LISPSimulationPackage.
URL:http://sctp.fh-muenster.de/sim.html6.3.1.1
T¨uxen,M.andDreibholz,T.:2005,ReliableServerPoolingPolicies,Internet-DraftVersion00,IETF,
IndividualSubmission.draft-tuexen-rserpool-policies-00.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-tuexen-rserpool-policies-00.txt3.11,8.8.1,
8.12.2.4T¨uxen,M.andDreibholz,T.:2006a,Privatecommunicationatthe8thSCTPBakeoff,Vancouver/-
5.9.3Canada.T¨uxen,M.andDreibholz,T.:2006b,ReliableServerPoolingPolicies,Internet-DraftVersion02,
IETF,RSerPoolWorkingGroup.draft-ietf-rserpool-policies-02.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-policies-02.txt2,3.11,3.11.1,
3.11.2,3.11.3,4.4.2.2,7.4.5,8.8.1,8.8.3,8.12.2.4,10.1.4
T¨uxen,M.,Dreibholz,T.,Silverton,A.,Coene,L.,Xie,Q.,Stewart,R.andLei,P.:2004,Private
communicationatthe60thIETFmeeting,SanDiego/California,U.S.A.5
T¨uxen,M.andStewart,R.:2005,UDPEncapsulationofSCTPPackets,TechnicalReportVersion00,
IETF,IndividualSubmission.draft-tuexen-sctp-udp-encaps-00.txt,workinprogress.
URL:http://www.watersprings.org/pub/id/draft-tuexen-sctp-udp-encaps-00.txt8.10.5.1
T¨uxen,M.,Stewart,R.andLei,P.:2005,AuthenticatedChunksforStreamControlTransmission
Protocol(SCTP),TechnicalReportVersion03,IETF,TransportAreaWorkingGroup.draft-tuexen-
progress.inorkwsctp-auth-chunk-03.txt,URL:http://www.watersprings.org/pub/id/draft-tuexen-sctp-auth-chunk-03.txt2.4.3.7
T¨uxen,M.,Xie,Q.,Stewart,R.,Shore,M.,Loughney,J.andSilverton,A.:2006,Architecturefor
ReliableServerPooling,TechnicalReportVersion11,IETF,RSerPoolWorkingGroup.draft-ietf-
progress.inorkwrserpool-arch-11.txt,URL:http://www.watersprings.org/pub/id/draft-ietf-rserpool-arch-11.txt1.2.1,1.2.4,3.3,3.6.1,
3.6.4T¨uxen,M.,Xie,Q.,Stewart,R.,Shore,M.,Ong,L.,Loughney,J.andStillman,M.:2002,Require-
mentsforReliableServerPooling,InformationalRFC3227,IETF.
URL:http://www.ietf.org/rfc/rfc3237.txt1.3,3.2
Unurkhaan,E.:2005,SecureEnd-to-EndTransport-AnewsecurityextensionforSCTP,PhDthesis,
UniversityofDuisburg-Essen,InstituteforExperimentalMathematics.
URL:http://miless.uni-duisburg-essen.de/servlets/DocumentServlet?id=119812.4.3.7

BIBLIOGRAPHY

237

Uyar,¨U.,Zheng,J.,Fecko,M.A.andSamtani,S.:2003a,PerformanceStudyofReliableServer
Pooling,ProceedingsoftheIEEENCAInternationalSymposiumonNetworkComputingandAp-
plications,Cambridge,Massachusetts/U.S.A.,pp.205–212.ISBN0-7695-1938-5.
URL:http://www.cis.udel.edu/∼fecko/paperspdf/nca03rsprev.pdf3.6.7,9.5
Uyar,¨U.,Zheng,J.,Fecko,M.A.andSamtani,S.:2003b,ReliableServerPoolinginHighlyMobile
WirelessNetworks,ProceedingsoftheIEEEInternationalSymposiumonComputersandCommu-
nications,Kemer-Antalya/Turkey,pp.627–632.ISBN0-7695-1961-X.
URL:http://www.cis.udel.edu/∼fecko/paperspdf/iscc03rev.pdf3.6.7,9.5
Uyar,¨U.,Zheng,J.,Fecko,M.A.,Samtani,S.andConrad,P.:2003,ReliableServerPoolingfor
FutureCombatSystems,ProceedingsoftheIEEEMILCOMMilitaryCommunicationsConference,
Vol.2,Boston,Massachusetts/U.S.A.,pp.927–932.
URL:http://www.cis.udel.edu/∼fecko/paperspdf/milcom03mainrev.pdf3.6.7
Uyar,¨U.,Zheng,J.,Fecko,M.A.,Samtani,S.andConrad,P.:2004,EvaluationofArchitecturesfor
ReliableServerPoolinginWiredandWirelessEnvironments,IEEEJSACSpecialIssueonRecent
AdvancesinServiceOverlayNetworks22(1),164–175.
URL:http://www.cis.udel.edu/∼fecko/paperspdf/jsac04rev.pdf1.2.1,3.6.7,9.5
ValgrindDevelopers:2005,ValgrindHome.
URL:http://www.valgrind.org7,7,4
Varga,A.:2005a,OMNeT++DiscreteEventSimulationSystem.
URL:http://www.omnetpp.org4.2,4.4.8,6.3.1.4
Varga,A.:2005b,OMNeT++DiscreteEventSimulationSystemUserManual-Version3.2,Technical
UniversityofBudapest/Hungary.
URL:http://www.omnetpp.org/doc/manual/usman.html4.2,4.4.8,6.3.1.4,6.3.3
Wagner,A.:2006,Interoperability-Test.
URL:http://www.cs.ubc.ca/∼wagner/SCTP/5.9.4
Wetherall,D.andLindblad,C.:1995,ExtendingTclfordynamicobject-orientedprogramming,Pro-
ceedingsoftheTcl/TkWorkshop,Toronto/Canada,pp.173–182.ISBN1-880446-72-3.
URL:http://citeseer.ist.psu.edu/wetherall95extending.html6.3.1.3
Wiesmann,M.,Pedone,F.,Schiper,A.,Kemme,B.andAlonso,G.:2000,UnderstandingReplica-
tioninDatabasesandDistributedSystems,Proceedingsofthe20thInternationalConferenceon
DistributedComputingSystems,Taipei/Taiwan,pp.264–274.
URL:http://citeseer.ist.psu.edu/393991.html1.2.1
Williams,T.andKelley,C.:2003,GNUPlotHomepage.
6.3.2.gnuplot.infowwwhttp://URL:Wireshark:2006,Wireshark:TheWorldsMostPopularNetworkProtocolAnalyzer.
URL:http://www.wireshark.org8
Xie,Q.,Stewart,R.,Stillman,M.,T¨uxen,M.andSilverton,A.:2006,EndpointHandlespaceRe-
dundancyProtocol(ENRP),Internet-DraftVersion13,IETF,RSerPoolWorkingGroup.draft-ietf-
progress.inorkwrserpool-enrp-13.txt,

238

3.10URL:,3.10.2.1http://,www3.10.2.3.water,4,springs.or3.10.3,g/3.10.5pub/

id/aft-drietf-serpool-renrp-13.txtBIBLIOGRAPHY

,3.7.2,3.7.1.4,2,3.3

Xie,Q.andYarrol,L.:2004,RSerPoolRedundancy-modelPolicy,TechnicalReportVersion02,
IETF,IndividualSubmission.draft-xie-rserpool-redundancy-model-02.txt.
URL:http://www.watersprings.org/pub/id/draft-xie-rserpool-redundancy-model-02.txt3.12.4

ReliableZhang,Essen,Y.:Institut2004,f¨urDistributedExperimentelleComputingMathematik.mit

Server8.3,3.6.5

oolingP,

sMaster

thesis,

at¨ersitvUni

Index

195Restart,andAbort2Principle,Abort-and-Restart92accept(),Active/Active,3,65
Active/Standby,3,65
AdaptivePolicy,60
29,16,Add-IP10Scope,Address33erification,VAddress12Checksum,-32AdlerAggregateServerAccessProtocol,22,31,38–39
48Card,Business46Cookie,46Echo,Cookie42gistration,Dere42Response,gistrationDereEndpointKeep-Alive,42,58
EndpointKeep-AliveAck,42
44Unreachable,Endpoint43Resolution,Handle43Response,ResolutionHandle48ill,WLast39gistration,Re33erification,VAddressASAPTransportParameter,41
39Life,gistrationRe41Use,ransportTUserTransportParameter,39
41Response,gistrationRe44Announce,erServAmigaResearchOperatingSystem,seeAROS
82aOS,AmigModule,ProcessClientApplication109ApplicationLayer,8,9
ApplicationServerProcessModule,107
82OS,ARASAP,seeAggregateServerAccessProtocol

89Cache,ASAP83Assertion,13Association,16Extension,ChunkAuthenticationAvailability,1,19
114Back-Linking,eofBak96f,BasicModeAPI,89,90
30ork,NetwBattlefield78ginCalculateInternetChecksum16(),beBerkeleySoftwareDistribution,96
BinaryTree,80,114
92bind(),30e,MakbeforeBreakBSD,seeBerkeleySoftwareDistribution
109ge,CachePur138CalcAppProtocol,141CalcAppAbort,140CalcAppAccept,141CalcAppComplete,140e,veepAliCalcAppK140eAck,veepAliCalcAppK140CalcAppReject,138CalcAppRequest,138Calculations,140,CapacityCapacityScaleFactor,179
46Checkpoint,3Checkpointing,Checksum12-32,Adler12CRC-32,InternetChecksum,11,12,56,78
77,56(Handlespace),Checksum13Chunk,Chunks

239

24013CK,ACOOKIEHEARCOOKIETBEATECHO,,1513
HEARTBEATACK,15
INITINIT,A13CK,13
CleanClientShutdoApplicationwn,141ProcessModule,141
Client-BasedStateSharing,46,138
Cmdenclose(),v,91,10192
27Point,Collector102Module,Compound134Attack,xityCompleComputationalConfidenceIntervals,102,104
91connect(),103(OMNeT++),Connection11Connection-Less,12Connection-Oriented,ControlConsistencyChannel,Checking23,45Function,83
13Chunk,ControlControllerControlledFModule,ailure,187104,105
Cookie,COOKIE13A,46CK,13
13ECHO,COOKIECookiePCookieMaxCalculations,arameter,46140
140ime,CookieMaxT20A,CORB12Checksum,CRC-32163create-binaries,163create-hosts-files,104,createsummary247itae,VCurriculum45,23Channel,Data13Chunk,Data8,LayerLinkData11Datagram,Debugging,84,95,111
DefaultPolicy,60
13Service,ofDenialDereDepartmentofgisterPoolElement,Defense,7109
109gisterPoolElementAck,Dere

INDEX

118Operation,gistrationDere105Algorithm,sDijkstra87,Dispatcher157Distance,DistancePenaltyFactor,159
DistributedComputing,28,67
DoD,DNS,seeseeDomainDepartmentNameofDefenseSystem
20System,NameDomainServiceofDenialseeDoS,DPF,seeDistancePenaltyFactor
viiAnnelore,Dreibholz,Dreibholz,ErnstG¨unter,vii
247Thomas,Dreibholz,16Extension,ReconfigurationAddressDynamicviiKlaus,Echtle,EndpointHandlespaceRedundancyProtocol,22,
49,3154Request,ableTHandleHandleHandleTableUpdate,55Response,55
InitTakeover,58
InitTakeoverAck,58
PeerPeerListListRequest,Response,5252
49Presence,TakeoverServer,58
109EndpointUnreachable,EnhancedModeAPI,89,91
ENRP,seeEndpointHandlespaceRedundancyP.
21,erServENRP95Ethereal,EventExtensibilityCallback,,see87RSerPoolRequirements
Failover(Path),15
Failover(Session),23,36,46,48
FailoverDelay,186
FailoverProcedure,23
FailoverTime,143
62ailure,F62Model,ailureFFFCS,aultTseeolerance,Future3CombatSystem
FES,seeFutureEventSet
8,LayerFinancial

INDEX

78finishCalculateInternetChecksum16(),103Machine,StateFinite37Flags,58,42Home,57,55More,OwnReject,41Children,55Only,57
51Required,Response11Label,wFlo92Graphics,Fractal8Frame,FSM,FutureseeCombatFiniteStateSystem,30Machine
103Set,entEvFuture103Gate,25,eeperGatek90getaddrinfo(),90gethostbyname(),eeperGatekseeGK,GNUGLib,67Octa,v82e,102
102Plot,GNU102R,GNU143ime,TGoodput22Resolution,Handle123Operation,ResolutionHandle21Handlespace,68Datatype,Abstract56Audit,56Checksum,83Checks,yConsistenc21,yHierarch70Design,Implementation68Structure,57Synchronization,OperationHandlespace118gistration,Dere123Resolution,Handle119gistration,Re-Re117gistration,Re117gistration,gistration/DereRe124Synchronization,HandlingTimer,Speed,120142,143
143ime,THandling

241

eHandshak3-Way,12,13
13,ay4-W15Blocking,Head-of-Line15,TTBEAHEARHEARTBEATACK,15
HomeHost-to-NetwFlag,42ork,58Layer,8
69gisterPE(),hsMgtDere70hsMgtHandleResolution(),hsMgtThsMgtReraverse(),gisterPE(),7069
ICMP,seeInternetControlMessageProtocol
ICMPv4,seeInternetControlMessageProtocol,
4ersionVICMPv6,seeInternetControlMessageProtocol,
6ersionVIETF,seeInternetEngineeringTaskForce,seeIn-
ternetEngineeringTaskForce
IGMP,seeInternetGroupManagementProtocol
Implementation88,LibraryPU/PE87,gistrarRe88rsplib,70Set,xInde13,INIT13CK,AINIT163install-binaries,8ace,InterfInternationalStandardsOrganization,7
11Checksum,InternetInternetControlMessageProtocol,Version4,10
InternetControlMessageProtocol,Version6,11
InternetEngineeringTaskForce,9
InternetInternetGroupOperatingManagementSystem,82Protocol,10
10Protocol,InternetInternetProtocol,Version4,10
InternetProtocol,Version6,10
IOS,InternetwseeorkInternetLayer,9OperatingSystem
ProtocolInternetsee,IPIPFlowInformationExport,27
IPFIX,seeIPFlowInformationExport
IPv4,seeInternetProtocol,Version4
IPv6,seeInternetProtocol,Version6

242

25ISDN,ISO,seeInternationalStandardsOrganization
viiAndreas,,JungmaierKeep-Alive(ASAP),22
Keep-Alive(Session),140
104Module,LANLeastLeaf-Linking,Used,6181,134
LeastUsedwithDPF,159
RequirementsRSerPoolseeLightweight,LISPLinearList,Simulation80,114Package,100
92listen(),Load,LLC,60see,145LogicalLinkControl
27,BalancerLoadLoadBalancing,2,27,60
LoadDistribution,2,60
61Increment,Load159,LoadDPF8Control,LinkLogicalUsedLeastseeLU,LU-DPF,seeLeastUsedwithDPF
MAC,seeMediumAccessControl
89Thread,LoopMainMakeManagementbeforeBreak,Information30Base,207
92Mandelbrot,MAX-BMAX-TIME-LASTAD-PE-REPOR-HEARD,T,3552
52MAX-TIME-NO-RESPONSE,125MaxElementsPerHTRequest,151,122MaxHResItems,MaximumMaxIncrement,T122ransmission,151Unit,199
140MaxRequests,MeanTimeBetweenFailure,187
MediaGateway,25
MediaGatewayController,25
8Control,AccessMedium82Unit,ManagementMemoryMentorMessagePR,(ASAP54andENRP),37
103(OMNeT++),Message

INDEX

12,11Message-Oriented,MG,seeMediaGateway
MGC,seeMediaGatewayController
MIB,MMU,seeseeMemoryManagementManagementInformationUnitBase
MobileSCTP,17,29
MoreModuleFlag,55,(OMNeT++),57102
MTBF,seeMeanTimeBetweenFailure
MTU,seeMaximumTransmissionUnit
15Multi-Homing,15Multi-Streaming,140asking,Multi-T21,erServName21Namespace,NetNED,seeUtilities,Netw87orkDescription
102(OMNeT++),orkNetw103Description,orkNetwNetworkGateway,25
8,LayerorkNetwNGW,seeNetworkGateway
Non-AdaptivePolicy,60
NS,NS-2,see101NameServer
27Point,ationObservOpenOMNeT++,Systems101,102InterconnectionModel,7
8,LayerApplicationDataLogicalLinkLinkLayer,8Control,8
8Control,AccessMedium8,LayerFinancialPhNetwysicalorkLayerLayer,,88
8,LayerPolitical8,LayerPresentationTSessionransportLayerLayer,8,8
21Scope,Operation100,ModelerOPnetOrderedDelivery,15
157Orthodrome,OSIModel,seeOpenSystemsInterconnection
Model

INDEX

OwnChildrenOnlyFlag,54,57
79Set,Ownership8et,ackPPacketDropExtension,17
arameterP51,arameterPChecksumErrorCookiePParameterarameter,,3846
IPv4AddressParameter,41
IPv6AddressParameter,41
PolicyParameter,41
PoolElementIdentifierParameter,41
PoolElementParameter,39
PoolHandleParameter,39
SCTPTransportParameter,39
ServerInformationParameter,51
TCPTransportParameter,39
38,VTLUDPTransportParameter,39
16Extension,ReliabilityartialP15ath,PPayloadProtocolIdentifier,15
ElementPoolseePE,PEPEDEL,ADD,seeseeUpdateUpdateActionAction
7,PeerPeerTPEER-HEARable,49TBEAT-CYCLE,52
MetricPerformance113Management,HandlespaceServiceServiceProUser,vider142,142
HandlePoolseePH,8,LayerysicalPh163PlanetLab,UsedLeastPriorityseePLU,109yUpdate,Polic8,LayerPolitical21Pool,PoolPoolElementElement,21ASAPModule,107
21,IdentifierElementPool107Module,ElementPool21Handle,PoolPoolMemberSelectionPolicy,seePoolPolicy
PoolPolicy,22,59–62,71

243

60e,vAdaptiDefaultPolicy,60
61Used,LeastLeastUsedwithDPF,159
60e,vNon-Adapti61Used,LeastPriority61Random,60Robin,Round61Random,eightedWWeightedRandomwithDPF,159
60Robin,RoundeightedWPoolRegistrar,seeRegistrar
70Set,Pool22,UserPoolPoolPoolUserUserASAPModule,109Module,109
11Port,ElementPoolProxyseePPE,PPID,seePayloadProtocolIdentifier
PR,PPU,seeseeReProxygistrarPoolUser
21ID,PRPr-SCTPPresentation,16Layer,8
15ath,PPrimaryPriorityLeastUsed,61,181
143Speed,Processing143ime,TProcessing7Protocol,7Stack,ProtocolProxyProxyPoolPoolUserElement,,2323
PSTN,seePublicSwitchedTelephoneNetwork
UserPoolseePU,23Cache,PU-side146Ratio,PU:PEPublicSwitchedTelephoneNetwork,25
viiJobin,Pulinthanath,146oPERatio,puTQoS,QualityseeofQualityService,of11Service
142,DelayQueuingRandomseeRAND,61Random,vii.,PErwinRathgeb,

244

119Operation,gistrationRe-Rerecv(),Real-T91ime,,92seeRSerPoolRequirements
Red-BlackTree,80,115
85Implementation,Reference109gisterPoolElement,Re109gisterPoolElementAck,ReRegistrar,21,31
21ID,gistrarRe88Management,gistrarReReRegistrargistrarTable,Module,89106
RegistrarTableModule,106
39Life,gistrationReReRegistrationgistration/DereOperation,gistration117Operation,117
ReRejectgressionFlag,T41ests,,5584
19,ReliabilityReliableServerPooling,21–23
21Architecture,20Definition,23Stack,Protocol85Implementation,Reference20Requirements,WorkingGroup,IETF,20
12ransport,TReliable138(CalcAppProtocol),RequestRequestRequestforInterval,Comments,1389
138Queue,Request138Size,RequestRequestRequestIntervSize:PEalCapacity(CalcAppProtocol),Ratio,144138
140,RequestRetryDelay138(CalcAppProtocol),RequestSize109ResetPoolElement,109ResetPoolElementAck,51Flag,RequiredResponseRFC,Rohde,seeSebastian,RequestviiforComments
60Robin,Round114,RouterRobinRoundseeRR,RSerPool,seeReliableServerPooling
92et,SockRSerPool92accept(),rsp

INDEX

92close(),rsp92connect(),rsp90getaddrinfo(),rsprsprsppepefreailure(),gister(),9091
92poll(),rsp92recv(),rsp92send(),rsp92cookie(),sendrsp92et(),sockrsp85prototype,rsplib99rspsim,17,S-SCTPSASP,seeServer/ApplicationStateProtocol
RequirementsRSerPoolsee,ScalabilitySCTP,seeStreamControlTransmissionProtocol
ExtensionSCTP16Chunk,Authentication(Add-IP),ReconfigurationAddressDynamicPacket16Drop,17
PartialReliability(Pr-SCTP),16
17(S-SCTP),Secure-SCTPSCTPStreamOfoadReset,Engine,17114
163Shell,Secure17Extension,Secure-SCTP8gment,Se71Procedure,Selection71ault,Def75Random,send(),Selection91,Set,9270
NumberSequence72Pool,71Element,PoolServerApplicationProcessModule,141
19Pooling,erServServServerTable,er/Application44StateProtocol,27
106erHuntRequest,Serv106erHuntResponse,Serv109erSelection,Serv7Service,20Protocol,LocationService8Session,

INDEX

45(ASAP),Session26Protocol,InitiationSessionSessionSessionLayerLayer,8(ASAP),45
SessionKeepAliveInterval,140
SessionKeepAliveTimeout,140
79,70Set,SG,seeSignallingGateway
107,imerwnTShutdo85Siemens,SignallingGateway,25
127,No.SystemSignalling62ailure,FSilent102Module,SimpleRequirementsRSerPoolsee,SimplicitySinglePointofFailure,21
SIP,seeSessionInitiationProtocol
26,ProxySIP163Slice,SLP,seeServiceLocationProtocol
92,91et(),sockSOE,seeSCTPOfoadEngine
SoftwSortingarePOrderatent,,7056
SPoF,seeSinglePointofFailure
SS7,seeSignallingSystemNo.7
SS7SSH,seeprotocol,Secure25Shell
103State,StableStaleCacheValue,36,168
Standby3Cold,3Hot,3arm,WStartupStartupTDelayimer,,107142
46Cookie,State15Stream,StreamControlTransmissionProtocol,12
17Extension,ResetStream12Stream-Oriented,13SYN,13Flooding,SYN124Operation,Synchronization146,142Utilization,SystemT1-ENRPRequest,43

41gistration,T2-Re42gistration,T3-Dere42gistration,T4-Rere44erannounce,T6-Serv44T7-ENRPoutdate,T¨uxen,Michael,vii
TakeoverProcedure,58
Target(Takeover),58
TargetSystemUtilization,142
114Engine,oadOfTCPTCP/IPModel,seeTCP/IPReferenceModel
TCP/IPReferenceModel,7,8
9,LayerApplicationInternetwHost-to-NetworkorkLayer,Layer9,8
9,LayerransportTimerT70,ExpiryCacheKeep-AliveTimeout,69,120
Keep-AliveTransmission,69,120
LifetimeExpiry,69,120
76Management,76Schedule,87Mgt,imerT120Operation,imerTTkenv,101,111
38,VTLTOE,seeTCPOfoadEngine
103State,ransientTTransportLayer,8,9
105Module,NoderansportT115,80reap,TUDP,seeUserDatagramProtocol
UnorderedDelivery,11,15
11ransport,TUnreliable55Action,Update11Protocol,DatagramUser60Utilization,Valgrind,84,95,111
ValidationStrategy,83
12ag,TerificationVVoiceoverIP,26
VoIP,seeVoiceoverIP
eightW72Element,Pool

245

246

72Sum,eightW

159,eightDPFW

W61Random,eighted

withRandomeightedW

,DPF

W60Robin,Roundeighted

95ireshark,W

159

eightedWseeWRAND,Random

WRAND-DPF,seeWeightedRandom

WRR,

see

eightedW

Round

Robin

with

DPF

INDEX

itaeVCurriculum

Name:

29.09.1976

09/1987-08/1983

06/1993-09/1987

07/1996-08/1993

04/2001-08/1996

10/1998since

04/2001since

2000

05/2001since

2002since

DreibholzThomas

borninBergneustadt,Germany

StudentattheGrundschuleBielstein(PrimarySchool),Germany

StudentattheRealschuleBielstein(JuniorHighSchool),Germany
∼rswiehl.schule.de/.gm.nwhttp://www

StudentattheGymnasiumWiehl(HighSchool),Germany
∼gymwiehl.schule.de/.gm.nwhttp://www

StudentofComputerScienceattheUniversityofBonn,Germany
bonn.de.uni-http://www

Vordiplom(BachelorsDegree)ofComputerScienceatthe
UniversityofBonn,Germany

Diplom(MastersDegree)ofComputerScienceatthe
UniversityofBonn,Germany

ResearchAssistantintheComputerNetworkingTechnologyGroupatthe
DepartmentofComputerScienceIVoftheUniversityofBonn,Germany
bonn.de/IV/.informatik.uni-http://web

ResearchAssociateandPh.D.Studentinthe
ComputerNetworkingTechnologyGroupatthe
InstituteforExperimentalMathematicsofthe
UniversityofDuisburg-EsseninEssen,Germany
essen.demath.uni-xp-.ehttp://tdrwww

Cisco™CertifiedNetworkAssociate(CCNA)andCisco™Certified
UnivAcademyersityofInstructorDuisbur(CCAI)g-EssenatintheEssen,Cisco™GermanNetwyorkingAcademyofthe
essen.dehttp://cna.uni-

247

  • Accueil Accueil
  • Univers Univers
  • Livres Livres
  • Livres audio Livres audio
  • Presse Presse
  • BD BD
  • Documents Documents