Automatic regression benchmark system

Voin - Nicolas Desprès

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

15 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

Automaticregressionbenchmarksystem
NicolasDesprès
oTechnicalReportn 0513,June2005
revision 921
Regression benchmark is a part of regression testing that aims at an automatic detection of performance
regrduringapplicationdevelopment. Thegoalistodetectassoonaspossiblethesmallestchangeof
performance. Weneedprecisemeasurementofsmallpartsoftheprogramtoachievethat. Although,many
benchmarksystemsalreadyexistsnonearefullyautomatedand/oradaptedtowiderangeofapplications.
However, automation is a crucial requirement in order to detect regression as soon as possible. This paper
tackles generalities about performance measurements, then gives the requirements of such a system, and
ﬁnallyproposesamodeling.
L’évaluation des régressions de performance d’une application fait partie intégrante de la phase de test de
régression.Lebutestd’exprimerleplusprécisémentpossiblelesdifférencesdeperformanceentredeuxver
sions.Parprécision,nousentendonsàlafoislapertinencedel’estimateurdetempsutiliséetlagranularité
despartiesévaluées.
Bienquedenombreuxsystèmesd’évaluationcomparativedesperformancesexistentdéjà,peusontcom
plètementautomatiséset/ouadaptésàdifférentessortesd’applications.Pourtant,l’automatisationdecette
phaseestcrucialeaﬁndedétecterleplustôtpossiblelespertesdeperformance.
Cet exposé présente tout d’abord les prérequis à la mise en place d’un tel système. Puis son architec
ture ainsi que son application au projet Transformers. Ensuite, nous comparerons différentes techniques
d’estimations ...

Sujets

Linternaute.com

Régression linéaire

Arms

African Institute for Mathematical Sciences

Printze

Dépôt (informatique)

Informations

Publié par	Voin
Nombre de lectures	86
Langue	English

Extrait

AutomaticregressionbenchmarksystemNicolasDesprèsTechnicalReportno0513,June2005revision921Regressionbenchmarkisapartofregressiontestingthataimsatanautomaticdetectionofperformanceregressionduringapplicationdevelopment.Thegoalistodetectassoonaspossiblethesmallestchangeofperformance.Weneedprecisemeasurementofsmallpartsoftheprogramtoachievethat.Although,manybenchmarksystemsalreadyexistsnonearefullyautomatedand/oradaptedtowiderangeofapplications.However,automationisacrucialrequirementinordertodetectregressionassoonaspossible.Thispapertacklesgeneralitiesaboutperformancemeasurements,thengivestherequirementsofsuchasystem,andﬁnallyproposesamodeling.L’évaluationdesrégressionsdeperformanced’uneapplicationfaitpartieintégrantedelaphasedetestderégression.Lebutestd’exprimerleplusprécisémentpossiblelesdifférencesdeperformanceentredeuxver-sions.Parprécision,nousentendonsàlafoislapertinencedel’estimateurdetempsutiliséetlagranularitédespartiesévaluées.Bienquedenombreuxsystèmesd’évaluationcomparativedesperformancesexistentdéjà,peusontcom-plètementautomatiséset/ouadaptésàdifférentessortesd’applications.Pourtant,l’automatisationdecettephaseestcrucialeaﬁndedétecterleplustôtpossiblelespertesdeperformance.Cetexposéprésentetoutd’abordlesprérequisàlamiseenplaced’untelsystème.Puissonarchitec-tureainsiquesonapplicationauprojetTransformers.Ensuite,nouscompareronsdifférentestechniquesd’estimationsdutempsd’exécution.Etenﬁn,nousévoqueronslespossibilitésd’adaptationd’untelenvi-ronnementàd’autresorted’applications.Keywordsautomatic,regressionbenchmark,performanceanalysis,visualization,database,datacollectionLaboratoiredeRechercheetDéveloppementdel’Epita14-16,rueVoltaire–F-94276LeKremlin-Bicêtrecedex–FranceTél.+33153145947–Fax.+33153145922nicolas.despres@lrde.epita.fr–http://www.lrde.epita.fr/

Copyingthisdocument2Copyright c2005LRDE.Permissionisgrantedtocopy,distributeand/ormodifythisdocumentunderthetermsoftheGNUFreeDocumentationLicense,Version1.2oranylaterversionpublishedbytheFreeSoftwareFoundation;withtheInvariantSectionsbeingjust“Copyingthisdocument”,noFront-CoverTexts,andnoBack-CoverTexts.AcopyofthelicenseisprovidedintheﬁleCOPYING.DOC.

Contents1Introduction2Requirements2.1Overview...........................................2.2Automaticdataacquisition................................2.3Uniﬁedresultsformat...................................2.4Resultsrepository......................................2.5Resultsanalyzes/visualizationinterface.........................2.6Platform-dependency....................................3Design3.1Overviewoftheregressionbenchmarksystem.....................3.2Thebenchmarksuite....................................3.3Thepopulatescript.....................................3.4Thecollectscript......................................3.5Thewebinterface......................................3.6Thedatabase.........................................3.6.1Tablesdescription.................................3.6.2Tablealterationversusnumerousrecords....................3.7Typicalusecasescenario..................................3.8Theuniﬁedresultsformat.................................3.8.1Featuredmaterials.................................3.8.2Chosenlanguage..................................4Conclusion5Bibliography455556667778899901011111214151

Chapter1IntroductionThistechnicalreportpresentsaregressionbenchmarksystem.Suchasystemaimsatpreventingperformanceregressionwhichmaybeintroducedduringaprojectdevelopment.Betweentwomajorversionsofaprogram,manysmalllossesofperformancemaybeintro-ducedcontinuouslybymaintainerswhiletheyaredevelopingtheprogram.Theyoftendonotdetectthesesmallperformanceregressionsbecausetheydonotruntheirbenchmarksuiteforev-erypatchtheyapply1orbecausetheirperformancemeasurementsarenotaccurateenough.Thesumofallthesesmalllossesofperformancemayresultinasigniﬁcantperformanceregression.Whenmaintainersdetectanimportantregressionoftheefﬁciencyoftheirprogram,hundredsofpatchesarealreadycommitted.Thus,theyareunabletoﬁndwhenthisregressionhappened,especiallyifitisthesumofseveralsmallregressions.Thisproblemcanbeavoidedifthebench-marksuiteisruncontinuouslywhiletheprogramismodiﬁed.Itisfrequentthatadeveloperteamdotensofpatchesperday(sometimesmore)ontheirproject.Thus,itisverycumbersometorunthebenchmarksuitemanuallyaftereverypatch,speciallyifittakesalongtimetorunit2.Moreovertheamountofbenchmarkresultmayincreasequicklysincethenumberofrevisionsofanaverageprojectisoftenaround500.Thus,aregressionbenchmarksystemmustbefullyautomaticinordertonotoverloadthewholedevelopmentcycle.Thisreportdescribestheregressionbenchmarksystemunderdevelopmentinourlaboratory.Thissystemaimsatsolvingtheproblemintroducedabove.Itisstilladraftbutthemainmodulespeciﬁcation,intermsofrequirementsanddesign,aredeﬁned.Firstofall,therequirementsofthesystemaredetailed.Then,thewholeproject’sarchitectureisdescribed.1Numerousprojectsdonotevenhaveanybenchmarksuite.2Thisoftenhappenssincethetestsuitemustberunbeforethebenchmarksuiteandthatbothsuitemaybelargerandlargerastheprogramgrows.

Chapter2RequirementsThischaptercoversthespeciﬁcationoftherequirementsofanautomaticregressionbenchmarksystem.Firstofall,wegiveashortdescriptionofallofthemasanoverview.Then,wedetaileachofthemindividually.2.1OverviewTheregressionbenchmarkframeworkmustfulﬁllthefollowingrequirements[Coursonetal.(2000);Kalibera(2004);Kaliberaetal.(2004)]:•Itmustperformandcollectthemeasurementresultsautomatically.•Itmustprovideauniﬁedresultformat.•Itmustmanagearesultrepository.•Itmustfeatureauser-friendlyinterfaceforresultanalysisand/orvisualization.•Itmustbeplatform-independent.Alltheserequirementsaredetailedinthefollowingsections.2.2AutomaticdataacquisitionAsmentionedintheintroduction,weaimatmeasuringperformancesforeveryrevisionofaproject,inordertodetectperformanceregressionassoonaspossible.Becauseperformancemea-surementsmaytakealongtime,andbecauseitiscommontocommitchangesmorethantentimesperday,theperformancedataacquisitionforagivenprojectisverytimeconsumingandthuscan’tbeperformedmanually.Itiscrucialthattheentirebenchmarkprocessisperformedau-tomatically:fromtheprogramandbenchmarkenvironmentinstallationandrun,totheadditionoftheresultsintotherepository.2.3UniﬁedresultsformatAbenchmarkconsistsinmakingacomparisonoftwomeasurements.Thecomparisonmaybedoneagainstanothercontestantprogramoranolderversionofthesameprogram.Inordertodosuchacomparison,theresultmustbestoredusingthesameformattoavoidtheuseofconverter.Moreover,wewanttobeabletobenchmarksubpartsofabenchmark.So,weneedaresultformatthatsupportsnestedstructures.However,itisoutofthescopeofourprojecttowriteacomplexparserandacomplexprettyprinter.Thus,weneedaformatwhichiseasytoreadandwritefromtheperspectiveofascript.

2.4Resultsrepository62.4ResultsrepositoryTheamountofcollecteddatamayincreasequicklybecause,wewanttoperformmeasurementforeveryrevision.So,weneedastrongstoragesystem(e.g.notaregularﬁle).Moreover,weneedtobeabletosearchandgroupbenchmarkstogetherwhenweanalyzethedata.Theseanalyzesmustbefastenoughandsupportscalability.2.5Resultsanalyzes/visualizationinterfaceTheresultanalyze/visualizationinterfacemustprovideaneasyanduser-friendlywaytogen-erategraphsbasedonthecollecteddata.Themostimportantgraphweneedinaregressionbenchmarksystemistheonerepresentingtheperformanceevolutioninrespecttotherevisionnumberoftheprogram.Wealsoneedtocomparedifferentprograms:typicallyourprogramanditscontestants.2.6Platform-dependencyTheend-usermaywishtoseethedifferenceofperformanceofitsprogramfromanarchitecturetoanother.So,thebenchmarkenvironmentmustbeabletorunondifferentarchitectures.Thisconstraintisappliedespeciallyonthebenchmarksuitewrittenbytheprojectauthors.Mostof,theprojectundertestiscompatiblewiththearchitecturethebenchmarksuiteiscompatibletoo.Thetaskoftheregressionbenchmarksystemisonlytorunthebenchmarksuiteandtocollecttheresult.

Chapter3DesignInthischapter,wepresentthedesignchosentodeveloptheregressionbenchmarksystem.Firstofall,wegiveanoverviewofthewholesystem.Secondly,thestructureofthedatabaseisde-tailed.Then,wedescribethetypicalusecasescenario.Finally,wedetailoursuggestionforauniﬁedresultformat.Finally,wearguequicklythetoolsetwehavechosentoimplementit.3.1OverviewoftheregressionbenchmarksystemThesystemiscomposedofseveralcomponentslistedbelow:•Abenchmarksuite.•Apopulatescript.•Acollectscript.•Awebinterface.•Adatabase.Therelationshipsbetweeneachcomponentareshownontheﬁgure3.11(onpage8).3.2ThebenchmarksuiteThebenchmarksuiteisthepartofthesystemwhichactuallyperformsthemeasurements.Ontheﬁgure3.1(page8),wecallthispartoftheframework:thebencher.Mostoftheexistingprojectsimplementtheirbenchmarksuitebywritingatestsuitededicatedtoemphasistheperformancesoftheprograminsteadofthecorrectbehaviorofitsfeatures.Developersaregenerallyinterestedinmeasuringtheamountoftimeand/orthememoryus-agetheirprogramneeds.Thesetwovaluesareeasilycomputablebymeansofareusableexternprogram(suchastimeorvalgrind[Nethercote(2004)]).Thesetoolsareconvenientbecausetheyarenotintrusiveintheprogramcode.Inotherword,theyratherneedmeaningfultestsuitesthancodeinstrumentationsinordertoberelevant.Manyprojectsalsoneedmorespeciﬁcinformation.Forinstance,ourprojectOlena[Duret-Lutz(2000)],animageprocessinglibrary,cancomputethenumberoftimesanalgorithmaccessestoapixelofanimage.Thisinformationisveryinterestinginordertooptimizeanalgorithm.Contrarytothetimeandmemoryusagevalues,thecomputationofsuchavalueimpliestoin-strumentthelibrarycode.Thisexampleillustratesthatitisveryhardwithcommonlanguages1TheCRUDabbreviationistheCreate,Read,UpdateandDeleteactionssequencethatisusuallyperformedonadatabase.

3.3ThepopulatescriptFigure3.1:Regressionbenchmarksystemoverview8todevelopagenerictoolsthatcanhelptocomputeanymeasureonemayneed.That’stherea-sonwhyourregressionbenchmarksystemdoesnotfeatureagenerictooltoeasethewritingofbenchmarkmeasurements.Nevertheless,thispointistackledin[Kaliberaetal.(2004)].However,asdiscussedinthepreviouschapter,wewanttounifytheformatusedbythebench-marksuitetoprintitsresults.Thisformatisnotonlyamatteroflayout.Itassertsthatnecessaryinformationispresent.Theinformationisthedescriptionofeverybenchmarkoftheprojectandtheirresults.Thedescriptionmustbenonambiguous,inordertoensurethattheinsertionoftheresultinthedatabasewon’tneedanyhumaninteractionsduringtheentireprocess.Thema-terialsprovidedtohelpthedeveloperstoprinttherightinformationusingtherightformatisdescribedinthesectiondedicatedtotheuniﬁedresultsformat3.8onpage11.Thismaterialisprovidedasalibraryanditisgeneratedfromtheinformationcontainedinthedatabase.3.3ThepopulatescriptThegoalofthepopulatescriptistotakethebenchmarkresults(writtenusingtheuniﬁedresultsformat)asinputandpopulatethemintothedatabase.Thisscriptiscalledeitherbythecollectscript(seesection3.4)orbyahumanoperator.Thebenchmarksuiteofagivenprojectmayberunwithoutourprojectinstalledonthema-chine.That’swhythemodulethatcommittheinformationintoourdatabaseisembeddedintothepopulatescriptinsteadofthebenchmarksuite.Moreover,incaseofanunexpectedambiguousbenchmarkdescriptionwhichcouldnotbecommittedforsanityreasonsintothedatabase,thepopulatescriptkeepstrackoftheresultuntilahumanintervention.Thus,theautomaticprocessisnotinterruptedandnodataarelost.3.4ThecollectscriptThegoalofthecollectscriptistoperiodicallyrunthebenchmarksuiteofeveryrevisionofev-eryprojectregisteredinthedatabaseandoneveryconﬁgurationmentioned.Thus,itﬁllsthedatabaseandensuresthatnonerevisionmeasuresaremissing.

9Design3.5ThewebinterfaceThewebinterfaceallowstheusertodrawgraphsandchartsbasedonthemeasuresstoredinthedatabase.WehavechosenawebbasedapplicationinsteadofanXwindowoneforportabilityreasons.3.6ThedatabaseThedatabaseisthecornerstoneofthesystem.Itisdesignedtoavoidduplicatedﬁeldandtosupporteverybenchmarktype.Theﬁgure3.2(page9)representstherelationshipbetweenthetablesofthedatabase.Figure3.2:Theresultsdatabasedescription3.6.1TablesdescriptionThecentraltable,calledbenchs,storeseverysinglebenchmark.Inthisdatabase,abenchmarkrepresentsonemeasureofonefeatureofoneproject.Ameasureisqualiﬁedbyatypeandascale.Themeasurementtypetellsus,forinstance,ifthebenchmarkmeasuresthememoryusageorthedurationoftheprogram.Thus,thetypehasaunitsuchasbytesorseconds.Themeasurementscalecodestheprograminputsizeusedbythebenchmark.Thisallowsustoperformscalabilitybenchmark.Ascaleisalsoqualiﬁedbyaunit.Thebenchstablehasthefollowingﬁelds:anid,aname,aprojectid,atypeid,ascaleid,etc...Becauseabenchmarkmaynotbeavailablefromthebeginning(ﬁrstrevision)ofaprojecttoitsend,therearetwomoreﬁeldscalledstart_revisionandstop_revision.Theyindicatebetweenwhichrevisionintervalthebenchmarkmaybeperform.Theexecutionstablestorestheresultofabenchmarkcollectedforeveryvalidrevisionandeveryavailableconﬁguration.Thebenchmarkismentionedinthistablebymeansofitsidinthebenchstable.Theexecutionstableallowustoeasilychecktheperformanceregressionsforagivenbenchmarkofaproject.

3.7Typicalusecasescenario013.6.2TablealterationversusnumerousrecordsWehavedesignedthisdatabasetoavoidhavingtoalteratablewhilethesystemisrunning.Wehavealsopaidattentiontonotduplicatedata.Thus,thetablerelationshipsmayseemcomplex,butitisnotrelevantsinceitismaintainedbythesystemandnotbytheusers.Currently,weprefertohaveatablewithmanyrecordsinsteadofcreatingnewtablesontheﬂy.3.7TypicalusecasescenarioThetypicalusecasescenarioisshownontheﬁgure3.3andisdetailedhere:Figure3.3:Atypicalusecasescenario1.Registeranewbenchmarkinthedatabaseviathewebinterface.Thisincludestheadditionofnecessarynewtypesorscalesorunits.2.Askthesystemtoregeneratethebenchmarkconﬁgurationﬁle.Basically,thisﬁlefeaturesmaterialstohelpthedevelopertoprinttheresultsusingtheexpectedformat.3.Writethecodeneededofthenewbenchmarkintheproject’sbenchmarksuite.4.Runagainthebenchmarksuiteandredirecttheresultstothepopulatesuite.Thisstageisoptionalsincepeoplemaywaitfortheperiodicalbenchmarkexecution.5.Finally,oncethepopulationprocessisﬁnishedyoucanwatchtheresultsbymeansofchartsusingthewebinterface.

11Design3.8TheuniﬁedresultsformatThisformataimsatrepresentingeverybenchmarkresultstype.Itis,so-called,uniﬁedbecauseeveryprojectmayuseitastheoutputformatoftheirbenchmarksuite.3.8.1FeaturedmaterialsThedevelopersofthebenchmarksuiteshouldnotknowtheuniﬁedformatthatweprovide.Firstofall,itmaybeverycumbersomeforthemtoprintitproperly.Secondly,ifwechangeit,theywillhavetoadapttheirbenchmarksuite.Thirdly,theydon’thavetoknowalltheinformationwerequiretodescribewithoutambiguityabenchmarkanditsresult.So,weprovidealibrarywhichcontainsalltheinformationneededwhichareregisteredinthedatabaseforagivenproject.Basically,thebenchmarksname,typeandunitareavailable.Then,thelibraryinterfacefeaturesmainlytwofunctionswiththefollowingprototypes:voidbegin_benchmark(constchar*name,constchar*type,constchar*unit);voidend_benchmark(doublescore);Forinstance,thedevelopersofthebenchmarksuitemayusethesefunctionsthisway:#include"benchmark.h"staticdoubledo_the_bench(void){doublescore;/*computethevalueofthescorevariable...*/returnthe_score;}intmain(void){doublescore;begin_project(MY_PROJECT);begin_benchmark(MY_BENCH_FOO,MY_TYPE_BAR,MY_INPUT_BAZ);score=do_the_benchmark();end_benchmark(score);end_project();return0;}Figure3.4:AnexampleofabenchmarkcodeThebegin_benchmarkfunctionprintsthedescriptionofthebenchmarkwhichisgoingtoberun.Thedescriptionistheidofthebenchmarkinthebenchstableofthedatabase.Thisidiscomputedbythehashoftheconcatenationofthebenchmarkname,typeandinputstrings.Sincethetypenameandinputnamearekeptuniqueinthedatabaseandthebenchmarknameforagivenprojectaswell,therearenoambiguities.Aftercallingthebegin_benchmarkfunction,wereallycomputeourbenchmarkandthen,givethescoreasargumenttotheend_benchmarkfunctioncall.Thebegin_projectfunctioncallindicatesthatthebenchmarkswrittenuntil