Automatic regression benchmark system
15 pages
English

Automatic regression benchmark system

-

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
15 pages
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

Automaticregressionbenchmarksystem
NicolasDesprès
oTechnicalReportn 0513,June2005
revision 921
Regression benchmark is a part of regression testing that aims at an automatic detection of performance
regrduringapplicationdevelopment. Thegoalistodetectassoonaspossiblethesmallestchangeof
performance. Weneedprecisemeasurementofsmallpartsoftheprogramtoachievethat. Although,many
benchmarksystemsalreadyexistsnonearefullyautomatedand/oradaptedtowiderangeofapplications.
However, automation is a crucial requirement in order to detect regression as soon as possible. This paper
tackles generalities about performance measurements, then gives the requirements of such a system, and
finallyproposesamodeling.
L’évaluation des régressions de performance d’une application fait partie intégrante de la phase de test de
régression.Lebutestd’exprimerleplusprécisémentpossiblelesdifférencesdeperformanceentredeuxver
sions.Parprécision,nousentendonsàlafoislapertinencedel’estimateurdetempsutiliséetlagranularité
despartiesévaluées.
Bienquedenombreuxsystèmesd’évaluationcomparativedesperformancesexistentdéjà,peusontcom
plètementautomatiséset/ouadaptésàdifférentessortesd’applications.Pourtant,l’automatisationdecette
phaseestcrucialeafindedétecterleplustôtpossiblelespertesdeperformance.
Cet exposé présente tout d’abord les prérequis à la mise en place d’un tel système. Puis son architec
ture ainsi que son application au projet Transformers. Ensuite, nous comparerons différentes techniques
d’estimations ...

Sujets

Informations

Publié par
Nombre de lectures 86
Langue English

Extrait

AutomaticregressionbenchmarksystemNicolasDesprèsTechnicalReportno0513,June2005revision921Regressionbenchmarkisapartofregressiontestingthataimsatanautomaticdetectionofperformanceregressionduringapplicationdevelopment.Thegoalistodetectassoonaspossiblethesmallestchangeofperformance.Weneedprecisemeasurementofsmallpartsoftheprogramtoachievethat.Although,manybenchmarksystemsalreadyexistsnonearefullyautomatedand/oradaptedtowiderangeofapplications.However,automationisacrucialrequirementinordertodetectregressionassoonaspossible.Thispapertacklesgeneralitiesaboutperformancemeasurements,thengivestherequirementsofsuchasystem,andfinallyproposesamodeling.L’évaluationdesrégressionsdeperformanced’uneapplicationfaitpartieintégrantedelaphasedetestderégression.Lebutestd’exprimerleplusprécisémentpossiblelesdifférencesdeperformanceentredeuxver-sions.Parprécision,nousentendonsàlafoislapertinencedel’estimateurdetempsutiliséetlagranularitédespartiesévaluées.Bienquedenombreuxsystèmesd’évaluationcomparativedesperformancesexistentdéjà,peusontcom-plètementautomatiséset/ouadaptésàdifférentessortesd’applications.Pourtant,l’automatisationdecettephaseestcrucialeafindedétecterleplustôtpossiblelespertesdeperformance.Cetexposéprésentetoutd’abordlesprérequisàlamiseenplaced’untelsystème.Puissonarchitec-tureainsiquesonapplicationauprojetTransformers.Ensuite,nouscompareronsdifférentestechniquesd’estimationsdutempsd’exécution.Etenfin,nousévoqueronslespossibilitésd’adaptationd’untelenvi-ronnementàd’autresorted’applications.Keywordsautomatic,regressionbenchmark,performanceanalysis,visualization,database,datacollectionLaboratoiredeRechercheetDéveloppementdel’Epita14-16,rueVoltaire–F-94276LeKremlin-Bicêtrecedex–FranceTél.+33153145947–Fax.+33153145922nicolas.despres@lrde.epita.frhttp://www.lrde.epita.fr/
Copyingthisdocument2Copyright c2005LRDE.Permissionisgrantedtocopy,distributeand/ormodifythisdocumentunderthetermsoftheGNUFreeDocumentationLicense,Version1.2oranylaterversionpublishedbytheFreeSoftwareFoundation;withtheInvariantSectionsbeingjust“Copyingthisdocument”,noFront-CoverTexts,andnoBack-CoverTexts.AcopyofthelicenseisprovidedinthefileCOPYING.DOC.
Contents1Introduction2Requirements2.1Overview...........................................2.2Automaticdataacquisition................................2.3Uniedresultsformat...................................2.4Resultsrepository......................................2.5Resultsanalyzes/visualizationinterface.........................2.6Platform-dependency....................................3Design3.1Overviewoftheregressionbenchmarksystem.....................3.2Thebenchmarksuite....................................3.3Thepopulatescript.....................................3.4Thecollectscript......................................3.5Thewebinterface......................................3.6Thedatabase.........................................3.6.1Tablesdescription.................................3.6.2Tablealterationversusnumerousrecords....................3.7Typicalusecasescenario..................................3.8Theuniedresultsformat.................................3.8.1Featuredmaterials.................................3.8.2Chosenlanguage..................................4Conclusion5Bibliography455556667778899901011111214151
Chapter1IntroductionThistechnicalreportpresentsaregressionbenchmarksystem.Suchasystemaimsatpreventingperformanceregressionwhichmaybeintroducedduringaprojectdevelopment.Betweentwomajorversionsofaprogram,manysmalllossesofperformancemaybeintro-ducedcontinuouslybymaintainerswhiletheyaredevelopingtheprogram.Theyoftendonotdetectthesesmallperformanceregressionsbecausetheydonotruntheirbenchmarksuiteforev-erypatchtheyapply1orbecausetheirperformancemeasurementsarenotaccurateenough.Thesumofallthesesmalllossesofperformancemayresultinasignificantperformanceregression.Whenmaintainersdetectanimportantregressionoftheefficiencyoftheirprogram,hundredsofpatchesarealreadycommitted.Thus,theyareunabletofindwhenthisregressionhappened,especiallyifitisthesumofseveralsmallregressions.Thisproblemcanbeavoidedifthebench-marksuiteisruncontinuouslywhiletheprogramismodified.Itisfrequentthatadeveloperteamdotensofpatchesperday(sometimesmore)ontheirproject.Thus,itisverycumbersometorunthebenchmarksuitemanuallyaftereverypatch,speciallyifittakesalongtimetorunit2.Moreovertheamountofbenchmarkresultmayincreasequicklysincethenumberofrevisionsofanaverageprojectisoftenaround500.Thus,aregressionbenchmarksystemmustbefullyautomaticinordertonotoverloadthewholedevelopmentcycle.Thisreportdescribestheregressionbenchmarksystemunderdevelopmentinourlaboratory.Thissystemaimsatsolvingtheproblemintroducedabove.Itisstilladraftbutthemainmodulespecification,intermsofrequirementsanddesign,aredefined.Firstofall,therequirementsofthesystemaredetailed.Then,thewholeproject’sarchitectureisdescribed.1Numerousprojectsdonotevenhaveanybenchmarksuite.2Thisoftenhappenssincethetestsuitemustberunbeforethebenchmarksuiteandthatbothsuitemaybelargerandlargerastheprogramgrows.
Chapter2RequirementsThischaptercoversthespecificationoftherequirementsofanautomaticregressionbenchmarksystem.Firstofall,wegiveashortdescriptionofallofthemasanoverview.Then,wedetaileachofthemindividually.2.1OverviewTheregressionbenchmarkframeworkmustfulfillthefollowingrequirements[Coursonetal.(2000);Kalibera(2004);Kaliberaetal.(2004)]:Itmustperformandcollectthemeasurementresultsautomatically.Itmustprovideaunifiedresultformat.Itmustmanagearesultrepository.Itmustfeatureauser-friendlyinterfaceforresultanalysisand/orvisualization.Itmustbeplatform-independent.Alltheserequirementsaredetailedinthefollowingsections.2.2AutomaticdataacquisitionAsmentionedintheintroduction,weaimatmeasuringperformancesforeveryrevisionofaproject,inordertodetectperformanceregressionassoonaspossible.Becauseperformancemea-surementsmaytakealongtime,andbecauseitiscommontocommitchangesmorethantentimesperday,theperformancedataacquisitionforagivenprojectisverytimeconsumingandthuscan’tbeperformedmanually.Itiscrucialthattheentirebenchmarkprocessisperformedau-tomatically:fromtheprogramandbenchmarkenvironmentinstallationandrun,totheadditionoftheresultsintotherepository.2.3UniedresultsformatAbenchmarkconsistsinmakingacomparisonoftwomeasurements.Thecomparisonmaybedoneagainstanothercontestantprogramoranolderversionofthesameprogram.Inordertodosuchacomparison,theresultmustbestoredusingthesameformattoavoidtheuseofconverter.Moreover,wewanttobeabletobenchmarksubpartsofabenchmark.So,weneedaresultformatthatsupportsnestedstructures.However,itisoutofthescopeofourprojecttowriteacomplexparserandacomplexprettyprinter.Thus,weneedaformatwhichiseasytoreadandwritefromtheperspectiveofascript.
2.4Resultsrepository62.4ResultsrepositoryTheamountofcollecteddatamayincreasequicklybecause,wewanttoperformmeasurementforeveryrevision.So,weneedastrongstoragesystem(e.g.notaregularfile).Moreover,weneedtobeabletosearchandgroupbenchmarkstogetherwhenweanalyzethedata.Theseanalyzesmustbefastenoughandsupportscalability.2.5Resultsanalyzes/visualizationinterfaceTheresultanalyze/visualizationinterfacemustprovideaneasyanduser-friendlywaytogen-erategraphsbasedonthecollecteddata.Themostimportantgraphweneedinaregressionbenchmarksystemistheonerepresentingtheperformanceevolutioninrespecttotherevisionnumberoftheprogram.Wealsoneedtocomparedifferentprograms:typicallyourprogramanditscontestants.2.6Platform-dependencyTheend-usermaywishtoseethedifferenceofperformanceofitsprogramfromanarchitecturetoanother.So,thebenchmarkenvironmentmustbeabletorunondifferentarchitectures.Thisconstraintisappliedespeciallyonthebenchmarksuitewrittenbytheprojectauthors.Mostof,theprojectundertestiscompatiblewiththearchitecturethebenchmarksuiteiscompatibletoo.Thetaskoftheregressionbenchmarksystemisonlytorunthebenchmarksuiteandtocollecttheresult.
Chapter3DesignInthischapter,wepresentthedesignchosentodeveloptheregressionbenchmarksystem.Firstofall,wegiveanoverviewofthewholesystem.Secondly,thestructureofthedatabaseisde-tailed.Then,wedescribethetypicalusecasescenario.Finally,wedetailoursuggestionforaunifiedresultformat.Finally,wearguequicklythetoolsetwehavechosentoimplementit.3.1OverviewoftheregressionbenchmarksystemThesystemiscomposedofseveralcomponentslistedbelow:Abenchmarksuite.Apopulatescript.Acollectscript.Awebinterface.Adatabase.Therelationshipsbetweeneachcomponentareshownonthefigure3.11(onpage8).3.2ThebenchmarksuiteThebenchmarksuiteisthepartofthesystemwhichactuallyperformsthemeasurements.Onthefigure3.1(page8),wecallthispartoftheframework:thebencher.Mostoftheexistingprojectsimplementtheirbenchmarksuitebywritingatestsuitededicatedtoemphasistheperformancesoftheprograminsteadofthecorrectbehaviorofitsfeatures.Developersaregenerallyinterestedinmeasuringtheamountoftimeand/orthememoryus-agetheirprogramneeds.Thesetwovaluesareeasilycomputablebymeansofareusableexternprogram(suchastimeorvalgrind[Nethercote(2004)]).Thesetoolsareconvenientbecausetheyarenotintrusiveintheprogramcode.Inotherword,theyratherneedmeaningfultestsuitesthancodeinstrumentationsinordertoberelevant.Manyprojectsalsoneedmorespecificinformation.Forinstance,ourprojectOlena[Duret-Lutz(2000)],animageprocessinglibrary,cancomputethenumberoftimesanalgorithmaccessestoapixelofanimage.Thisinformationisveryinterestinginordertooptimizeanalgorithm.Contrarytothetimeandmemoryusagevalues,thecomputationofsuchavalueimpliestoin-strumentthelibrarycode.Thisexampleillustratesthatitisveryhardwithcommonlanguages1TheCRUDabbreviationistheCreate,Read,UpdateandDeleteactionssequencethatisusuallyperformedonadatabase.
3.3ThepopulatescriptFigure3.1:Regressionbenchmarksystemoverview8todevelopagenerictoolsthatcanhelptocomputeanymeasureonemayneed.That’stherea-sonwhyourregressionbenchmarksystemdoesnotfeatureagenerictooltoeasethewritingofbenchmarkmeasurements.Nevertheless,thispointistackledin[Kaliberaetal.(2004)].However,asdiscussedinthepreviouschapter,wewanttounifytheformatusedbythebench-marksuitetoprintitsresults.Thisformatisnotonlyamatteroflayout.Itassertsthatnecessaryinformationispresent.Theinformationisthedescriptionofeverybenchmarkoftheprojectandtheirresults.Thedescriptionmustbenonambiguous,inordertoensurethattheinsertionoftheresultinthedatabasewon’tneedanyhumaninteractionsduringtheentireprocess.Thema-terialsprovidedtohelpthedeveloperstoprinttherightinformationusingtherightformatisdescribedinthesectiondedicatedtotheunifiedresultsformat3.8onpage11.Thismaterialisprovidedasalibraryanditisgeneratedfromtheinformationcontainedinthedatabase.3.3ThepopulatescriptThegoalofthepopulatescriptistotakethebenchmarkresults(writtenusingtheunifiedresultsformat)asinputandpopulatethemintothedatabase.Thisscriptiscalledeitherbythecollectscript(seesection3.4)orbyahumanoperator.Thebenchmarksuiteofagivenprojectmayberunwithoutourprojectinstalledonthema-chine.That’swhythemodulethatcommittheinformationintoourdatabaseisembeddedintothepopulatescriptinsteadofthebenchmarksuite.Moreover,incaseofanunexpectedambiguousbenchmarkdescriptionwhichcouldnotbecommittedforsanityreasonsintothedatabase,thepopulatescriptkeepstrackoftheresultuntilahumanintervention.Thus,theautomaticprocessisnotinterruptedandnodataarelost.3.4ThecollectscriptThegoalofthecollectscriptistoperiodicallyrunthebenchmarksuiteofeveryrevisionofev-eryprojectregisteredinthedatabaseandoneveryconfigurationmentioned.Thus,itfillsthedatabaseandensuresthatnonerevisionmeasuresaremissing.
9Design3.5ThewebinterfaceThewebinterfaceallowstheusertodrawgraphsandchartsbasedonthemeasuresstoredinthedatabase.WehavechosenawebbasedapplicationinsteadofanXwindowoneforportabilityreasons.3.6ThedatabaseThedatabaseisthecornerstoneofthesystem.Itisdesignedtoavoidduplicatedfieldandtosupporteverybenchmarktype.Thefigure3.2(page9)representstherelationshipbetweenthetablesofthedatabase.Figure3.2:Theresultsdatabasedescription3.6.1TablesdescriptionThecentraltable,calledbenchs,storeseverysinglebenchmark.Inthisdatabase,abenchmarkrepresentsonemeasureofonefeatureofoneproject.Ameasureisqualifiedbyatypeandascale.Themeasurementtypetellsus,forinstance,ifthebenchmarkmeasuresthememoryusageorthedurationoftheprogram.Thus,thetypehasaunitsuchasbytesorseconds.Themeasurementscalecodestheprograminputsizeusedbythebenchmark.Thisallowsustoperformscalabilitybenchmark.Ascaleisalsoqualifiedbyaunit.Thebenchstablehasthefollowingfields:anid,aname,aprojectid,atypeid,ascaleid,etc...Becauseabenchmarkmaynotbeavailablefromthebeginning(firstrevision)ofaprojecttoitsend,therearetwomorefieldscalledstart_revisionandstop_revision.Theyindicatebetweenwhichrevisionintervalthebenchmarkmaybeperform.Theexecutionstablestorestheresultofabenchmarkcollectedforeveryvalidrevisionandeveryavailableconfiguration.Thebenchmarkismentionedinthistablebymeansofitsidinthebenchstable.Theexecutionstableallowustoeasilychecktheperformanceregressionsforagivenbenchmarkofaproject.
3.7Typicalusecasescenario013.6.2TablealterationversusnumerousrecordsWehavedesignedthisdatabasetoavoidhavingtoalteratablewhilethesystemisrunning.Wehavealsopaidattentiontonotduplicatedata.Thus,thetablerelationshipsmayseemcomplex,butitisnotrelevantsinceitismaintainedbythesystemandnotbytheusers.Currently,weprefertohaveatablewithmanyrecordsinsteadofcreatingnewtablesonthefly.3.7TypicalusecasescenarioThetypicalusecasescenarioisshownonthefigure3.3andisdetailedhere:Figure3.3:Atypicalusecasescenario1.Registeranewbenchmarkinthedatabaseviathewebinterface.Thisincludestheadditionofnecessarynewtypesorscalesorunits.2.Askthesystemtoregeneratethebenchmarkconfigurationfile.Basically,thisfilefeaturesmaterialstohelpthedevelopertoprinttheresultsusingtheexpectedformat.3.Writethecodeneededofthenewbenchmarkintheproject’sbenchmarksuite.4.Runagainthebenchmarksuiteandredirecttheresultstothepopulatesuite.Thisstageisoptionalsincepeoplemaywaitfortheperiodicalbenchmarkexecution.5.Finally,oncethepopulationprocessisfinishedyoucanwatchtheresultsbymeansofchartsusingthewebinterface.
11Design3.8TheuniedresultsformatThisformataimsatrepresentingeverybenchmarkresultstype.Itis,so-called,unifiedbecauseeveryprojectmayuseitastheoutputformatoftheirbenchmarksuite.3.8.1FeaturedmaterialsThedevelopersofthebenchmarksuiteshouldnotknowtheunifiedformatthatweprovide.Firstofall,itmaybeverycumbersomeforthemtoprintitproperly.Secondly,ifwechangeit,theywillhavetoadapttheirbenchmarksuite.Thirdly,theydon’thavetoknowalltheinformationwerequiretodescribewithoutambiguityabenchmarkanditsresult.So,weprovidealibrarywhichcontainsalltheinformationneededwhichareregisteredinthedatabaseforagivenproject.Basically,thebenchmarksname,typeandunitareavailable.Then,thelibraryinterfacefeaturesmainlytwofunctionswiththefollowingprototypes:voidbegin_benchmark(constchar*name,constchar*type,constchar*unit);voidend_benchmark(doublescore);Forinstance,thedevelopersofthebenchmarksuitemayusethesefunctionsthisway:#include"benchmark.h"staticdoubledo_the_bench(void){doublescore;/*computethevalueofthescorevariable...*/returnthe_score;}intmain(void){doublescore;begin_project(MY_PROJECT);begin_benchmark(MY_BENCH_FOO,MY_TYPE_BAR,MY_INPUT_BAZ);score=do_the_benchmark();end_benchmark(score);end_project();return0;}Figure3.4:AnexampleofabenchmarkcodeThebegin_benchmarkfunctionprintsthedescriptionofthebenchmarkwhichisgoingtoberun.Thedescriptionistheidofthebenchmarkinthebenchstableofthedatabase.Thisidiscomputedbythehashoftheconcatenationofthebenchmarkname,typeandinputstrings.Sincethetypenameandinputnamearekeptuniqueinthedatabaseandthebenchmarknameforagivenprojectaswell,therearenoambiguities.Aftercallingthebegin_benchmarkfunction,wereallycomputeourbenchmarkandthen,givethescoreasargumenttotheend_benchmarkfunctioncall.Thebegin_projectfunctioncallindicatesthatthebenchmarkswrittenuntil
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents