Bellman's GAP [Elektronische Ressource] : a 2nd generation language and system for algebraic dynamic programming / Georg Sauthoff. Technische Fakultät - AG Praktische Informatik

De
Publié par

Dissertation zur Erlangung des akademischen Gradeseines Doktors der Naturwissenschaften (Dr. rer. nat.)der Technischen Fakultät der Universität BielefeldBellman’s GAP: A 2nd GenerationLanguage and System for AlgebraicDynamic Programming∗Georg SauthoffMarch 4, 2011∗gsauthof@techfak.uni-bielefeld.deGedruckt auf alterungsbeständigem Papier °° ISO 9706AbstractThe dissertation describes the new Bellman’s GAP which is a programming sys-tem for writing dynamic programming algorithms over sequential data. It is thesecond generation implementation of the algebraic dynamic frame-work (ADP) [20]. The system includes the multi-paradigm language (GAP-L), itscompiler (GAP-C), functional modules (GAP-M) and a web site (GAP Pages) toexperimentwithGAP-Lprograms. GAP-Lincludesdeclarativeconstructs, e.g.treegrammars to model the search space, and imperative constructs for programmingadvanced scoring functions. The syntax of GAP-L is similar to C/Java to lowerusage barriers. GAP-C translates the high-level and index-free GAP-L programsinto efficient C++-Code, which is competitive with handwritten code. It includes anovel table design optimization algorithm, support for dynamic programming (DP)over multiple sequences (multi-track DP), sampling, optional top-down evaluation,various backtracing schemes etc. GAP-M includes modules for use in GAP-L pro-grams. Examples are efficient representations of classification data types and sam-pling as well as filter helper functions.
Publié le : samedi 1 janvier 2011
Lecture(s) : 43
Tags :
Source : NBN-RESOLVING.DE/URN:NBN:DE:HBZ:361-18210
Nombre de pages : 156
Voir plus Voir moins

DissertationzurErlangungdesakademischenGrades

einesDoktorsderNaturwissenschaften(Dr.rer.nat.)

derTechnischenFakultätderUniversitätBielefeld

Generation2ndAGAP:Bellman’s

LanguageandSystemforAlgebraic

ProgrammingDynamic

GeorgSauthoff

hMarc

gsauthof@techfak.uni-bielefeld.de

4,

2011

ktGedruc

auf

alterungsb

eständigem

apierP

°°

ISO

9706

Abstract

ThedissertationdescribesthenewBellman’sGAPwhichisaprogrammingsys-
temforwritingdynamicprogrammingalgorithmsoversequentialdata.Itisthe
secondgenerationimplementationofthealgebraicdynamicprogrammingframe-
work(ADP)[20].Thesystemincludesthemulti-paradigmlanguage(GAP-L),its
compiler(GAP-C),functionalmodules(GAP-M)andawebsite(GAPPages)to
experimentwithGAP-Lprograms.GAP-Lincludesdeclarativeconstructs,e.g.tree
grammarstomodelthesearchspace,andimperativeconstructsforprogramming
advancedscoringfunctions.ThesyntaxofGAP-LissimilartoC/Javatolower
usagebarriers.GAP-Ctranslatesthehigh-levelandindex-freeGAP-Lprograms
intoefficientC++-Code,whichiscompetitivewithhandwrittencode.Itincludesa
noveltabledesignoptimizationalgorithm,supportfordynamicprogramming(DP)
overmultiplesequences(multi-trackDP),sampling,optionaltop-downevaluation,
variousbacktracingschemesetc.GAP-MincludesmodulesforuseinGAP-Lpro-
grams.Examplesareefficientrepresentationsofclassificationdatatypesandsam-
plingaswellasfilterhelperfunctions.GAPPagescontainwebdialogsforselected
textbookdynamicprogrammingalgorithmsimplementedinGAP-L.Thewebdi-
alogsallowinteractivead-hocexperimentswithdifferentinputsandcombinations
algebras.ofSeveralbenchmarksandexamplesinthedissertationshowthepracticalefficiency
ofBellman’sGAPintermsofprogramruntimeanddevelopmenttime.

cknoAwledgements

IthankmysupervisorProf.Dr.RobertGiegerichforhissupportduringmyPhD
studies,hisopen-doorpolicyandveryinterestingdiscussions.
wIorkamthroughgratefulthetotheGK635.DFG(DeutscheForschungsgemeinschaft)forfundingmy
ThanksareduetoJensReederforprovidingADPversionsofRNAfoldingalgo-
discussions.andrithmstheIthtableankdesignChristianstudyLangproforjectthein2005.excellentteamworkandfruitfuldiscussionsduring
sions,IthankeffectiveStefancoffeeJanssen,breaksandAlexanderproofreadingKaiserandefforts.MarcoStefanRütheralsofordidfruitfulcarefuldiscus-alpha
andbetatestingofBellman’sGAP,whichwasquitehelpful.
ofhighThanksqualittoyfellosoftwwopare.en-sourceIenappreciatethusiaststheirwhoeffortsareproandIvidinganprofitedbenormousythataamounlot.t
Inlibrary,particularFlexandtheLinBisonuxparserKernel,thegenerators,GNUtheLACompilerTEXtypCollection,esettingthesystemBoanostdaC++lot
ofsophisticatedLATEXpackages,includingtheKOMAclasses,tikzandpgfplotsall
withIamincredibledeeplydogratefulcumentomtationsyparenhavetsbforeenvtheireryusefuongoingl.support.

Contents

ductionIntro11.1ProblemStatement............................
1.2RoleofDynamicProgramminginBioinformatics...........
1.3RelatedDynamicProgrammingFrameworks..............
1.3.1Dynamite.............................
1.3.2StagingDP............................
1.3.3Dyna...............................
1.3.4ShortcutFusion..........................
2AlgebraicDynamicProgramming
2.1FirstGenerationADP..........................
2.1.1AlgebraProducts.........................
2.1.2HaskellEmbeddingofADP...................
2.1.3TheADPCompiler.......................
2.2SecondGenerationADP.........................
2.2.1Products.............................
2.2.2Generalizations..........................
2.2.3Algebracharacteristics......................
viewOverGAPBellman’s33.1LimitationsofHaskell-embeddedADP.................
LanguageGAPBellman’s44.1DesignGoals...............................
4.2NewADPfeatures............................
4.3Example..................................
4.4LexicalStructure.............................
4.4.1Keywords.............................
4.4.2Comments.............................
4.4.3Operators.............................
4.4.4Constants.............................
4.4.5Whitespace............................
4.4.6Identifiers.............................
4.4.7Layout...............................
4.5ProgramStructure............................
4.5.1Imports..............................
4.5.2Input...............................

6

9101114141515161818202326272829293334363637384141414142424242424243

5

6

4.5.3Types...............................44
4.5.4Signature.............................45
4.5.5Algebras..............................47
4.5.6Statements............................50
4.5.7VariableAccess..........................51
4.5.8Grammar.............................52
4.5.9Instances.............................57
4.6SelectedLanguageFeatures.......................59
4.6.1Algebraextension........................59
4.6.2Syntacticfiltering........................59
4.6.3Semanticinstancefiltering....................60
4.6.4Multi-Trackprograms......................64
4.6.5Alphabets.............................64

65erCompilGAPBellman’s5.1CompilerArchitecture..........................65
5.2Example..................................68
5.3SemanticAnalyses............................69
5.3.1UnreachableNon-Terminals...................71
5.3.2ProductiveChecking.......................71
5.3.3YieldSizeAnalysis........................73
5.3.4LoopChecking..........................76
5.3.5Maxsizefilterpropagation...................76
5.3.6TableDimensionAnalysis....................78
5.3.7TableDesign...........................82
5.3.8TypeChecking..........................98
5.3.9Listanalysis...........................100
5.3.10Dependencyanalysis.......................101
5.3.11Non-terminalinlining......................103
5.3.12Indexanalysis...........................103
5.4CodeGeneration.............................105
5.4.1ParsingSchemes.........................105
5.4.2Parallelization..........................111
5.4.3Backtracing............................117
5.4.4WindowMode..........................125
5.4.5IndexHacking..........................125

129dulesMoGAPBellman’s6.1MemoryPools...............................129
6.2Lists....................................130
6.3StringDataStructures..........................130
6.4librna...................................132

7

7Bellman’sGAPPages135
7.1BiBiServ..................................136

137srkBenchma88.1RNAfold..................................138
8.2Thermodynamicmatchers........................139
8.3RNAshapes................................141
8.4pknotsRG.................................141

Conclusion9

144

146okOutlo1010.1SparseADP................................146
10.2KnapsackstyleDPalgorithms......................148
10.3ADPoverTrees..............................149

Bibliography

8

150

ductionIntro1

DynamicProgramming(DP)isanoptimizationmethoddevelopedbyRichardBell-
maninthe1950ies[5].Itisusedtosolveoptimizationproblemswhereoverallsolu-
tionsarecomputedfromsub-solutions.Solutionsandsub-solutionsarecomputed
usingthesameobjectivefunction,e.g.maximizationorminimization.Applying
theobjectivefunctionstocomputeoptimalsub-solutionsandthetabulationof
reusedsub-solutionsleadstoanalgorithmthatevaluatesasearchspaceofusually
exponentialsizeinpolynomialruntimeandspace.Bellman’sPrinciple(Definition
5)specifiesthepropertiesanobjectivefunctionhastosatisfysuchthatdynamic
programmingcanbeusedtosolvetheoptimizationproblem.
Traditionally,dynamicprogrammingalgorithmsarepresentedasmatrixrecur-
rencesinvolvingcasedistinctions.Forexample,thematrix-entryMi,jcontains
thesub-solutionforthesub-problem(i,j),thecompleteprobleminstanceisrepre-
sentedby(0,n)andtherighthandsideofthematrixrecurrencespecifiesforeach
(i,j)howtorecursivelycomputethesolutiontakingsolutionsforsome(k,l),with
i≤k≤l≤jand(k,l)=(i,j)intoaccount.Implementingthematrixrecurrence
asacomputerprogrammeansusinganarrayasmatrixandaloopcontrolstructure
thatcomputesentriesrepresentingsmallersub-solutionsbeforecomputinglarger
ones.Often,oneisnotonlyinterestedintheoptimalscoreasspecifiedbythematrix
recurrence,butalsointhecandidatestructurefromthesearchspacethatrepresents
thescore.Thestructureisderivedfromthesequenceofoptimizationstepsduring
thecomputationofthescore.Givenascore-matrixcomputingthestructureis
calledbacktracing:startingfromtheglobalsolutionrecursivelythestepsyielding
thatsolutionaretracedback.
Intextbooks,dynamicprogrammingisusuallyintroducedwithexamplealgo-
rithmsthatonlyuseonematrixrecurrence(e.g.[8]).Suchexamplesareeasily
understoodandimplementedinastraight-forwardway,butinpracticedynamic
programmingalgorithmsarecommonthatuseseveralinterdependingmatrixre-
currences,e.g.upto30matrixrecurrencesinmanuallydevelopedalgorithms.
Developingdynamicprogrammingalgorithmsholdsseveralchallenges.Onehas
todecidewhatthesearchspaceoftheproblemdomainis,howmanytablesare
needed,howtoscoreelementsofthesearchspaceandwhatstructureascoring
schemeimplicates.Usingmatrixrecurrencesasspecificationtoolfordeveloping
DPalgorithms,alltheseconcernsarenotseparated,butneedtobesolvedinan
interleavedfashion.
Learningdynamicprogrammingonlyfromsmalltextbookstyleexamples,itis
notobviousthatthesedifferentconcernsexist,e.g.onecouldassumethatallmatrix

9

recurrenceshavetobetabulatedinanycase.
Inaddition,indexcomputationsinmatrixrecurrencesareerror-prone,which
leadstotediousdebuggingsessionswhenimplementingtherecurrencesasanim-
perativecomputerprogram.Forexample,whenwriting(i+1,j+1)insteadof
(i+1,j−1)inoneplacemayonlyyieldonceinawhileanobviouslywrongsolu-
duringtionoptimization.Moredifficultiesarise,whenextendinganexistingDPalgorithm.Changing,for
example,thebacktracingmode,makingminorconceptualchangestothescoring
schemeortheneedtocombineseveralobjectivesmayrequireacompletelynew
implementationoftheDPalgorithm.
AlgebraicDynamicProgramming(ADP,Chapter2),developedbyRobertGiegerich
around2004,isaformalframeworkfordevelopingdynamicprogrammingalgo-
rithmsoversequencesthatprovidesaclearseparationofthedescribedconcerns
andeliminatestheuseofindices,coiningtheslogan:“Nosubscripts,noerrors!”
Bellman’sGAPisasecondgenerationimplementationofADP,whichisthe
topicofthisdissertationandisoutlinedinthenextsection.Theroleofdynamic
programminginbioinformaticsisdiscussedinSection1.2andseveralapproaches
thataresimilartoADPintheirmotivationarereviewedinSection1.3.

StatementProblem1.1

Thesubjectofthisdissertationisthedevelopmentofanovelprogrammingsystem
forADPthatprovidestheadvantagesandfeaturesofADPaspublishedin2004,
generalizesADPfordynamicprogrammingovermultiplesequences(multi-track
DP),providestabledesignthattakesconstantruntimefactorsintoaccountand
usesanewdomainspecificprogramminglanguagethatiseasyaccessiblebyADP
novices,efficienttousebyADPprofessionalsandnotbasedonembeddingintoa
language.hostAdditionalgoalsofthisworkaretheinvestigationofnewproductoperations,the
developmentofclassificationschemesforproductsandthecompilationofproducts
intoefficientcode.Further,alternativeevaluationstrategies,besidesCYK-style
parsing,aretobeinvestigated.
TheaimofthisdissertationistoestablishanewprogrammingsystemforADP
whichistobecalledBellman’sGAPinthefollowing.Thetaskistwofold:develop-
mentofGAP-L(Chapter4),thenoveldomainspecificprogramminglanguagefor
ADPthathasaJava/C-likesyntax,buthasdeclarativeconstructsanddevelopment
ofGAP-Cfromscratch(Chapter5),anoptimizingcompilerthattranslatesGAP-L
programsintoefficientC++code.Besidesthepresentationofnovelandgener-
alizedsemanticanalysisalgorithms,e.g.fortabledesign(Section5.3.7)andtable
dimensionanalysis(Section5.3.6),thepresentationofcode-generationtechnique,
e.g.analternativeevaluationschemethatexploitssparseness(Section5.4.1)and
parallelization(Section5.4.2),Chapter8showsseveralbenchmarksofthepractical
runtimeandmemoryusageofGAP-LprogramscompiledwithGAP-C,comparing

10

da_rling
s1=darlingda-rlingDMIMMMMM
s2=airline-airline_airline

Figure1.1:Twoexamplesequencesandapossiblealignmentintwodifferent
notations.

themwithhand-writtenversionsandwithversionsusingpreviousADPtools.
IdevelopedGAP-M(Chapter6)asaruntime-librarythatispartofGAP-C.It
providesefficientimplementationsofinternaldata-structuresandreusablefunctions
fortheapplicationdomainofbioinformatics.GAPPages(Chapter7)isaweb-
siteprojecttopresenttheGAP-Lsyntaxusingexamplesofwellknowndynamic
programmingasGAP-Lversionsandprovidinganinteractiveinterfaceforad-hoc
experimentswithdifferentinputsandobjectives.
TheADPgeneralizationsIdevelopedduringthedesignofGAP-Lthatareinde-
pendentofGAP-LarepresentedinSection2.2.
Chapter2givesanoverviewoverthecurrentADPframeworkandChapter3
givesanoverviewoverBellman’sGAPincludingadiscussionofthemotivation.
Chapter9concludesthedissertationandChapter10discussesopenproblems
andfurtherresearchopportunities.

1.2RoleofDynamicProgramminginBioinformatics
Dynamicprogrammingplaysanimportantroleinbioinformatics.Severaloptimiza-
tionproblemsinbioinformaticscanbesolvedwiththehelpofdynamicprogram-
ming.Forexample,inatextbookaboutsequenceanalysis[12],everypresented
algorithmusesthemethodofdynamicprogramming.Inthefollowingafewexam-
pleusecasesaredescribed.
Intheanalysisofgenomicsequencedata,theoptimalpairwisesequencealignment
isanimportantbuildingblockthatcanbecomputedviadynamicprogramming.
Giventwosequences,thealignmentisasequenceofedit-operationsthat,applied
tothefirstsequence,yieldsthesecondsequence.Figure1.1showsanexampleofan
alignment.Examplesofedit-operationsaredeletion,insertion,match,mismatchor
transpositionofcharacters.Eachedit-operationhasascoreassociatedandthescore
ofthealignmentisthesumofalledit-operationscores.Analignmentisoptimal,
ifithasthebestscore.Dependingonthescoringmodel,theobjectivefunctionis
tomaximizethescore,i.e.tomaximizethesimilaritiesbetweenthesequences,or
tominimizethescore,i.e.tominimizethedissimilaritiesordistancebetweenthe
sequences.Thebestscoreisthenthemaximalorminimalone.Thesearchspace
ofallpossiblealignmentsisofexponentialsize,andwithdynamicprogramming,it
canbeevaluatedinpolynomialtime.

11

Figure1.3:ExampleofanRNAmotif,generatedwiththeLocomotifGUI.

moleculeisdescribedbythesetofbasepairings.Figure1.2showsanexample.In
manyprocessesinthecell,thestructureofanRNAdeterminesitsfunction.Since
differentsequencesmayyieldthesamestructure,onlylookingatthesequence
dataisnotenoughwhencomparingorsearchingRNAsequences.Analogtothe
ViterbialgorithmforHMM,theCYKalgorithm[58]canbemodifiedtogetthemost
probableparseofagrammarforagiveninputsequence.Toderiveruleprobabilities
fromatrainingsetofinputs,thegeneralizationoftheForward/Backwardalgorithm
forSCFGsistheInside/Outsidealgorithm.
Besidesprobabilisticmodeling,CFGsareusedtomodelthesearchspaceofpos-
siblesecondarystructureunderaminimumfreeenergy(MFE)model.Different
structureelementshavedifferentfreeenergycontributionstothecompletestruc-
tureofanRNAsequence.Theusedenergyparametersarederivedexperimentally.
Thebaseofthismodelisthatduetophysicochemicallaws,anRNAmoleculein
thecellfoldsitselfsuchthatthefreeenergyisminimal.Minimizationoptimization
usingdynamicprogramming(e.g.implementedasaCYKstyleparser)computes
theMFEvalueandthestructureorstructuresthatyieldthisMFE.UsingBoltz-
mannstatistics[54],theprobabilityofstructuresorstructureensemblesunderthe
MFEmodelcanbecomputed.
InRNAstructureanalysis,aCFGcanbeusedtomodelthegeneralsearchspace
ofanRNAstructure.Inotherusecases,aCFGrestrictsthesearchspacetospecific
structuralmotifs,e.g.tothesearchspaceofallclover-leaflikestructures(basically

13

three(transferhairpinsRNAs).enclosedFigurein1.3ashohairpinwsanotherstructure),motifwhichcreatedincludeswiththethegrfamilyaphicofaltRNAsgram-
margeneratorLocomotif[38].Grammarsthatmodelrestrictedstructuralmotifs
andusetheMFEmodelarealsocalledthermodynamicmatchers(TDMs).
Forsearchingnewmembersofafamilyofsequences,covariancemodelsareused.
SuchmodelsuseSCFGs,i.e.theusecaseissimilartoprofileHMMs,butcontext
freegrammarsareusedinsteadofregulargrammars.

1.3RelatedDynamicProgrammingFrameworks
ThetheoreticalbaseofBellman’sGAPisAlgebraicDynamicProgramming(ADP,
Section2).ADPisaformalframeworkforspecifyingdynamicprogrammingal-
gorithmsoversequencesonahighlevel.Iteliminatestheuseofindices,amajor
sourceoferrorindynamicprogramming.InADP,thereisaseparationofthede-
scriptionofthecandidatestructure,fromthecandidateevaluation,fromthesearch
spacedescriptionandfromtabulationconcerns.
Thefollowingsectionspresentanddiscusslanguages,frameworksandlibraries
fordynamicprogrammingthatarerelatedtoADP.

Dynamite1.3.1Dynamite[6]isadomainspecificlanguage(DSL)forspecifyingpairwisesequence
alignmentstyleDPalgorithms.ItscompilergeneratesCcode.Themainformal
conceptintheDynamitelanguagearefinite-statemachines(FSM).ADynamite
programdefinesstatesandtransitions.Everytransitionisassociatedwithascoring
functionandtwooffsetsthatspecifyhowmanycharactersfromeachinputsequence
areconsumedduringthattransition.Theexplicitspecificationoftheoffsetsmeans
thatonemajorsourceoferrorinDP,theuseofindices,isnotcompletelyeliminated
fromtheDynamitelanguage.Astateisdirectlymappedtoatwo-dimensionaltable,
i.e.foreachstateallsub-solutionsaretabulated.
Forexample,theGotohalgorithm[23]forpairwisesequencealignmentwithaffine
gap-costismodelledinDynamiteasFSMwiththreenormalstates:match,delete
andinsert.Inaddition,thetwospecialstatesstartandendaredefined,whichare
notmappedtotables.Atransitionfromthematchtodeletestatecorrespondsto
openingadeletiongap,i.e.thegapopeningcostisassociatedtothattransition.
Thefirstoffsetis1andthesecondoffsetis0,sinceadeletionconsumesonecharacter
fromthefirstsequenceandnonefromthesecond.Thenagapextensioncostis
associatedtothedeletestatetodeletestatetransition,etc.
TheDynamitecompileriscoupledwitharichruntimelibrarythatincludesdata-
structuresandtoolsneededforbioinformaticssequenceanalysis,e.g.forintegrating
databasesearching,readingFASTAfilesorusingproteinscoringmatrices.The
objectsystemthatisusedinDynamiteisdesignedforextensibility,e.g.tointegrate
aproteinHMMorcustomsequencetypesintoasequencealignmentalgorithm.

14

Asaspecialcodegenerationfeature,theDynamitecompilersupportsthegener-
ationoflinearspacealignmentcomputationcodewhichisageneralizationofthe
Hirschbergalgorithm[24].
TheDynamitelanguageonlysupportsoptimizationfunctions,e.g.minimization
ormaximization.Synopticanalysesofthesearchspace,e.g.countingviasum-
mation,arenotpossible.Thedomainofthelanguagecomprisespairwisesequence
styleO(n2)DPalgorithms.ThereisnosupportforgenericNeedleman-Wunsch[35]
O(n3)stylepairwisesequencealignmentalgorithmsforusinggenericgaplength
functions.Likewise,sequencealignmentalgorithmsformorethantwosequences,
genericsingle-track(e.g.O(n3)RNAsecondarystructurefolding),generictwo-track
(e.g.theSankoffalgorithm[42],forsimultaneousfoldingandaligning)orgeneric
multi-trackDPalgorithmsoversequencescannotbeformulatedinDynamite.

DPStaging1.3.2Acombinatorlibraryforspecifyingdynamicprogrammingalgorithmsasmultistage
programming(MSP)codeispresentedin[52].ItisimplementedinMetaOCaml
tothatquoteprovidesexpressionselemenandtarytoMSPinstructconstructs.thecompTheilerusedtoMSPinlinequotedconstructsareexpressions.operators
Thecombinatorsofthelibraryareimplementedasmonads.Theydonotabstract
fromthematrixrecurrencesstyleofdynamicprogrammingthatisusuallyused
intextbookstopresentdynamicprogrammingalgorithms.Thismeansthatone
mathisjorlibrarysource.TheoferrordetailsinofMSPdynamicarenotprogrammingencapissulatednotintheeliminatedlibrary.withtheuseof
Exampleimplementationsandbenchmarkresultsareshownforsimpleone-table
dynamicprogrammingalgorithms.Theyarespecializedoninputsizesfrom7to34
characters.Thetaskofprintingoptimalcandidatesorbacktracingisnotcovered
.librarytheyb

Dyna1.3.3Dyna[13]isaturing-completedeclarativelanguageforspecifyingweighteddeduc-
tiveprograms,whichareageneralizationofprobabilisticparsing.Itwasdevel-
opedlanguagetoprosimplifycessingthedev(NLP).elopmenDynamictofalgorithmsprogrammingandparsersalgorithmsinctheanbfieldespofecifiednaturalin
Dyna,butDynaisnotrestrictedtodynamicprogramming.
ThesyntaxofDynaisPrologbasedandsomeconceptsoriginatefromtheworld
ofdeductivedatabases.ADynaprogramisasetofdeductiveinferencerules.
ThebasedDynaparser.compilerInNLPgeneratelargesgrammarsoptimizedareC++notcounde,usualwhoseandmainDynapisartableisantohanagendadle
grammarswithover100000rules.
SincethescopeofDynaisnotrestrictedtodynamicprogramming,dynamic
programmingrelatedoptimizationsarenotamainfocus.Forexample,garbage
collectionisruninfixedintervals,evenwhenanoptimizingdynamicprogramming

15

algorithmonelementarydatatypesimplicatesnoneedforanygarbagecollection,
andinthatcasegarbagecollectiononlyintroducesoverhead.Similartothat,the
genericagendabasedparsingloopintroducessomeoverheadinpractice.
Intheexamples,theprintingofoptimalcandidatesorbacktracingisnotmen-
tioned.Onealternativeistoforwardcomputestringrepresentationsofcandidates
withthescoresorusinganavailablegraphbrowsertointrospectthederivation
space.TheDynalanguagedoesnotseparatethesearchspacedescriptionfromthe
evsomethingaluationofelseascandidates.optimizingForfunctionexampleorthethescuseoreofschememinimization,areembeddedmaximizationontheleftor
handsideoftheinferencerules.
InDynatheoptimalsub-solutionsaretabulatedforsharingofsub-solutions,
whichyieldsanasymptoticoptimalruntimeofdynamicprogrammingalgorithms.
TheDynacompilersuppdoortsesnotoptimizingincludeprogramoptimizationstotransformationsreducelikeunnecessaryfoldingandtabulation.unfolding,
thebuttheycompiler,arenotwheresuchautomaticallytransforappliedmationsbyhavtheetobecompiler.applied.Theuserhastoinstruct

usionFrtcutSho1.3.4Theoptimalsequenceproblemsolvingframework[33]providesfunctionstowrite
dynamicprogrammingalgorithmsspecifications.ItisimplementedasaHaskell
[28]library.Itusesthefunctionalprogrammingtechniqueofshortcutfusion,which
isaprogramtransformationforeliminatingintermediatedata-structuresbetween
functioncalls.TheHaskelllibraryimplementsseveralshortcutfusionrulesas
GHC(GlasgowHaskellCompiler)rewriterules,whichareappliedbytheGHCto
theparsetreeoftheinputprogramduringcompiletime.
Adynamicprogrammingalgorithmisspecifiedintheoptimalsequenceframe-
workviaaHaskellprogramthatenumeratestheproblemsearchspaceandapplies
oneormoreobjectivefunctionstoallcandidatesofthesearchspace.Thesespeci-
ficationsareconstructedusinghigherorderfunctionsfromthelibrary.Whencom-
pilingtheprogramviaGHC,therewriterulesareappliedtotheinefficientsearch
spaceenumerationprogramanditisautomaticallytransformedintoanefficient
program.optimizationSincetheframeworkisimplementedasaHaskelllibrary,theuserisconfronted
witherrorsfromtheHaskellsystemsexposingimplementationdetails,whenanop-
timalsequencespecificationcontainsanerror.Whenaspecificationiswrittensuch
thattherewriterulescannotautomaticallybeapplied,aninefficientspecification
istransformedtoaninefficientprogramwithoutwarning.
Theframeworkdoesnoteliminatetheuseofindicesinsearchspacespecifications.
Theframeworkspecifiesrewriterulesforselectiveorscoringobjectivefunctions,as
e.g.minimumormaximum.Synopticobjectivefunctions,e.g.forcomputingthe
sizeofthesearchspace,arenotincluded.
Therewriterulesgeneratedynamicprogrammingprogramsthattabulateevery

16

derivcandidateedlistsrecurrence.sorted,Thwhicehnestedmayinframewtroduceorkanfunctionsasymptoticallyneedtosubkeepoptimalinruntermediatetime
factor.Similartothattheuseofabinarymapdatastructurefortabulatingsolution
may[33]presenasymptoticallytsseveralincreaseexampletherundynamictimeoftheprogramminggeneratedalgorithmsprogram.asoptimalse-
quencespecifications,e.g.Knapsackvariants,thelongestcommonsubsequence
algorithmandtheoptimalbinarysearchtreealgorithm.Inseveralbenchmarks
theruntimeoftranslatedspecifications,usingtheoptimalsequenceframework,are
comparedtotheruntimeofdirectlyhandwrittenHaskellimplementations.The
generatedprogrammingvprogramsersions,arebutasfasttheormanfasteruallythanimplemethenmanteduvallyersionsimplemenarenottedtuneddynamicfor
efficiency.Forexample,themanuallyimplementedversionforsomealgorithms
consumesalotofmemoryincomparisontothegeneratedversionsuchthatmuch
runtimeisspentduringgarbagecollection.

17

2AlgebraicDynamicProgramming

AlgebraicDynamicProgramming(ADP)isaformalframeworkforspecifyingdy-
namicprogrammingalgorithmsonsequences.Itclearlyseparatestheconcernsof
searchspacedescription,candidatedescription,candidateevaluationandtabula-
tion.Treegrammars(G)specifythesearchspace,algebras(E)evaluatecandidate
termsandsignatures(Σ)declarethefunctionreservoirwhichtreegrammarsand
algebrasareusing.Tabulationisspecifiedthroughnon-terminalannotationin
treegrammars.Theuseoftreegrammarsforsearchspacedescriptioneliminates
subscriptsfromthealgorithmdescription,i.e.amajorsourceofprogrammingerrors
indevelopingDPalgorithms.
Algebrasarebuildingblockstowrapdifferentscoringschemesoroptimization
strategies(h).Withproductoperationstheycanbecombinedtomorepowerful
analyses.

ADPGenerationFirst2.1TheADPframeworkaspublishedin2004isreferredtoasthefirstgenerationof
ADPdefinitions.ThisfollosectionwthedefinessemanticthesemandescriptionticsofoftheADPinbasic[20],GAP-LSectioncomp3.onents.The
Tosimplifythefollowingdefinitions,weassumethecaseofoneinputtrack,one
obsignaturejectiveΣovfunctionerAisandasetoneofsort.functionAdenotessymbolstheandaalphabdataettofypetheinputplaceholderstring.(sort)A
S.ThereturntypeofeachfunctionsymbolisS,eachargumentisoftypeSorA.
TΣdenotesthetermlanguagedescribedbythesignatureΣandTΣ(V)istheterm
language,whereeachtermmaycontainvariablesfromthesetV.Theregulartree
grammarGisdefinedastuple(V,A,Z,P),whereVisthesetofnon-terminals,
Z∈Vistheaxiom,andPisthesetofproductions.Eachproductionisofthe
2.1:Equationofform

v→twithv∈V,t∈TΣ(V)(2.1)
ThelanguagegeneratedbyatreegrammarGisdefinedbyEquation2.2:
L(G)={t∈TΣ|Z→∗t}(2.2)
Definition1(YieldFunction).ydenotestheyieldfunctionandisoftypeTΣ→
A∗.Itisdefinedasy(a)=a,wherea∈Aandy(f(x1,...,xn))=y(x1)...y(xn),
andn≥0,foreachfunctionsymbolffromΣ.

18

TheyieldlanguageL(G,y)ofatreegrammarGisdefinedbyEquation2.3.
L(G,y)={y(t)|t∈L(G)}(2.3)
Definition2(YieldParsing).Computingtheinverseoftheyieldfunctionyis
calledyieldparsing.TheyieldparserQofatreegrammarGcomputesthesearch
spaceofallpossibleyieldparses:
Q(x)={t|t∈L(G),y(t)=x}(2.4)
Notethatacontext-freeparserforayieldlanguagereturnsparse-treesanda
yieldparserreturnselementsfromTΣ.AuserofBellman’sGAPneedsnottocare
abouthowyieldparsingworks.
Definition3(EvaluationAlgebra).AnevaluationalgebraEforasignatureΣ
containsafunctionforeveryfunctionsymbolfromΣwiththesamearity.The
algebrasubstitutesthesortsymbolwithaconcretetypeS,i.e.thei-thargument
ofthealgebrafunctionfisoftypeSiftheargumentoffunctionalsymbolfisof
typeS.Inadditionanevaluationalgebracontainsaobjectivefunctionhoftype
[S]→[S].
Definition4(ADPProblemSolution).AnADPprobleminstanceisspecifiedby
agrammarG,evaluationalgebraEandinputsequencex∈A∗.Itssolutionis
2.5.EquationybdefinedG(E,x)=hE[E(t)|t∈L(G),y(t)=x](2.5)
ThesquarebracketsinEquation2.5denotemulti-sets,whichareusedtoallow
forco-optimalsolutionsorarestrictedclassofsub-optimalsolutions.
AGAP-LprogramencodesGandE,andrunningoninputx,itproducesthesolu-
tiondefinedinEquation2.5.TheactualexecutionofatranslatedGAP-Lprogram
doesnotenumeratethesearchspace,buildingallcandidatetreesandevaluating
them.Forefficiencyreasonstheobjectivefunctionapplicationisinterleavedwith
theevaluationofcandidatetrees,whicharenotexplicitlyconstructed.Infunctional
languageterminology,thisisacaseofdeforestation.Tabulationofsub-solutionsis
usedtoeliminatere-computationsthatleadtoanexponentialruntime.
TheprerequisiteforcorrectandefficientcomputationofthesolutionisBellman’s
PrincipleofOptimality[5],whichintheADPframeworkisdefinedbyEquations
2.7.and2.6Definition5(Bellman’sPrincipleofOptimality).
hE[fE(x1,...,xk)|x1←Z1,...,xk←Zk]=(2.6)
hE[fE(x1,...,xk)|x1←hE(Z1),...,xk←hE(Zk)]
hE(Z1∪Z2)=hE(hE(Z1)∪hE(Z2))andhE[]=[](2.7)

19

Bellman’sPrinciplestatesthatapplyingtheobjectivefunctiontosub-solutions
orapplyingittosub-multi-setsdoesnotchangethecomputationoftheglobal
optimum.AnexampleofanobjectivefunctionthatdoesnotsatisfiesBellman’s
Principleisafunctionthatselectsthesecondbestsolution,e.g.thesecondlargest
score.ThisfunctionviolatesEquation2.7.

ductsProraAlgeb2.1.1WhileproductsofevaluationalgebrasdonotstrictlyenhancetheADPtheory,they
areofenormouspracticalvalue.Productsallowtheeasycombinationofmultiple
objectives,e.g.tochoosethelargestpizzaamongthecheapestonesortogetalist
ofthebestscoredpizzaplaceineverydistrict.
Withoutproductsthesecombinationsofobjectiveswouldneedtobemanually
re-implementedinanewevaluationalgebra,whichcomputesonl-tuplesforlob-
es.jectivThedefinitionofthelexicographicproductoperationfollows[48,49].
Definition6(LexicographicProduct).LetAandBbeevaluationalgebrasover
Σ.TheproductA∗BisanevaluationalgebraoverΣandhasthefunctions

fA×B(x1,...,xAk)=B(fA(a1,...,ak),fAB(b1,...B,bk))AB
ifxi=(xi,xi),thenai=xi,bi=xi,xi∈SA,xi∈SB,1≤i≤k
ifxi∈A,thenai=xi=bi
(2.8)foreachf∈Σ,andtheobjectivefunction
hA[∗(Bl,[(ra)1|,b1),...,(am,bm)]=
(2.9)rl←←hsetB[(hrA|[a(1l,,.r..),a←m[(])a,1,b1),...,(am,bm)],l=l]].
Theexpressionset(U)reducesthemulti-setUtoaset.Usingtheproduct
theminPriclargeste∙pimaxSizezzafromimplementhectsheaptheestfirstones,obwhicjectivheiscomthelexibinationcographiexample,corderingi.e.itofselectsthe
twoobjectives,hencethenameoftheproduct.Thesecondexampleisaclassifica-
tionasdescribedin[48].Everypizzacandidatefromthesearchspaceisclassified
inwheretosevtheeralobjectivdistricts.efunctionTheprooftheductdistrictdistrict∙balgebraestistheimplemenidentstitythese.classification,

Example2.1.1.1InthissectiontheNussinovalgorithm[36]isintroducedasarunningexample
toapplytheADPdefinitionsandconcepts.TheNussinovalgorithmisahistoric
dynamicprogrammingalgorithmthattakesonecharacterstringasinputandcom-
putesthemaximalnumberofcharacterpairings.Noteverycharacterpaircanform

20

apairing,onlycertaincomplementaryonescandothis.Eachcharactercanonly
bepartofonepairingandtwopairingsarenotallowedtocrosseachother,i.e.for
everytwopairingsi,jandk,lwithi<j<n,k<l<nandi<kitholdseither
j<kori<k<l<j,wherenisthelengthoftheinput.
ThefirstcomponentofanADPalgorithmisthespecificationofthealphabet
A.SincetheNussinovalgorithmisahistoricallandmark(seeforexample[12])in
thefieldofRNAsecondarystructurepredictionanRNAalphabetisusedinthis
example,i.e.A={a,u,c,g}.InRNAsequenceanalysisacharacterfromAiscalled
baseandastringofcharactersiscalledsequence.AnRNAsecondarystructureis
definedbythesetofbasepairings.Figure1.2showsanexamplestructure.
Thesignaturecontainsthefunctionsymbolsweneedtodescribethecandidates
space:hsearctheof

X→:nilright:X×A→X
pair:A×X×A→X
split:X×X→X

whereXdenotesthesortsymbol.Thefunctionsymbolnildescribestheempty
structure,rightdescribesanunpairedbaseontherightofasub-structure,pair
describesasub-structureenclosedbyabasepairingandsplitdescribestwosub-
side.ybsidestructuresForexamplefortheinputsequencex=ccuggthecandidatetermt1describesa
structurewithtwobasepairings:

p=tair1

gairpc

grightc

nilu

Theyieldstring(Definition1)y(t1)equalstheinputsequencex.
largeComsearcbininghspace.theFollofunctionalwingsymgrammarbolsin(Gnevussery)prestrictsossiblethewaysearcyieldshspaceantoallunnecessaryterms
thatrepresentwell-formedRNAstructures:

21

S→nil|right|split

SbSpair

ˆbSb

Thesymbolbisshorthandforabasefromthealphabetandbˆisshorthandfor
acomplementarybase.ForRNAthebases(a,u),(u,a),(c,g),(g,c),(g,u)and
(u,g)arecomplementaryandmaypairwitheachother.
Algebrasdefinehowtoevaluatethesearchspacecandidates.Thefollowingal-
gebrascoremaximizesthebasepairings:

0=nilright(x,e)=x
split(x,y)=x+y
pair(e,x,f)=x+1
h(l)=[maxl]
Itsubstitutesthesortsymbolwithanintegertype.Forprintingthecandidates
asViennaStrings[25],whereeachunpairedbaseismarkedwithadotandeach
pairingwithparentheses,thealgebraprettysubstitutesthesortforastringdata
e.ypt

“”=nilright(x,e)=x+“.”
split(x,y)=x+y
pair(e,x,f)=“(”+x+“)”
h(l)=l
wherethe+operatordenotesstringconcatenationandtheobjectivefunctionis
theidentity.Sinceeverycandidatefromthesearchspaceyieldsadifferentpretty
printunderthisalgebratheaboveNussinovgrammariscalledsemanticallynon-
biguous.amFollowingalgebrausesanobjectivefunctionwhichdiffersfromtheonesbefore.
Itdoesnotselectelementsfromtheinputlist,itpossiblycreatesnewelements.
Thealgebracountsthesearchspacedefinedbythegrammar.

22

0=nilright(x,e)=x
split(x,y)=x∙y
pair(e,x,f)=x+1
h(l)=x
l∈xSolvingtheADPproblemforalgebrascoregivenaninputsequencex,i.e.solving
G(score,x)yieldsaoneelementlistoftypeintthatcontainsthemaximalnumber
ofbasepairings.Usingalgebrapretty,G(pretty,x)returnsalistofallcandidates
notation.StringViennainTheproductscore∙pretty,i.e.solvingG(score∙pretty,x),computesalistoftuples,
wheretheleftcomponentstoresthemaximalnumberofbasepairingsandtheright
storestheViennaStringofthecandidatethatyieldsthisscore.Notethatmore
thanonecandidatemayyieldthemaximalnumberofbasepairings.Theproduct
score∙countcomputeshowmany.
Forexampleforx=ccaggg:

Gnuss(score∙count,x)=[(2,3)]
Gnuss(score∙pretty,x)=[(2,“((.)).”),(2,“((..))”)]

2.1.2HaskellEmbeddingofADP
Haskell-ADP[20]istheimplementationofADPasembeddeddomainspecificlan-
guage(eDSL)inthepurelyfunctionalprogramminglanguageHaskell[28].
ThesignatureofanADPprogramisimplementedinHaskell-ADPasparametrized
typesynonymtupleoftypesignaturesoffunctions,wherethealphabetandsortare
thetypeparameters.Analgebraisofthesignaturetype,wherethetypeparame-
tersaresettoconcretetypes,i.e.analgebraisatupleoffunctions.Atreegrammar
isspecifiedinHaskell-ADPwiththehelpofparsercombinators.Parsercombina-
torsarehigherorderfunctionstobuildparsers.Haskell-ADPdefinesseveralyield
parsingcombinatorstoparsetheinput.Duringtheyieldparsingtheevaluation
algebrafunctionsareappliedsuchthattheresultingprogramisefficient.
InthefollowingtheimplementationoftheHaskell-ADPcombinatorsisexempli-
fied.>typeSubword=(Int,Int)
>typeParserb=Subword->[b]
Asub-wordoftheinputisrepresentedasatupleofintegers.ItisnotaHaskell
stringtosavespace.Theinputisavailableasanarray,i.e.(0,n)marksthecomplete

23

inputParsers,trinwhicg,hiswherenparametrizistheedlengthwithofthethetypeinput.oftheAresparserults.Acomparserbinatortakisesofatypsub-e
wordasinputandreturnsalistofsuccesses,i.e.ifitcannotparsetheinputit
list.yemptthereturns>infixr6|||
>(|||)::Parserb->Parserb->Parserb
>(|||)rq(i,j)=r(i,j)++q(i,j)
ThefirstlinespecifiestheassociativityandthepriorityofanoperatorinHaskell.
Thiscombinatorimplementsanalternativeoftwogrammarrules.Bothargument
parsersarecalledandtheresultsareconcatenated.
>infix8<<<
>>((<<<<<<))fq(i,j):=:(mbap->fc()q(-i>,jPa))rserb->Parserc
Theapplyparsercombinatorappliesanalgebrafunctiontotheargumentsparsed
parsers.targumenyb>infixl7~~~
>(~~~)::Parser(b->c)->Parserb->Parserc
>(~~~)rq(i,j)=[fy|k<-[i..j],f<-r(i,k),
>y<-q(k,j)]
Thenextparsercombinatornestsargumentparsers.
>infix5...
>>((......))rh(i,j):=:hPa(rrse(ri,bj)-)>([b]->[b])->Parserb
TheThisobjeparserctiveisfunctionusedtoevspaluatesecifyantheobparsejectiveresultfunctionlistoftheapplicationparserinonthgrammareleftrules.hand
side.andHasktospell-ADPecifytabuconltainationsoffurtherparseparserresults.comItbinatorsalsoforincludessynsptacticecializationfilteredofparsingthe
basicparsercombinatorsandterminalparsers.

Example2.1.2.1InthissectiontheNussinovalgorithmfromSection2.1.1.1isimplementedin
.ell-ADPHask>typeSignaturealphabetsort=(
>()->sort,--nil
>sort->alphabet->sort,--right
>alphabet->sort->alphabet->sort,--pair
>sort->sort->sort,--split
>[sort]->[sort]--h
)>

24

Thesignatureisaparametrizedtypesynonymwiththeparametersalphabetand
sort.ThenamesofthesignaturefunctionsymbolarewrittenasHaskellcomments
fordocumentationpurposes.Theprogrammerhastobecarefultousethesymbol
namesinalgebraandgrammardefinitionsinthesameorder.
>nussinovalginp=axiomswhere
>(nil,right,pair,split,h)=alg
>s=tabulated(
>nil<<<empty|||
>right<<<s~~-base|||
>split<<<s~~+
>((pair<<<base-~~s~~-base)
>‘with‘basepairing)...h
)>Inthegrammardefinition,variantsofthenextparsercombinatorareusedthat
reducethenumberofconsideredsub-wordsplitsforanefficientevaluation.Allparse
resultsforallsub-wordsaretabulatedandthesecondargumentofthefunction
symbolsplitisonlyparsed,ifthebasepairingfilteridentifiestwocomplementary
basesatthatlocation.Thefilterisadirecttranslationoftheb,bˆshorthandnotation
2.1.1.1.SectioninThescoreandprettyalgebrasareimplementedastupleofHaskellfunctions:
>score::SignatureCharInt
>score=(nil,right,pair,split,h)where
>nil_=0
>rightx_=x
>pair_x_=x+1
>splitxy=x+y
>h[]=[]
>hxs=[maximumxs]
>pretty::SignatureCharString
>pretty=(nil,right,pair,split,h)where
>nil_=""
>rightlb=l++"."
>pairalb=’(’:l++")"
>splitl1l2=l1++l2
di=h>TheimplementationofthelexicographicproductdirectlyfollowsDefinition6:
>infix***
>alg1***alg2=(nil,right,pair,split,h)where
>(nil1,right1,pair1,split1,h1)=alg1
>(nil2,right2,pair2,split2,h2)=alg2

25

Table2.1:ExamplesofHaskell-ADPcodeandtheequivalentADPC-ADPcode,
whereldenotesalistofintegers,uastringandcacharacter.Inthe
firstthreelinestheleftcolumnisnotvalidADPC-ADPcodeandvice
versatherightcolumnisnotvalidHaskellcode.Inthelastlinethe
leftcolumnisnotvalidADPC-ADPcode,buttherightcolumnisvalid
Haskellcode(ityieldsaruntimeerroriflisempty).
ADPC-ADPell-ADPHaskhl=lhx=[idx]
u++[c]u++c
u++cc:uh[]=[];hl=[minimuml]hl=[minimuml]

>nila=(nil1a,nil2a)
>right(x1,x2)a=(right1x1a,right2x2a)
>paira(x1,x2)b=(pair1ax1b,pair2ax2b)
>split(x1,x2)(y1,y2)=(split1x1y1,split2x2y2)
>hxs=[(x1,x2)|x1<-nub$h1[y1|
>(y1,y2)<-xs],
>x2<-h2[y2|
>(y1,y2)<-xs,y1==x1]]
GiventhedescribedelementsoftheNussinovalgorithmimplementedinHaskell-
ADPtheexpressionGnuss(score∙pretty,x)isequivalenttotheHaskellexpression
nussinov(score***pretty)x.

erCompilADPThe2.1.3toTheimpADPerativecompilercode.Itis(ADPC)written[47]inwasHasktheellfirstandeffortgeneratestoCcompilecodebADPydefault.programsIt
containsanalternativebackendthatgeneratesJavacode[43].Theinputlanguage
oftheADPCisaHaskell-ADPdialect(inthefollowingcalledADPC-ADP).The
compilerimplementssemanticanalyses,e.g.yieldsizeanalysis,tabledimension
analysisandtabledesign.ADPCincludesatype-checkerthatcheckstheADP
programsafterasuccessfulparse(seeSection5.3.8foradiscussion).
WiththehelpoftheADPC,severalRNArelatedbioinformaticstoolswerecre-
ated,e.g.RNAshapes[51],pknotsRG[40]andRNAhybrid[41].Thetoolswere
protot(Sectionyped8).inForHaskadiscusell-ADPsionwhicofhthedoesreasonsnotscaleseewSectionell,asb3.1.encUsinghmarktheresultsADPCshotow
bettercompileonthesesequencetoolsinputmadeofitrealpwossibleorldstoizes.createTheversionstable-designthatareheuristicusablederivorpesgoerformod
grammars.someforresultsADPC-ADPusesthelayoutrulesofliterateHaskell:eachcodelinehastobegin

26

witha>character,allotherlinesarecommentsandamoreindentedlinestartsa
newblock(off-siderule).OthersimilaritiestoHaskell-ADParethethreecharacter
conADPtainscomseveralbinatorscompilerandthedirectivHaskellestuple(pragmas)stylethatnotationmarkofcodealgebras.blocksasHaskell-signature,ADP
algebrasandgrammars.TheADPC-ADPprogramswithpragmasandothercon-
structsareinvalidHaskell-programs.Basicdatastructure(e.g.list)operationsin
differences.ADPC-ADPTareableins2.1piregivdebsysometheHaskexamples.ellorTheHaskell-ADPdifferencessyntax,complicatebutshothewtypsomee-
checkingofADPC-ADPprograms.ThelanguageADPC-ADPisnotspecifiedina
t.cumendoIncaseofaparseerrortheADPCprintsagenericerrormessagewithonlythe
Thelinenumactualbercandompinolerfurtheradpcompilediagnostic.andaThefrontendADPCisprogramseparatedadpcintothattwoinspectsprograms.the
ADPsourcefileandcallsadpcompilewithasetoflow-leveloptions.Thefrontend
generatesacommand-lineinterface,amakefileanddoessomepost-processingof
thegeneratedC-Code.Theend-userofthecompilerisexpectedtousetheadpc
forfronthetend.algebrasThefronitfindstenddoandessnotomeallosimplewaproselectionductsoflikeproscoreducts,it***callspretty.adpcompileAside
effectofthefrontendisthatitdoesnotcheckthestatusorerrormessagesof
writtenadpcompiletoa.CFmoordule.example,Theaparseend-useriserrorthenisnotconfronprintedtedtowiththecrypticconsole.errorInsteadmessagesitis
ofThetheincludedC-compilerifheuristicthetablegenerateddesignmakefilealgorithmisexecuted.doesnotderivegoodresultsfor
somepracticalgrammars,e.g.autogeneratedthermodynamicmatchergrammars.
Amalrunproblemtime,isbutthatwithtableprohibitivconfigurationselylargearederivconstanedtwhicfactors.hyieldanasymptoticopti-
heaForvilythepost-paborvoetocessedols,intheasemiADPC-automaticcreatedabasicfashioncoaddedingstructurehand-writtenwhichwcasodethento
implementfeatureslikestochastic-backtracingormorecomplicatedproducts.The
ADPCcompilercannotsupportofgeneratecertainshapproeclassductsisifyingnotproavductsailableforfortheallpknotsRgrammars.GForgrammarexample,[39].
quencesADPC(misultinottracadvkertiDP).sedtoProbablysupportitincludesdynamiclimitedprogrammingtwo-trackonmsuppultipleorttoinputcompilese-
[41].ybridRNAhparisonsThebofenchselectedmarkcADPhaptergrammars(Chapter8)compiledcontainswithrunthetimeADPCandandmemoryGAP-C.usagecom-

ADPGenerationSecond2.2ThenotionSecondGenerationADPsubsumesextensionsoftheADPframework
thatweremainlydevelopedafter2004andareunpublished.Theinterleavedprod-
uctwasdevelopedbyRobertGiegerich.Theroleclassificationschemeofalgebras
andthegeneralizationofADPformultipleinputtrackshavebeendevelopedas

27

thesis.thisofpartsductsPro2.2.1atmostDefinitioniscal7leda(UnitaryunitaryAlgebra)ob.jectivAneobfunction.jectiveAnfunctionalgebrathatconreturnstaininglistsonlyofsizeunitaryone
objectivefunctionsiscalledaunitaryalgebra.
Definition8(CartesianProduct).LetAandBbeunitaryevaluationalgebras
overΣ.ThecartesianproductA×BisanevaluationalgebraoverΣandhasthe
functionsfA×ifB(xx1i,.=..(,xxAk,)x=B)(,fA(athen1,...a,ia=k),xfAB,(bib1=,..x.B,,bk))xA∈SA,xB∈SB
iiiiii
ifxi∈A,thenai=xi=bi,1≤i≤k
(2.10)foreachf∈Σ,andtheobjectivefunction
hA×B[(a1,b1),...,(am,bm)]=
[(l,r)|(2.11)
lr←←hhAB[[ab11,,......,,bamm]],].
advanUsingtagesthecartesiancomparedprotoductcomputingandjustG(A,x)computingandGG((AB,×x)B,x)separatelyprovides,noexceptstrongthe
offirstlargerisaprobitductsfasterinwherepracticthee.comHobinwatiever,onistheadvancartesiantageousprotoductseviseralusedasredundanpartt
re-computationsofotherpartsoftheproducts.
Definition9(GenericAlgebra).AgenericevaluationalgebraA(k)hasaparameter
kandanobjectivefunctionthatreturnsthekbestsolutions.
Σ-algebraDefinitionsuch10that(InBterlea(1)visedProunitary.duct)The.inLetAterleabveedaΣproduct-algebra(A⊗andB)(Bk()k)isaagenericgeneric
Σ-algebraandhasthefunctions
fA⊗B=fA×B(2.12)
foreachf∈Σ,andtheobjectivefunction
h(A⊗B)(k)[(a1,b1),...,(am,bm)]=
[(l,r)|(l,r)←U,p←V,p=r]
(2.13)whereVU==hset(A∗hBB(1)(k[)([av1,|b(1)_,,.v..),(←amU,])bm)]
28

Theinterleavedproductisanextensionofthelexicographicproductformore
classifiesrestrictedeveryclassificationsolutioninpurptoaoses.classandConsideralgebratheBproscoresductAand∙B,selectswherethebestalgebrasolu-A
tion.algebra,Theanresultalgebraofthethatproselectsductisthethek-bbestestsolutionsolutionsofforevBeryresultsclass.inUsingtheka-bkest-scoringsolu-
thetionsk-bofesteveryclassesclass.withUsingthebtheestsolutionlexicographicofevproeryduct,classit.isHonotwevper,ossiblethetointerleacomputeved
product(A⊗B)(k)doesexactlythis.

Generalizations2.2.2IntivethefunctioncaseofhEalgebrassplitsintocontaisevneralingmorefunctionsthanhE1,onehE2,ob...jectivwhicehallfunction,havethetoobsatisfyjec-
Bellman’sPrincipleofOptimality.Inthecaseofmorethanonesort,thebasic
evaluationsemanticsdoesnotchange.Formulti-trackprograms,thereturntype
ofoftheusedyieldtracks.IffunctiontheyisGAP-Lthel-tupleprogramofcomputesstringsonfromlAtrac∗,ks,wherethenlistheEquationnumb2.14er
tics.semantheecifiessp

G(E,x1,...,xl)=hE[E(t)|t∈L(G),y(t)=(x1,...,xl)](2.14)

racteristicscharaAlgeb2.2.3InBellman’sGAPthecompilerinspectsthealgebrasandproductsandcomputes
twoattributesoftheobjectivefunctions.Thefirstattributeistheroleofanob-
jectivefunctionandthesecondoneisthemaximallengthofthelisttheobjective
functionmayreturn.Thereasonforthischaracterizationisthreefold.First,this
informationimplyincodegenerationthatoptimizationslikelisteliminationor
theuseofhashtabledatastructuresinclassifyingproductsaresafelyapplicable.
Second,whenanalysingtheroles,thecompilerisabletoprinterrormessagesin
cases,whereitissurethattheproductdoesnotsatisfyBellman’sPrinciple.Third,
dependingonthecomputedroles,thecompilerisabletoissuewarningsabout
productsthatwillproduceexponentiallysizedanswerlists.
Considere.g.ascoringalgebrawithoneobjectivefunctionthatcomputesthe
minimumoftheinputlist.Iftheinputlistisempty,thentheemptylistisreturned,
elsetheminimumvalueisreturned.TheADPframeworkassumesthatevery
objectivefunctionreturnsalistofsuccesses.Inthisexample,thelistisalwaysof
sizezeroorone.Thus,inthegeneratedcodethereisnotreallyalistdatastructure
neededfordealingwiththeresultoftheobjectivefunction.Insteadofalistoftype
T,avariableoftypeTissufficienttostoretheresultoftheobjectivefunction.
Byconventionavaluefromthedomain,likee.g.‘infinity’,encodestheemptylist.
Doingsuchanoptimization,theruntimeoftheresultingcodeisimproved,because
listoperationsaremoreexpensivethanelementarydatatypeoperations,memory
issavedandtheCPU’scacheismoreefficientlyutilized.

29

AnexamplewhereawholeclassofproductsdoesnotsatisfyBellman’sPrinciple
iscount∙B,wheretheobjectivefunctionofthealgebracountsumsoverallelements
oftheinputlistandBisanalgebrathatsatisfiesBellman’sPrinciple.Following
thedefinitionofthelexicographicproduct(Equation2.9),theobjectivefunction
h1ofthecountalgebraproducesanewvaluewhichisnotincludedinthefirst
componentsoftheinputlistelements,unlesstheinputlistisemptyorcontains
justoneelementandthefirstcomponentissetto1.Thus,theanswerlistof
theproduct’sobjectivefunctionhisalwaystheemptyvalue,unlesstheinputlist
containsonetuplewithavalueof1inthefirstcomponent.Consideraninputlistl,
wherel=[(1,_),(1,_)]andasplitoftheinputlistl=l1++l2,wherel1=[(1,_)]
andl2=[(1,_)].Then,h(l1++l2)=h(h(l1)++h(l2)),whichviolatesBellman’s
2.7).(EquationPrincipleExamplesofalgebrasthatusuallyyieldexponentiallysizedanswerlistsarethe
algebraprettyortheproductpretty∙scorewheretheobjectivefunctionofpretty
istheidentityandtheobjectivefunctionofscorereturnsthemaximalorminimal
valueoftheinputlist.Bothproductsenumeratethe(usually)exponentiallysized
space.hsearcIfthealgebracontainsjustoneobjectivefunction,thentheroleofthealgebrais
definedastheroleofthatobjectivefunction.SinceGAP-Lsupportsalgebraswith
multipleobjectivefunctions,therolesofeachobjectivefunctioncandiffer.Insuch
cases,theroleofsuchalgebrasisonlydefined,ifallobjectivefunctionshavethe
samerole.Theconceptofaroleofanalgebraisjustashorthandnotion,internally
fortheoptimizationsthecompilerworkswiththerolesoftheobjectivefunctions.
Aproductoftwoalgebrasisagainanalgebra.
GAP-Cclassifieseachalgebraintooneoffourroles:selective,enumerative,set-
valuedandsynoptic.Aselectiveobjectivefunctionselectsoneormoreelements
fromtheinputlistaccordingtoanoptimizationcriterion.Anexampleforaselective
algebraisanalgebrathatcomputesscoresandminimizesthem.Anenumerative
objectivefunctionjustreturnstheinputlist.Forexample,aprettyprintingalgebra
isenumerative.Aset-valuedobjectivefunctionremovesallduplicatesfromthe
inputlist.Asynopticobjectivefunctionisafunctionthatdoesnotselectelements
fromthelist,butcomputesnewvaluesfromtheinput.Anexampleofasynoptic
algebraisacountingalgebra,wherethesearchspaceofaGAP-Lgrammaris
counted.Theobjectivefunctionthenisthesumoftheinputlist.Thefollowing
definitionsummarizesthepropertiesofthevariousroles:
Definition11(Algebraroles).AnevaluationalgebraAwithobjectivefunction
hA,withhA([])=[],is
•enumerative,ifhA(X)=X,
•set-valued,ifhA(X)=set(X),
•selective,ifhA(X)⊆X,
•synoptic,if|hA(X)|=1and∃X:hA(X)⊂X

30

Table2.2:Therolesandthemaximumreturnlistsizesofobjectivefunctionsde-
compiler.theybtectedobjectivefunctionrolemaximumlength
1eselectivlist(minimum(l))return1eselectivlist(maximum(l))return1synopticlist(sum(l))returnreturnreturnlunique(l)enset-valumerativuedenn

forallmultisetsX=[].
Thecompilertriestodetecttheroleofeachevaluationobjective.Table2.2
showsthedetectedroleofcertainobjectivefunctionexpressions.Ifauser-defined
objectivefunctionisused,thenthecompilerassumesascoringrolewithpossibly
morethanoneoptimalelementreturned.Whendefininganobjectivefunctionin
GAP-L,itispossibletoexplicitlyspecifytheroleoftheobjectivefunction(see
4.5.5.1).SectionCombiningtwoalgebraswiththecartesian,lexicographicorinterleavedproduct
ofoptheerationresultingcreatesobanejectivwealgebra.functionsTabledep2.3endingshowsonthetherolesleftandandwrightorst-casemlistultiplicand.sizes
MostcellsinthecartesianproductmatrixaremarkedasnotsatisfyingBellman’s
Principle,becausethedefinitionofthecartesianproductoperationrequiresthat
eachoperandisaunitaryalgebra(Definition8).Likewise,thefirstoperandofthe
interleavedproducthastobeaenumerativeorset-valuedalgebraandthesecond
operandhastobeaselectivealgebrawithananswerlistgreaterone(Definition10).
aForpermtheutationlexicographicoftheproinputduct,list,comwhicbinhingfollotwowsendirectlyumerativefromobthejectivproeductfunctionsdefiyielnitionds
(Definition6).First,alltheleftcomponentsoftheinputlistareextractedand
allduplicatesareremoved.Thenforeachextractedcomponentthecorresponding
tuplesareselectedthatmatchtheleftcomponent.Analogoustothat,theproduct
ofaset-valuedobjectivefunctionandanenumerativeoneisapermutationofthe
inputlist,too.IntheADPframeworkapermutationoftheanswerlistofan
objectivefunctiondoesnotmatter,sincethelistsareviewedasmulti-sets,where
theorderofelementsisnotimportant.Thus,duringcodegenerationthecompiler
isabletoeliminatetheobjectivefunctionapplicationinenumerativeproductsor
inenumerativesub-products,whichleadstomoreefficientcode.

31

Table2.3:Resultingrolesofthedifferentproductoperations(sel→selective,enum
→enumerative,set→set-valued,syn→synoptic).Theworst-case
lengthofthereturnedlistsisspecifiedinparentheses,wherenisthe
sizeoftheinputlist.A-entryinthetablemeansnon-preservationof
Principle.Bellman’scartesian(a)×sel(1)sel(n)enum(n)set(n)syn(1)
sel(1)sel(1)---syn(1)
sel(n)-----
enum(n)-----
set(n)-----
syn(1)syn(1)---syn(1)

lexicographic(b)∙sel(1)sel(n)enum(n)set(n)syn(1)
selsel(n(1))selsel(n(1))selsel((nn))selsel((nn))selsel((nn))synsyn(n(1))
ensetum(n)(n)selsel((nn))selsel((nn))enenumum((nn))setset((nn))synsyn((nn))
syn(1)-----

edvterleain(c)×sel(1)sel(n)enum(n)set(n)syn(1)
sel(1)-----
sel(n)-----
enum(n)-sel(n)---
set(n)-sel(n)---
syn(1)-----

32

3OverviewGAPBellman’s

Bellman’sGAPisaprogrammingsystemfordevelopingdynamicprogramming
algorithmsoversequencesintheADPframework.Itisnamedafterthemost
concepts:ADPtortanimp•Bellman’sPrincipleofOptimality(Definition5)
Grammars•Algebras•2)(SectionductsPro•Bellman’sGAPconsistsofthefollowingparts:
•GAP-L—adeclarativelanguagewithC/JavalikeSyntaxforspecifyingADP
programs.Grammarsarespecifiedinadeclarativestyle,wheretherighthand
sideofnon-terminalscontainstreepatternsresemblingfunctioncalls.The
algebracodeiswrittenasimperativecodeblocks.Instancedeclarationsallow
thecombinationofalgebraswithproductoperationstonewalgebras.
•GAP-C—anoveloptimizingcompilerthattranslatesGAP-Lprogramsto
efficientC++code,whichiscompetitivewithhandwrittencode.
•GAP-M—aruntimelibraryforcompiledGAP-Lprogramsincludingef-
ficientimplementationsofinternaldata-structuresandamoduleproviding
conveniencefunctionsforaccessingenergyparametersandcomputingenergy
contributionsasusedinRNAsecondarystructureprediction.
•GAPPages—aneducationalwebsiteprovidinganinteractiveinterfacetoa
examples.GAP-LofsetThefollowingsectiondescribesthemotivationbehinddesigninganewlanguage
forADPandtochoosetherouteofcompilingADPprogramsinsteadofembedding
itintoanotherlanguage.Section2.1.3hasdiscussedthepreviousADPCcompiler,
thefirstcompilertranslatingADPtoC,andhasmentioneditsshortcomings.
GAP-LispresentedinChapter4,wherethedesigngoalsarediscussed,thesyn-
taxisspecifiedandnewADPfeaturesareshown.Chapter5describesGAP-C,
presentingtheoverallarchitectureofthecompiler,specifyingseveraloptimization
algorithmsusedinsemanticanalysesofthecompiler,e.g.anoveltabledesignalgo-
rithm,andtheusedcodegenerationtechniques,e.g.forparsingorparallelization.

33

InChapter6GAP-Mmodulesarepresented,theoptimizedimplementationofsev-
eralinternaldata-structures,likee.g.memorypoolsandstrings,arediscussedand
theRNAdomainspecifichelperlibraryisspecified.GAPPagesispresentedin
Chapter7andChapter8includesseveralbenchmarksofGAP-Lversionscompiled
byGAP-CagainstADPC-ADPversions,Haskell-ADPversionsandmanualcoded
tations.implemen

3.1LimitationsofHaskell-embeddedADP

ThefirstimplementationofADPwasdoneasembeddeddomainspecificlanguage
(eDSL)inHaskell[21].Haskell[28]isageneralpurposepurelyfunctionalpro-
gramminglanguagethatuseslazy-evaluationbydefault.InthefollowingtheADP
embeddingiscalledHaskell-ADP.
ADPThearemotivtwationo-fold.forFirst,designicompilednganewHasklanell-ADPguagedoandesanotpstand-aloneerformandcompilerscalewell,for
theseeHaskChapterell8systemforbdoenceshmarksnothaofveHaskknowledgeell-ADPofvADPersions.toTheapplyreasonADP-spforthisecificisopti-that
amizations.scoringFalgebraorisexample,used.theOtherHaskellfactorsystemsarethecannotlazy-eveliminatealuationtheuseandoflistsgarbageifonlycol-
lectionofHaskell.Lazy-evaluationmeansthatexpressionsinfunctionalprograms
areonlyevaluatedifneeded.Lazy-evaluationpaysoffincases,wheretheoverhead
ofbocomputationokkeeping,ofallwhichexpressions.expressionsForareorexample,arenotifthecomputed,structureisoflessthethansearcthhespacestrict
putedimplieswiththatonlylazy-evafewaluation.entriesHowofevaer,DPifantableareexpressionused,onlytriggersthesetheentriescomputationarecom-of
allcompilersubcanexpression,optimizethenlazy-evunnecessaryaluationlazy-evisjustaluationovaerhead.wayandInthesomecasesprogrammeraHaskcanell
placestrictnessannotationsinthecode.ButinHaskell-ADP,theheavyuseof
higher-orderfunctions(parsercombinators)makesitdifficulttodebuglocations
wherelazy-evaluationorstrictnesswouldimprovetheruntime.
Garbagecollectionisaruntimesystemthatautomaticallydeletesobjectswhich
arelectionnotisaneededhardanyproblem.more,Ifande.g.freesanobtheirjectismemoryreused.sevDoingeralantimesefficienintthegarbagesubsequencol-t
Onprogramtheotherexecution,hand,ktheneepingitmakobesjectssensetootoklongeepwitastesaroundtomemoryav.oidThentherere-computations.isthe
cutiondesignofdecision,thehoprogram.woftenAthegarbagegarbagecollectioncollectionalgorithmshouldhasruntoandinimplementerrupttaptheolicyexe-if
thereismemorypressure:whichobjectstodeletefirstandwhichcannotbedeleted
atall.Thegeneraltrade-offingarbagecollectionismemoryusageefficiencyvs.the
runcollectiontimeoftencomplexitleadsyoftotheextensivgarbageememorycollectionconsumptionalgorithm.forInHaskpractice,ell-ADPtheprogramsgarbage
8).Chapter(seeInaddition,Haskell-ADPusesheavy-weightdata-structures.Forexample,a

34

ERROR"Optbin.lhs":89-Inferredtypeisnotgeneral
enough******ExpectedExpressiontype::treeTree_AlgebraAlphabeta
[a]->[Alphabet]->***Inferredtype:Tree_AlgebraAlphabet()
[()]->[Alphabet]->

Figure3.1:ErrormessageexampleofanADPerrorinHaskell-ADPissuedbythe
HaskellinterpreterHugs:inthegrammardefinitionafunctionsymbol
isusedwiththewrongnumberofarguments.

Haskellstringisalistofcharacterobjectsandevenintheoptimizedcaseacharacter
usesA12bdedicatedytes[9]ADP(usingcompilerGHCcanonaav32oidbitthearchiproblemstecture).oftheADPHaskellembedding
initscodegeneration.Thecompilercaninspecttheinputprogramandimplement
severalADPspecificoptimizations,e.g.listeliminationsortabledimensionreduc-
tion.Itisabletorestrictlazy-evaluationtocases,whereitislikelytopayoff.The
needforgarbage-collectioncanbeeliminatedformakingdynamicprogramming
programsSecond,themoreembefficieneddingt,bofyusingADPinanHaskexplicitellhasmemorysomeusabilitmanagemenytcoimplications.de.Since
antheADPHaskellprogramsystemmadoyesyieldnotknogenericwtheand/orsemanticslongoftypADP,ae-inference(perhapserrortrivial)reportserrorfromin
theHaskellsystem.Figure3.1showsanerrormessageexample.Suchmessages
supposesomeknowledgeoftheHaskellprogramminglanguageandtheimplemen-
tationlanguage.detailsForofexample,Haskell-ADPHask.ell-ADPThedesignhastoofuseHasktheell-ADPoff-sideisrule.restrictedTheboff-syiditserulehost
meansminimizethatovertheloadinindengtationproblemsofwithlinesexistingdefinesopwhereerators,codetheblocusedksstartparserandcomend.binatorsTo
havealengthof3characters.
Amaintargetaudiencearebioinformaticians,whoareusuallynottrainedHaskell
programmers.Forthemtheoff-sideruleisanew,complicatedconceptandtype-
notinferenceneedtoerrorknowmessagestheareimplemen“opaque”tation.Indetailsanyofcase,Haskanell-ADPADPprtoogrammerunderstandshouldthe
messages.errorAnADPlanguageimplementationoutsideofHaskelldoesnotneedtoconsider
Haskellconstraints.Forexample,theoff-sideruleisnotusedandthesyntaxofthe
grammardeclarationcanbedesignedaccordingtoafunction-likenotationwiththe
wellknownsinglecharacterforanalternativeoperator.Algebrafunctionsdonot
needimplementobetanimplADPemensptedecificastHaskypeellcheckerfunctions.(SectionAnADP5.3.8)thatcompilerisgeneratethesrighmoretplaceusefulto
messages.arningwanderror

35

LanguageGAPBellman’s4

Bellman’sGAPLanguage(GAP-L)isthe2ndgenerationdomainspecificlanguage
forconstructsprogrammingforseveralADP.eleItsmensyntsoftaxanisJaADPva/Clikalgorithm,ebute.g.GAP-Lforspincludesecifyingthedeclarativgram-e
mar.Itisnotembeddedintoahostlanguagetoavoidunnecessaryconstraintsin
thedesignofthelanguage.GAP-Limplementsthefeaturesofthefirstgeneration
ADP(Section2.1)andlaterADPextensions(Section2.2).Severalconcepts,like
e.g.syntacticfiltering,aregeneralizedinGAP-LandadvancedDPtechniques,like
e.g.Thenextsampling,sectionareavdiscussesailableviathethedesignlangugoalsageofinaGAP-L.generalInwaySection.4.2thenewADP
featuresinGAP-Larelistedwithforwardreferences.ThediscussionofaGAP-
LversionoftherunningNussinovexampleinSection4.3exemplifiesthemain
syntacticelementsofaGAP-Lprogram.Section4.4describesthelexicalstructure
andSectionSection4.64.5presenthetssynadvtactiancedcstructurelanguageusingfeatures,aCFlikGeofe.g.anfilteringGAP-Lormprogram.ulti-trackFinallyDP,
thatcoverseverallanguageconstructs.

GoalsDesign4.1

ThemaindesigngoalofGAP-Lis:Itshouldbeeasytolearnandtousefornew
usersanditshouldbeusableeffectivelybyADPexperts.
AmaintargetaudienceofGAP-Larebioinformaticians.Mostlikelyanunder-
graduatebioinformaticsstudenthassomeknowledgeofanimperativeprogramming
languagewithC-likeSyntaxlikeJavaorC/C++.Thereforethesyntaxoflarge
partsofGAP-LisC-like.Usingknownsyntaxelementsandconceptslowersthe
barrieroflearningandusingGAP-Lfornewusers.Thesignaturedeclaration
inGAP-LresemblesaninterfacedeclarationinJava.Analgebraimplementsa
signaturelikeaJavaclassmayimplementaJavainterface.Tohighlightthereuse
ofalgebrafunctions,GAP-Lhastheconceptofalgebrainheritance.Analgebramay
extendanotheralgebraandonlyoverwritesinglefunctions,asclassescanextend
otherclassesinJava.ThesyntaxofalgebrafunctioncodeinGAP-LisC-like,
too.Thecomplexityofdealingwithdatastructuresisencapsulatedintheruntime
librarythatprovideshigh-leveldatastructureoperationsandtoolsviaafunction
basedAPI.Thus,GAP-Ldoesnotneedtoprovideobjectorientedlanguagefeatures
orpointer-arithmetic.Thelanguagefordefiningalgebrafunctionsisbasicallya
a.vmini-JaAsaconsequenceofusingaC-likesyntaxandavoidingtheoff-siderule,GAP-L
iseasiertoparse.Easierparsingresultsineasiergenerationofhelpfulwarning

36

tracandkingerrorofmesexactsageslowhiccationshishelpstosimplifiedsatisfyifnotheabpre-proovecessingmainofgoal.theForoff-sideexample,layouttheis
done.ForconstructstheinADPgrammarexpert,rulesGAP-Landproprovidesducts,advancedinstancedecfeatureslarations,likeextendedgenericsuppfilteringort
forwhereproducts,needed.SeeparametrizedSection4.2fornon-terminalsanoverview.andtheexplicitmanipulationofindices
syntax.Both,theTheADPtreebgrammareginnerandpatternstheADPontheexpertrightprofithandfromsidetheoftheGAP-Lnon-terminalgrammar
argumendefinitiontsareofafunctionfunctionliksyme.bTholerearearenoseparatedthreebyccharacterommas.Spparserecialvcomersionsbinators.ofnext-The
btocomecausethebinators,compilerlikeinHaskautomaticallyell-ADP,whicoptimizeshtheseparatemothevingindexargumenbts,areoundariesnotbetwneeded,een
thegrammarargumentsnotation.(SectionAgain,5.3.12).thissynThtaxus,isanotGAP-Lonlyeasiergrammartoloread,oksbutlikealsopseudo-coeasierdeto
parseandeasiertotypecheck.

featuresADPNew4.2

GAP-LisanimplementationoftheADPframework.TheADPconceptsofalpha-
bet,GAP-L.Insignatures,additionevtoaluationthat,alitingebras,troducestreethegrammarsfollowingandnewproductsconcepts.areaThevailabledetailedin
descriptioninlatersectionsisreferencedinthelistingoffeatures.

•Analgebracanextendotheralgebras,i.e.intheextendedalgebra,algebra
functionscanbeoverwrittenoradded(Section4.5.5and4.6.1).
•AlgebrasthatcountthesearchspaceorprintthecandidatetermasASCII
serializationarebuiltmechanicallyafterthestructureofthesignature.In
GAP-Ltheuserdoesnotneedtoprogramthemforeachnewsignature.
Itispossibletodeclarethemasautomaticandthecompilerautomatically
4.5.5).tion(Secthemgenerates•AGAP-Lprogrammaycontainseveralinstancedeclarationsthatspecify
differentproducts(Section4.5.9).Forexample,instancenamesallowtoref-
erencecomplicatedproductswhencompilingaGAP-Lprogram.
•GAP-Lincludesnewproductoperations(Section4.5.9).Thetake-oneprod-
uctisaspecializationofthelexicographicproductthatignoresco-optimal
candidates.Theoverlayproductallowsthespecificationofdifferentalge-
braduringtheforwardcomputationandbacktracing.Thecartesianand
interleavedproductsarenotnew,butarerecentinnovationsfromtheADP
community.AllproductoperationsaredirectlysupportedbyGAP-C,i.e.
thecompilerautomaticallygeneratesoptimizedcodethathasthesemantics
oftheproductoperation.Incomparisontothat,inHaskell-ADP,theuseris

37

requiredtomechanicallyprogramtheproductoperationforeachsignature
fromscratchasHaskell-Code.
•Theconceptofsyntacticwith-filtersinADPisgeneralizedinGAP-L(Section
4.5.8.5and4.6.2).Thepatternsoftreegrammarscannowcontainsyntactic
filtersthattakemorethanonefunctionsymbolargumentatonceintoaccount.
onInthevaddition,aluesofgrammarthegrammarpatternsparserscanbethatrestricteresultdbyfromthesemanticreferfiltersencedthatgrammarfilter
pattern.•Semanticfilteringispossibleinproducts,aswell(Section4.5.9and4.6.3).
Thisallowstoreuseproductfiltersfordifferentproductsandseparatefiltering
concernsfromoptimizationconcernsintheevaluationfunctionofthealgebra.
•Parametrizednon-terminals(Section4.5.8.1)allowtodeclarenon-terminals
inthetreegrammarthathavearguments.Thisallows,forexample,torestrict
therecursiondepthofrecursiverulesorpasscontextdependinginformation
aroundandfeedthemintoalgebrafunctions.
•TheADPframeworkisspecifiedfordynamicprogrammingoveroneinput
sequence(track).GAP-LgeneralizesADPtomultipleinputtracks(Section
4.5.8.2and4.6.4).Forexample,parsersinthetreegrammarcanreadfrom
oneormultipleinputtracks.Itispossibletointegratesingletrackgrammars
intomultitrackones.
•ADPeliminatesindicesfromthesearchspacedescription.However,insome
desperatecasestheaccessandmanipulationofmovingindexboundariesis
necessaryforefficiencyreasons.InGAP-Lsomelanguageconstructsare
providedforspecifyingexplicitindicesandmovingindexboundaries(Section
4.5.8.4and5.4.5).Thismakesitpossibletospecifylargepartsofadynamic
possibilitprogrammingytomanipulatealgorithmlohigh-levw-levelelinindicesADPonlyandinhavsingleeatlothecations.sametimethe

Example4.3BeforeintroducingthesyntaxofGAP-Linalldetailinthenexttwosections,the
basicelementsofanADPalgorithmareshownimplementingtheNussinovexample
fromSections2.1.1.1and2.1.2.1inGAP-L.
ThesignatureisimplementedinGAP-Lviathefollowingdeclaration:
signatureNuss(alphabet,answer){

38

answernil(void);
answerright(answer,alphabet);
answerpair(alphabet,answer,alphabet);
answersplit(answer,answer);

choice[answer]h([answer]);

}Thealphabetisaplaceholderforthetypeoftheinputcharactersandanswer
isthesort,i.e.theplaceholderforthevalueofacandidateunderanevaluation
algebra.Functionsymbolsignaturesareexplicitlynamed.Thenameisnotplaced
insideacommentasinHaskell-ADP.Theorderofthefunctionsymbolsignature
declarationsdoesnotmatter.Thesyntaxofthefunctionsymboldeclarationsis
Java/C-like,thetypeofthereturnvalueiswrittenbeforethefunctionsymbolname
andnotattheend,asinHaskell.Thechoicemodifiermarkstheobjectivefunction
ol.bsymThefollowingdeclarationshowstheNussinovgrammarinGAP-Lsyntax:
grammarnussinovusesNuss(axiom=struct){
struct=nil(EMPTY)|
right(struct,CHAR)|
split(struct,
pair(CHAR,struct,CHAR)
withchar_basepairing)#h;
}Treepatternsontherighthandsidearewritteninafunctionlikenotationand
therearenothree-characterwideparser-combinator-likeoperators.Inparticular,
therearenonext-combinatorvariants.Argumentsofafunctionsymbolapplication
areseparatedbycommasbecausetheGAP-Lcompilerautomaticallyoptimizes
movingindexboundariesinthegeneratedcodeusingresultsfromtheyieldsize
5.3.12).(SectionanalysisAfterthedescriptionofthesearchspaceandthesignature,weneedtodefine
algebrastoassignameaningtoeachcandidateandspecifyhowtooptimizeover
thedifferentcandidates.Theeasiestalgebraspecificationsareautomaticones:
algebracoautocount;
algebraenautoenum;
Thefirstonegeneratesanalgebrathatcountsthesearchspaceunderthegiven
treegrammarandthesecondenumeratesthesearchspace,whereeachcandidateis
printedinatermrepresentation.Thelatterisusefultocheckwhetheragrammar
matchesitsintention.TheGAP-Canalyzesthegrammarandautomaticallygener-
atesthecodefortheautomaticalgebradeclaration.InHaskell-ADPthesealgebras
needtobemanuallyimplementedforeverygrammar.
Nextthescorealgebrawhichcomputesthemaximalnumberofbasepairingsis
implementedinGAP-Lsyntax:
algebrascoreimplementsNuss(alphabet=char,
answer=int)
{

39

intnil(void){return0;}
intright(inta,charc){returna;}
intpair(charc,intm,chard){returnm+1;}
intsplit(intl,intr){returnl+r;}
choice[int]h([int]l){returnlist(maximum(l));}
}Thealphabetandsortplaceholdersinthesignaturearemappedtoconcretetypes
intheheaderofthealgebradeclaration.ThecodeofthealgebrafunctionisC/Java
like.Thefunctionslistandmaximumarepre-definedandspecializedversionsfor
differenttypesandproductsarepartoftheGAP-Lruntimelibrary.
Theimplementationoftheprettyalgebralookslikethis:
algebraprettyimplementsNuss(alphabet=char,
answer=string)
{stringnil(void)
{stringr;
returnr;
}stringright(stringa,charc)
{stringr;
append(r,a);
append(r,’.’);
returnr;
}stringpair(charc,stringm,chard)
{stringr;
append(r,’(’);
append(r,m);
append(r,’)’);
returnr;
}stringsplit(stringl,stringr)
{stringr;
append(r,l);
append(r,r);
returnr;

40

}

choice[string]h([string]l)
{returnl;
}

}Inthiscase,theevaluationfunctionistheidentity.Theresultofthepretty
printalgebraevaluationisalistofVienna-Strings[25].Thestringsareconstructed
viathebuilt-infunctionappend,whichisoverloadedfordifferentdata-typesand
arguments.ThisexampleshowstheimperativenatureoftheGAP-Lalgebracode.
TheexampleproductsfromSection2.1.1.1aredefinedinGAP-Lviainstances:
instancescorepp=nussinov(score*pretty);
instancescoreco=nussinov(score*count);
TheGAP-Lprogrammerdoesnotneedtomanuallywritethedefinitionofthe
lexicographicproductfortheabovesignature,asinHaskell-ADP,itisautomatically
GAP-C.ybedderiv

StructureLexical4.4ThecharactersetofBellman’sGAPprogramsisASCII.Thelexingiscasesensitive.

rdsoKeyw4.4.1Thefollowingtokensarekeywords:
algebraalphabetautoaxiomchoiceclassifyelseextendsexternforgrammarifim-
plementsimportinputinstancekscoringoverlayparametersprettyreturnscoring
signaturesuchthatsynoptictabulatedtypeusesvoidwith

Comments4.4.2CommentsarespecifiedasinC++/Java.Everythingbetween/*and*/andfrom
//totheendoflineisignored.

rseratoOp4.4.3Thefollowingoperatorsaresupported:
+-=*/%.<>==!=<=>=&&||!+++=---=

41

Constants4.4.4Characterconstantsareenclosedbysingleticks(’)andstringconstantsareen-
closedbydoubleticks(").
NumbersareencodedasintegersorinthestandardIEEE754floatingpoint
notation.

Whitespace4.4.5Blanks,tabsandnewlinesoutsideofconstantsareconsideredaswhitespaceand
ignored.are

Identifiers4.4.6Identifiersaredescribedbythisregularexpression:[A-Z_a-z][A-Z_a-z0-9]*

outyLa4.4.7Thereisnospecialtreatmentofthesourcecodelayout,i.e.thereisnooff-side
blorulecks(likareeinenclosedHaskellborycurlyPython).braces.Statementsareseparatedbysemicolonsandcode

StructureProgram4.5Thegrammarsyntaxareof’:’GAP-L(separatingisspecleftifiedandasarighconthandtext-freesideofprogrammar.ductions),Meta-sym’|’bols(separatingofthe
onthealternativrighetrighhandthandsidesidises),alwaandys’;’listed(separatinglast.Nonproterminalductions).symAnbolsemptareyalwrittenternativine
plaintypewriterstyle,terminalsymbolsinitalic-facestyle.Thenon-terminal
symbolidentdesignatesarbitraryidentifiers.Forclarity,itwillbewritteninthe
formalgebra_ident,var_identetc.toindicatethesemanticroleoftheidentifier.
However,theseidentifiersfordifferentrolesarenotdistinguishedsyntactically.
areAoptional,Bellman’sbutGAPtheorderprogramofisthesectionsstructuredisintofixed.sevTheeralsections.non-terminalSomeprogramsectionsis
thegeneratestartsymtargetbolcoofde,thebutsynaretaxnotneedescription.dedforSomesemanticoptionalanalysespartsbyareGAP-C.mandatoryto
program:

imports_optinput_opttypes_opt
signaturealgebras_optgrammarinstances_opt
;

4.5.1rtsoImp

42

imports_opt:
imports|
;

imports:
import|
importsimport
;

import:
importmodule_ident
;

treatedTheasnamesmodule_identofacanuserbeadefinedmodulemofromdule.TheGAP-M;moduleotherwisrnaeismoanduleexamplenamesforare
aenergymoduleconfromtributionGAP-Msofbases(seein6.4).ItdifferentdefinesRNAseveralsecondaryfunctionsstructureforelemencomputingts.free

Input4.5.2

input_opt:
inputinput_specifier|
input’<’inputs’>’|
;

inputs:
input|
inputs’,’input
;

input:
input_specifier
;

Theinputdeclarationspecifiesaspecialinputstringconversion.Bydefaultthe
inputisreadasis(raw).Theinputspecifierrnasignalsaninputconversionfrom
ASCIIencodednucleotidestringstostrings,whichareencodedintheinterval[0..4]:

43

ucleotidenaluev0anunspecifiedbase
A1C2G3U4Thealphabetoftheinputstringisspecifiedinthealgebradefinition.
Theinputdeclarationalsospecifiesthenumberofinputtracksinamultitrack
Bellman’sGAPprogram.Forexample,input<raw,raw>meanstwo-trackinput
andbothtracksarereadasis.InGAP-Lthedefaultissingle-trackprocessing.
Multi-trackdynamicprogrammingalgorithmsworkonmorethanoneinputse-
quence.Forexample,theNeedleman-Wunschpairwisesequencealignmentalgo-
(twrithmo-trac[35]k).ortheRNASankofffolding,foldlikeandthealignZukeralgorithmminimum[42]wfreeorkonenergytwo(MFE)inputalgorithmsequences
[59],workononeinputsequence(single-track).

esypT4.5.3

types_opt:
types|
;

types:

pyt|etypestype
;

:epyttypeident’=’datatype|
typeident’=’extern
;Typedeclarationsatthegloballevelareeithertypesynonymsordeclarationsof
datatypesinimportedexternalmodules.

datatype:
type_specifier|
alphabet|
|diov’[’type_specifier’]’|
’(’named_datatypes’)’
;

44

Adatatypeiseitheranelementarydatatype,thealphabettype,void,alistor
tuple.(named)aThealphabettypecanonlybeusedinthesignaturedeclaration.Itisaplace-
holderforanactualdatatypeofanalphabet.Analgebraimplementingthesigna-
turedeclareswhichdatatypeisthealphabetdatatype.
Elementarydatatypesare:
int,integer,float,string,char,bool,rational,bigint,subsequence,
void.shape,floatisindoubleprecision,rationalandbigintareofunlimitedprecision
andsubsequencesavesonlythebegin-/end-indexofasubstring.intisatleast32
bitlongandintegerisatleast64bitlong.

named_datatypes:
named_datatype|
named_datatypes’,’named_datatype
;named_datatype:
datatypename_ident
;Notethatthissyntaxforcestheprogrammertonamethecomponentsusedin
tuples.

Signature4.5.4

signature:
signatureident’(’sig_args’)’’{’sig_decls’}’
;sig_args:
alphabet’,’argssigntparas
;Theparametersofthesignaturedeclarationarethealphabetkeywordandone
ormoresorts.Asortisanameforadatatypewhichwillbesubstitutedinan
algebrathatimplementsthesignature.(InHaskellterminology,aBellman’sGAP
sortisatypeparameter.)

gra:s

|graargs’,’arg
;

45

:gratnedi;signtparas:
’;’datatypes|
;sig_decls:
decl’;’|
sig_declsdecl’;’
;:lcedqual_datatypeident’(’multi_datatypessigntparas’)’
;Thesignaturecontainsoneormoresignaturefunctiondeclarations.Thequal_datatype
indicatestheresulttypeofthefunction.

qual_datatype:
datatype|
choicedatatype
;Thequalifierchoicemarksasignaturefunctionnameasobjectivefunction.The
declarationofseveralobjectivefunctionsisallowed.Fortheobjectivefunction,
argumentandreturntypesmustbelisttypes.

datatypes:
datatype|
datatypes’,’datatype
;multi_datatype:
’<’datatypes’>’|
datatype
;Amulti_datatypeisatupleofdatatypes.Inanalgebrafunctionthei-th
componentofthistypecomesfromthei-thinputtrack.Inasingletrackcontext,
datatypeisequalto<datatype>.

46

multi_datatypes:
multi_datatype|
multi_datatypes’,’multi_datatype
;

rasAlgeb4.5.5Analgebraimplementsasignature.Thealgebradeclarationspecifieswhichdata
typeisusedforthealphabetandwhichdatatypeisusedforeachsort.Thebody
ofthealgebracontainsacompatiblefunctiondefinitionforeachsignaturefunction
declaration,wherealphabetandsorttypesaresubstitutedaccordingtothehead
declaration.algebratheof

algebras_opt:
algebras|
;algebras:
algebra|
algebrasalgebra
;algebra:
algebra_head’{’fn_defs’}’|
algebraidentautomaticautomatic_specifier’;’
;automatic_specifier:
|munetnuoc;Theautomatickeywordspecifiestheautogenerationofthespecifiedalgebra.The
Bellman’sGAPcompilersupportstheautogenerationofanenumeration(enum)
algebraandacounting(count)algebraunderanarbitrarysignature.Anenumer-
ationalgebraprintseachcandidatetermasahumanreadablestringandkeeps
allcandidatestringsintheobjectivefunction,i.e.runningtheenumerationalge-
braaloneprintsthewholecandidatesearchspace.Acountingalgebracountshow
manycandidatesthereareinthesearchspace.

algebra_head:
modealgebraidentparametersimplements
signature_ident’(’eqs’)’|
algebraidentparametersimplements

47

signature_ident’(’eqs’)’|
algebraidentparametersextendsalgebra_ident
;Analgebraisdeclaredasanimplementationofasignatureorasanextensionof
apreviouslydefinedalgebra.Ifasignatureisdirectlyimplemented,themapping
betweensignatureparameters(alphabetandsorts)andconcretedatatypesisspec-
ified.Inthecaseofanextension,everyalreadydeclaredalgebrafunctioncanbe
erwritten.voThemodeofanalgebraisoptionalandeither:synopticstringrepclassify
kscoringscoringkscoringisthedefaultmodeforeveryobjectivefunctionofthealgebraandcan
beoverwrittenbyadeclarationofanobjectivefunction.
Incaseofnomodespecification,thecompilertriestoderivethemodeauto-
matically.Ifanobjectivefunctionusesthegenericlistminimizationfunction,the
objectivefunctionmodeisautodetectedasscoring.

parameters:
parameter_block|
;

parameter_block:
’(’var_decl_init_p|
’(’var_decl_initsvar_decl_init_p
;var_decl_inits:
var_decl_init_k|
var_decl_initsvar_decl_init_k
;

var_decl_init_p:
datatypeident’=’expr’)’
;

var_decl_init_k:
datatypeident’=’expr’,’
;Parametersofthealgebraareoptional.Ifpresent,theyaresuppliedorover-
writtenatruntimeoftheresultingBellman’sGAPprogram,e.g.viacommandline
switchesandareintendedtobesuppliedoroverwrittenbytheuserofagenerated
program.GAPBellman’s

48

sqe:|qeeqs’,’eq
;

:qesig_var’=’datatype
;

sig_var:

sort_ident|
alphabet
;

unctionsFraAlgeb4.5.5.1

fn_defs:
fn_def|
fn_defsfn_def
;

fn_def:
mode_optqual_datatypeident’(’para_declsfnntparas’)’
’{’statements’}’
;

fnntparas:
’;’para_decls|
;

mode_opt:
m|edo;

para_decls:|
para_decl|
para_decls’,’para_decl
;

para_decl:
datatypeident|
’<’para_decls’>’|

49

ovdi;Analgebracontainsnormalfunctionsandoneormoreobjectivefunctions.A
offunctionisqual_datatypemarked).asInobeacjectivhedeclarationfunctionbyofusingtheobthekjectiveyweordfunctionchoiceitis(seepossibledefinitionto
overwritethedefaultalgebramode.Itispossibletodeclareanalgebrawithtwo
objectivefunctions,wherethefirstoneisofscoringmodeandthesecondoneis
de.mokscoringofApara_decliseitherasingle-track,amulti-trackoraVOIDparameterdeclara-
tion.tupletAypmeofulti-tracthekcorrespparameonterdingdeclsignaturearationisfunctiontheimplemenparameter.tationIfofaanmon-termiulti-tracnalk
parserevaluatesabranchingelement,itfeedseachbranchresultintothecorre-
spondingdeclaredparameterofamulti-trackparameterdeclaration.
Anexampleofthemulti-trackparameterdeclarationsyntaxisthefollowing
:matchfunctionalgebraintmatch(<chara,charb>,intrest)
{if(a==b)
return1+rest;
eslereturnrest;
}Thecorrespondingsignaturefunctionis:
answermatch(<char,char>,answer);
Thesignaturefunctionsymbolmatchmaybeusedinagrammarrule,e.g.:
ali=match(<CHAR,CHAR>,ali)
Statements4.5.6

statements:
statement|
statementsstatement
;

statement:
continue|
return|
|fi|rofassign|
var_decl|

50

fn_call|
’{’statements’}’
;

continue:
continue’;’
;

fn_call:
ident’(’exprs’)’’;’
;

return:
returnexpr’;’
;

:fiif’(’expr’)’statement%precLOWER_THAN_ELSE|
if’(’expr’)’statementelsestatement
;The%precLOWER_THAN_ELSEgrammardescriptionannotationspecifiesthatthe
elsepartofanifstatementbelongstothelaststartedifstatement(likeinC/Java)
conditionals.nestedparsingwhile

:roffor’(’var_decl_initexpr’;’inc_stmt’)’
statement
;

assign:
var_access’=’expr’;’
;

4.5.7VariableAccess

var_access:
ident|
var_access’.’name_ident|
var_access’[’expr’]’
;

51

Avariableaccessiseitheranaccesstoasimplevariable,anaccesstoacomponent
ofanamedtupleoranaccesstoanarray.

rGramma4.5.8rulesrGramma4.5.8.1

grammar:

grammaridentusessignature_ident
’(’axiom’=’nt_ident’)’’{’grammar_body’}’
;

grammar_body:
tabulatedproductions|
productions
;tureTheanddefintheitionnameofaofthegrammarstartsspymbecifiesol.thTheenamegrammaroftheisagrammar,regularthetreeusedgrammar.signa-
TheTherighGAP-Cthandcheckssidewhconethetainsrthefunctiongrammarsymisbvolsalidufromnderthethespsignatureecifiedastreesignature.nodes.

tabulated:
tabulated’{’args’}’
;Withtheoptionaltabulateddeclarationitispossibletorequestthetabulation
ofalistofnon-terminals.Incaseofanincreasedoptimizationleveloranon-
presenttabulateddeclarationthecompilerautomaticallycomputesagoodtable
5.3.7).Section(seeconfiguration

productions:
production|
productionsproduction
;

production:
identntargs’=’rhs’;’
;

ntargs:

52

’(’para_decls’)’|
;

Anon-terminalsymbolcanbedefinedwitharguments(i.e.aparameterizednon-
terminal).Thearguments,orexpressionsincludingthearguments,canbeused
ontherighthandsideasextraargumentsofafunctionsymbol,afilterfunction
oranotherparametrizednon-terminalcall.Aparametrizednon-terminalcannot
betabulated,becauseforeverycombinationofparametervaluesaseparatetable
needed.ebouldwAnexamplefortheuseofparametrizednon-terminalsisthedesignofRNApat-
ternmatchingalgorithmsinADP[31],whereanon-terminalmodelse.g.astack
ofbasepairingsandtheargumentofthenon-terminalisthestacklength.The
argumentisthendecremented,ifgreaterthanzero,andappliedtoarecursive
non-terminalcall.AnotherexampleispknotsRG[40],wherecanonicalizationin-
formationissuppliedvianon-terminalparameters(Section5.4.5).

:shr|stlaalts’#’choice_fn_ident
;Therighthandsideofaproductionisasetofalternativeswithanoptional
applicationofanobjectivefunctionwhichwasdeclaredinthesignature.

ntparas:
’;’exprs|
;

filters:

tracks:

filters’,’filter_fn|
filter_fn
;

track|
tracks’,’track
;

track:
tla;

:stla

|tlaalts’|’alt
;

53

:tla

’{’alts’}’|
sig_fn_or_term_ident’(’rhs_argsntparas’)’|
symbol_ident|
altfilter_kwfilter_fn|
Analternativeisablockofenclosedalternatives,afunctionsymbolfromthe
signatureplusitsarguments,anon-terminal/terminalparsercalloraconditional
e.alternativ

RrackMulti-T4.5.8.2ules

’<’tracks’>’|
altfilter_kw’<’filters’>’|
Formulti-trackdynamicprogramminganalternativecanalsobeabranching
fromamulti-trackcontextintoseveralsingle-trackcontextsoraconditionalalter-
nativeguardedbydifferentsingle-trackfiltersforeachtrack.
Amulti-trackcontextofntracksmaycontainann-foldbranching<a_1,...,a_n>.
Eachaiistheninasingle-trackcontextforeachtracki,whereaiisaterminal-or
call.parsernon-terminalForexample,match(<CHAR,CHAR>,ali)isagrammarrulethatcalls
twocharacterreadingterminalparsers,whichreadacharacterfromthefirstorthe
secondinputtrack,respectively.
Toapplyafilterondifferenttracksinamulti-trackcontext,alistoffiltershas
tobeincludedin<>parentheses.
Inmulti-trackmodethegrammarmaycontaincombinationsofsingle-trackand
multi-trackproductions.Thefollowingexamplecontainstwo-trackandsingle-track
ductions:profoo=del(<CHAR,EMPTY>,foo)|
ins(<EMPTY,CHAR>,foo)|
x(<fold,REGION>,foo)#h;

fold=hl(BASE,REGION,BASE)#h’;

rameterspaNon-terminal4.5.8.3

alt’.’’(’exprs’)’’.’|
Thisalternativespecifiesthesyntaxforcallingnon-terminalsthathaveparame-
ters.Incasealtisnotalinktoanothernon-terminal,anerrorshouldbesignaled.

54

HackingIndex4.5.8.4

symbol_ident’[’exprs’]’|
alt’.’’{’alt’}’’.’|
’.’’[’statements’]’’.’’{’alt’}’
;Theseindexhackingrelatedalternativesspecifyanon-terminalcallwithexplicit
indices,anoverlayoftwoalternativesandverbatimindexmanipulationcodebe-
foreanalternative.Thetreegrammarsearch-spacespecificationmechanismfrom
theADPframeworkeliminatestheneedofusingexplicitindicesformostdynamic
programmingalgorithmsoversequences.However,somealgorithms,likeforex-
amplepknotsRG[40],needtoperformtheirownindexcomputationsatselected
non-terminallocationsforefficiencyreasons.IntheexampleofpknotsRG,canon-
icalizationrulesareappliedtoreducethenumberofmovingindexboundaries.In
GAP-L,theserulesareimplementedasverbatimindexmanipulationcodeinthe
grammar.Theoverlayingofalternativesisusedinthesemanticanalyses.Theleft
alternativeisafakerulethatapproximatestheresultingindexboundaries,such
thattheruntimeanalysiscomputesmorerealisticresults.Therightalternativeis
thenusedforcodegeneration.
SeeSection5.4.5formoredetailsonindexhackingintheusecaseofpknotsRG.

rGramma4.5.8.5Filters

filter_kw:
|htiwsuchthat|
swuicthht_hoavte_rolvaeyrl|ay
;WithPfilter_kwfincaseofthewithkeyword,thefilterfunctionfiscalled
beforePisparsed,withthesub-wordthatshouldbeparsedbyP,asan(addi-
tional)argument.WiththesuchthatkeywordthefilterfunctioniscalledafterP
isationsevofaluatedwithforandeachparsesuchthatofPand.areonlywith_overlaydefinedandifPusesasuchthat_overlaysignaturearefunctionvari-
gw.ordsInthewhichcasecorrespofondtowith_overlaytheunparsedthefilterargumenfunctiontsofisg,bcalledeforePwithisalistparsed.ofWithsub-
suchthat_overlaythefilterfunctioniscalledaftertheargumentsofgareparsed
andbeforetheevaluationofgforeachcombinationofargumentvalues.
sinceFilteringthefilterthroughfunctionwithdepandendsonlywith_overlayontheinputclausesword.iscalledFilteringsynwithtacticsuchthatfiltering,

55

andsuchthat_overlayiscalledsemanticfiltering,sincethefilterdoesnotdepend
ontheinputword,butontheusedalgebra.

filter_fn:
ident|
ident’(’exprs’)’
;Thefilterfunctioncanbepartofthesignatureandalgebradefinitionorcanbe
includedinamodule.Thefilterfunctionmustreturnabooleanvalue.Inaddition
tothedefaultarguments,itispossibletosupplyuserdefinedarguments.
Ifthereturnvalueisfalse,thelefthandsideofthefilterkeywordisnotused
duringparsing.Inthecaseofsyntacticfilteringthismeansthatthelefthandside
isneitherparsednorevaluated.Withsemanticfiltering,thelefthandsideisparsed
andevaluated,buttheresultisdiscarded.
Thefiltersareusedtoreducethesearchspacewhichisdescribedbythegrammar.

rhs_args:
rhs_arg|
rhs_args’,’rhs_arg
;rhs_arg:
|tlatsnoc;const:
number|
’\’’character_constant’\’’|
’"’string_constant’"’
;4.5.8.6TerminalSymbols
TheBellman’sGAPlanguagesupportsseveralterminalparsersorsymbols.Fora
terminalparseritispossibletohaveoneormorearguments.
Theyieldsizeofaterminalparseristhenumberofcharactersitparsesfrom
theinputword.TheSTRINGterminalparserparsessomenon-emptystring,i.e.its
minimumyieldsizeis1anditsmaximumyieldsizeisn,wherenisthelengthofthe
ord.winputTheterminalsymbolswithoutarguments(includingtheirreturntype)arelisted
ws:folloas

56

sizeyieldReturntypeParserminmax
00EMPTY[void]00LOC[subsequence][char][subsequence]CHARBASE1111
n0STRING0[string]n1STRING[string][subsequence][subsequence]REGIONREGION010nn
n1FLOAT[float][int][int]SEQINT11nn
Ifaterminalparsercannotparsesuccessfully,anemptylistisreturned.Theparser
LOCisusedtoaccessthepositionintheinputstring,wheretheemptywordwas
parsed.INTreadsanintegernumberandreturnsitsvalue.SEQparsesasub-word
fromtheinputstringandreturnsitslength.
Thelistofterminalsymbolswithargumentsis:
[alphabet]CHAR(alphabet)
[int]INT(int)
[int]CONST_INT(int)
[subsequence]STRING(string)
[float]CONST_FLOAT(float)
TheCONST_*terminalparsershaveamaximumyieldsizeof0,i.e.theydon’t
consumeanysub-wordoftheinput.Thoseterminalparserscanbeusedina
grammarcontexttosupplyaconstantargumenttoanalgebrafunction.

Instances4.5.9Aninstancedeclarationspecifiesunderwhichalgebra(orproduct)agrammaris
aluated.ev

instances_opt:
instances|
;instances:
instances_
;instances_:

57

instance|
instances_instance
;

instance:
instancei_lhs’=’ident’(’product’)’’;’
;Aninstanceisnamed.Ontherighthandsideoftheequalsign,thegrammar
andtheproductisspecified.SeeSection2.1.1forthesemanticsoftheproducts.

product:
product’*’product|
duct.prolexicographicThe

product’/’product|
Theinterleavedproduct.

product’%’product|
Thecartesianproduct.

product’.’product|
Thetake-oneproduct.Thedifferencetothelexicographicproductisthatonly
oneco-optimalresultischoseninthecaseofco-optimalresults.

product’|’product|
Theoverlayproduct.WithA|B,AisusedintheforwardcomputationandB
is5.4.3.1),usedi.e.duringthebacsamplingktracing.ofAnshapuseecasestringsforunderthisisastocpartitionhasticbacfunction:ktracing(Section
((p_func|p_func_id)*shape5)suchthatsample_filter)
functionTheobofjectivtheefunctionp_func_idofthealgebraisp_funcidentitalgebray.isDuringsummationtheforwandardtheobcomputationjective
onlyp_funcisevaluated.Inthebacktracingphasetheintermediatep_funcval-
uesareevaluatedbythep_func_idalgebraandvaluelistsarefilteredbythe
sample_filter.Thesample_filterinterpretsthevaluelistsasdiscreteproba-
bilitydistributionsandrandomlytakesoneelementfromthelistunderthisdistribu-
tion.Duringthebacktracingtheshaperepresentationisrandomlybuiltaccording
tosamplestheshapcomputedestringsprobabilitaccordingytodistribution,theirshapi.e.etheprobabilitrepeatedy(seestochasticSectionbac5.4.3.1).ktracing

58

’(’product’)’|
algebra_ident|
duct.proSingleton

productsuchthatfilter_fn
;Beforeevaluatingtheanswerslistwiththeproduct’sobjectivefunction,the
filter_fnisappliedtoeachintermediate(candidate)answerlist.Theresultof
thefilter_fnistheinputfortheproductsobjectivefunction.
AusecaseforthisfeatureistheprobabilitymodeinRNAshapes[51],where
inthecomputationofshape*pfevery(sub-)candidateisremovedduringthe
computation,ifthelefthandsideis<0.000001.Thisfiltersignificantlyreduces
theexponentialnumberofclasses,suchthatthecomputationofthisproductfor
longersequencesisfeasible(Section4.6.2).

eaturesFLanguageSelected4.6TheprevioussectionhaspresentedthesyntacticconstructsofGAP-L,focusingon
thestructureofGAP-Lprograms.Inthissectionseverallanguageconceptsare
describedinmoredetailtoshowhowdifferentlanguagefeaturescooperate.

extensionraAlgeb4.6.1Itispossibletodefineanalgebraasanextensionofanexistingalgebra.This
conceptissimilartoclassinheritanceinobjectorientedlanguageslikeJava.See
Section4.5.5forthesyntaxspecification.Intheextendedalgebra,functiondefini-
tionsoverwritetheonesofthebasealgebra.Theextendedalgebrahasthesame
alphabettypeandtypeassignmentstothesortslikethebasealgebra.Theexten-
sionmechanismise.g.usedintheElMamunexamplewherethebuyeralgebraand
theselleralgebraonlydifferinitsevaluationfunction(Figure4.1).
FurtherexamplesarethedifferentclassificationalgebrasofRNA-foldingpro-
grams,wheretheimplementationofeachshapelevelabstractiondiffersonlyina
definitions.functionfew

filteringSyntactic4.6.2Filteringinthegrammarmeanspruningthesearchspace.Considerthefollowing
example.decoclosed={stack|hairpin|leftB|rightB|iloop|
multiloop}
withstackpairing#h;

59

algebrabuyerimplementsBill(alphabet=char,
answer=int){

}

...

choice[int]h([int]i)
{returnlist(minimum(i));
}

algebrasellerextendsbuyer{
choice[int]h([int]l)
{returnlist(maximum(l));
}}

Figure4.1:Exampleofanalgebraextendinganother.Theevaluationfunctionis
erwritten.vo

terminalThismeansparserthatclosedeachispcossibleheckedsubby-wtheordfilterfromtheinputstackpairingstringbfedeforeinittoisthesubnon-ject
tobreturnseparsedtrue,bythethesub-wordnon-terminalisparsed.parsersThestack,filterchechairpinks,if...the.firstOnlyifandthesecondfilter
characters(orbases)arecomplementarytothelastandsecond-lastcharacters.
butanPruningothertheadvansearctagehisspacethewithfilterssimplificationmayofreducealgebrasandcomputationacleartimeinseparationpractice,be-
twalgebraeensearcthehspacestackpairingdescriptionwouldandbeevchecaluation.kedandForscoredexample,withifinminausinfinitmaximizationy,the
grammarfilterwouldnotbeneeded.Thenacountingorprobabilisticalgebra
wouldenoughtoreturnjustwrongexchangeresults.theIfobjectivextendingefunction.theFalgebraurtherforpracticalminimization,problemsitisnwithot
thisForapproacmorehfiltewrouldvbarianetsrange-oandvtheerrunformalproblesynmstaxwhenseeusingSectionintegers.4.5.8.1.Syntactic
top-dofilteringwnisevaaluationsourceofofthesparsenesssearchinspacedynamic(Section5.4.1).programmingthatisexploitedin

filteringinstanceSemantic4.6.3Intheinstancedeclaration,thereisoptionalsupportforsemanticfiltering(see
Sectionnon-terminal4.5.9fporarserformalissynfilteredtax).withSemanthetispcecifiedfilteringfiltermeansfunctionthattheafteranswtheeroblistjectivofae

60

functionwasapplied.Thislanguagefeaturemakesitpossibletocleanlyseparate
algebras.genericfromconcernsfilteringAusecaseisfoundinclassifiedDP[48,49],wheretheclassificationalgebra
computesanexponentialnumberofclasseswithgrowinginputlength,i.e.during
thecomputationafilterheuristicallyremovesthoseclassesthatonlycontribute
verysmallvaluestothesolution.Anexampleisthisinstancedeclaration:
instanceshape5pfx=fold((shape5*p_func)
suchthatp_func_filter);

Theproductcomputestheprobabilityofeachshapeinthesearchspace(each
candidatehasashape,eachshapeembracesmultiplecandidates).Thefilterfunc-
tionp_func_filterremovestheshapeclasseswithaverylowcontribution(for
example<0.000001percent)forallnon-terminalparserresults.
5.29,Figurepage4.2122)showsusingthetwodifferedistributionntofgrammarsshapeandprobabilitvariousycut-offdeviationsfilterδvalues(Equationfor
2000grammarrandomlythattakesgeneratedenergyconsequences.tributionsRNAshapofesdanglingandGAPCbasesnonamunambusebiguouslytheinsameto
account.Theδbetweentwosetsofshapeprobabilitiesiscomputedusingthe
resultsofacomputationwithoutfilteringandofonewithacut-offfilterapplied
duringthecomputation.Theresultsshowthatusingacut-offfilterof0.0001or
lesscomparisonsonlyintrotheducesδaresmalllesserrorsthanin0.01.comparisonOnlyfortotheRNAshapexactesandcomputation.acut-offForvmostalue
bofet0w.een00010.it01shoandws0.05that.5Thispercenmeanstofforthethesequencestestedlargersequencesthanthat110abasesfilteredyieldshapeδ
intheprobabilitworstycase.deviates5Figurepercen4.3tpshooinwststhefromnumtheberexactlyofshapecomputedclassesasshapafunctie-probabilitonofy
thegeneratedsequencesequences.lengthforUsingdifferenfilteringtcut-offthenfilterumbveraluesofshapforestheisgresameatlysetofreduced.randomlyThe
numbersofshapesresultingfromanunfilteredcomputationareupto103or104
timesofthenumbersresultingfromfiltering,dependingontheusedcut-offvalues
grammars.andAngrammaralternativ(whicheistoalsothiscaluseledcasestocofhasticsemanbacticktracing,filteringseeistoSectionsample5.4.3.1shapesforfromdetails).the
Figure5.41showsthedistributionofshapeprobabilitydeviationsduetosampling
forthesetofsequencesusedinFigure4.2.Whencomputingonlythepartition
function(Equation5.24,page122)andthensamplingshapestringsseveraltimes
(e.g.1000iterations)underashapealgebra,thenumberofsamplesthatreturnthe
ofsamethatshapshapeiseclass.dividedbyComparingthenumbsemanerofticsamplesfilteringwhicwithhstoapprochasticximatesbacthektracingprobabilitshowsy
thatsemanticfilteringyieldssmallershapeprobabilitydeviations,butstochastic
32isbacinO(ktracingn3),iswheremorekisefficienthent,umi.e.beroffilteringclasses.isinO(kn)andstochasticbacktracing

61

n0140–112112–8484–5656–2828–0)000001.0(cut-off:esRNAshap(e)n0140–112112–8484–5656–2828–0[0;0.01[[0.01;0.05[
[0.05;0.1[[0.1;0.25[
[0.25;0.5[[0.5;1[

100

trcen50ep

00–2828–5656–84n84–112–140
112(b)GAPCadpf(cut-off:0.0001)
[0[.0;05;0.001[.1[[[00..1;01;00..25[05[
[0.25;0.5[[0.5;1[
100

trcen50ep

00–2828–5656–84n84–112–140
112(d)GAPCnonamb(cut-off:0.0001)
[0[.0;05;0.001[.1[[[00..1;01;00..25[05[
[0.25;0.5[[0.5;1[
100

trcen50ep

[0;0.01[[0.01;0.05[
[0.05;0.1[[0.1;0.25[
[0.25;0.5[[0.5;1[

100

tnerce50p

00–2828–5656–84n84–112–140
112(a)GAPCadpf(cut-off:0.000001)
[[0.0;05;0.001[.1[[[00..01;1;00..25[05[
[0.25;0.5[[0.5;1[
100

tnerce50p

00–2828–5656–84n84–112–140
112(c)GAPCnonamb(cut-off:0.000001)
[[0.0;05;0.001[.1[[[00..01;1;00..25[05[
[0.25;0.5[[0.5;1[
100

tnerce50p

62text).(seegrammarsametheusebnonamCGAPandseRNAshapsequences.dgeneraterandomly2000ofsetsametheonrunaswprogramhEacone.filteredaandutationompcunfilteredanofsultsre-theenewetbcomputedisδThealues.vcut-filterariousvandmarsgram-treniffedowtforδdeviationsyprobabiliteshapofDistribution4.2:Figure)0001.0(cut-off:esRNAshap)(f

GAPCadpfcut-off:0GAPCnonambcut-off:0
RNAshapescut-off:0GAPCadpfcut-off:0.000001
GAPCnonambcut-off:0.000001RNAshapescut-off:0.000001
GAPCadpfcut-off:0.0001GAPCnonambcut-off:0.0001
RNAshapescut-off:0.0001
610

510410esshap310#210110010020406080100120140
n

Figure4.3:Themaximalnumbersofshapesasafunctionofthesequencelengthn
forvariousgrammarsandcut-offfiltervaluesandRNAshapes.Thein-
putare2000randomlygeneratedsequences.Acut-offvalueof0means
thatnofilteringofshapeclasseswasdoneduringthecomputation.A
cut-offvalueofxmeansthatallshapeclassesarefilteredduringthe
computationthathaveashape-probabilityof<xpercent.

63

rogramsprackMulti-T4.6.4Theinputdeclaration(Section4.5.2)specifiesonhowmanyinputtracksaGAP-L
programcomputes,e.g.fortwoinputtracks:
input<raw,raw>
InmostsingletrackGAP-Lprogramsthereisnoinputdeclaration,sincethe
defaultistheinput<raw>declaration(rawmeansnopre-processingorconver-
sionoftheinputstring).Thealphabetoftheinputtracksisnotspecifiedinthe
inputdeclaration,butinthealgebradeclaration.
Iftheinputdeclarationspecifieslinputtracks,theaxiomnon-terminalisina
l-trackcontext.Anl-trackcontextnon-terminalcandirectlycallotherl-tracknon-
terminalsorusethe<x1,...,xl>construct(seeSection4.5.8.1)tocallforeach
trackasingletrackcontextnon-terminalorterminalxi,where0≤i<l.Asingle
trackcontextnon-terminalcanbecalledfordifferenttracks.Inthiscasethenon-
terminalparserparsesdifferenttracksindependenceofthecallerstrackposition.
Usingthistrackcontextchangefeature,singletracksub-grammarsareusablefrom
amulti-trackgrammar.Ausecaseforthisfeatureise.g.thecombinationof
alignmentandduplicationhistorycomputationinonealgorithm[1].
The<>parenthesesareusedinthedefinitionofsignatureandalgebrafunctions
accordingtotheiruseinthegrammar.SeeSection4.5.5.1fortheformalsyntax.
Inthegeneralcase,atwo-trackDP-programneedsO(n4)space,sincethereare
twoindicesforeachtrack.However,theGAP-Canalyzesthegrammarandelim-
inatestheindicesthatareconstant(Section5.3.12).Forexample,inthePairwise
Sequencealgorithm,2onlyoneindexchangesforeachtrack,thusthespacecon-
sumptionisinO(n).

etsAlphab4.6.5InGAP-Lprograms,thealphabetspecifiesthebasicunitofparsing.Theterminal
parserslikeCHAR,REGIONetc.arealphabetpolymorphic.ThatmeansthatCHAR
returnsachar,ifthealphabetischarorafloatifthealphabetisfloat.
aproTheductalphabhaveettocanusebethechangedsameforalphabeachet.Thealgebra,formalbutnotesyntaxthatisthespecifiedalgebrasinusedSectionin
example:An4.5.5.algebraprettyimplementsAlign(alphabet=single,
answer=spair){
...}Thealgebraprettyworksoninputstringsofsingleprecisionfloatingpointnum-
ers.bInthedefaultcommandlinefrontendgeneratedbytheGAP-C,theuserinputis
toconvdelimitertedtothethecrighharacterstalphaboftheet.Ininput.alphabetsotherthanchar,whitespaceisused

64

GAPBellman’s5erCompil

TheBellman’sGAPCompiler(GAP-C)isthenovelADPcompilerwhichtranslates
GAP-LprogramsintoefficientC++code.Section5.1specifiestheoverallarchitec-
tureofGAP-C.AnexamplecompilesessionusingtherunningNussinovexample
ispresentedinSection5.2.Thecompilerimplementsseveralsemanticanalyses
foroptimizationpurposes,errorreporting,typecheckingandautomatictablede-
sign,whicharespecifiedinSection5.3.Algorithmsandtechniquesregardingthe
code-generationphaseofGAP-CarereportedinSection5.4.

ArchitectureCompiler5.1

TheBellman’sGAPcompileriswrittenfromscratchinC++.Object-orientation
isextensivelyusedforthedifferentpartsofthecompiler.Thearchitectureofthe
compilerconsistsofthreemodules:Thefrontend,themiddle-endandthebackend.
SeeFigure5.1foranoverview.
Thefrontendconsistsofalexerandaparser.ThelexerdividestheinputGAP-L
programintoastreamoftokenswhichareconsumedbytheparser.Theparser
createsanabstractsyntaxtree(AST).TheelementsoftheASTareC++objects.
Forexample,thereisanexpressionbaseclassandthesubstractionoperationis
asub-classoftheexpressionbaseclass.Figure5.2showsaclassdiagramofthe
mainclassesanASTismadeof.Both,thelexerandtheparserareautogenerated
fromabstractspecifications.ThelexerisspecifiedasFlex[37]descriptionand
thesyntaxisspecifiedasBison[18]grammar.Thelexerandparserimplement
asophisticatedlocationtrackingschemetobeabletoreporttheexactlineand
columninerrormessages.Agenericerrormessageprintingcomponenttakesthe
locationobjectintoaccountinprintingthespecifiedlineoftheinputprogramand
highlightingthespecifiedcolumns,suchthataBellman’sGAPprogrammergets
amoreinformativeerrormessage.Figure5.3showsanexamplemessage.Syntax
parseerrors,i.e.violationsoftheGAP-Lgrammar,andsimpleconsistencychecks
aredoneinterleavedwiththeASTconstruction.Moresophisticatedtype-checking
andreportingofsemanticerrors/warningsaredoneinthemiddle-end,sincethe
wholeASTandtheoutputofsemanticanalysesareneeded.
Themiddle-endtakestheASTasinputandappliesseveralsemanticanalyses
toit.Examplesforsemanticanalysesaretype-checking(Section5.3.8),yieldsize
analysis(Section5.3.3)andtable-design(Section5.3.7).Theresultsofthesemantic
analysesandtheASTaretheinputforthecode-generation.Theresultofthecode
generationisadatastructurethatrepresentstheabstracttargetcode.Theabstract

65

Table5.1:ExamplesofADTsusedincodegeneratedbyGAP-C.Theyareprovided
bytheruntimelibraryGAP-M.ForeachADTafewexamplesofaccess
listed.arefunctionfunctionsADTkpush_bacListendappyemptyis_empt...endappStringyempt...kpush_bactableHashendappdate_filterupfinalize...

targetcodeisaninternallanguage.Thislanguageisimperative.Itcontainshigh-
levelstatements,e.g.for-loopsandfunctions,anditabstractsfromimplementation
detailsofcentraldatastructures.Theyarebetterdefinedinalanguagedependent
backend.Forexample,thewaytablesarestoredandaccessedefficiently,and
theefficientrepresentationofintermediatelistsandbacktracestructuresdepend
highlyontheoutputlanguage.Thusthetargetlanguagecontainsasetofbuilt-
inabstractdatatypes(ADTs)andoperationsonthem.ThisincludesADTsfor
lists,backtracingstructures,arbitraryprecisionrationals/integers,classification
datastructuresandtabulation.Table5.1showsanoverview.Asaconsequence,
thetargetlanguagedoesnotneedtoimplementpointerarithmetic.Thissimplifies
theimplementationofbackendsforoutputlanguagesthatdonothavepointers(as
a).vJae.g.Thebackendtakesthetargetcodeasinputandgeneratesoutputlanguagecode.
Thebackendarchitectureisconstructedforextensibility,i.e.toenabletheaddition
ofnewoutputlanguages.Thereisanabstractbaseclassthatdefinestheinterface,
whicheachoutputlanguagebackendhastoimplement.Figure5.4sketchesthe
interfaceofthebackendclasses.Currently,alanguagebackendforC++outputis
included.TheC++backendmapstheADTsoperationstogeneratedC++classes
andtoaBellman’sGAPC++runtimelibrary.Ontheonehand,thebacktracing
targetcodeistranslatedintogeneratedclasses,sincethedatastructuresensitively
dependsonthesignatureandalgebraoftheGAP-Linputprogram.Ontheother
hand,thememorymanagementandanefficientlistdatatypeisimplementedas
C++templateclassesintheruntimelibrarybecausetheyareindependentofthe

67

voidprint(constStatement::For&stmt);
voidprint(constStatement::If&stmt);
voidprint(constStatement::Backtrace_Decl&stmt);
voidprint(constStatement::Hash_Decl&stmt);
voidprint(constStatement::Table_Decl&stmt);
...voidprint(constExpr::Base&);
...voidprint(constType::List&expr);
voidprint(constType::BigInt&expr);
...

Figure5.4:ExcerptsoftheinterfacewhichalanguagebackendofGAP-Chasto
t.implemen

translatedprogramandC++asoutputlanguageispowerfulenoughtoallowforan
efficientruntimelibraryimplementation.Theadvantageofmovingfunctionality,
whereitisappropriate,intoaruntimelibraryisthepossibilitytooptimizeor
exchangepartsoftheruntimelibrarywithouthavingtochangethecompiler,thus
reducingthecomplexityoftheC++languagebackend.
Besidesthetranslationoftargetcodetooutputlanguagecode,theC++backend
generatesamakefileandagenericcommandlineinterface.Themakefilecontains
thebuilddependenciesofthegeneratedcodeandthegenericinterfacecode.By
default,everythingneededisbuiltandlinkedintoanexecutable.Thecommand
lineinterfaceisoptional.ThegeneratedC++codeisenclosedinaC++class
whichimplementsacommonAPI.Thismakesiteasytointegrateitintoother
C++code.Examplesaremoresophisticated,specializedinterfaceprogramsora
C++programwhereaGAP-Lprogramisjustonestepinabiggerpipeline.

Example5.2Beforetheinternalsofthecompileraredescribedinthenextsections,thissection
showshowtheexamplefromSection4.3iscompiledwithGAP-C.
AspecificproductofaGAP-LprogramiscompiledwithGAP-Cwithfollowing
commands:

$gapc-tnussinov.gap-p’score*count’
out.mf-fmake$Theoption-tinstructsthecompilertoautomaticallydeterminethenon-terminal
parsers,whosesolutionsneedstobetabulated1(Section5.3.7)andthe-poption
1Theoption-tisenabledbydefaultiftheGAP-Lprogramdoesnotcontainatableconfiguration.

68

spandecifiesamaktheefile.productThetoprogramcompile.canThebeexecutedcompilerlikthenethis:producesoptimizedC++code
ccaggg./out$)32,(forInansomeinputusecstringasesofisofindynamicterest,butprogrammingalsothealgorithms,structureofnottheonlytheoptimaloptimalcandidate.score
Inthisexample,thisisaccomplishedwithusingtheinstancescorepp,whichis
program:sourcetheindeclared$gapc-tnussinov.gap-iscorepp
out.mf-fmake$ccaggg./out$((2,2,((.)).((..))))
Thestandardproductoperationcomputesallco-optimalsolutions.
Inadditiontothedefaultcompilercallforthescoreppexample,GAP-Ccan
beinstructedwiththe--backtraceoptiontogeneratebacktracecodeforthe
theproductscore(Sectionalgebrais5.4.3).computedThisisandmoreonlyinefficienat,bacsincektracinginthestepforwtheardalgebracomputationpretty
isused.Inadditiontothat,wecanchoosetogenerateaCYK-stylebottom-
upevaluatorwiththe--cykoption.Thedefaultisthegenerationoftop-down
evadvanaluatorstagesif(Sectionthealgebra5.4.1).isTexpop-doensivwnetoevaluatorscomputedoandextrathebookkgrammareepinghasbutalothavofe
searchspacerestrictionssuchthatthetop-downevaluationonlydoesaverysparse
computationofthetables.Inthisexamplethisisnotthecase.Thusthebottom-up
evaluatorleadstomoreefficientcode.

AnalysesSemantic5.3Inthefollowingsectionsseveralsemanticanalysesaredescribedwhicharepartof
ordGAP-C.oesAsemanoptimizations.ticanalysisTheisresultsanarealgorithmneededthatforthecomputesgenerationpropofertiesefficienofthetcoASTde,
forerrorreportingandtowarntheuserofproblematicconstructs.
spInecificationmostcasesofthethealgorithm.pseudo-codeTheofaalgorithmssemanticareofanalysisrecursivisegivennatureasaandtracompletivversee
thegrammardata-structure,i.e.theAST.Therelationshipoftheclassesofthe
ASTisshowninaclassdiagram(Figure5.38).Figure5.5showsanobjectdiagram
oftheASToftheNussinovexamplefromSection4.3.Itexemplifiesthemapping
ofthedifferentsyntaxelementsinthesourceprogramtoconcreteobjects.
Notincludedinthediagramsaretheattributesoftheobjectsthatareusedin
thepseudo-codesincetheyfollowaneasyscheme,e.g.Alt::BlockandSymbol::NT

69

grammarnonproductiveusesSignature(axiom=start){

start=foo(CHAR(’+’),start);

}

Figure5.6:Anon-productiveGAP-Lgrammar.

grammarnonproductive2usesSignature(axiom=S){

}

S=m(<CHAR,CHAR>,S)|
ins(<fold,EMPTY>,S)|
nil(<EMPTY,EMPTY>)#h;

fold=f(CHAR,fold,CHAR);

Figure5.7:Anon-productiveGAP-Lmulti-trackgrammar.

erminalsNon-TUnreachable5.3.1Anunreachablenon-terminalisanon-terminalwhich∗cannotbepartofanyderiva-
tiontree:anon-terminalBisunreachableifaxiom→Bistrue.
TheGAP-Cautomaticallydetectsunreachablenon-terminals.Itissueswarnings
ablateroutthemgrammarandanalremoysvesesandthemfromtransformations.theinternalgrammardatastructuretosimplify
Tofrom-axiomdetecttherelationunreacishablecomputednon-teusingrminalsathestandardtransitivalgorithm.e-closureofthereachable-

CheckingductivePro5.3.2Ationnon-terminalincludingaispronon-productivductive,ifeaparsernon-terminalofthisrunsnon-terminalindefinitely.terminates.Figure5.6Ashoderivwsa-a
samplegrammarwithanon-productivenon-terminal.TranslatingaGAP-Lgram-
marwithnon-productivenon-terminalsresultsinGAP-Lprogramsthatdonot
terminate.Thus,GAP-Cchecksfornon-productivenon-terminalsandissueser-
rormessagesforeachonefound.Thecheckisimplementedusingasimplefixed
pointiterationalgorithm.Atthebeginningeverynon-terminalisinitializedas
non-productiveandeachterminalisinitializedasproductive.Ineachfixedpoint
anotheriterationeachnon-terminalnon-terminalorterminaldatasymstructurebol.isDuringtravtheersedtravrecursiversalelytheuobntiljectsaofcalltheto
non-terminaldatastructurearemarkedasproductiveifthelinkedobjectsare
markedasproductive,otherwisetheyaremarkedasnon-productive.Forexample,

71

Grammar::init_productive:
changed=true
whilechanged:
changed=false
foreach(ntinnts):changed=changed||nt->init_productive
Productive::start:
changed=false;p=true
Productive::step(x):
changed=changed||x->init_productive();p=p&&x->productive
Productive::set:
ifp!=productive:productive=p;returntrue
returnchanged
Symbol::Terminal::init_productive:
productive=true;returnfalse
Symbol::NT::init_productive:
start()
foreach(altinalts):step(alt)
returnset()
Alt::Simple::init_productive:
start()
foreach(arginargs):step(arg)
returnset()
Alt::Block::init_productive:
start()
foreach(altinalts):step(alt)
returnset()
Alt::Multi::init_productive:
start()
foreach(trackintracks):step(track)
returnset()
Alt::Link::init_productive:
ifnt->productive!=productive:
productive=nt->productive;returntrue
returnfalse

Figure5.8:Pseudocodeoftheproductivecheckalgorithm.Productiveisasuper-
classoftheAltandSymbolclasses.
72

arighthandsidealternativedatastructureisonlyproductiveifallitsalternative
elementsareproductive.Amulti-tracknon-terminalisonlyproductive,ifallits
tracksareproductive.SeeFigure5.7foranexampleofanon-productivemulti-
trackgrammar.Thealgorithmisfinishedifnoobjectchangesitsproductivestate
variableanymore.Figure5.8showsthepseudo-codeofthealgorithm.

AnalysisSizeYield5.3.3Theyieldoryieldstringofaderivationtreeisthestringofitsconcatenatedleafs
inapre-ordertraversal.TheyieldfunctionyisoftypeTΣ→A∗anditisdefined
asy(f(t1,...,tn))=y(t1)...y(tn)andy(a)=a,wherea∈A∗andfisafunction
symbolfromthesignatureΣ(seeDefinition1,page18).
Theyieldsizeofanon-terminalorterminalsymbolAisdefinedasthetupleof
theminimalandmaximalsizeofayieldstringofallpossiblederivationtreesthat
startatthatsymbol(Equation5.3).Theminimalormaximalyieldsizeofasymbol
AisdefinedbyEquation5.1or5.2.

ysmin(A)=min{size(y(t))|t∈L(A)}(5.1)
ysmax(A)=max{size(y(t))|t∈L(A)}(5.2)
ys(A)=(ysmin(A),ysmax(A))(5.3)
Inthecontextofanon-terminalorterminalparserthatparsesthelanguageof
thenon-terminalorterminalsymbolA,theyieldsizeoftheparseristheminimal
andmaximalsizeoftheinputstringtheparserisabletoconsume.
Theyieldsizeanalysisisimportant,asitsresultisusedbyseveralothersubse-
quentsemanticanalyses,e.g.thetabledimensionanalysisortheruntimeanalysis.
Forexample,ifanon-terminalhasaconstantmaximalyieldsizeof30thenonly
a30×ntableisneededfortabulatingresultsforallparsedsub-wordsinstead
ofan2tableinthegeneralcase,wherenisthelengthoftheinputstring.If
inthegrammarruleA=f(B,C)anditholdsthatysmin(B)=ysmax(B)and
ysmin(C)=ysmax(C),thentheresultingnon-terminalparserofAonlyneedsto
consideronesplitoftheinputstringbetweentheparsersofBandC.Inthegeneral
caseitneedstoconsiderO(n)splitsoftheinputstring.
Thedescriptionoftheyieldsizecomputationalgorithmforasingle-trackdynamic
programmingfollows[21].Thebasicalgorithmishereextendedformulti-track
programs.GAP-LTheyieldsizesofterminalsymbolsareknownapriori.SeeTable5.2foran
overviewoftheyieldsizesofthepre-definedterminalparsers.Theyieldsizeofthe
non-terminalsymbolsisinitializedwith(1,n).Fromtherethefixedpointiteration
isstartedandineveryiterationtheyieldsizeoftheelementsofthegrammar
datastructurearecomputeduntilnothingchangesanymore.Theterminationof
theanalysisisguaranteedbecausetheyieldsizeintervallengthismonotonically
decreasing.Eachtrackhasayieldsizeassociated.InEquations5.4to5.6thebasic

73

Table5.2:Overviewoftheyieldsizesoftheterminalparsers.
SizeYieldmaxminarserPn1STRINGn1REGIONINTFLOAT11nn
SEQSTRING001nn
CHARREGION0,CHAR(arg)101n
11BASE00CONST_INT0EMPTY0..LOC.00

operationsaredefinedoveryieldsizesofsingle-trackcontextsandinEquation5.7
overtuplesofyieldsizesofmulti-trackcontexts,where◦isoneof+=,/=or|=
operations,asusedinthefixedpointiterationalgorithm.

+=(a,b)=(a.min+b.min,a.max+b.max)(5.4)
/=(a,b)=(min(a.min,b.min),max(a.max,b.max))(5.5)
|=(a,b)=(max(a.min,b.min),min(a.max,b.max))(5.6)

◦(<a1,...,an>,<b1,...,bn>)
=(<a1◦b1,...,an◦bn>)(5.7)

NotethatyieldsizetuplesareelementsfromthesetB×B,whereB=N∪{n}and
nisasymbol.Thearithmeticofminimalandmaximalyieldsizeshassaturating
semantics,i.e.0−c=0andn+c=n,wherec∈B.
Figurenon-terminal5.9symshobwsoltheandFigurepseudo-co5.10deshoforwsthethecomppseudo-coutationdeofofthetheyieldyieldsizesizeofcom-a
symputationbol.ofThetheyielddatasizestructureanalysiselemenalgorithmtsonthealsorcighontsidershandthesideofusetheofminimalnon-terminaland
marmaximalrulebyield=sizef(REGION)filtersinwiththegrammar.minsize(2)Forwithexample,themaxsize(2)yield;sizeis(2of,th2).egram-

74

Symbol::NT:
foreach(altinalts):
alt.init_ys()
ys/=alt.ys()
ys|=(minsize,maxsize)

Figure5.9:Yield-sizecomputationpseudo-codeforanon-terminalsymbol.

Alt::Simple::init_ys:
foreach(arginargs):
arg.init_ys()
ys+=arg.ys()
ys|=(minsize,maxsize)

Alt::Block::init_ys:
foreach(altinalts):
alt.init_ys()
ys/=alt.ys()
ys|=(minsize,maxsize)

Alt::Link::init_ys:
ys=nt.ys()
ys|=(minsize,maxsize)

Alt::Multi::init_ys:
0=iforeach(trackintracks):
track.init_ys
ys[i++]=track.ys
ys|=(minsize,maxsize)

Figure5.10:Yieldsizeanalysisalgorithmpseudo-codeforthedatastructureele-
mentsontherighthandsideofthenon-terminal.Thevariablesminsize
andmaxsizeareminimalormaximalyieldsizerestrictionsgiveninthe
grammarfilter.Ifnofilterispresent,minsizeandmaxiszearesetto
.nand0

75

grammarLoopusesSignature(axiom=A)
{

}

A=f(B,A,C)|g(CHAR);

B=STRING0;

C=STRING0;

Figure5.11:Atreegrammarwithaloop.Non-terminalAispartofaloop,because
theminimalyieldsizeoftheSTRINGterminalparseris0.

CheckingopLo5.3.4Anon-terminalsymbolAispartofaloopifthereisaderivationA→∗Athat
dofromesAnotandenconsumeterscthisharactersderivofation,thetheninputitdostring.esnotIfaterminate.top-downAbparserottom-upisgeneratedparser
cannotbeconstructed,sincetheparseof(i,j)isneededtocomputetheparsefor
asub-word(i,j).Figure5.11showsaloopgrammarexample.
GAP-Cimplementsaloopdetectionalgorithmthatisrunforeverynon-terminal
ofthegrammar.Ittakesthegrammardatastructureandtheyieldsizeinformation
fromtheyieldsizeanalysiscomputationasinput.Incaseofadetectedloopa
detailederrormessageisgivenandthecompilationendsafterthisphase.Figure
5.3.4algorithmshowsdoestheapseudo-codepth-firstdetraofvtheersalofrecursivtheegrammaralgorithm.dataForeacstructurehasnon-terminallongasthethe
aleftminimalhandconyieldtextsizeandofrigh0.tAhandtravconersaltextofstopstheifancurrenelemetntstructureisreacelehedmentwhichahveisbstilloth
ispartreacofahed,itrunningispartofacomputation.loop.Ifthenon-terminalfromthestartofthetraversal

5.3.5Maxsizefilterpropagation
TheyieldsizeanalysisalgorithmfromSection5.3.3considerstheminimaland
maximalyieldsizefiltersinthegrammar.Thefilterrestrictionsintheyieldsize
analysesonlypropagatebottom-up.Seethefollowinggrammarasanexample
forasituation,whereitmakessensethatthefilterinformationshouldpropagate
wn:top-dogrammarMaxusesSignature(axiom=A)
{A=f(REGIONwithminsize(12),B,REGIONwithminsize(7))
withmaxsize(42);

76

Symbol::NT::detect_loop:
Yield::Sizep
foreach(trackintracks):p[track]=(0,0)
foreach(altinalts):
ifalt->detect_loop(p,p,this):
error(...)

Symbol::NT::detect_loop(left,right,nt):
ifactive:returnfalse
active=true
r=false
foreach(altinalts):
r=r||alt->detect_loop(left,right,nt)
active=false
returnr

Alt::Simple::detect_loop(left,right,nt):
r=false
foreach(arginargs):
t=arg.next.ys+...+args.last.ys
if(foralliintracks,
left[i].ys.min==0&&t[i].ys.min==0):
r=r||arg->detect_loop(left,t,nt)
left+=arg->ys
returnr

Alt::Block::detect_loop(left,right,nt):
r=false
foreach(altinalts):
r=r||alt->detect_loop(left,right,nt)
returnr

Alt::Link::detect_loop(left,right,nt):
ifthis->nt==nt:returntrue
returnnt->detect_loop(left,right,nt)

Alt::Multi::detect_loop(left,right,nt):
r=false
foreach(trackintracks):
r=r||track->detect_loop(left[track],right[track],nt)
returnr

Figure5.12:Pseudo-codeoftheloopdetectionalgorithmforthedatastructure
elementsofthenon-terminalsymbol.Theparametersleftandright
sizes.ldyietaincon

77

B=g(STRING0);

}Lookingonlyatnon-terminalB,wecanderiveitsyieldsizefromtheknownyield
sizeofterminalsymbolSTRING0,whichis(0,n).Thisisalsocomputedbythe
yieldsizeanalysisalgorithmifitisrunonthecompletegrammarMax.However,
inthegrammarMax,thenon-terminalBisonlycalledfromnon-terminalA.This
callisrestrictedbyamaxsizefilterof42.Thus,onecanderivethattheeffective
maximalyieldsizeofBis42.Takingalsotheenclosedminsizefiltersintoaccount
wecanderivethattheeffectivemaximalyieldsizeofBis23.
Inthefollowingweonlylookatthepropagationofmaximalyieldsizes,because
itisasymptoticallymoreinteresting:gettingaconstantmaximalyieldsizeinstead
ofnyieldsatablewithaconstantnumberofentriesinsteadofO(n2)inthegen-
eralsingle-trackcase.Increasingonlytheminimalyieldsizeyieldsatablewitha
constantnumberofrowsorcolumnslessthaninthegeneralcase.However,the
computationofminimalyieldsizepropagationisanalogoustothecomputationof
propagation.sizeyieldmaximalThealgorithmdoesadepth-firsttraversalofthegrammardatastructureand
picksupmaximalyieldsizefilterrestrictionsduringthistraversal.Figure5.13
showsthepseudo-codeofthealgorithm.Afterthetraversalisfinished,eachnon-
terminalandrighthandsidedatastructureelementcontainsaninitializedmember
max_ys.Inmulti-trackcontextswithltracks,max_ysisanl-tuple.

AnalysisDimensionableT5.3.6Thetabulationofanon-terminalmeansthatthegeneratednon-terminalparser
asavestableitshasresultsO(n2)foreneactries.hsub-wDepordendinginaontable.theInconthetexts,generalfromsinwhicglehatrackcasenon-terminalsuch
iscalled,areductionofthequadratictablesizeispossible.Considerforexamplea
timestartforsymbtheolsub-wwhichordis(0not,n),calledi.e.thefromcompleteelsewhereininputthegrstring,ammar.whereItnisiscalledthejustlengthoneof
theinput.Thus,totabulatethisstartsymbol,onlyaconstant-sizetablewithone
needed.sitryenterminalAnotheristableconstantdimensionandonlyreductiontheothercexamplehanges.istheIncasethiswherecaseoneonlyinadexlinearofatablenon-
withoneroworonecolumnisneeded.Considergrammartabdim1:
grsakmimpaRr=tasbkdiipm_1riugshets(sSkiigpnRa,tuCrHeA(Ra)xi|om=skipR){
skipL#h;

skipL=...;
}

78

Symbol::NT::propagate_max_filter(max_ys):
ifactive:return
active=true
m=min(ys.max,max_ys)
if(m<=this->max_ys):active=false;return
this->max_ys=max(this->max_ys,m)
foreach(altinalts):
alt->propagate_max_filter(m)
active=false

Alt::Base::propagate_max_filter(max_ys):
this->max_ys=max(this->max_ys,min(ys.max,max_ys))

Alt::Simple::propagate_max_filter(max_ys):
Base::propagate_max_filter(max_ys)
Yield::Sizeleft
foreach(arginargs):
right=arg.next.ys+...+args.last.ys
max_ys=min(max_ys,ys.max)
max_ys-=left.low
max_ys-=right.low
arg->propagate_max_filter(max_ys)
left+=arg->ys

Alt::Block::propagate_max_filter(max_ys):
Base::propagate_max_filter(max_ys)
foreach(altinalts)
alt->propagate_max_filter(min(ys.max,max_ys))

Alt::Link::propagate_max_filter(max_ys):
Base::propagate_max_filter(max_ys)
nt->propagate_max_filter(min(ys.max,max_ys))

Alt::Multi::propagate_max_filter(max_ys):
Base::propagate_max_filter(max_ys)
foreach(trackintracks):
m=min(ys.max,max_ys)
track->propagate_max_filter(m[track])

Figure5.13:Pseudo-codeofthemaximalyieldsizepropagationalgorithm.

79

Thenon-terminalskipRistheaxiomanditisonlycalledfromitsownright
handside.Foreachcallforthesubword(i,j)thenon-terminalparserskipRcalls
itselfrecursivelyforthesub-word(i,j−1)(becausetheterminalparserCHARhas
ayieldsizeof(1,1)).Sincetheaxiomiscalledforthecompleteinputstring(0,n),
skipRisalwayscalledfori=0andonlyalineartablewithonerowisneeded.
Thebenefitsofreducingthetabledimensionofnon-terminalparsersarethree-
fold.First,reducingtabledimensionssavesmemory.Inthesingle-trackcasea
reductionfromO(n2)toO(n)isimportant,inthegeneraltwo-trackcaseatable
needsO(n4)andthereitismandatorytotrytoreducethedimensionofthetable.
Second,inthecaseofgeneratingbottom-upCYK-styleparsers(seeSection5.4.1)
itimprovestheruntimeoftheprogram.Forthesingle-trackcase,thegeneral
CYK-loopiscomposedoftwonestedfor-loopsthatexplicitlyfillthetableentries
foreachsub-word,goingfromsmallertobiggersub-words.Ifitisderivedthata
non-terminalhasonlyaconstantorlineartable,itcanbemovedoutsideofthe
innermostCYK-loop.Thisoptimizationeliminatesunnecessarymemoryaccesses,
whichareexpensive.Considerthecase,whereallnon-terminalsofagrammar
onlyneedtablesoflinearsizeandthereisnomovingboundaryontherighthand
side.ThenanunoptimizedgeneralCYK-loopwouldleadtoaprogramwithan
asymptoticallysuboptimalruntimeofO(n2)insteadofO(n).Again,inthemulti-
trackcaseusingthegeneralCYK-loopunconditionallyisprohibitive.
Third,exacttabledimensiondataimprovetheruntimecomputation(Section
5.3.7.1).Theruntimecomputationalgorithmdependsontheresultsofthetable
dimensionanalysis.Forexample,thecompilerchecksifthegiventableconfigura-
tionyieldsaGAP-Lprogramwithasymptoticallyoptimalruntime.Forthis,the
runtimeunderthegiventableconfigurationiscomparedwiththeruntimeunderthe
fulltableconfiguration(everynon-terminalistabulated).Iftheruntimeunderthe
fulltableconfigurationisasymptoticallybetter,thenthecompilerissuesawarning.
Thus,withoutthetabledimensiondata,theruntimecomputationalwayshasto
assumetheruntimeoftabulatingO(n2)tableentriesinthesingle-trackcase.In
theworstcaseforaprogramofasymptoticallylinearruntimeandlineartables,the
runtimecomputationalgorithmwouldfalselyderiveanasymptoticallysuboptimal
runtimeofO(n2)insteadofO(n).Inthetwo-trackcase,thegeneraltabulation
runtimeisinO(n4).Intheruntimecomputationofthepairwisesequencealign-
mentgrammar,O(n4)wouldoverlaytheasymptoticallyoptimalruntimeofO(n2)
data.dimensiontableexactwithoutThepreviousgrammarexamplesoftabledimensionreductionpossibilitieswere
easy,becauseanautomatictabledimensionanalysisonlyhastocheckthecase
whenoneorbothindicesdonotchange.However,considergrammartabdim2:
grammartabdim2usesSignature(axiom=skipR){
start=x(CHAR,start)|
;b

80

b=x(CHAR,b)|

x(CHAR,c)|
y(c,CHAR);

c=z(REGION,d);

d=w(REGION,REGION);

}Non-terminalciscalledfromtwolocationsontherighthandsideofnon-terminal
b.andIffor(i,jnon-terminal−1).Thparserus,abisnaivecalledtableforthedimensionsub-word(analysi,ji)swthenoucldisderivcalledeafor(i+quadratic1,j)
tablefornon-terminalcbecausebothindicesarechanging.However,fromthe
forgeneral(i+1p,join).tofTheview,rightforindex(i,j)ofthenon-terminalnon-terminalparserparsercisalwstartaysandnorbncall−1,themselvresultinges
fromthecallofnon-terminalstartfor(0,n).Thusanasymptoticallylineartable
forFcoristheenoughcaseof(amtableultiplewithtinputwoctracoluksmns).itissufficienttocallthesingle-tracktable
analysisdimensionreturnsreductionforatwalgorithmo-trackforprogrameachtracthekneedindepofaendenlineartly.Ftableorforexample,thefirstthe
trackandaquadratictableforthesecondtrack.Thismeansthatalineartableof
quadratictablesisneeded(oratablewiththreedimensions).Considertheexample
ofthebasiceditdistancegrammar:
grammarAliusesSignature(axiom=alignment){

alignment=nil(<EMPTY,EMPTY>)|
del(<CHAR,EMPTY>,alignment)|
ins(<EMPTY,CHAR>,alignment)|
match(<CHAR,CHAR>,alignment)#h;

}Onlylookingatthefirsttrack,averticalsplitofthetwo-trackgrammarresults
inthefollowingpseudogrammarforthefirsttrack:

alignment_t1=nil(EMPTY)|
del(CHAR,alignment_t1)|
ins(EMPTY,alignment_t1)|
match(CHAR,alignment_t1)#h;
Forthesecondtrackitresultsin:

alignment_t2=nil(EMPTY)|
del(EMPTY,alignment_t2)|
ins(CHAR,alignment_t2)|
match(CHAR,alignment_t2)#h;

81

Notethatthesepseudo-grammarscontainaloop,butthisdoesnotmatterfor
tabledimensionreductionanalysis.Inbothgrammarstherightindexofnon-
terminalsalignment_t1andalignment_t2isconstant.Thus,thetablespaceis
reducedfromO(n4)forthegenerictwo-trackcasetoO(n2).
InGAP-C,atabledimensionreductionanalysisalgorithmisimplementedthat
recognizesifanon-terminalonlyneedsaconstanttableofoneentryoralineartable
ofonerow/columnanditcheckswhetheranon-terminalneedsanasymptotically
constantorlineartableasintheexampleofgrammartabdim2.Inthecaseof
asymptoticallyconstantorlineartablestheexactindexrangeiscomputed.
Thealgorithmdoesadepth-firsttraversalonthegrammardatastructureand
duringthetraversalthesumsoftheminimalandmaximalyieldsizesoftheleft
andrightcontextofthecurrentobjectarepickedup.Ife.g.atanon-terminalthe
maximalyieldsizesumoftheleftcontextisnandthatoftherightcontextisa
constant,thenonlyanasymptoticallylineartableisneeded.Thecasedistinctionat
thenon-terminaldatastructureobjecthastotakesimplerecursive(anon-terminal
directlycallsitself)andgeneralrecursivenon-terminalsintoaccount.Thus,the
algorithmkeepstrack,whichcomputationsofnon-terminalsarecurrentlyactive
anditdoesare-computationifalaterrecursionchangesthetabledimensionstate
ofanobject,whosecomputationisnotfinished.
ThetabledimensionreductionanalysisofthefirstgenerationADPcompiler[47]
isnotabletoderiveareductiontoasymptoticallylineartablesinexampleslike
.tabdim2

DesignableT5.3.7IntheADPframework,thereisaclearseparationbetweensearchspacedesign,
i.e.creatingthegrammar,andchoosingthesetofnon-terminalswhoseparsing
resultsarestoredintables.Derivingthesetofnon-terminalsfortabulation,i.e.
thetableconfiguration,iscalledtabledesign.Tabulatingeverynon-terminalin
theGAP-Lgrammarisasafechoice,becausetheruntimeoftheresultingGAP-L
programisasymptoticallyoptimal.Tabulatingnonon-terminalofagrammarthat
describesanexponentiallysizedsearchspaceleadstoaprogramwithexponential
runtime,becausesub-solutionsarere-computedrecursively.However,according
totheruntimecomputationequations(Section5.3.7.1)theruntimeofagrammar
maybeasymptoticallyoptimalevenifasubsetofthenon-terminalsistabulated.
Figure5.14showsanexampleofsuchatableconfiguration.
Thus,findingtheminimaltableconfigurationunderwhichtheGAP-Lprogram
stillrunsinasymptoticallyoptimaltimeisafeasibleobjective,sincesavedtables
mayreducethememoryfootprintofthegeneratedprogramsignificantly.This
optimizationproblemiscalledthetabledesignproblem.Unfortunately,thetable
designproblemisNP-complete[50].Anothermotivationfortabledesignisthe
possibilityofreducingtheconstantruntimefactors,iftheoverheadofstoringthe
tablewithitsentriesdoesnotpayoffincomparisontojustcomputingtheneeded
parsingresultsinconstanttimeforeachsub-word.

82

computationuntimeR5.3.7.1GiventheGAP-Lgrammaranditstableconfigurationϑ,i.e.thesetoftabulated
non-terminals,asinput,itispossibletocomputetheasymptoticruntimeofits
generatedparser.Thefollowingruntimeequationsfollow[50]andforsimplicityit
isassumedthatthegeneratedparsercomputesunderaunitaryalgebra(Definition
7,page28).Forconvenience,“theruntimeofgrammarA”or“theruntimeofanon-
terminalB”isjustashorthandwritingfor“theruntimeoftheparsergeneratedfrom
grammarA”or“theruntimeofthenon-terminalparserthatparsesthelanguage
.”Bnon-terminalofInthefollowingtherighthandsidesoftheruntimeequationscontainbig-O
notationsymbolsandanasymptoticarithmeticisassumed,e.g.23n3+42n2=
O(n3)and42=O(1)=23.
Equation5.8describestheruntimeofanon-terminalparserA,
rt(A)=calls(A,x)∙rt(x)ifx∈/ϑ(5.8)
x∈rhs(A)O(1)ifx∈ϑ
wherethefunctionrhsreturnsthesetofterminalandnon-terminalsymbolson
therighthandsideofthenon-terminalA,thefunctioncallsreturnsthenumberof
callsfromnon-terminalAtosymbolxandϑisthesetoftabulatednon-terminals.
Theruntimeofaterminalparserisconstant.Thus,itisnecessarytocomputethe
runtimeoftheaxiomandtheruntimeoftabulatingalltabulatednon-terminalsfor
computingthecompleteruntimeofaGAP-Lprogram.Equation5.9specifiesthe
completeruntimeofaGAP-LprogramP:
rt(P)=rt(S)+ndim(x)∙rt(x)(5.9)
ϑ∈xwhereSistheaxiom,ndenotesthelengthoftheinputstring,andthefunction
dimreturnsthetabledimensionneededtostoretheparseresultsforallsub-words.
Inthesingletrack-case,theimageofdimis{0,1,2}.
ThecomputationoftheruntimeofaGAP-Lprogramisanimportantsemantic
analysis,sinceothersemanticanalysesinthecompilerdependonit.First,the
compilerchecks,whethertheruntimeunderthefulltableconfigurationisasymp-
toticallybetterthanundertheuser-suppliedtableconfiguration.Ifthisisthe
case,thentheuser-suppliedtableconfigurationisasymptoticallysub-optimaland
awarningmessageisprinted.Second,intabledesigntheruntimeofagrammaris
computedseveraltimesforchangingtableconfigurations.Tabledesignistheopti-
mizationproblemtofindtheminimaltableconfiguration,underwhichtheruntime
ofthegrammarisstillasymptoticallyoptimal(Section5.3.7).
Ifthegeneratedparserusesanon-unitaryalgebra,i.e.analgebrawhoseobjective
functionreturnsmorethanoneresult,thentheasymptoticruntimeoftheparser
maydeteriorate.Forexample,considerthecase,wheretheasymptoticruntimeof
aGAP-Lprogramunderagiventableconfigurationandaunitaryalgebraisin

83

O(n3).Iftheparserjustusesaprettyprintingalgebra,whoseobjectivefunctionis
theidentity,theruntimeisinO(2n),sincethesearchspaceofexponentialsizeis
umerated.en

toticConstantruntimeFactoofarsGAP-LTheEquationsprogram.In5.8andpractice,5.9describconstanethowfactorstoarecomputealsooftheintereasymp-st.
Itmattersiftheruntimeundertwodifferenttableconfigurationisasymptoticop-
timal,buttheconstantfactorsofthefirstruntimeareprohibitivelylarge(e.g.
999999n3vs.20n3).Theruntimefunctionsjustneedtouseadatatypethat
representspolynomialswithfactorstosupportconstantfactors.O(1)isreplaced
by1andthetermndim(x)isreplacedbycells(x).Thefunctioncellsreturnsthe
asymptoticnumberofcellstotabulateallresultsofanon-terminalparserx.Ifthe
tabledimensionreductionanalysis(Section5.3.6)computesalowerbound(includ-
ofingctheonrunstanttimefactors),computationthenthisarebanoundapproisximationreturned.ofNotetherealthattheconstantconstanfactorstfactorsofa
generatedparser.Forexample,notallterminalparsershavetheexactruntimeof
onetimeunitandsyntacticgrammarfilters,whichdependontheinput,introduce
someconditionalcodeblocksinthegeneratedparsercode.

DependencyGraphAdependencygraphofaGAP-Lgrammarisadirected
acyclicgraph,whereeachnon-terminalandterminalsymbolisavertexandan
edge(A,B)specifiesthatsymbolAcallssymbolB.ThenumberofcallsofAfrom
Bisanattributeoftheedge(A,B).Ifanon-terminalistabulated,thenitsvertex
outlineisdotted.Figure5.14showsanexampleofadependencygraph.
ForcomputingtheruntimeofaGAP-Lprogram,thecompilerhastoestablish
theruntimeequationsforeachnon-terminalandcomputeallruntimeequations
witharecurrentpolynomialsolver.Sucharecurrentsolverneedstobeefficient,
sincetheruntimecomputationisthemajortimeconsumingoperationforthetable
designanalysis.Theauthorisnotawareofanopensourcerecurrentpolynomial
solverwhichisefficient,stableandmaintained.Thus,theGAP-Cimplementsthe
runtimecomputationasarecursivealgorithmthatdirectlytraversesthegrammar
datastructure.Thegrammardatastructureisadependencygraph.

AlgorithmFigure5.15showsthepseudo-codeforcomputingtheruntimeofthe
grammarandthenon-terminalsymbols.Figure5.16showsthepseudo-codeof
theruntime-computationoftheobjectsontherighthandsideofanon-terminal
symbol.Theruntimecomputationalgorithmdoesadepth-firsttraversalofthe
dependencygraphandduringthattraversaltheruntimesofthedatastructure
elementsareset.Thealgorithmupdatesduringtraversalalistofnon-terminals
whosecomputationisnotyetfinished(active_list)aswellastheaccumulated
runtime(accum_rt,oraccumulatednumberofcalls)overthetraversedpath.Inthe
caseofacycleatnon-terminalXbeingdetectedinthetraversal,theaccumulated
runtimespecifieshowmanyrecursivecallsfollowfromonecallofnon-terminalX.

84

Grammar::runtime:
active_list=[]
rt=axiom->runtime(active_list,1)
foreach(ntintabulated):
rt+=nt->runtime(active_list,1)*nt->cells()
returnrt

Symbol::Terminal::runtime(active_list,accum_rt):
return1

Symbol::NT::set_rec(accum_rt):
if(accum_rt==1&&rec<=1):
n=cer:eslerec=2^n

Symbol::NT::set_recs(activelist,accum_rt):
foreach(ntinthis,...,active_list->last):
nt->set_rec(accum_rt)

Symbol::NT::runtime(active_list,accum_rt):
if(rt_computed)
returnrt;
if(active):
set_recs(active_list,accum_rt)
return1
active=true
active_list->push(this)
1=cerforeach(altinalts):
rt+=alt->runtime(active_list,accum_rt)
ifrt==2^n:
rt_computed=true
active=false
returnrt
rt*=rec
active=false
active_list.pop
rt_computed=true
returnrt

Figure5.15:Runtimecomputationpseudo-codeofBellman’sGAPgrammar,non-
terminalandterminaldatastructureobjects.

86

Alt::Simple::runtime(active_list,accum_rt):
1=trforeach(arginargs):
rt+=arg->runtime(active_list,accum_rt)
returnrt

Alt::Block::runtime(active_list,accum_rt):
0=trforeach(altinalts):
rt+=alt->runtime(active_list,accum_rt)
returnrt

Alt::Link::runtime(active_list,accum_rt):
rt=calls
if(!nt->is_tabulated):
rt*=nt->runtime(active_list,accum_rt*calls)
returnrt

Alt::Multi::runtime(active_list,accum_rt):
0=trforeach(trackintracks)
rt+=track->runtime(active_list,accum_rt)
returnrt

Figure5.16:Runtimecomputationpseudo-codeofdatastructureobjectsofthe
righthandsideofanon-terminaldatastructure.

87

A

=

f

g

(

(

A

C

,

H

A

C

R

H

)

A

R

#

)

h

|

;

resultsforallsub-wordsinaSymbol::NTobject.Thenumberofcellsiscomputed
inthetabledimensionreductionanalysis(Section5.3.6).Thenumberofcallsof
anon-terminalfromanotheroneiscomputedinadepth-firsttraversal.While
traversingthedatastructure,movingboundariesfromtheleftandrightcontext
ofasymbolarepickedupandtranslatedtoapolynomialofthenumberofcalls.
Amovingboundaryisintroducedinagrammardatastructureifinthederivation
therearetwoormoresymbolshorizontalneighborswhoseminimalyieldsizeis
unequaltoitsmaximalyieldsize.Considerforexamplethefollowinggrammar
rule:A=f(REGION,CHAR,REGION,CHAR,g(CHAR,B,CHAR),CHAR);
B=b(REGION);
TheterminalparserREGIONhasayieldsizeof(1,n)andtheterminalparser
CHARhasayieldsizeof(1,1).Ontherighthandsideofnon-terminalA,theleft
contextofsymbolBisREGIONtwotimesandCHARtwotimesandtherightcontext
ofBisCHARtwotimes.SincetheminimalyieldsizeofBisunequaltoitsmaximal
yieldsizeandtheminimalyieldsizesoftwosymbolsfromtheleftcontextofBare
unequaltotheirmaximalyieldsizes,twomovingboundariesareintroduced.Each
oneistranslatedtoafactorofn.Thus,thenon-terminalBiscalledn2timesfrom
.Anon-terminalthe

5.3.7.2ApproximativeAlgorithm
LookingatthedependencygraphofsmallGAP-Lprograms,anexperiencedADP
programmerhasnodifficultiestoidentifythosenon-terminals,whosetabulation
breaksthecyclesinthegrammarthatintroduceadditionallinearorexponential
factorsintheruntimecomputation.However,manuallyderivinganoptimaltable
configurationforlargergrammarsistediousanderrorprone.Thus,thecompiler
shouldimplementanalgorithmthatautomaticallysolvesthetabledesignproblem.
SincethetabledesignproblemisNP-complete,anexactcomputationoftheoptimal
tableconfigurationisinO(2n).Anexactalgorithmhastodoabrute-forcesearch
intheexponentiallysizedsearchspaceofallpossibletablenconfigurations.The
exactcomputationoftheoptimalconfigurationbeinginO(2)doesnotnecessarily
meanthatinpracticetheruntimeofanexactalgorithmexplodesformoderately
sizedinputgrammars.Acarefulimplementationoftheexactalgorithmisableto
excludecertainnon-terminalswhosetabulationalwaysdeterioratestheasymptotic
runtimeforreducingthesearchspace.Suchanimplementationcomputestheopti-
maltableconfigurationforgrammarsupto14non-terminalsinreasonabletimein
practice.However,hand-writtengrammarsexistwithupto30–40non-terminals
andthereareBellman’sGAPgrammargeneratingprogramslikeRapidShapes[27]
thatgenerategrammarswith200–300non-terminals.Anotherexampleofagram-
margeneratingtoolisLocomotif[38]thattranslatesgraphicallyconstructedRNA
structuremotifsintoADPRNA-motif-matcherprograms.Amoderatelysizedmotif
easilytranslatesintoagrammarwith50ormorenon-terminals.

90

Thus,anapproximativeoptimizationalgorithmisneededtosolvethetabledesign
problemheuristically.GAP-Cimplementsanoveltabledesignalgorithmthatcom-
putesagoodtableconfiguration.Atableconfigurationisgoodifthecorresponding
includegrammarmorerunsinnon-terminalsasymptoticallythananoptimaloptimaltime,tablei.e.agoconfiguodrtableation.configurationmay
rationThewhichalgorithmislargerisappthanrotheximativeoptimalinthetablesensethatconfiguration.itmayItreturnneveraretutablernsaconfigu-table
configurationunderwhichthegrammarhasanasymptoticallysuboptimalruntime.
breaksTheonemainorideamoreofrunthealgorithmtime-increasingistoidcycles.entifyThisastepnon-terminalisrepeatedwhoseuntiltabulationthere-
asultingcyclethattablecontainsconfigurationmorethanyieldsoneannon-terminal,asymptoticallythereoptimalareallrunthetime.Innon-terminalsbreaking
ofthecycletobechosenfromandideallyanon-terminalisselectedthatbreaks
themostcyclesatthesametime.Thealgorithmtriestoidentifysuchimportant
thenpicnon-terminalskingthatwithwiththethemethodhighestofscscore.oringEallquationnon-terminals5.10shoofwsthetheusedgrammarscoringand
:sfunction

s(x)=in(x)∙out(x)∙selfrec(x)(5.10)
wherethefunctioninreturnsthenumberofincomingedges,functionoutreturns
thenumberofoutgoingedgesandfunctionselfrecreturnstherecursionfactorof
thenon-terminalx.Thus,thescoreofanon-terminalishigherthananother,ifit
ismoreconnectedandtakespartinmorerecursions.Figure5.20showsthescore
computationofanon-terminalinthepalindromeexamplegrammar.Therecursion
factorofanon-terminalisacounterthatisincrementedduringseveraldepth-first
traversals,ifacycleisdetected.Figure5.21showsthepseudo-codeforcomputing
therecursionfactorsforeachnon-terminal.
Theapproximatetabledesignalgorithmworksinthreephases.First,allnon-
terminalsarescoredaccordingtothescoringfunctions.Second,thenon-terminals
arestoredinavector.Thevectorissortedwiththescoreaskey.Andinthethird
phase,iterativelythehighestscorednon-terminalissettabulateduntiltheresult-
ingtableconfigurationyieldsanasymptoticallyoptimalruntimeofthegrammar.
Figure5.22showsthepseudo-codeofthisalgorithm.
Theworst-caseruntimeofthefirstphaseisinO(|V|(|V|+|E|)),whereVistheset
ofverticesandEisthesetofedgesofthedependencygraph.Sortingthevertices
isinO(|V|log|V|).Sincetheworst-casecomplexityofoneruntimecomputation
isinO(|V|+|E|)thecomplexityofthethirdphaseisinO(|V|2+|V||E|).Thus,
theoverallworst-caseruntimeofthealgorithmisinO(|V|2+|V||E|).Thespace
usedbythealgorithmisinO(|V|+|E|),becausethedependencygraphandan
additionalvectorforsortingisstored.Intheworstcasethealgorithmreturnsthe
fulltableconfigurationandineverycasethereturnedtableconfigurationyieldsan
time.runoptimalasymptoticallySolvingorapproximatingthetabledesignproblemmeanscomputingatable-

91

Grammar::init_self_rec:
foreach(ntinnts):
nt->init_self_rec
foreach(ntinnts):nt->active=false

Symbol::init_self_rec:
if(active):
if(started):
self_rec++
return
active=true;started=true
foreach(altinalts):alt->init_self_rec
started=false

Alt::Simple::init_self_rec:
foreach(arginargs):arg->init_self_rec

Alt::Link::init_self_rec:nt->init_self_rec

Alt::Block::init_self_rec:
foreach(altinalts):alt->init_self_rec

Alt::Multi::init_self_rec:
foreach(trackintracks):track->init_self_rec

Figure5.21:Pseudo-codeofrecursivefactorattributecomputationforeachnon-
terminal.

93

Grammar::approx_table_design:
foreach(ntinnts):nt->tabulated=true
opt=runtime
][=vforeach(ntinnts):
nt->tabulated=false;
nt->init_score
v.push(nt)
sort(v,\nt->nt->score)
reverse(v)
r=runtime
foreach(xinv):
if(opt==r):
kaerbx->tabulated=true
r=runtime

Figure5.22:Pseudo-codeoftheapproximativetabledesignalgorithm.

configuration,underwhichtheruntimeofthegeneratedparserisasymptotically
optimal.However,thisasymptoticconditionisnotsufficientinpractice.Consider
e.g.tw3otableconfigurationsofagrammarwhoseasymptoticallyop3timalruntimeis
intableO(n).configurationThefirstmatableyyieldaconfigurationruntimeofmay666666yieldna3.runAtimeuserofw26ouldnandprobablytheacceptsecond
thefirsttableconfigurationinanycase,evenifitcontainsmorenon-terminalsthan
thesecondone,becausesuchalargeconstantfactorisprohibitiveinpractice.Thus,
thetabledesignalgorithminGAP-Calsotakesconstantfactorsintoaccountduring
phase.optimizationthirdthe

ConstantFactorsThetabledesignobjectiveisextended,totakeconstantfactors
intoaccountandtocomputeatableconfigurationthatyieldsaruntimewithgood
constantfactors.Theextendedtabledesignobjectiveis:Findtheminimaltable
configuration,underwhichtheruntimeofthegeneratedparserisstillasymptoti-
callyoptimalandtheconstantfactorofthelargestpolynomialisatmostcpercent
higherthantheconstantfactorofthelargestpolynomialoftheruntimeunderthe
fulltableconfiguration.InGAP-Ccissetto20,whichleadstogoodresults.The
extensionoftheapproximativetabledesignalgorithmtocheckfortheextended
tabledesignobjectiveisstraightforward.Theruntimecalculationsneedtouse
apolynomialdatatypethatsupportsconstantfactorsandanotherabortcondi-
tioninthemainloopofthethirdoptimizationphase.Figure5.23containsthe
de.pseudo-co

94

Grammar::approx_table_design(c):
foreach(ntinnts):nt->tabulated=true
opt=runtime
const_factor=opt.last.factor
][=vforeach(ntinnts):
nt->tabulated=false;
nt->init_score
v.push(nt)
sort(v,\nt->nt->score)
reverse(v)
r=runtime
foreach(xinv):
if(opt==r):
a=r.last.factor
if(a<=const_factor+const_factor*c/100):
kaerbx->tabulated=true
r=runtime

Figure5.23:Pseudo-codeoftheapproximativetabledesignalgorithmvariantthat
takesconstantfactorsintoaccount.

95

rksBenchma5.3.7.3Table5.3showsruntimeandmemoryusagebenchmarkresultsofvariousGAP-L
programscompiledwithGAP-Cunderdifferenttableconfigurations.
TheADPfold(adpf)programisaGAP-LversionofRNAfold[25],adpf_nonamb
usestheRNAshapesgrammar[51],Loco3stemisasimpleRNAmotifmatcher
generatedwithLocomotif[38],pknotsRG[40]isaGAP-LversionofthepknotsRG
programandtheshapeprogramsaredifferentlysizedshapematchersgeneratedby
RapidShapes.Themfealgebradoesfreeenergyminimization,thealgebracount
countsthesearchspaceandthealgebrapfcomputesthepartitionfunction.In
thebenchmarkeachprogramwasrunforalltableconfigurationsfor10randomly
generatedsequencesandtheruntimeandmemoryusagevaluesinthetableare
averagesovertheseruns.AlltestswererunontheAthlon64Linuxsystemdescribed
8.SectioninEveryalgorithmwasrununderthreedifferenttableconfigurations:thefulltable
configuration,atableconfigurationderivedbyahumanADPexpertoranexpert
systemandthetableconfigurationcomputedbythetabledesignalgorithmofGAP-
Cwhichisdescribedintheprevioussection.TheADPexpertisinmostcasesthe
creatorofthealgorithm.Fortheautomaticallygeneratedgrammarstheexpert
tableconfigurationisderivedbythegeneratorsLocomotifandRapidShapes.Note
thatRapidShapeswasdevelopedafterthetabledesignalgorithmofGAP-Cwas
availableandtestedtoyieldgoodresultsinpracticesuchthatRapidShapesby
defaultusesthetabledesignfeatureofGAP-C,andtheexperttableconfiguration
featureofRapidShapeswasdevelopedwithalowpriorityofderivinggoodresults.
Theshowntheoreticalruntimeexpressionsarecomputedbytheruntimecom-
putationalgorithmofGAP-Candareonlyapproximationsbecausedifferentkinds
ofstatementsareassumedtoyieldthesameconstantruntimeunit-costandother
performancerelevantfactors,e.g.cachingeffects,arenotconsidered.However,in
mostcasesthetheoreticalruntimeexpressionrtapproxisconsistentwiththemea-
suredpracticalruntimertpracdata,i.e.itholdsthatrtapprox(a)≤rtapprox(b)⇒
rtprac(a)≤rtprac(b),whereaandbaretwoGAP-Lgrammars.However,forexam-
pleinthecomparisonoftheADPfoldexpertversionwiththefulltableconfiguration
versiontheimplicationdoesnotholdbecauseoftheapproximation.
TheruntimeofthepknotsRGalgorithmisinO(n4)whichdoesnotmatchthe
theoreticalruntimeexpressionsbecausethetabledesignisnotabletoconsiderthe
indexhackingconstructs(Section5.4.5)thateliminatetwomovingindexbound-
aries.Theresultsshowthatdesigningatableconfigurationisinmostcasesatrade-
offbetweenmemorysavingandruntimespeedup.Forexample,theexperttable
configurationforADPfoldusesanadditionaltableincomparisontotheexperttable
configurationsuchthatthememoryusageisonlyhalfofthefulltableconfiguration
versionandnotathirdastheexpertversion,butitis24percentfasterthanthe
ersion.vertexpInmostcasesthetabledesignalgorithmcomputesatableconfigurationthat

96

Ratio--------
1.001.003067073.301.05926722.481.321235351.001.0011363373.130.423637991.830.756204511.001.0013448023.141.594285052.751.534895251.001.00622823.060.66204251.100.97562921.001.004371062.180.942001131.001.0010841353.210.0533728521.340.998101361.001.0023291691.301.001794169
6memrtmemrtrttheo.n615n64813757n819n930543488n299200n2200468n2468
+6+nn6+n
35+18+12+18+12+
6+8+3+n4+28+2n28+18+n20+nnnnn
6+n6+nn4+nn
nn18+4+nn4+n30+33+230+228+230+228+
11+18+18+
28+2n28+2n2n2n6+26+6+6348+36321+6343+2n2nnnnn
nn2nn2n3nn3n
6052+29978+29956+54098+54055+
6280+36276+6274+36042+3n36044+311+3465+476+814+8172+814+311954+311934+3n3n3n3n
3n3nn3n3nnnnn
|ϑ|strategie4ertexp5design9ertexp15design8ertexp9design6ertexp19design12ertexp17design29ertexp68design59ertexp147design
ns#NTalgorithm11all400011adpf_mfe26all150026_mfebadpf_nonam23all400023tco3stem_counlo25all100025G_mfepknotsR43all60043e1_pfshapshap95all60095e2_pf195all600195e3_pfshap

tco3stem_counlo

G_mfepknotsR

e1_pfshap

e2_pfshap

e3_pfshap

nsequence.inputtheoflengththedenotes
Thetext).(seesequencesaluesvthetoreferenceinengivareratios
secondthext),te(seealgebrausedtheandalgorithmtheofnamethedesencocolumnfirstThe.ϑ
configurations-tenonoferbmunthewsshocolumnandgrammartheof(NT)rminalstableatabulated,iserythingevwhereconfigurationtableai.e.ign,sedorertexpall,ofoneisgystrateTheybcomputedconfigurationtableayelectivrespmtesysertxpeanorertexpADPumanhaybedderivconfigurationedvderiasximationsapproareexpressionst)r(theo.timeruntheoreticalTheGAP-C.ofalgorithmdesigntabletheageusmemorytheanddsonsecinmeasuredis(rt)timerunTheGAP-C.ofalgorithmcomputationtimeruntheybervorageevaanasytesmegabin(mem).onraticonfigutablefulltheunder
unR5.3:ableTtableariousvusingGAP-Cwithcompiledprograms-LGAPsouarivofhmarksencbusagemorymeandtime
10

97

yieldsprogramabtheetterexprunerttimeversionthanisthe4exppertercenttablefaster.Inconfiguration.comparisonOnlytoforthetheLoruntimeco3stemof
theprogramsunderthefulltableconfigurationthedesigntableconfigurationyields
spdesigneedupsversiongreaterof1oraadpf_nonamsimilarbrunis33timepinercentmostslowercases.thanOnlythethefullruntabletimeofthconfigurationetable
ersion.vi.e.Theittablemaximallydesigntabulatesalgorithm75premoercenvestandseveralminimallytablesfromtabulatestabulation35pinercenalltofcases,the
non-terminals.Thepracticalmemoryusageismaximallyreducedbyafactorof
half.onethanmoreInconclusion,theresultsshowthatthetabledesignalgorithmofGAP-Cworks
wellinpractice.Forseveralnon-trivialGAP-Lprogramsthecomputedtablecon-
figurationsyieldabetterruntimeandlessmemoryusagethanversionsusingafull
tableconfigurationorevenanexperttableconfiguration.

CheckingeypT5.3.8TypecheckingmeanscheckingfortypeerrorsinGAP-Lprograms.Atypeerror
occurs,whene.g.afunctionsymbolinthesignaturedeclarationhastwoarguments,
butisusedwiththreeargumentsinthegrammar,orwhenthethirdargumentof
afunctionsymbolisdeclaredasoftypeintinthesignatureandinthealgebra
thethirdargumentisoftypechar.Anotherexampleisanon-terminalwithtwo
alternativesontherighthandsideofdifferenttypes.Thecompilerhastodetect
sucherrors,becauseitdoesnotknowhowtogenerateacorrectparserfromit.
WhenprogrammingADPinHaskell-ADP,theHaskellcompilerorinterpreter
doestype-inferenceontheinputprogram.SincetheHaskelltypeinferencesystem
doesnotknowanythingaboutADP,thetypeinferenceerrormessagemayexpose
implementationdetailsofADPDomainSpecificLanguageconstructs,inthecase
ofatypeerrorinADPcode.Sucherrormessagemayobfuscatethereallocation
andtheperhapssimplecauseoftheerrormessage.Considerthiscorrectgrammar
et:snippformula=mult(formula,times,formula)
Ifwedeletethesecondargumentofthefunctionsymbolmult,asimpleerroris
introducedandyieldsthistypeinferencemessageintheinteractiveHaskellinter-
gs:uhpreter

ERROR"El2.lhs":116-Typeerrorinapplication
***Expression:add<<<formula~~-plus~~~formula|||
formula~~-formula<<<mult***Term:add<<<formula~~-plus~~~formula
***Type:(Int,Int)->[Char]
***Doesnotmatch:(Int,Int)->[Char->Char]

98

Error:;h#formula)mult(formula,^--^e2.gap:186.13-16:Functionmulthas2arguments,but
Error:answer);alphabet,mult(answer,answer^--^e2.gap:9.10-13:itisdefinedwith3argumentshere.

Figure5.24:ExampleofatypeerrormessageofGAP-C.Thefunctionsymbolap-
plicationinthegrammarmissesthesecondargument.

itispWhenossibleintotegratingderiveaspmoreecialusefpurpuloseerrortypemessagescheckingintypealgorithmerrorincases.totheThecompiler,GAP-C
whereimplementhetationGAP-Lsucceedsprogrammerindisplahasyingtofixthetheexacterror.locationSeeandFigurecon5.24textforoftanypeexampleerrors
message.errorThegrammartypisecchecheckedkingagainstanalysistheofsignature.GAP-CisSecond,dividedeacinhtoalgebrathreeiscphases.heckedFirst,againstthe
thesignature.Andlast,typesinthebodyofalgebrafunctionsarechecked.
Iftypeerrorsaredetectedinonephase,thenthefollowingtypecheckingphases
areskipped.Otherwise,thecompilerwoulddisplayalotofredundanttypeerror
messageswhichreferencethesameerror,butreportitindifferenttypecontexts.
Suchmessageswoulddecreasethesignaltonoiserationoftheerrormessageoutput.
Considere.g.aGAP-Lprogramwherexalgebrascorrectlyimplementthesignature.
Ifthegrammarusessymbolsfromthesignatureinanerroneousway,thenthetype
cTheheckingdifferenceoftheisthatgrammarsortsymagainstbolseacarehalreplacedgebrabynecessarilyconcretetfindsypes.thesameerrors.
asaThetypstandaloneecheckingprogramofGAP-CandwisasrelatedlaterintotethegratedADPinTtoyptheechecoldkerADPC.[44].ItAsstartedpart
oftheADPCitcancheckADPC-ADPlanguageprograms,butnoHaskell-ADP
checprograms.kingInalgebrascomparisonagainsttothethegrammarGAP-Ciftyptheechecgrammarkingusesdesign,itsignaturedoesnotfunctionsstop
.incorrectlyCheckingthegrammaragainstthesignaturemeansthatthesignaturedeclara-
tionsareinsertedintothegrammardatastructure,i.e.thetypecheckingalgorithm
traversesthegrammardatastructureandateachobjectthatusesafunctionsym-
bol,itislookedupinthesignature.Thereturntypeofthesignaturefunctionis
propagatedupwardsinthedatastructureandthetypesoftheargumentsareprop-
notagatedmatcdohwnwanother.ards.AOthtypereerrerrorsorisarefound,presenift,oneifafunpropagatedctionstymypbeolisninformationotdedoclaredes
intheCheckingsignatureanoralgebrathenumagainstbertheofargumensignaturetsmeansdiffers.checkingeachalgebrafunction

99

definitionagainstthecorrespondingsignaturefunctionsymboldeclaration.The
algebradefinitioncontainsamappingbetweensignaturesortsandalphabettype
toconcretetypes.Thus,duringthecheckingthisinformationislookedupina
table.olbsymThebodyofanalgebrafunctionisnotdirectlycheckedbyGAP-C.Instead,the
compilergeneratesthealgebrafunctioncodeandincludeslineandfileposition
pragmasintheoutput.ThesepragmasareunderstoodbyaC++compiler:ifthe
C++compilerdetectstypeerrorsinthegeneratedalgebracodetheyarereported
inthecontextoftheGAP-Lsourceprogram.Thisstrategyeliminatestheneed
toimplementaclassictypecheckerforimperativealgebracodeinsideGAP-C.A
C++compileralreadydoesagoodjobattypecheckingthatcode.

analysisList5.3.9Thelistanalysisalgorithmtakestheresultsofthealgebracharacteristicsanalysis
(Section2.2.3)andthegrammardatastructureasinputandcomputestheworst-
caselistsizesresultingfromsymbolparsercallsatdifferentelementsofthegrammar
datastructure.Usingfixedpointiteration,thealgorithmdoesseveraldepth-first
traversalsofthegrammardatastructuretopropagatelistsizeinfluencingfactors
untilthecomputedlistsizesdonotchangeanymore.Inthebeginningtheworst-
caselistsizesofthenon-terminalsareinitializedwithn,whichdenotesthelength
input.theofDuringatraversalthefollowingrulesareapplied:

lsize(X=Y;)=lsize(Y)(5.11)
lsize(X#h)=lsize(h)(5.12)
lsize(X|Y)=lsize(X)+lsize(Y)(5.13)
lsize(f(a1,...,ak))=lsize(a1)∙...∙lsize(ak)∙nb(5.14)
lsize(<X1,...,Xk>)=lsize(X1)∙...lsize(Xk)(5.15)

Theexponentbdenotesthenumberofunrestrictedmovingindexboundaries
intherighthandsidesymbolcontextinsideandoutsideoff.Lookingatthe
contextofasingle-track,anunrestrictedmovingindexboundaryisintroducedif
twosymbolswithmaximalyieldsizeofnareplacedsidebyside,possiblyinterleaved
withconstantyieldsizedsymbolsandnestedfunctionsymbolapplications.Each
additionalmaximalyieldsizedsymboladdsanotherindexboundary.Thenumber
ofmovingboundariesofmultiple-tracksmultiplywitheachother.
Theroleofanobjectivefunctioninfluencestheworst-caselist-sizeofanon-
terminal,ascomputedbythealgebracharacteristicscomputation(Equation5.11).
Inthecaseofascoringalgebra,thenon-terminalwithanobjectivefunctiononthe
righthandsidethenhasaworst-caselistsizeof1.

100

Theresultsoflist-sizeanalysisareusedintwoways.First,theuseoflistsis
unnecessaryatlocationswhereaworst-caselist-sizeof1isdetected,andisthus
eliminated.Thisleadstomoreefficientcode,sincethememoryandcacheisused
moreefficientlyandthelist-accessesincludemoreoverhead.Second,thecompiler
checksformissingobjectivefunctionapplicationsontherighthandsideofnon-
terminalrules.Ifanobjectivefunctionismissingthenitischecked,whetherthe
applicationofanobjectivefunctionatthatlocationwouldreducetheworst-case
list-sizeandimprovetheasymptoticruntimeoftheresultingprogram.Ifthisis
thecase,awarningmessagewiththeexactlocationofpossibleobjectivefunction
ted.prinisapplication

analysisendencyDep5.3.10Thecompilersupportsthegenerationoftwodifferentstylesofparsers:top-down
conUnger-sttrolfloylewofparserstheparserandbfunctionsottom-upimplicitlyCYK-stylefillspaarsetablers.Inentrytop-doofawntabulatingparsing,non-the
terminalparserbeforeitisaccessedbyacomputationofanothertableentry.In
bhastoottom-upsatisfytwparsing,otheconditions.tablesFirst,arefilledsmallerenexplicitlytriesinforasmallermainlosubop.-wThisordshamainvetolobope
entrycomputedofonebeforetablehaslargertobesub-wcompuordstearedbeforeaddressed.itisSecond,accessedforbyeachanothersub-wordnon-terminalatable
parsercomputationforthesamesub-word.Formoredetailsonthedifferentparsing
schemesseeSection5.4.1.Thefirstconditionissatisfiedbygeneratingacorrect
loopstructure.Theresultsofthedependencyanalysisareneededtogeneratean
orderingofnon-terminaltabulatecallsfromtheinnerCYK-loopthatsatisfiesthe
condition.secondInthefollowingitisassumedforsimplicitythattheinputisaone-trackgrammar.
IfaGAP-Lprogramcontainsmultipletabulatednon-terminals,thenitispos-
siblethatcomputationofatableentry(i,j)ofonenon-terminaldependsonthe
computedentry(i,j)ofanothertable.Inbottom-upparsingthecompilerhasto
derivethesedependenciesandgeneratetheparserresultstablefillingcallsfromthe
mainCYK-loopinanon-conflictingorder.
Atabulatingnon-terminalparserAdependsonanothertabulatingnon-terminal
parserB,ifthereisaderivationfromAtoBandforacallofAforthesub-word
(thei,j),lefttheandparserrightBconistextcalledofforthethecalllosamecationsub-wofBordis(i,jempt),y.i.e.Figureduring5.25theshoderivwstationwo
examples.etsnippgrammardepTheendsdepontheendencyresultsderivoftheationyieldalgorithmsizewanalysisorkson(Sectionthegrammar5.3.3),dwhereatathestructureyieldandsize
ofeacnon-terminalhobjecttheisalgorithmcomputed.doesTheaalgorithmdepth-firstwtraorksversalintaswolongphases.asthelFirst,eftandforerighacth
contextofalinktoanothernon-terminalisempty.Ifbothcontextsareempty,
leathenvestheinthedepderivendencyationistreerecordedhaveainaminimallist.Ayieldconsizetextofis0.Inemptthey,ifsecondallneighphase,boringthe

101

A=f(REGION0,B);

A=f(REGION,B);

B=g(CHAR);B=g(CHAR);
(a)ParserPAdependsonparserPB.(b)ParserPAdoesnotdependonparserPB

Figure5.25:Twogrammarsnippets,whereaparserexecutiondependencyforone
sub-wordisanalyzed.Theminimalyieldsizesoftheterminalparsers
REGION0andREGIONare0and1,respectively.

Grammar::parser_deps:
][=lforeach(ntinnts):
nt->collect_deps(l)
tsort(l)
returnl

Symbol::NT::collect_deps(l):
Yield::Sizeleft,right;
foreach(altinalts):
alt->collect_deps(l,this,left,right)

Alt::Simple::collect_deps(l,n,left,right):
foreach(arginargs):
t=arg.next.ys+...+args.last.ys
arg->collect_deps(l,n,left,right+t)
left+=arg->ys

Alt::Block::collect_deps(l,n,left,right):
foreach(altinalts):
alt->collect_deps(l,n,left,right)

Alt::Link::collect_deps(l,n,left,right):
ifleft==right==((0,_),...,(0,_)):
l.push_back((n,nt))

Alt::Multi::collect_deps(l,n,left,right):
foreach(trackintracks):
track->collect_deps(l,n,left[track],right[track])

102

Figure5.26:Pseudo-codeoftheparserdependencycollectionalgorithm.

resultinglistissortedtopologically.Figure5.26containsthepseudo-codeofthe
algorithm.Forthetopologicalsort,thecollecteddependenciesmustnotcontain
cycles.Thisisthecase,becausethedependencyanalysisisonlyexecuted,ifthe
loopanalysis(Section5.3.4)doesnotfindaloop.Theworst-caseruntimeofboth
phasesisinO(|V|+|E|),whereVisthesetofnon-terminalsandEisthesetof
linksinthegrammardatastructure.

inliningNon-terminal5.3.11Themeanscompthatileritissuppremoortsvedinliningfromofthegrammarnon-terminalandsymthebols.non-terminalInliningacallisnon-terminalreplacedA
byacopyoftherighthandsideofAateachcallinglocation.Thisgrammar
transformationisonlypossible,iftheinlinednon-terminalisnotpartofacycle.
Inlininganon-terminalmayincreaseordecreasethepracticalruntimeofthegen-
eratedprogram.Thecodegenerationphaseimplementsanon-terminalparseras
aexpcodeensivefunctionthantheforediracecthcompnon-terminal.utingofIfthetheofunctionverheadbodyof,afthenunctioninliningcallisimpromoreves
theconstantfactorsoftheruntime.Intheinliningphase,thecompilerinlinesonly
righthosethandnon-terminalsside.Arightwhichhandaresidenotispartofconsideredacyclesimplyandconstructured,tainaifsimplyitjustconstructuredtains
onealternative,noobjectivefunctionapplicationandaworst-caseanswerlistsize
of1.Becauseofthissimplestructure,thealgorithmapproximatesthatinlining
improvestheruntimeoftheresultingcode,i.e.inliningdependsonthecomputed
datafromthelistanalysis.Dependingontheselectedproduct,apresentobjective
functionapplicationcouldbepresentinthesourcegrammar,butwouldberemoved
byapreviousobjectivefunctioneliminationphase.
Thecodegenerationsubsystemofthecompilerdoesnotcontainmorelow-level
inliningfunctions.phasesSinceforcurreninliningtC++verycompilerssmallgeneratedalreadycoincludedefungocodtionsgeneralorrunpurptimeoseinlibraryline
optimizationphases,GAP-Cdoesnotneedtoduplicatethiseffortformorelow-level
code.Itissufficientthattheruntimelibrarycodeinquestionandthegenerated
low-levelcodeisstructuredinsuchawaythatagenericC++inlineoptimizeris
notlimitedinitsoperation.Whenaninliningshouldbeconsideredbythecompiler
ofthegeneratedcode,itneedsaccesstothefunctiondefinitionsinalltranslation
unitsandthefunctionmustnotbetoolarge.

analysisIndex5.3.12Theindexanalysisusestheresultsfromtheyieldsizeanalysis(Section5.3.3)and
thetabledimensionanalysis(Section5.3.6)asinputandcreatesindexexpressions
fortheelementsofthegrammardatastructure.Inmulti-trackprograms,each
traccesskofisoneprotraccessedkdoindepnotendeninfluencetly,btheecausebouthendariesindexofbtheoundariesotheroftraconeks.symForboleacac-h
non-terminaltheindexanalysisdoesadepth-firsttraversalofthedatastructure

103

iloop=il(BASE,BASE,REGIONwithmaxsize(30),closed,
REGIONwithmaxsize(30),BASE,BASE)#h;
non-terminalGAP-L(a)for(k_0=i+3;k_0<=j-10&&k_0<=i+32;++k_0)
for(k_1=j-(k_0+7)>=32?
j-32:k_0+7;k_1<=j-3;++k_1)
BBAASSEE((ii,+(1i,+i1+))2)
REGION(i+2,k_0)
closed(k_0,k_1);
REGION(k_1,j-2)
BASE(j-2,j-1)
BASE(j-1,j)(b)indexpseudo-code

Figure5.27:Grammarsnippetandthecomputedindicesforsymbolaccessesbythe
indexalgorithm.analysis

thatrepresentstherighthandside.Whenontherighthandtwosymbolswith
theminimalyieldsizeunequaltothemaximalyieldsizearehorizontallysideby
side,thenanewmovingindexboundaryisdetectedandanewindex-variableis
created.Thedetectionofmovingboundarieskeepstrackofnestedfunctionsym-
bolapplicationsandinterleavedsymbolswithconstantminimalandmaximalyield
sizes.Eachnewindexvariableintroducesanewloopconstructthatcontainsthe
upperandlowerboundsofthisindexdependingontheouterindicesoftheouter
non-terminal,whichareneededforthecodegenerationphase.Whenasymbol
withminimalyieldsizeequaltomaximalyieldsizeistraversed,thenthevalue
iscollectedtocomputetheindexboundariesofthefollowingsymbolsastightas
ossible.pFigure5.27showsanexampleofaGAP-Lgrammarsnippetandresultingindices
calls.parserofDuringindexanalysisindicesareeliminatedifthetabledimensionanalysishas
foundoneormoreindicesofanon-terminalparserbeingconstant.
Anotherpartoftheindexanalysisisthegenerationofconditionalexpressions
thatconsiderimplicitandexplicityieldsizelimits.Incodegenerationtheyare
thenusedtogenerateif-statementsthatguardagainstunnecessaryexecutionsof
codeblocksdependingonthesizeoftheparsersub-wordargument.

104

GenerationdeCo5.4Thecode-generationphaseofGAP-CtakestheASTandtheresultsfromthese-
manticanalysesasinputandgeneratesoptimizedtargetcodeforthebackend.
Thetasksofthecode-generationphasearethegenerationofanefficientimple-
mentationofyieldparsing(Section5.4.1),thegenerationofcodethatexploits
parallelismonsharedmemoryarchitectures(Section5.4.2)andtheapplicationof
severalbacktracingschemeswherepossibleinthegeneratedcode(Section5.4.3).
InSection5.4.4theWindow-ModefeatureisdescribedandSection5.4.5presents
thecode-generationforindex-hackingconstructs,forapplicationdomainspecific
optimizations,anddiscussestheirmotivation.

SchemesrsingaP5.4.1Theparsercocompilerde:top-doimplemenwntsUnger-sttwoyledifferenparsingtsc[53]hemesandforbottom-upgeneratingtheCYK-stylenon-terminalparsing
[58].

wnop-DoT5.4.1.1Intop-downparsing,eachnon-terminalparserisgeneratedasafunction.Atthe
beginningofaparse,theaxiomparseriscalledforthecompleteinputasargument.
Theparsercodethenrecursivelycallsthereferencednon-terminalparsersonthe
righthandsideforallpossiblesplitsofthecurrentsub-word.Whenanon-terminal
parseristabulating,thenittestsateverycall,ifitwasalreadycalledwiththe
samesub-wordargument.Ifyes,thenitreturnsthealreadycomputedresultfrom
thetable,elseitcomputestheparseandsavesitintothetable.Figure5.28(b)
showsthepseudo-codeofatop-downnon-terminalparserforanon-terminalfrom
grammar.exampleanUsingtheresultsfromtheyieldsizeandindexanalysis(Sections5.3.3and5.3.12)
thetop-downparsercodegenerationeliminatesrecursionsintounnecessarysplits
ofthesub-wordargumentthatcannotreturnavalidparse,ife.g.asplitwould
yieldaparsercallwithasub-wordargumentofsizegreaterthanthemaximalyield
non-terminal.theofsizeTheruntimeofatop-downparserisinO(nl+m)wherenisthemaximaltrack
length,listhemaximalnumberofdimensionsofanon-terminalstableandmis
thenumberofnon-restrictedmovingindexboundariesontherighthandsideofa
non-terminal.Anon-restrictedmovingboundaryisintroducedifoneinputtrack
containstwolinkstosymbolswithamaximalyieldsizeofn.Eachadditional
linktosymbolwithmaximalyieldsizeofnintroducesanothermovingboundary.
Thus,thetop-downparserruntimecorrespondstotheequationsintheruntime
5.3.7.1).(SectionanalysiscomputationThetop-downparsingschemeiscomparabletotheparsingoftheHaskell-ADP.
TheHaskell-ADPparsercombinatorsalsoworktop-down.Theon-demandtable
entrycheckingisimplicit,sinceHaskelluseslazy-evaluation.

105

grammarnussinovusesFold(axiom=struct){

struct=nil(EMPTY)|
right(start,CHAR)|
split(start,pair(CHAR,start,CHAR)
withbasepairing)#h;

}Grammar(a)

comp_N(0,n)forj=0;j<n;++j
comp_N(i,j):fori=j+1;i>1;i--
if(computed(N,i,j))compute_N(i-1,j)
returnN[i,j]compute_N(i,j):
......
fo.r.e.achk,i<k<j:foreachk,i<k<j:
...tmp=comp_N(i,k-1)tmp=N[i,k-1]
+comp_N(k+1,j-1)+N[k+1,j-1]
N[i,j]=max(N[i,j],tmp)N[i,j]=max(N[i,j],tmp)
....returnN[i,j]....(c)Bottom-Up
wnp-DooT(b)

Figure5.28:downNussinoandvbalgorithmottom-upevgrammaraluationandoftawobasepairpseudo-codemaximizationskeletonsforalgebra.top-

106

Bottom-Up5.4.1.2Inbottom-upparsingthetabulatingnon-terminalparsersarecalledexplicitlyfrom
amainlooptofillthetableswithparserresults.Thetablesmustbefilledinthe
orderofincreasingsub-wordsize,becauseasmallersizedentrycouldbereferenced
tocomputetheparseresultofatableentry.Iftherearemorethanonetabulat-
ingnon-terminalparserthentheorderofcomputingtableentriesforsame-sized
sub-wordsmusttaketheparserdependenciesintoaccount(Section5.3.10).Non-
tabulatingnon-terminalparsersarecalledduringthecomputationoftableentries
top-downasrecursivefunctions.Iftheaxiomistabulated,thentheparsingresult
isstoredintheaxiomtableintheentrythatrepresentsthewholeinputparse(e.g.
(0,n)inthesingle-trackcase).Elsetheaxiomparseriscalledtop-downwiththe
wholeinputasargument.Figure5.28(c)showsthepseudo-codeofabottom-up
parserforasmallsingle-trackgrammar.
Theworst-caseruntimeofabottom-upparseristhesameastheruntimeofa
top-downparser.ThemainloopiteratesoverO(nl)sub-wordsandforeachsub-
word,whilecomputingtheentries,O(m)non-restrictedindex-boundarieshaveto
considered.ebThecodegeneratedbyADPCusesbottom-upparsing.

rksBenchma5.4.1.3Thepracticalruntime,i.e.theconstantfactors,ofthegeneratedparsersmaydiffer
significantlyinthetwoparsingschemes.Ontheonehand,top-downparsingin-
troducessomeoverhead,becausearecursionstackhastobeadministeredandfor
eachsub-wordeachtabulatingnon-terminalparserhastoexecutetablechecking.
Inbottom-upparsingtheconditionalcodeiseliminatedandastackisonlyneeded
fornon-tabulatedparsers.Ontheotherhand,ifthegrammaroftheprogramming
introducessparseness,thentheentriesofthetablesareaccessedsparselyaswell.
Duringbottom-upparsingalltableentriesarecomputed.Butintop-downparsing,
sparsenessmayprunesomederivationswhichyieldsparselyfilledtables.Sparseness
isintroducedduetogrammarfilters(e.g.stackpairing)ornon-parsablesymbols
ontherighthandsideofanon-terminal.Theeffectofsparsenessdependsonthe
usedalgebraaswell.AwellstructuredCYK-styleloopisabletoexploitcaching
andpre-fetchingeffectsoftheCPU,ifforexampleascoringalgebraworksonan
elementarydatatype.Theseeffectsmayoutweighreducedcomputationdueto
sparseness.However,foranalgebrawithmoreexpensiveoperations,thetop-down
overheadmaypayoffinreducingthenumberofalgebracomputations.Conversely,
inadensegrammarthetop-downoverheadwouldyieldnobenefit.
Figure5.29showstwoplotsofthetop-downvs.bottom-upruntimeratioof
thecompiledADPfoldalgorithmfortwodifferentalgebraproducts.TheADPfold
grammarisbasicallyaGAP-LversionofRNAfold[25].Itintroducessparseness,
becauselargepartsofthegrammarareprotectedbystackpairingfilters(Figure
5.30).Themfealgebracomputestheminimumfreeenergy,i.e.theenergycontribu-

107

wnop-DoTBottom-Up/

Top-Down/Bottom-UpTop-Down/Bottom-Up
1.5executiontimeratio
ratiousagememory21ratioratio1.50.5
atiortimeexecution105001,0001,5002,000050100
lengthsequencelengthsequence(a)mfe(b)shape∙mfe

Figure5.29:Runtimeandmemoryusageratiosoftop-downandbottom-upparsers
fortwodifferentproducts.

closed={stack|hairpin|leftB|rightB|
iloop|multiloop}
withstackpairing#h;

Figure5.30:filter,Non-terminali.e.therulerightthathandissideprotectedisonlybyparsedtheifthestackpairingfirstandsynsecondtactic
characterofthesub-wordformbasepairingswiththelastandsecond-
one.last

108

tionsofsubstructuresareaddedandtheobjectivefunctionminimizesoverallvalues.
Inthiscase,thebottom-upcomputationistwotimesfasterthanthetop-downcom-
putation,evenifthegrammarintroducessomesparseness.Theshape*mfeproduct
computestheMFEforeachshapeinthesearchspace.Thenumberofshapesgrows
exponentiallywiththeinputsize,i.e.theproductobjectivefunctionreturnsalist
ofshapesandthe2runtimeofaparserforeachsub-wordincreasesfromO(n)in
themfecaseto(nm),wheremisthemaximallistsizeofanargumentparser.In
thiscase,theoverheadoftop-downparsingdoespayoffandthetop-downruntime
andspaceusageisjusthalfthebottom-upones.Duringbottom-upparsingalot
ofsub-wordentriesarecomputedwhichrepresentsuccessfulsub-parsersbutwhich
arenotusedbyanycomputationofbiggersub-wordparses,becauseatthatlevel
astackpairingfilterreturnsfalseforallthosebiggersub-words.

rderingReoArgument5.4.1.4Inbothtop-downandbottom-upparsercodegeneration,thecompilerdoesare-
orderingoftheargumentcallorderoffunctionsymbolsontherighthandsideofa
non-terminal.Theheuristicusedthereistoscoreeachargumentandthensortthe
argumentswithdecreasingscore.Anargumentisscoredhigherthananotherifit
isanapprounsuccessfulximatelyparsemorewithlikelylesstobecomputation,computedi.e.lessexpterminalensivelyparsersandarethusscoredmayhigherreturn
thannon-terminalparsersandlocalgrammarfiltersincreasethescore.

rsenessSpa5.4.1.5Puretop-downparsingexploitssparsenessinagrammar,butpre-processingofthe
inputandon-the-flybookkeepingtoeliminatemovingindexboundariescanexploit
thesparsenessonahigherlevel,usingalgebraandgrammarproperties.SeeSection
discussion.afor10.1

opsLoCYK5.4.1.6Inthesingle-trackcase,themainCYK-styleloopisconstructedwithtwonested
for-loopsasdisplayedinFigure5.28(c).Usingresultsfromthetabledimension
analysis(Section5.3.6)thebasicCYK-styleloopisoptimizedduringcodegener-
beation.calleFdorforevexample,erythesub-wordtable(i,enj),tryifthefillingtablecodefordimensionaparserPanalysisdoesshonotwsneedthatPto
needsonlyaconstantsizedtable.Inthatcasethetabulatingparseronlyneedsto
becalledforthecompletesub-word(0,n).Analogoustothat,aparserthatonly
needneedstoabelinearcalledtable,frombtheecauseitinnermostisalwaysfor-loopcalled,butwithforaeveryconstanj.tTheleftinCYK-sdex,tdoyleeslonotop
isspecializedandrolledoutduringcode-generationandtheparser-callsaremoved
outofthenestedloopsasmuchaspossibletosaveunnecessarycallstotabulating
code.Figure5.31showsthepseudo-codeofthisoptimization.Iftheasymptoti-
callyoptimalruntimeoftheGAP-LprogramisinO(nx),wherex>1,thenthis

109

for(unsignedj=0;j<n;++j){
for(unsignedi=j+1;i>1;i--){
nt_tabulate_A(i-1,j);
}

}

unsignedi=1;
nt_tabulate_A(i-1,j);
nt_tabulate_B(i-1,j);

unsignedj=n;
for(unsignedi=j+1;i>1;i--){
nt_tabulate_A(i-1,j);
nt_tabulate_C(i-1,j);
}

unsignedi=1;
nt_tabulate_A(i-1,j);
nt_tabulate_B(i-1,j);
nt_tabulate_C(i-1,j);
nt_tabulate_D(i-1,j);

Figure5.31:Spquadraticecializedtable,2-tracBkandCCYK-staylelinearlooptablewhereandDanon-terminalconstantAsizedneedstable.a

110

optimizationdoesnotchangetheasymptoticruntimeofthecode,butreducesthe
constantruntimefactorsofthegeneratecode.Otherwise,iftheasymptoticallyop-
thistimalrunoptimizationtimeisinwO(ouldn),jyieldustageneratingprogramthewithbasicasymptoticallyCYK-stylelosubopoptimalinsteadofruntime.doing
Inpractice,reducingconstantruntimefactorsmayyieldsignificantruntimeim-
ts.emenvproTheCYK-loopspecializationisgeneralizedinthemulti-trackcase.Figure5.32
isshowsappliedtherecursivpseudo-coelydeonofthethelotracopkscoandderecurgenerationsivelyloopsalgorithm.withTheemptybooptimizationdiesare
GAP-Leliminated.programConsiderexample.theThepairwisegeneralsequencetwo-trackalignmenCYK-sttylealgorithmloopwasoualdtwconsidero-track
allsub-words,i.e.iteratingoverO(n4)indexcombinationsofthetwotracks.Ap-
thatplyingarethisconstantoptimizationandthusyieldsaiteratesspovecializederO(lon2op)thatindexcomeliminatesbinations,twoofwhfouicrhisindicesthe
asymptoticallyoptimalruntimeofthealgorithm.Theresultingtwo-foldnested
loopresemblesahandwrittenloopifmanuallyimplementingthecontrolflowof
thepairwisesequencealignmentalgorithmtextbookrecurrencesinanimperative
language.programming

rallelizationaP5.4.2Today,multi-coreCPUsandmulti-socketcomputersystemsarewidelyavailable.
OnecaninterpretthistrendasaresultofMoore’sLawthatstates:“Thecomplexity
forminimumcomponentcostshasincreasedatarateofroughlyafactoroftwo
peryear[...].Certainlyovertheshorttermthisratecanbeexpectedtocontinue,
ifnottoincrease.”[32]Thismeansthateveryyearmoretransistorsareavailable
forproducingaCPUatthesamecostsaslastyear.Thus,puttingmorecoresin
oneCPU-packageisonewaytoutilizetheincreasingtransistorcount.Figure5.33
showsthatthislawstillfitswithcurrentdevelopments.
Asaconsequence,newalgorithmsshouldscalewellonparallelmachines.
Thereareseveralparallelversionsofdynamicprogrammingalgorithmsonse-
quences.Forexample,[30]describesaparallelversionofthepairwisesequence
alignmentalgorithm.Sincethecomputationofatableentry(i,j)dependsonthe
valuesoftheneighboringentries(i−1,j),(i,j−1)and(i−1,j−1),itisnotpossible
tocomputetheentriesinparallelinarbitraryorder.Thedescribedparallelization
schemerespectsthetableentrydependenciesandtheeditdistancetableisfilled
inadiagonalizedfashion(diagonalafterdiagonal),becausethecomputationofthe
entriesonthediagonaldoesnotdependoneachotherandthusitispossibleto
computetheminparallel.AnotherexampleistheparallelversionofMcCaskill’s
partitionfunctionalgorithm[17],whichusesasimilarparallelizationscheme.Mc-
Caskill’salgorithmisasingle-trackO(n3)algorithmthattakesanRNAsequence
input.asTheGAP-Lcompilersupportsthegenerationofcodethatisparallelizedand
optimizedforsharedmemoryarchitectures.Itaimsatsharedmemoryarchitec-

111

partition_nts(tord,all,inner,left,right,track):
foreach(ntintord):
if(!nt.is_cyk_left(track)&&!nt.is_cyk_right(track)&&
!nt.is_cyk_const(track))
inner.push_back(*i);
if(!nt.is_cyk_right(track)&&!nt.is_cyk_const(track))
left.push_back(*i);
if(!nt.is_cyk_left(track)&&!nt.is_cyk_const(track))
right.push_back(*i);
all.push_back(*i);
print_cyk(tord,track)
partition_nts(tord,all,inner,left,right,track);
print("...");
if(!inner.empty())
print("for(..){for(..)...{");
print_cyk2(inner,track);
print("}");
if(!left.empty())
print_cyk2(left,track);
print("}");
if(inner.empty()&&!left.empty())
print("for(..){");
print_cyk2(left,track);
print("}");
if(!right.empty())
print("for(..){");
print_cyk2(right,track);
print("}");
if(!all.empty())
print_cyk2(all,track);
print_cyk2(tord,track):
iftrack==tracks:
//generatenttabulatingcalls
:esleprint_cyk(tord,track+1)

Figure5.32:Pseudo-codeofthegenericmulti-trackoptimizedCYK-styleloopgen-
erationalgorithm.Thetordargumentcontainsthetableconfiguration
topologicalsortedaccordingtotheparserdependencies.

112

910710#transistors510

telInUltraSP8080GT200NVIDIA15WER7PObulIstanAMDAMDK10POWER4mcNiagaraT1
#cores

CARMIPSR4000105Intel386POWER4Xeonmc
310

Opteron1970198019902000201020022004200620082010
earyeary(a)transistorcounts(b)numberofcores

Figure5.33:TheincreaseoftransistorcountsinCPUsandthetrendtointegrate
morecoresintooneCPUpackage.

tures,becausethecommunicationoverheadofO(n3)singletrackDPalgorithms
inisalikelymessagetodecreasepassingsiengnificanvironmentlyt.theInsucparallelhanefficiencyalgorithmofathemovingresultingboundaryprogramat
therighthandsideofarulemeansthatforcomputingthevalueforthecurrent
sub-word,O(n)valueshavetobeconsidered,whicharenotlocallyavailableinthe
worstcase.Evenwithaspecializedlow-latencymessage-passingnetwork,thecom-
municationoverheadisthenprohibitiveincomparisonwithlocalmemoryaccesses.
Besidesthat,sharedmemoryarchitecturesareanattractivetarget,becausethey
ailable.vawidelyareTogenerateportableparallelcode,thecompilergeneratescodeaccordingtothe
OpenMPstandard[7].OpenMPisafree-availableopenstandardthatspecifies
languageextensionsforwritingparallelprogramsforsharedmemoryenvironments
inC,C++andFortran.OpenMPconstructsarepragmas,whichdeclarehowanno-
tatedlanguagestatements,likee.g.loops,shouldbeparallelizedbythecompiler.
AcompilerthatdoesnotsupportOpenMP,ignoresthesepragmas.Theresult-
ingprogramisthenjustsingle-threaded,butstillyieldscorrectresults.OpenMP
supportiswidelyavailableinOpen-Sourceandclosed-sourceC/C++compilers.
GAP-Cgeneratescodethatcomputesthetableentriesdiagonalafterdiagonal,as
inthementionedexamples,tosatisfytheentrydependencies.Figure5.34(a)shows
thedependenciesofanentryina≥O(n3)single-trackDPalgorithmandFigure
code5.34(b)doesshonotwsthecomputediagonalonconsingletrolenflotrieswinofthediagonalstableincomputationsparallel,.butThecomputesgeneratedon
diagonalsofblocksofentriesinparallel.Theadvantageofusingblocksassmallest
distributionunitisthatduringthecomputationofoneormoreblocks,acorecould
profitfromcachingandprefetchingeffects.Anotheradvantageofthisisthatthe
synchronizationoverheadisreduced.Theblocksizeisconfigurable,butalocal

113

Disregardingcachingeffects,itfollowsthatthemaximalspeedupofaprogram
is:

(5.17)

(5.18)

T1maxsu(n)=T1/n=n(5.17)
Definition13(Parallelefficiency).
)n(sueff(n)=maxsu(n)(5.18)
Definition14(Amdahl’sLaw).
1es(n)=(1−p)+np(5.19)
1(5.20)≤p−11es∗(n)=(1−p)+p+φ(n)(5.21)
nAmdahl’slaw[2]definestheexpectedspeedup(es)whenparallelizingasingle-
threadedprogramwherepisthefractionoftheprogramwhichcanbeperfectly
parallelized.1−pistheinherentsingle-threadedfractionoftheoriginalprogram
thatcannotbeparallelized.es∗describesthepracticalexpectedspeedup,whereφ
describescostfactorsthatarearesultofparallelization,likee.g.communicationor
erhead.vohronizationsyncFigure5.35showsplotsoftheexpectedparallelspeedupandefficiencyfordiffer-
entvaluesofp.Forexample,evenwhendisregardingextraparallelizationcosts(φ),
anassumedprogramwith95percentperfectlyparallelizedcodeandonly5percent
inherentlynon-parallelizeablecode,theexpectedparallelefficiencyrunningiton
10CPUsislessthan70percent.Thus,effectivelythecomputationtimeof3CPUs
cannotbeutilizedthroughparallelizationinthatcase.
Figure5.36showsthebenchmarkresultsofrunningtheGAP-Lversionofthe
ADPfoldalgorithmunderthemfealgebra.TheADPfoldisanO(n3)single-track
algorithmthatisanADPversionoftheRNAfoldalgorithm[25].RNAfoldpre-
dictsthesecondarystructureofRNAmolecules.Theinputarerandomuniformly
distributedRNAsequencesofsize4000.Forcomparison,plotsoftheexpected
speedup(asdefinedinEquation5.19)ofanassumedprogramPwith98percent
perfectlyparallelizeablecodeareincludedinthedisplayoftheresults.Theresults
showthatthegeneratedcodescaleswellondifferentmachines.Theplotsofthe
parallelspeedupandefficiencyfittheplotsoftheexpectedspeedupandefficiency
oftheassumedprogramP.

115

11099%98%88.095%690%99%eedupsp480%0.698%
2efficiencyparallel0.490%
95%80%246810246810
#CPUs#CPUs(b)(a)

Figure5.35:Plotsoftheparallelspeedupandefficiencyofdifferentparallelizable
idealprogramsaccordingtoAmdahl’sLawforvariousxpercent.An
xpercentplotmeansthatitrepresentsaprogramofwhichxpercent
ofthecodeisperfectlyparallelizable.Communicationoverheadisnot
takenintoaccount.

3020100

304×4Xeon1
Opteron4×8208×8NiagaraT1
98%efficiencyparallel8×8NiagaraT1
eedupsp0.54×4Xeon
Opteron4×81098%001020300
#CPUs#CPUs(a)(b)

Figure5.36:ParallelspeedupandefficiencyofrunningthecompiledADPfoldGAP-
Lprogramondifferentmachines(seetext).Forcomparison,plots
ofanidealprogramwith98percentperfectlyparallelizablecodeare
included.TheinputofADPfoldareuniformlydistributedrandom
sequencesofsize4000.AllmachineswererunningSolaris10andSun
WorkshopPro12wasusedasC++compiler.

116

Backtracing5.4.3Intimizationdynamicdecisionsprogramming,forabaccomputedktracingDP-table,denotesandtheprobuildingcessaofpretttracingy-prinbacktedthestringop-
mingrepresentationcomputationofthatcomputespaththeduringtablethewhicbachisktrace.theinpAutforwofardthebacdynamicktracingprogram-phase.
Considerforexamplethepairwisesequencealignmentalgorithm.Intheforward
computationtheeditdistanceisminimizedfortwosequences.Thebacktracing
phaseworksonthedistance-valuetableandduringbacktracing,theactualalign-
mentisconstructedthathastheminimalcomputededitdistance.
BacktracingisnotaconceptoftheADPframework.Theeffectofbacktracing,
i.e.computingarepresentationoftheoptimalpath,isimplementedviaproductsof
spalgebrasecified,inunderADP.whichtheConceptuallycand,idatedifferensoftthealgebrassearchforspacescoringareevandaluated.pretty-pTrointingcomputeare
the(Definitionstring6)represenisused.tationForofethexamploptimale:scoredcandidates,thelexicographicproduct

(5.22)yprett∙scorewherescoreisascoringalgebraandprettyhasanenumerativerole.
BacktracingisthenanoptimizationoftheGAP-Lcompilerinthecodegenera-
tion.Thecompilerinspectsthespecifiedproductandsplitsitintotwopartsthat
arecomputedinaforwardandbacktracecomputation,iftheproductsatisfiescer-
tainconditions.Intheexamplethegeneratedcodefortheforwardcomputation
usesjustalgebrascoreforcomputationandthegeneratedbacktracingcodeuses
theprettyalgebraforbacktracing.Thebacktracecodegenerationphasetriesto
splittheproductintotwoparts

(5.23)B∙AanwherealgebraAisproanductalgebraofroleorenanalgebraumerative.proIfducthistofisrolenotpscoringossible,andthenBisnoanbacalgebraktracingor
codeisgenerated.Disablingthebacktracingoptimizationyieldscodethathas
thesameasymptoticruntime,buttheconstantfactorsofthepracticalruntime
arehigher.WithoutbacktracingthecomputationofBisdoneintheforward
computation,i.e.Biscomputedforsub-candidates,whicharenotpartofthe
optimalsolutionandhencearenotcomputedduringbacktracing.Considere.g.
asingle-trackO(n3)RNAfoldingalgorithm,withtheoptimalbacktracingphase
inaboOve(n2and).O(nStarting)entriesfromtothtableeleftentryinth(0e,nw),orstthecase.bacAtktracingeachentravtryersesaO(n)non-restrictedentries
indexboundaryimpliesalookupofO(n)values.
optimal,GAP-Csub-opsupptiortsmalsevanderalstocbachasticktracingbacscktracing.hemesforcode-generation:optimal,co-
grammarFigure5.37(b)exampleshows(Figurethe5.37(a)).pseudo-codeOptimalforanbacoptimal-backtracingmeansktracecothatdeinfortheasmallsitu-

117

stringbt_formula(i,j):
score=formula[i,j]
formula=number|ifnumber[i,j]==score:
add(formula,plus,returnbt_number(i,j)
formula)|foreachk,i<k<j:
ifadd(formula[i,k],plus(k,k+1),
mult(formula,times,formula[k+1,j])==score:
formula)
returnadd_pp(bt_formula(i,k),
;h#(a)GAP-Lgrammarplus(k,k+1),
bt_formula(k+1,j))
...ktracebacOptimal(b)

[string]bt_formula(i,j):
ret=[]
score=formula[i,j]
ifnumber[i,j]==score:
ret.add(bt_number(i,j))
foreachk,i<k<j:
ifadd(formula[i,k],plus(k,k+1),
formula[k+1,j])==score:
ls=bt_formula(i,k)
rs=bt_formula(k+1,j)
foreachlinls,rinrs:
ret.add(add_pp(l,plus(k,k+1),r))
...returnret
ktracebacCo-Optimal(c)

Figure5.37:AnexampleforaGAP-Lgrammarandthepseudo-codeforoptimal
andco-optimalbacktracingthatfollowsthegrammarstructure.

118

Figure5.38:Diagramoftheclassesthatareusedforconstructingthebacktrace
aredata-structure.generatedbyClassthenamescompiler.withTheastarotherrepresenclassestaaresetpartofcoflasthesesrun-that
.librarytime

ationswithmorethanoneoptimalsub-solution,onlyoneisconsideredandthe
othersareignored.Inco-optimalbacktracingeveryoptimalsub-solutionistaken
insults.toaccounThisist,suchconsistenthattthewithbacthektracdefinitioningphaseofthemayreturnlexicographicseveralproductco-optimaloperationre-
inADP(Definition6).Fromthedefinitionoftheproductsobjectivefunctionit
directlyfollowsthatallco-optimalcandidatesareprocessed.Figure5.37(c)shows
thethefirstpseudo-cooptimaldeofsplitandco-optimalreturnbaconektraceoptimalcodeforsolution,theallexample.splitsInaresteadofconsideredchoosingand
alistTheofgeneratedoptimalcodesolutionsforisco-optimalreturned.backtracingissimilartothepseudo-codeex-
amples,becauseitalsousesasetofrecursivefunctions.Usingrecursivefunctions
hastracing,theadvsinceanthtageefofunctioneliminatingcallstacthekisexplicitusedforthat.administrationInofaddition,astackduringdata-structuresback-
areFiguregenerated5.38thshoatwstherepresentdiagramabacofthektrace.classesusedtoconstructabacktraceinthe
generatedcode.Theclassnamesthatcontainastararegeneratedbythecompiler
andsourcetheprogram.classesareForparteacofhthefunctionruntimesymbollibraryof,btheecausealgebratheyaareindepBacktrace_Fn_*endentofclassthe

119

typedef(score,Backtrace)bt-tupel
typedef[bt-tupel]bt-list
...bt-listret_2=bt_proxy_nt_formula(i,k_0);
if(is_not_empty(ret_2))
foreach(x_0,ret_2)
foreach(x_2,ret_4)
bt-tupelans=add_bt(x_0,a_1,x_2);
push_back_min_other(answers,ans);
...bt-listeval=h_bt(answers);
btbt_list=execute_backtrace_k(eval);
returnbt_list

Figure5.39:Simplifiedcodeofthegeneratedco-optimalbacktracingcodebyGAP-
example.grammartheforC

isgeneratedandobjectsofthatclassrepresentthechoiceofthatfunctiondur-
ingthebacktracing.Foreachnon-terminalsymbolaclassBacktrace_NT_*_Front
isgeneratedthatrepresentsacallofanon-terminalparserinthebacktrace.It
mayreferencealistofBacktrace_NT_Back_Baseobjectsthatrepresentco-optimal
backtracesforsub-words.Consideringthepreviousgrammarexample,Figure5.39
showsasimplifiedversionofthegeneratedcodethatcreatesobjectsofthebacktrac-
ingdata-structure.Thefunctionbt_proxy_nt_formulageneratesalistofscore
andBacktrace_NT_formula_Frontobjecttuplesandthefunctionadd_btreturns
scoreandBacktrace_Fn_addobjects.Thefunctionpush_back_min_otherisan
optimizedlist-appendversionthatonlyappends,ifthescoreislessthanthetop
ofthelist,andbecauseofthisthespecialobjectivefunctionh_btisjusttheiden-
tity.Attheendoftherecursivebacktracefunctionacallexecute_backtrack_k
triggersthebacktracememberfunctionsofthecollectedobjectsthatcontinuethe
recursion.Afterthebacktracingisfinishedandalistofbacktracepathrepresent-
ingBacktraceobjectsisreturnedfortheinput,theevaluatememberfunction
iscalledtoactuallycallthealgebrafunctionsofBrecursivelyaccordingtothe
path.ektracbacconstructedThisbacktracingschemeresemblestop-downparsing(Section5.4.1).Thedif-
ferenceisthatthepartBoftheproductisreplacedbyageneratedbacktracing
algebraandtheparsingisguidedbytheresultsoftheforwardcomputation.The
executionofthebacktracingcodeisdelayedaftertheobjectivefunctionexecution
withthehelpofproxy-objectsofthebacktracingdatastructure.
Insuboptimalbacktracingduringbacktracingnotonlytheoptimalscoredcan-
didatesareconsidered,butallcandidateswithscoresxwherex≤x+δor
x≥x−δforminimizationormaximizationobjectivefunctions.Sinceusuallythe

120

Table5.4:ExamplesofViennaStringsandtheirshapes(atshapelevel5).
eShapStructure[]....(((.....)))...[]....(((...((....))....)))...[[][]]..(((...((...))...((((...))))...)))..

searchspaceofaGAP-Lprogramisofexponentialsize,dependingonthevalue
ofδtheruntimeofsuboptimalbacktracingisexponential.Thegeneratedcodeby
GAP-Cissimilartotheco-optimalcase.Thedifferenceisthattheconstructionof
scoreandbacktrace-objecttupleliststakestheδvalueintoaccount.Thisscheme
issimilartothatofRNAsubopt[57].RNAsuboptusesanexplicitstackduring
ktracing.bac

BacktracingchasticSto5.4.3.1Instochasticbacktracing,thescorecomponentofthescoreandthebacktrace-object
tuplesareinterpretedasandiscreteprobabilitydistributionandthebacktracing
objectivefunctionchoosesacandidatefromthecandidatelistatrandomunderthis
distribution.Stochasticbacktracingmeanssamplingcandidatesfromthesearch
spaceunderanalgebraBandaccordingtoaprobabilitydistribution,definedby
5.23).(EquationAalgebraAuse-caseforstochasticbacktracingisthesituation,wherecomputingthealge-
braproductC∙Disexpensive,becausecomputingCisexpensive,butitispossible
tocomputeDinpolynomialtimeandDisasynopticalgebrathatdefinesaprob-
abilitydistributionforthecandidates.AnalternativetodirectlycomputingC∙D
istocomputeDandthenuseDduringstochasticbacktracing.Thus,anapprox-
imationoftheresultofC∙Disobtainedviaseveralsamplingsfromthesearch
space.tracAnkO(n3example)ofRNA-foldingthisisthealgorithmcomputationlikeofADPfoldthepr(itoisductashapGAP-Le∙vpfuncersionforofaRNAfoldsingle-
[25]).Thealgebrashapehasaclassifyingroleandthealgebrapfunchasasynoptic
role.Algebrapfunccomputesandsumsoverthepartitionfunctionvaluesofthe
candidates.TheADPfoldgrammarissemanticallynon-ambiguousundertheVi-
ennaStringrepresentation[25].Eachcandidatefromthesearchspacehasaunique
ViennaStringrepresentation.AViennaStringprintsadotforanunpairedbase
intheinputandamatchedpairofbracketsforabasepairing.Ashapeneglects
spacesmallwithdifferencesexactlyinoneViennahairpinStringsstructure[22],suchavhedthatifferene.g.tallViennacandidatesStrings,frombutthethesearcsameh
shape.Table5.4showsexamplesofViennaStringsandshapestringsofvarious
candidates.UsingBoltzmannstatistics[54,55]thepartitionfunctionvalueofthe

121

l=[hl(...),ml(...,il(...),...),...](5.27)
candidateslpf=[1,23,42,...](5.28)

Figure5.40:Conceptuallistofcandidatestructuresandtheircorrespondingparti-
tionfunctionvalues.Thevaluesdefinethediscreteprobabilitydistri-
butionduringsampling.

searchspaceSisdefinedas

Q=e−βEs(5.24)
S∈sofawhereshapeEsXisisthedefinedenergyasofthesumstructureovers.allAcandidateccordinglyvthealueswithpartitionthesamefunctionshapvaluee.
pf(X)=e−βEs(5.25)
X∈sThentheshape-probabilityisdefinedas:
p(X)=pf(X)(5.26)
QThecomputationofproductshape∙pfuncprovidestheshapeprobabilitiesof
anandinputtheshapstring.espaceThesizesearcdeph-spaceendsonofththeesearcADPfoldhspacegrammarsize.Theisofshapexpeonentialabstractionsize
reducesthesearchspacesizeincomparisontotheViennaStringsspace,butthe
numberofshapesstillgrowsexponentiallywiththeinputsize[29].Thus,computing
shape∙pfuncleadstoanexponentialruntimeintheworst-case.Computingpfunc
isinO(n3).Thus,usingstochasticbacktracing,thecompleteruntimeisthen
O(n3+in2),whereO(n2)istheworst-caseruntimeofonestochasticbacktracing
andiisthenumberofsamplingiterations.Figure5.40showsanexampleofthe
correspondenceofpartitionfunctionvaluesandsearchspacecandidates.
Toanalyzetheerrorsofshapeprobabilitiesobtainedviastochasticbacktracing,
thefollowingexpressionisused:
δ(u,x,y)=|px(S)−py(S)|(5.29)
∪S∈SSyxwhereuisanRNAsequence,xandyaretwoshapeprobabilitycomputation
methods,SxandSyaretheshapespacesofthetwomethodsforsequenceu,
0≤δ(u,x,y)≤2(5.30)
andδ(u,x,x)=0.Aδof2meansthattheshapespacesofthetwosequencesdo
notshareanyshape.

122

100

tenerc50p

[0;0.01[[0.01;0.05[
[0.05;0.1[[0.1;0.25[
[0.25;0.5[[0.5;1[

00–2828–5656–84n84–112–140
112bnonamGAPC(a)[0;0.01[[0.01;0.05[
[[00..25;05;00..5[1[[0[.01;.5;0.1[25[

100

tnerce50p

100

trcen50ep

[0;0.01[[0.01;0.05[
[0.05;0.1[[0.1;0.25[
[0.25;0.5[[0.5;1[

00–2828–5656–84n84–112112–140
esRNAshap(b)

00–2828–5656–84n84–112–140
112adpfGAPC(c)Figure5.41:Distributionofshapeprobabilitydeviationsδ(Equation5.29)asa
abilitiesfunctionofwithsequenceshapelengthprobabilitiesnwhenapprocomparingximatedbtheyvexactariousshapeprogramsprob-
viastochasticbacktracing(seetext).Eachprogramwasrunwiththe
samesetof2000randomsequences.Eachδvalueiscollectedinone
ofthe6intervalsshowninthelegendboxes.Theversionsin5.41(a)
and5.41(b)usethesameADPgrammarthatunambiguouslytakes
danglingbasesintoaccount.Theversionin5.41(c)implementsthe
grammar.RNAfold

123

Figures5.41(a)and5.41(c)comparetheshape-probabilitiesforseveralrandom
sequencesobtainedviastochasticbacktracingandviaanexactforwardcomputa-
tion.Theplotshowsthat1000samplingiterationssufficeinthatcasetoproduce
verygoodapproximations.[11]describestheprogramsfold,whichisanO(n3)
RNAfoldingalgorithmthatsamplesRNAsecondarystructuresusingBoltzmann
statistics.Thepapergivesstatisticalreproducibilityguaranteesforsampling.
ThestochasticbacktracingisavailableinGAP-Lviatheuseoftheoverlayprod-
uctandaninstancefilter.Forthepreviousexampletherighthandsideofthe
is:GAP-Lindeclarationinstance(pfunc|pfunc_id)*shape5suchthatsample_filter_pf
Theoverlayproductspecifiesthattheleftoperandisusedduringtheforward
computationandtherightoperandisusedduringbacktracing.Inthiscasepfunc_id
isanalgebraderivedfrompfuncandtheobjectivefunctionisreplacedbytheiden-
tityfunction.Aninstanceiscalledinthegeneratedcodeaftertheobjectivefunction
ontheresultsoftheobjectivefunction.Inthiscasesample_filter_pfinterprets
thefirstcomponentsoftheresulttuplesaspointsofadiscreteprobabilitydistri-
butionandchoosesonetupleunderthisdistributionatrandom.Doingtheactual
samplinginafilterandnotintheobjectivefunctionhastheadvantageofbetter
codereuse:thefiltercanbepluggedtogetherwithotherpartitionfunction-like
algebras.Instochasticbacktracingtheobjectivefunctionalwayschoosesonesub-solution
fromthelistoftuples.FortheRNAshapes[22]grammar,thisisnotsufficient.
TheRNAshapesgrammarisanRNAfoldinggrammar.Itisnon-ambiguousunder
thecanonicalViennaStringrepresentationofthecandidatesinthesearchspace
anditunambiguouslytakesenergycontributionsofdanglingbasesintoaccount
whilecomputingtheminimumfreeenergy(MFE).Asaconsequence,theresult
typeofthemfealgebraisatupleofthescore,possibledanglingcontributionsand
indices,becauseatsomelocationsinthegrammarthedecisiontouseadangling
energycontributionisdelayedtoalaterapplicationofanotheralgebraobjective
function.Suchconstructions,incombinationwithaunitaryobjectivefunction,
violateBellman’sPrincipleofOptimality(Definition5),becausesub-solutionswith
co-optimalscoresmayyielddifferentlyscoredsuper-solutions,i.e.twosub-solutions
withthesameMFE,butwithdifferentdangling-energycontributions.Instochastic
backtracingthisresultsinapproximationproblemsfortheRNAshapesgrammar
andsamplingshape∙pfunc.Forexample,thecomparisonofRNAshapes(Figure
5.41(b))andtheGAP-LversionoftheRNAfoldgrammar(Figure5.41(c))shows
thatthestochasticbacktracingofRNAshapesproducessomelargerapproximation
errors.IntheGAP-LversionoftheRNAshapesgrammartheusedsamplingfilter
alsounambiguouslytakesdangling-energycontributionsintoaccount.Asaresult,
thedistributionofδvaluesofthisversion(Figure5.41(a))iscomparabletothe
GAP-LversionoftheRNAfoldgrammar(Figure5.41(c)).
Usingstochasticbacktracingisnotrestrictedtopartitionfunction-likealgebras.
ItispossibletoimplementStochasticContextFreeGrammars(SCFGs,[4])as

124

GAP-Lprograms.Theruleprobabilitiesarethencodedinanalgebra.

deMowWindo5.4.4Inwindowmode,thecomputationisdoneinaslidingwindowovertheinputstring,
wherethesub-solutionsoftheoverlapregionbetweentwoiterationsarereused.
Forexample,foracubicruntimealgorithmthespacerequirementisthenin
O(w2)andtheruntimeinO(w2n),wherewisthewindow-sizeandntheinput
length.codeTheforcoarbitraryde-generationsingle-tracofkGAP-CGAP-Lsupportsprograms.theoptionalgenerationofwindowmode

Index5.4.5HackingisWhilecited.introTheducingADPADP,framewusuallyorkaatvoidssomethepoinusettheofsloganindices“Nowiththesubscripts,conceptnooferrors!”tree
agrammars.declarativeAtrefashioneandgrammarthespcompilerecifiesthederivsearceshefficienspacetofamatrixDPproblemrecurrencesinstancefromthein
grammar.treeminimStandardumfreeeneRNA-foldingrgyfoldingalgorithmsalgorithmslike[25]theopNusserateinovunderalgortheithm(nested)[36]andbasestandardpairing
condition.Thebasepairingconditionsaysthattwobasepairingsmaynotintersect:
fortwobasepairings(ui,uj)and(uk,ul),wherei<ktheneithereitheri<j<
k<lori<k<l<j,whereuistheinputstringand0≤i,j,k,l<|u|.
ForthemodelingofRNAsecondarystructuresthatmayincludepseudo-knots,
thepseudo-knotmotifsviolatethebasepairingcondition[40].Figure5.42de-
scribesthebasicsegmentsofcanonicalsimplerecursivepseudo-knotswhichare
recognizedbytheRNAfoldingalgorithmpknotsRG[40].Anaiveimplementation
ofthepknotsRGalgorithmwouldruninO(n8)because6movingindexboundaries
inthepseudo-knotneedtobeconsideredforeachsub-wordoftheinputstring
[40].Thealgorithmreducesthenumberofboundariesto2,duetocanonicaliza-
tionrules,whilestillcalling7non-terminalparsersforeachpseudo-knotsegment.
ThisresultsinanoverallruntimeofO(n4).Thealgorithmisimplementedinthe
ADPframework,butwithoneexception.Inthegrammarrulewhichdescribesthe
pseudo-knotstructure,indicesareexplicitlymanipulatedoutsideofthegrammar.
TomaketheefficientimplementationofpknotsRGandsimilarpseudo-knot-aware
hacfoldingking(seealgorithmsSectionpossible4.5.8.4).inTheseBellman’sconstructsGAP,alloGAP-Lwforconatainsmixofconstructsexplicitformanindexual
windexorlds.Theoptimizationsusualappandliccationleanisdeclthearativereductiongrammarofmocovingde,btogetoundariesthebinesttheofbrighotht
mohandvingsideindofexabgrammaroundaryrulereductionforwhicbhecausesomeofexpaertconstanknotwledgeyieldissizeused.ofaInthenon-terminalcaseof
moparser,vingbnooundarindex-hacies(seekingisSectionnecessary5.3.12),sinceinsucthehcases.compilerautomaticallyremovesthe

125

1grammarpknotsRGusesAlgebra(axiom=struct){
...23help_pknot_free_kl=
[.45inti=t_0_i;intj=t_0_j;
6if(i+11<j){
7for(intl=i+7;l<=j-4;l=l+1){
8intalphamaxlen=second(stacklen(t_0_seq,i,l));
9if(alphamaxlen<2)continue;
10for(intk=i+3;k<=l-4;k=k+1){
11intalphareallen=min(alphamaxlen,k-i-1);
12if(alphareallen<2)continue;
13intbetamaxlen=second(stacklen(t_0_seq,k,j));
14if(betamaxlen<2)continue;
...1516INNER(CODE);
}17}18}19.]20{2122pknot(REGION,REGION,REGION).{
23pknot(REGION[i,i+alphareallen],
24front[i+alphareallen+1,k].(j).,
25REGION[k,k+betareallen],
26middle[k+betareallen,l-alphareallen]
27.(j-betareallen,i+alphareallen).,
28REGION[l-alphareallen,l],
29back[l,j-betareallen-2].(i).,
30REGION[j-betareallen,j];
31stackenergies)
.}3233}#hKnot;
3435middle(intbetaRightInner,intalphaLeftInner)=
|...3637middlr(BASE,mid,BASE;betaRightInner,alphaLeftInner)
38...#;
39...40}41

}}.]{pknot(REGION,REGION,REGION).{
pknot(REGION[i,i+alphareallen],
front[i+alphareallen+1,k].(j).,
REGION[k,k+betareallen],
middle[k+betareallen,l-alphareallen]
.(j-betareallen,i+alphareallen).,
REGION[l-alphareallen,l],
back[l,j-betareallen-2].(i).,
REGION[j-betareallen,j];
stackenergies)
.}}#hKnot;
middle(intbetaRightInner,intalphaLeftInner)=
|...middlr(BASE,mid,BASE;betaRightInner,alphaLeftInner)
;#......

Figure5.43:GrammarruleexamplesfromthepknotsRGGAP-Lthatusesindex
hackingandparametrizednon-terminals(seetext).

127

Figure5.43showsagrammarsnippetfromthepknotsRGGAP-Lgrammarthat
heavilyusesindexhackingandparametrizednon-terminals(seeSection4.5.8.1).
Line5accessesimplementationdetailsofthegeneratedcode.Thenestedfor-
loopsinline7and10explicitlystatehowtheremainingtwoindexboundaries
inasimplerecursivepseudo-knotareiterated.Inline16apragmaisusedthat
tellsthecompilertoinsertthegeneratedrulecode(line23untilline31)atthat
locationduringcodeoutput.Line22isonlyusedasreplacementforsemantic
analyses.Anexampleofaparsercallwithexplicitindicesisline23(REGION).In
line24aparametrizednon-terminaliscalled,wherethenon-terminalparameter
isenclosedinspecialparentheses.Thestackenergiesparameterofthealgebra
functionpknotinline31isseparatedbyasemicolonbecauseitdoesnotresult
fromaparserapplication.middle(line35)isanon-terminalparametrizedwithtwo
parametersthatareusedasadditionalargumentstothealgebrafunctionmiddlr.

128

dulesMoGAPBellman’s6

Bellman’sGAPModules(GAP-M)istheruntimelibraryforGAP-Lprogramsthat
areimportantranslatedtmoduleswithofGAP-C.GAP-MInarethiscpresenhapterted.designSinceandcurrentlyimplementhetationdefaultcbachoiceskendof
ofdescribGAP-Ced.isThenextgeneratingsectionC++shocowsde,theinthedesignfolloofwingmemorythepoC++ols,vSectionersionof6.2GAPdescrib-Mesis
thedesigndata-structuresoflistandSectiondata-structures,6.4presenSectiontsa6.3modusholewsofthedreusableesignoffunctionsdifferenfortstringRNA
algorithms.folding

6.1MemoryPools
InthegeneratedcodeofGAP-Lprogramstheuseofseveraldata-structuresleads
toalotofallocationsandde-allocationsoffixed-sizememoryslices,manyofwhich
areshortlived.Examplesaretemporarylistscontainingbacktraceobjectswhich
aredestructedattheendofthefunction,excepttheoptimaloneortheheavyuse
ofsmallstringconcatenationsinprettyprintingalgebras.
Fortheseuse-casesmemorypoolsareadvantageous.Amemorypoolmaintains
largeblocksofmemoryandallowsonlyfixed-sizememoryallocations.Anallo-
cationfromamemorypooljustreturnsapointerintoamemoryblockaftera
inminimalternalboamounokkteepingofinthatternalmarksbookktheloeepingcationandasade-reusable.allocationTheinducesmemoryofonlythecpheapool
isde-allocatedatoncewhenthepoolisnotneededanymore,e.g.atprogramend.
Thisconceptamortizestheoverheadofgeneralpurposeallocatorallocationsover
cations.ultiple-allomItTheobtainsmemorythempofromolthekimplemenernel’stationvirtualinGAP-Mmemorysusesystem,memoryi.e.viablothecksofmmap100syscall.MB.
Apageinthatvirtualmemoryisautomaticallyallocatedbythekernelfromreal
memoryatthefirstwriteintoit,whereacommonpagesizeis4KB.
Amemoryblockisdividedintomultipleentries.Anentrycontainsspacefora
blonextckpoinreturnstertheandannextfreeelemenenttryofandfixed-size.incremenIntstheanbasicindexvcaseariable.analloAcationde-allocfromationa
prependstheentryintoalinkedlistoffreedentrieswhichusesthenextpointers
inthefreedentries.Iffreedentriesareavailable,anallocationreturnsthelast
freedentryandremovesitfromthelinkedlist.Whenablockisfull,anewblock
isobtainedandusedastheprimaryblockinthepool.
GAP-Malsoprovidesmemorypoolsforallocationsofmultiplesofafixedsize.

129

Thedata-structureinternallyusesaseparatepoolanddoesmultiplexingbetween
themforeachmultiplier.
SincethecurrentoutputlanguageofGAP-CisC++andGAP-Misimplemented
inC++aswell,thememorypoolAPIofthecommonBoostC++library[10]
wouldbeanalternativetoimplementinganewmemorypoolsolutionforGAP-M.
Actually,earlyversionsofGAP-McontainedasmallwrapperaroundtheBoost
memorypoolAPI(usingversion1.38),butprofilingshowedthattheusageofreal
memory(RSS,ResidentSetSize)waslarge,i.e.dependingontheinputandthe
GAP-Lprogram,upto40percentmorememoryusageisobservedcomparedto
aversiononlyusingthesystemdefaultgeneralpurposeallocator(malloc).The
GAP-MmemorypoolallocatorusesRSSmoreefficiently,i.e.ituseslessmemory
thanamallocbasedversion.TheruntimeisthesameaswiththeBoostmemory
ol.op

Lists6.2Listsareusedinthegeneratedcodeduringbacktracingofco-optimalorsub-optimal
candidatesorwhenintheforwardcomputationanon-unitaryproductisused.
Dominantlistoperationsareappendingobjectstoalist,thejoiningoftwolists
andthecopyingoflistobjects.
Thus,theimplementationofthelist-datatypeinGAP-Musesfixed-sizememory
sliceswhichareallocatedfromamemorypool(Section6.1).Eachslicehasspace
forafewlistelementsandapointertoafollowingslice.Thesmallnumberof
elementsaslicecanholdthatyieldsgoodresultsinpractice.Thelistobjectsused
inthegeneratedcodearesmart-pointersthatreferencethereallistimplementation
objectsthatmanagetheslices.Smart-pointersaresmallobjectsthatimplement
referencecounting,i.e.theyautomaticallydestructthereferencedobjectifitisnot
referencedanymoreintheprogram.Inadditiontothatlazy-allocationisused.
Usingreferencecountingmakesthecopyingaroundoflistobjectscheap,using
fixed-sizeslicesamortizestheallocationcostovermultipleelementappendoptions
andthenextpointerinsideaslicemakesthejoiningoftwolargelistscheap.The
useofmemoryimprovestheperformanceofmemoryallocationsandde-allocations
general.in

StructuresDataString6.3GAP-McontainsseveraloptimizedstringmodulesforGAP-Lprograms.They
areeasilyexcoptimizedhangeableforviadifferenttypusee-synoncases,ymsbutifpshareerformancethesameprofilingAPIsucshohwsthattheyoptimizingare
ortunities.oppFirst,theimplementationoftheGAP-Lstringdata-typeisoptimizedforuse
infoldingpretty-prinalgorithm.talgebras,Themoste.g.ausedopVienna-StringerationsinprettthisyuseprincasetingarealgethebraofanconcatenatingRNA

130

ofsmallsub-stringsandthecopyingstringobjects.Itusesinternallyfixed-size
slicesofmemorywhichareallocatedfromamemorypool(seeSection6.1).Aslice
containsasequenceofcharacters,referencestosub-stringsandrepeat-codings.
Usingreferencesmeansthatifastringobjectisappendedtoanotheronethen
onlyareferenceisappendedtothedestinationsliceandnotthecontentiscopied
around.Inthecaseofanappendoperationofacharacterthatisrepeatedseveral
timesonlythecharactersandthenumberofrepeatsissavedforspaceefficiency
reasons.Theimplementationofthestringdata-typeusesaclasshierarchywherethe
stringobjects,whichareusedinthegeneratedcode,areonlysmart-pointersthat
referencetheheavy-weightobjectswhichreferencestringsliceobjects.Besides
referencecounting,theimplementationofthestringdata-typeuseslazy-allocation
andcopy-on-writeforefficiency.
Second,forclassifyingalgebras,i.e.algebrasthatareusedinclassifyingproducts,
ase.g.algebrashapeinshape∙mfe(seeSection2.1.1),thereisashape_tdata-
typeinGAP-LandGAP-Mcontainsanefficientimplementationofit.During
classification,theconstructionofstringsfromsub-stringsandcopyingarecommon
operations,too.Inadditiontothat,dominatingoperationsarethecomputationof
hashvaluesandstringequalitytesting.GAP-Cgeneratescodethatuseshashtables
asanoptimizationofclassifyingproducts.Inalotofusecasesthealphabetofclass
stringsisverysmall,e.g.forshapesitisofsize3,andclassstringsarenotvery
practice.inlongThus,theimplementationoftheshapedata-typeusesslicesofmultiplesof
machine-wordlengthand2bitsforacharacter.Asaresultitpacksupto32
charactersintoasinglesliceona64bitarchitecture.Slicesarepooledinamemory
poolwhichisoptimizedforallocationsofmultiplesofafixedsmallsize.Ifthe
stringlengthdoesnotfitintoasingleslicethenanewlargeenoughmulti-sliceis
allocatedfromthepoolandtheoldcontentiscopied.Stringappendoperationsare
optimizedusingelementaryfind-first-set(FFS,theindexofthefirstbitset)ofthe
machine,whenavailable.Theimplementationusesalsolazyallocationstoavoid
costsforemptystringsandreferencecounting.
Thedata-structureisparametrizedwiththeconcretealphabetsuchthatitis
re-usableforthedefinitionofnewoptimizedclassificationdata-typesthatneed
ets.habalptdifferenThird,GAP-Mprovidesageneralpurposestringdata-type,whichiscalledrope.
Itshouldbeusedforclassificationstringsthatusealargeralphabetandstring
algebraswheretheslicesizeofthestringdata-typeisnotlargeenough.The
implementationofthedata-typeusesreferencecounting,lazyallocationandfix
sizedmemoryslicesfromamemorypool.Whenaropestringgrowslargerthan
theslicesizethenanothersliceisusedasanextension.Thereisnocopy-on-write
andthecontentsofappendedropeobjectsaredirectlycopied.Whencomparing
theropedata-typewiththeshapedata-typeusingADPfoldandashapealgebra,
theshapedata-typeimplementationwas2timesfaster.

131

rnalib6.4TheGAP-MmodulernaprovidesseveralfunctionsforprogrammingRNAfolding
algorithms,especiallyforcomputinglocalenergycontributionsofdifferentpossi-
bleRNAsecondarystructureelements,likee.g.ahairpinlooporabulgeloopof
differentsizesandbases.Theenergyfunctionsareusede.g.inMFEandpartition
algebras.functionInGAP-Lprogramsthemodulecanbeusedviaimportrna.Itprovidesthe
functions:filterwingfolloboolbasepairing(Subsequence);
boolstackpairing(Subsequence);
Andfollowingenergyfunctionsarepredefined:
intdl_energy(Subsequence,Subsequence);
intdr_energy(Subsequence,Subsequence);
inttermaupenalty(Subsequence,Subsequence);
intsr_energy(Subsequence,Subsequence);
inthl_energy(Subsequence,Subsequence);
intbl_energy(Subsequence,Subsequence,Subsequence);
intbr_energy(Subsequence,Subsequence,Subsequence);
intil_energy(Subsequence,Subsequence);
intdli_energy(Subsequence,Subsequence);
intdri_energy(Subsequence,Subsequence);
intss_energy(Subsequence);
Thefunctionsworkontheargumentsofthebuilt-intypeSubsequence.Asub-
sequenceobjectisreturnede.g.bytheterminalparsersLOC,BASEandREGION.A
sub-sequenceobjectrepresentsasubstring(i,j)oftheinputstring(0,n),i.e.the
strings[i]...s[j−1]ofs[0]...s[n−1]1.
Astackpairinggrammarfilterreturnstrueifthefirsttwobasesarecomple-
mentarytothelasttwobasesofthesub-sequence.ThisfilterisusedinsomeRNA
foldingalgorithmsbecausealonelybasepairingisconsideredasveryunlikelyin
nature.Thenamingoftheenergyfunctionsfollowsthenamingofstructureele-
ments,e.g.dlforleftdangle,blforleftbulgeloopandssforsinglestacking.The
argumentsofanenergyfunctionrepresentpartofthestructurethatinfluencesthe
energycontribution,e.g.forhl_energythefirstsub-sequencemarksthebeginning
andthesecondsub-sequencemarkstheendofahairpinloop.Figure6.1showsan
example.ThernamoduleisathinwrapperaroundthelibrnaClibrarywhichisalsopart
ofGAP-M.TheCAPIoflibrnausesC-stringsandrawindicesasparametersto
theenergyfunctionssuchthatitisreusableforarbitraryRNAfoldingalgorithms
andnotjustforGAP-Lones.Figure6.2showsanexcerptfromthelibrnaAPI.In
additiontothefunctionsoftheGAP-Mmoduleitprovidesafewspecializedenergy
1InthedefinitionofHaskell-ADPthecharacterindexingschemestartsfrom1.

132

grammarfoldusesFS(axiom=struct){
...hairpin=hl(BASE,BASE,REGIONwithminsize(3),BASE,BASE);
...}algebramfeimplementsFS(alphabet=char,comp=int)
{...inthl(Subsequencelb,Subsequencef1,Subsequencex,
Subsequencef2,Subsequencerb)
{returnhl_energy(f1,f2)+sr_energy(lb,rb);
}...}

Figure6.1:Grammarandalgebraexample,whereaMFEalgebrausesenergyfunc-
tionsfromtheGAP-Mmodulerna.

enumbase_t{N_BASE,A_BASE,C_BASE,G_BASE,U_BASE};
typedefunsignedintrsize;
inthl_energy_stem(constchar*s,rsizei,rsizej,rsizen);

intsr_energy(constchar*s,rsizei,rsizej);

...

Figure6.2:ExcerptofthelibrnaCAPI.

133

functions

ishemesc

the

134

and

the

energy

functions

needed

for

partition

function

sameasfortheGAP-Mmodule.

tables

hwhic

ear

distributed

with

computations.

the,ternallyIn

the

Vienna

RNA

energy

The

indexing

functions

kagepac

[25].

use

PGAPBellman’s7ages

Bellman’sGAPPagesisaninteractiveweb-siteforpresentingGAP-Lbyexamples.
AlistofexampleGAP-Lversionsoftextbookdynamicprogrammingalgorithms,
likepalindromice.g.optimalRNAmatrixsecondarychainstructureexecution,likepairwisefoldingvariansequencets,areavalignmenailable.tvFarianortseacorh
linkprogramtotheapsourceagecopresendetsandawinputeb-form.fields.ItAuserincludescanaenterdescriptioninputofthesequencesalgorithm,andcon-a
structanalgebraproductfrommultipledrop-downboxes.Theresultingprogram
isexecutedontheserverandtheresultsareshowntotheuser.Figure7.1showsa
example.anofscreenshotThepurposeofGAPPagesistoprovideaplatform,wherepracticalaspectsof
aloGAP-Lcalmacandhine.ADPThecanbeaccessiblestudiedalgebrwithoutaprotheductneedtoselectionsinstallshouldthefullinspirecompileruserstoon
expsionsoferimenwtellwithknowndifferendynamictalgebraproprogrammingducts.Inalgorithmsaddition,maytheshowpresenhowtedtouseGAP-Lvcertainer-
GAP-Lconstructsinpracticeandactasastartingpointforowndevelopments.
Behindthescenes,aserverprocessgeneratesallpossiblealgebraproductsand
pre-compilesthemwithGAP-C.Thisspeedsupthewebinterfaceandreducesthe
loadonthewebserver.ApartoftheGAPPagesconceptistoinspecttheuser
expselectiononentialofrunalgebrastimeandandaninputinpusetquenlengthceabandovtoeaissuethresholdawisarningeniftered.aproForductcertainwith
proproductducts,whicthishcouldincludesbetheimplemencounttedinalgebra,thebserveforeersoftthewenarebteredyexecutingalgebraapromoductdifiedis
executed.Thecountalgebracountsthecandidatesearchspaceofthealgebraand
GAP-Csupportstheautomaticgenerationofacount-algebraforeveryGAP-L
grammar(Section4.5.5).Ifarunwiththecountalgebrareturnsanumberabove
athatthreshold,startsanwithanappropriateenumerativerrorealgebra,messageisthepresenlistofted.answForersisexample,oftheforasizeproofductthe
.paceshsearcApartfromthis,theserversoftwarehastomonitortheresourceuseofthecom-
piledGAP-Lprograms.Forexample,aninputsequencecouldinducealotof
Afterco-optimalareasonablecandidates,thresholdwhichofhavoutputetoblinesesenandtbacrunktimetothetheservusererofsoftthewwareeb-page.should
truncatetheoutputanddisplayahelpfulexplanatorywarningmessage.

135

Figure7.1:ScreenshotofGAPPages.AsanexampleGAP-Lprogramthelocal
sequencealignmentalgorithmisshown.

BiBiServ7.1

TheGAPPagesshouldbeintegratedintotheBiBiServ[45].TheBiBiServisaweb-
sitethatprovidesseveralbioinformatictoolsasweb-servicesandweb-forms.An
examplesetofapplicationsontheBiBiServisRNAStudio[46].TheBiBiServexists
since1996andisactivelymaintainedandimproved.Integratingaweb-versionof
atoolintotheBiBiServguaranteesastableinternetaddressandstablesupportof
thetheunderlyiBiBiServngsoftframewwareork,andlikehardwe.g.macarehineinfrastructure.readabletoolAlso,descriptions,genericimprocomevemeforntsfree.of
Basicfirst-levelsupportquestionsarefilteredoutandansweredbytheBiBiServ
team.SinceOnlythehigherBiBiServleviselisswrittenuesareinJaforwvaardedandtomaktheestoheaolvyauusethors.ofJavaapplication
servertechnologies,GAPPagesisimplementedinJavaforbetterintegration.The
webinterfaceisimplementedusingtheJavaServerFaces(JSF2.0)API.Thismakes
iteasytointegratedynamicchangesintheweb-forms,likeforexampledisplaying
theavailableoutput,withoutacompletereloadofthepage.Adisadvantageof
usingprograms.JavaisAlso,thatitdoexecutingesnotproexternalvideanprogramsAPIforfromtheJavamonitoringhasaofnoticcalledeablerunexterntimeal
1.erheadvo

10.2to0.5secondsunderSolaris10usingOracleJava

136

srkBenchma8

TotestthepracticaloverallefficiencyofthecodegeneratedbyGAP-C,several
GAP-Lprogramsofdifferentsizesarebenchmarkedinthefollowing.Theimple-
mentedalgorithmsarewell-knownbioinformaticsRNAfoldingalgorithms.For
comparison,theGAP-Lversionisbenchmarkedagainsttheoriginalimplementa-
tion,aHaskell-ADPversionandanADPC-ADPversion,whereavailable.
BenchmarkresultsofthegeneratedparallelizedcodearepresentedinSection
5.4.2,benchmarkscomparingtop-down-stylevs.bottom-up-stylecodegeneration
areshowninSection5.4.1.3andefficiencyresultsrunningtheGAP-Lprogramwith
tableconfigurationcomputedbytheheuristictabledesignalgorithmarepresented
inSection5.3.7.3.Theerrorsduringstochasticbacktracingofshapestringsunder
apartitionfunctionalgebraincomparisonwiththeexactcomputationareshown
5.4.3.1.SectioninAllbenchmarksinthischapterwererununderaDebianLinux5.0systemwith
defaultpackageversions,especiallyGNUGCC4.3.2forcompilingCandC++code
andGHC6.8.2forcompilingHaskell-ADPcode.Haskell-ADPcodewascompiled
withoptimizationflags-O2andC/C++codewith-O3,unlessthedefaultwassetto
-O2.TheGNUGCCandtheGHCproduced64bitcode.Thetableconfigurations
oftheGAP-LversionswereautomaticallyderivedbytheGAP-C.Thehardware
consistsofaAMDAthlon64X2DualCore5200+(2.6GHz,cachesizeof2times1
MB)and4GBRAMmainmemory.Thebenchmarkrunswereallsingle-threaded.
Ineachbenchmark,asequenceofrandomlygeneratedRNAsequenceswereused
asinputs.Thesequenceswererandomlygeneratedunderauniformdistribution
oflengthsandnucleotides.Inthecomparisonofdifferentimplementationsofan
algorithmeveryimplementationwascalledwiththesamerandomsequenceofse-
quences.Runtimesoflessthenonesecondandruntimesoverathresholdof10minutes
areexcludedfromtheplots,unlessotherwisestated.Likewisedataareexcluded
fromprograms,wherethememoryusagehitsthethresholdoftheGHCruntime
garbagecollection.Themeasuredmemorysizesarethehigh-watermarksofthe
actuallyusedmemory(ResidentSetSize–RSS).
Notethatthemaximalsequencelengthusedinthebenchmarksischosento
showtheprincipalscalingbehaviorofthegeneratedDPcodeanddoesnotnec-
essarilymeanthataconcreteprogramisusuallyusedinpracticewiththatlarge
sequences.Forexample,inbiologypracticecomputingthesecondarystructure
withtheminimumfreeenergy(MFE)foranRNAsequencebecomeslessusefulfor
largersequencesgreaterthan400basesbecauseofaccumulatingerrors.Ingeneral,
dependingonthesortofRNAsequencesunderexamination,thesequencelengths

137

vary,e.g.sequencesofupto2000basesmaybeconsideredwhilesearchingfortarget
sites,andotherclassesofRNAonlycontainverysmallsequences,e.g.uptoafew
bases.100

RNAfold8.1

RNAfold[25]computestheoptimalsecondarystructureofanRNAinputsequence
underaminimumfreeenergy(MFE)model.ItisaO(n3)algorithmwhichismanu-
allyimplementedinC.Theunderlyingmatrixrecurrencesdescribesemanticambi-
guityfreeunderthecanonicalViennaStringcandidatenotationthesearchspaceof
candidatestructures.Semanticnon-ambiguity,asdefinedin[19],meansthateach
candidatehasauniqueViennaString.JensReedertranslatedtheRNAfoldrecur-
rences(assumingdanglingmode-d2andforbiddinglonelybasepairings-noLP)
intoanADPgrammarandanmfealgebrainHaskell-ADP.Thisversionwastrans-
latedtoADPC-ADPandtoGAP-L.Itisavailableasexamplegrammarbothwith
GAP-C.andADPCtheRNAfolddoesnotprintco-optimalcandidates;thisispossiblewithRNAsubopt
[57].Besidesco-optimalbacktracingitsupportssuboptimalbacktracing.
Asbenchmark,theruntimesandmemoryusagesofRNAfold,RNAsubopt,a
Haskell-ADPversion,anADPC-ADPCversionoftheRNAfoldgrammarandtwo
GAP-LversionsoftheRNAfoldgrammararecompared.RNAfoldwasrunwiththe
options-d2-noLP,RNAsuboptwasrunwith-e0-d2-noLP(onlyco-optimal
candidates)andtheADPC-ADPversionwascompiledwithADPC0.8andrun
with-e0(onlyco-optimalcandidates).OneGAP-Lversionwascompiledwithco-
optimalbacktracingenabledandtheotherwithsingle-optimalbacktracing.The
fourADPbasedversionsusedthealgebraproductmfe∙pretty,wherethepretty
algebrageneratesaViennaStringrepresentationofacandidate.Forbenchmarking
50randomlygeneratedsequenceswereusedinthelengthintervalof[100;4000].
Figure8.1showstheruntimeandmemoryresultsforthedifferentRNAfoldver-
sion.TheruntimesofRNAfoldandRNAsuboptaremostlyclosetogetherandfor
largerinputlengthmostly16timesfasterthantheADPCandGAPCversions.
TheruntimeofRNAsuboptincreasesupto7timesincomparisontoRNAfoldin
caseswherealotofco-optimalcandidatesexistandarebacktraced.Theruntime
oftheHaskell-ADPversionis200timesoftheruntimeoftheGAP-Lversionand
moreforsmallsequencesunder1000nucleotides.Intheserangetheruntimesof
thenon-Haskellversionsaremostlyunder1secondandnotplotted.Forlarger
sequencestheHaskell-ADPversionisoutofmemory.
TheruntimeoftheADPC-ADPversionusuallyisclosetotheruntimeofthe
GAP-Cversions.Forsomesequencestherearesomedramaticdeviations.The
outputoftheADPC-ADPversionindicatessomeproblemswiththebacktracing.
Forsomeinputsduringbacktracingtherearealotofcandidateswhichareprinted
severaltimes.Forthecompletebenchmarkrun,everycandidateisprinted30times
duringbacktracingonaverage.Sincethegrammarissemanticallyunambiguous,

138

410

GAPCADPCellHaskconoGAPCptoRNAsubRNAfold

610

5(kb)10RSS410mem

GAPCADPCellHaskconoGAPCoptRNAsubRNAfold

3105102(s)timerun(kb)RSSmem104
1011031001001,0002,0003,0004,00001,0002,0003,0004,000
nn(a)RNAfoldruntime(b)RNAfoldmemory

Figure8.1:RuntimeandmemoryusageofvariousversionsofthebasicRNAfold
algorithmasafunctionofinputlengthn.

thislookslikeabuginthebacktracingcodegeneratedbytheADPC.
ThememoryusageoftheGAP-Lversionsisusuallysmallerthanthememory
usagetracingofhasthelargerADPC-ADPmemoryveusagesrsion.forOnlysomethesequencesGAP-Lvthatersionhavewithalotofco-optimalco-optimalback-
allcandidates,co-optimalbbacecausektracestheinaco-optimalbacdata-structure,ktracingbcoeforedepringeneratedtingthem.byThisGAP-Ciscollectsuseful,
iftheRNAsuboptgenerateddirectlycodeprisinintsterfacedfinishedbybacexternalktracescoanddefrethatestheirinspectsmemorythe.backtraces.
notThedifferrunmuctimesh.ofThistheisGAP-Lquiteco-bacdifferentktracingfromtheandruntimenon-co-baccomparisonktracingvofersionsRNAfolddo
opt.RNAsubandThespeedupofthemanuallyinCimplementedRNAfoldversionscomparedto
usetheofacompiledcompilerADPvmeansersionspayingshowsanthatabstractiontheyarepenalthighlyy,boptimized.ecausetheInthatcompilercasedothees
notrecognizeallchancesofMFE-dependentlow-leveloptimization,asahuman
programmerdoes.Ontheotherhand,onecanarguethatthedevelopmenttime
ofacomparisonbug-freetoandimplemensufficientingtlytheefficienretcurrencesversionbyusinghand.GAP-Lisgreatlyreducedin

matchersdynamicThermo8.2ofRapidShapinputessequences[27]isausingbioinformaticsthermodyntoamiolscthatmatcherscomputes(TDMs).exactInshapaepreproprobabilitiescessing

139

GAPCShape1HaskellShape1
GAPCShape2HaskellShape2
GAPCShape3HaskellShape3

610(kb)RSS510mem410

GAPCShape1HaskellShape1
GAPCShape2HaskellShape2
GAPCShape3HaskellShape3

3106102105(s)timerun101(kb)RSSmem4
10101002004006008001,00002004006008001,000
nn(a)TDMruntime(b)TDMmemory
Figure8.2:BenchmarkresultsofTDMs,i.e.runtimeandmemoryRSSasafunction
ofinputlengthnforthreedifferentshapes(43non-terminals,95non-
terminalsand195non-terminals).EachTDMwascreatedasHaskell-
ersion.vGAP-LandADP

phaseitapproximatesshapeprobabilitiesunderanRNAshapesgrammar.Heuris-
ticallythehighestscoredshapesareselectedandforeachshapeaTDMisgenerated
andrun.InthatcaseaTDMisanADPgrammarthatdescribestheRNAstructure
foldingspaceofexactlyonegivenshapeandagenericpartitionfunctionalgebra.
TheruntimeofeachTDMisinO(n3).Tocomputetheexactshapeprobabilityofa
givenshape,thepartitionfunctionvalueusingthecorrespondingTDMandthepar-
titionfunctionvalue,usingthegeneric,all-shapes-allowingRNAshapesgrammar
arecomputedunderthepartitionfunctionalgebra.
RapidShapessupportsthegenerationofTDMsasHaskell-ADPorGAP-Lpro-
grams.InpracticeRapidShapesgeneratesGAP-Lprograms.
Inthebenchmark,theruntimeandmemoryusageoftheHaskell-ADPandthe
GAP-LversionswerecomparedforthreeTDMs,whichmatchvariousshapes.The
firstTDMgrammarcontains43non-terminals,thesecond95non-terminals,and
thethird195non-terminals.Eachprogramwascalledforeachinputfromasetof
50randomlygeneratedinputsequencesoflength50to1000.
Figure8.2showstheresults.ForallshapestheHaskell-ADPTDMversionsare
out-of-memoryforshortsequencesofsizegreaterthan200bases.Theruntimeof
theHaskell-ADPversionsareupto200timesoftheruntimeoftheGAP-Lver-
sions.TheresultsshowthatcompilingtheTDM-grammarsviaGAP-Cjustenables
theiruseinpractice.WithoutthecompilationtoefficientcodetheRapidShapes
approachwouldnotscale.

140

ThegenerationofTDMshasamanageablelevelofcomplexityatthelevelof
treegrammarsandalgebras.OnecanimaginethatthecomplexitylevelofTDM
generationincreasesatthelowerlevelofmatrixrecurrences.Writinganddebug-
gingaTDMgeneratoroutsideofADPisexpectedtosignificantlyincreasethe
time.telopmendev

esRNAshap8.3RNAshapes[55]providesseveralshaperelatedoptimizationalgorithms.Forbench-
markingtheexactcomputationofshapeprobabilitiesisselected.InGAP-Lthisis
thecomputationofthealgebraproductshape∙pfunc.Foreachshapeoftheshape
searchspacethepartitionfunctioniscomputed.Thistaskischallenging,because
the5.4.3.1numforberaofshapdiscussion).esgrowsTheexprunonentimeoftiallythewiththealgorithmlengthisinofO(theαnn3input),where(seeαSection>1.
Theprogramshavetouseefficientdatastructurestostoreandaccessalargenum-
berofshapeclasses.SeeSection6.3forthedescriptionoftheshapedatatype.For
efficienclassifyingthashprotableducts,implemenGAP-CtationgeneratestailoredforoptimizedGAP-Lcodethatprogramsusesisparthashoftables.GAP-AnM.
InthisbenchmarktheRNAshapesversion2.1.6wasused.Itwasrunwiththe
options-p-F0,whichdisablestheheuristicfilteringofshapeclasseswithlow
probabilitiesduringthecomputation.Bydefault,RNAshapesusestheshapelevel
5,andthisiswhattheGAP-Lversionsused.AGAP-LversionoftheRNAshapes
grammarwasranunderthealgebraproductshape∙pfunc.Toshowtheimpact
ofthepartitionfunctioncomputationasecondGAP-Lversionthatcomputesjust
thealgebrashapewasrun.Eachversionwascalledfor100randomlygenerated
sequencesinthelengthintervalof]20;140[.
isfromFigure28.3to6showstimesthefasterbencthanhmarktheresults.runtimeTheofrunRNAshaptimeofes,thedepGAP-Lendingvonersionthe
inputsequence.TheGAP-LversionismorememoryefficientthanRNAshapes.
bSomeecauselargernotenoughsequencesmemorycouldnotwasbeavailable.computedWithwiththeRNAshapGAP-Lesvonersionthatthecomputer,exact
shapeprobabilitiesofthesesequencescouldbecomputedwithinthe4GBofRAM.
DependingontheinputsequencethememoryusageofRNAshapeswas2to10
timesthatoftheGAP-Lversion.

pknotsRG8.4pknotsRG[40]isanRNAsecondarystructurepredictionprogramthatalsoallows
atherestricalgorithmtedclassisinofO(npseud4).o-knotsTheinalgorithmitswstructureasdevsearcelophedspace.withADPTherunwithtimesomeof
indexdiscussionaccessandextethensionsGAP-Ltoallosynwtax).forInthepseudo-knotbenchmarkstructuresthepro(seeductSectionmfe∙pr5.4.5ettyforis
computed.

141

310

210(s)timerun110

esRNAshapGAPCno-pfGAPC

10080100120140
ntimerunesRNAshap(a)

610(kb)510RSSmem410

GAPCGAPCno-pfRNAshapes

31015010050nmemoryesRNAshap(b)

Figure8.3:BencnationhmarkwithofthetheRNAshshapeap∙espfuncgrammar,classificationi.e.runtimealgebraandpromeductmoryinRSScombi-as
afunctionofinputlengthnfortheRNAshapesprogramandaGAP-L
ersion.v

Thebenchmarkcomparestheruntimeandthememoryusageoftheoriginal
1.3Haskandaell-ADPGAP-LversionversionoftheofthepknotsRpknotsGRGalgorithm,grammartheandpknotsRalgebra.GTheprogramvpknotsRersionG
vprogramersionandconthetainsCpknotsRcodeGthatprogramwasbothgenerateddobacbythektracingADPCwhilecompiler.ignoringTheco-optimalGAP-L
turnedcandidates.offduringThepbencknotsRGhmarking.programAllvproersionsvideswsubereoptimalrunwithbacthektracing,samesetwhichofw100as
randomlygeneratedsequencesfromthelengthintervalof]10;1000[.
ADPFigureversion8.4isshoupwstothe100resultstimesofthethebrunenctimehmarkoftheruns.otherThevrunersions.timeofFortheHasksequencesell-
largerthan250basestheHaskell-Versionisoutofmemory.TheGAP-Lversion
hasmemoryarunusagetimesofptheeedupofpknots2RtoG3timesprogramiscompared2timestothattheofpknotsRtheGGAP-Lprogram.versionThore
morepknotsRforGmostprogramsequences.islessOnlythanfor2atimesfewthatlargerofthesequencesmemorytheusagememoryoftheusageofGAP-Lthe
ersion.v

142

310

2(s)10timerun110

ADPCGAPCHASKELL

01002004006008001,000
n(a)timerunGpknotsR

510(kb)RSS410mem

ADPCGAPCHASKELL

31002004006008001,000
nmemGpknotsR(b)

Figure8.4:BeRSSncashmarkafunctionresultsofoftheinputpknotsRlengthGnaprogram,forai.e.Haskruntimeell-ADPandversionmemory,a
ADPCversionandaGAP-LversionofthepknotsRGgrammar.Input
were100randomgeneratedsequences.

143

Conclusion9

Bellman’sGAPsimplifiesthedevelopmentofdynamicprogrammingalgorithms
sequences.ervoEspeciallyforlargescaleDPalgorithmsBellman’sGAPreducesthedevelopment
time.AnexampleisthetoolRapidShapesthatgeneratesGAP-Lgrammarswith
uptoafewhundrednon-terminals.Attheleveloftreegrammarsthecomplexity
ofcode-generationinRapidShapesismanageableincomparisontolow-levelmatrix
8.2).Section(seerecurrencesBenchmarksshowthatthenovelheuristicTableDesignalgorithmforderivinga
goodtableconfigurationthattakesconstantfactorsintoaccountyieldsgoodresults
inpractice.Inthetestedcasesthetableconfigurationheuristicallyderivedleadsto
abetterpracticalruntimethanthedomainexpertchoicesorexpertsystems.The
recentRapidShapesgrammargeneratorbydefaultusestheautomatictabledesign
ofGAP-Cbecauseoftheseresultsandthisdesignchoicesimplifiesthegenerating
5.3.7.3).(SectioncessproThenoveldomainspecificlanguageforADPGAP-Lsupportsgeneralmulti-track
DP,lexicographicproductsfromgroundupandthenewcartesianandinterleaved
product.Itgeneralizestheconceptofsimplesyntacticgrammarfilteringandin-
troducessemanticfilteringingrammarandinstanceconstructs.AdvancedDP
features,likee.g.samplingorfilteringofcandidates,areavailableviaorthogonal
languageconstructs,suchthattheneedoferror-proneandtediousmanualhack-
ingofthesefeaturesiseliminated.Theuseofestablishedlanguageconceptsand
syntaxelementsfromwidely-usedlanguageslikeJavaandCmakesGAP-Lmore
easilyaccessibletonewADPusers(Chapter4).
ThenovelGAP-CcompilerwhichtranslatesGAP-LprogramstooptimizedC++
code,providesstateofthearterrorandwarningsupportduringparsingandatype-
checkerforbetterusability.Benchmarkresultsshowthatthegeneratedcodescales
wellonparallelsharedmemoryarchitectures(Section5.4.2).Thecode-generation
formulti-trackGAP-Lprogramsprofitsfromanimprovedtabledimensionanalysis
andCYK-loopoptimization.Performanceoptimizationsforthesingle-trackcase
aregeneralizedforthemulti-trackcasesuchthatmulti-trackoptimizations,likee.g.
inthepairwise-sequencealignmentexample,areobtainedasasideeffect(Section
5.3.6andSection5.4.1.6).Thegeneratedcodeisdependable,asthetestresults
ofsamplingandfilteringcodeshowandtheerrorsarelessthanincomparednon-
GAP-Lprogramversions(Section5.4.3.1andSection4.6.3).Theoverallruntime
andmemoryusageperformanceofthegeneratedcodebyGAP-CincasesofADP
programsthatarealsocompilablewiththeADPCisequalorseveraltimesbetter
thantheresultsfromADPCversions.ComparingtheperformancetoHaskell-

144

ADPversions,theGAP-Cversionsshowhugespeedupsandinseveralcasesthe
Haskell-ADPversionrunsoutofspaceorshowaprohibitivelylongruntimefor
relativelysmallinputs.ThustheavailabilityofGAP-Cenablesthepracticalusage
orimprovesthepracticalvalueofADPalgorithmsinseveralcases(Chapter8).
Duringthestudyofalternativeevaluatingschemesingeneratedyieldparsing
codeatop-downschemeshowsadvantagesforacertainkindofproductsandgram-
mars(Section5.4.1).Incaseswheresparsenesscanbeexploitedandusingsubstan-
tialproductstwofoldruntimeandmemoryimprovementsarepossible.Top-down
parsingisavailableinGAP-Cinadditiontoabottom-upevaluationscheme.
InSection2.2.3twoclassificationschemesforalgebrasandproductsareintro-
duced.Theirbenefitistwofold.Theyallowtoestablishanddiscussproductprop-
ertiesonagenerallevel,likee.g.thelexicographicproductoftwounaryselective
algebrassatisfiesBellman’sPrinciple,andGAP-Cusestherolesinternallyforop-
timizationandwarning-generationpurposes.
ThedescriptionofselectedmodulesofGAP-M,theruntimelibraryofGAP-
Cshowsthatusinghigh-levelabstractdata-typesinthecode-generationofthe
compilerallowsforgreaterflexibilityinoptimizingtheconcreteimplementationfor
acertainoutputlanguageandreducesthecomplexityofthecompiler(Section5.1
6).rteChapand

145

okOutlo10

ThereregardingaretheseveralopexploitationentopicsofinfurthertheADPgeneralizframewationorkandandintheoptimizationcompilationpossibofilities.ADP

10.1SparseADP

ArithmsparsethatDPalgorithmsystematicallyisaomitsfunctionallypartsoftheequivsearcalenhtspvacarianeduteoftoanconexistingstraintsDPimpliedalgo-
bysparsethepropproblemertiesofdomain.theTheproblemprocessspacetomoaredifyanexploitedexistingiscalledDPalgorithmsparsification.suchthatDe-
pimproendingvedonrunthetimeinproblemtheaavsparseerage-casealgoriinthmvcomparisonersionmatoytheachievenon-sparseanone.asymptoticallyThere
aredifferentkindsofsparsenessthatcanbeexploitedindynamicprogramming.
Aformofsparsenessisintroducedbythedata-flowoftheprogram.Forexample
consideraGAP-Lprogram,whereinatop-downevaluationschemefilteringcon-
structsinthegrammarorimplicityieldsizeconstraintsmayprunelargepartsof
thesearch-spacedependingontheinput(seeSection5.4.1.3).
Anotherformofsparsenesscanbeexploiteddoingapre-processingofthein-
putasalgorithms.describForedinexample,[14,15,in16],ADPfold,whichtheincludesGAP-LsevveralersionofexamplestheofsstandardparsifiedO(nDP3)
MFERNAfoldingalgorithm,thereexistseveralrighthandsidesofnon-terminals
bthatoundariesinduceaaremorespvingonsibleindexforboneoundaryO(n)inthefactorofgeneratethedovcoede.rallrunThesetime.moDepvingendingindex
boneretheducedstructureinofpractice.therighFigurethand10.1side,showsthetwnoumberexampleofrules.consideredSinceboundariesnon-terminalcan
whereclosedtheisfilterguardedresultbyaatfilter,theonlyreferencedthosenon-indextebrminaloundariesistrue.needIntoanbeOc(onn2)sidertimeed
andspacepre-processingoftheinputsequenceadata-structurecanbeconstructed
thatcontainsallsuccessfulboundariesforeach(i,j).Thus,whenevaluatingthe
movingindexboundariesthedata-structureisusedtoiterateonlyoverarestricted
setofboundaries.Suchamodificationisnotheuristic,andthealgorithmisstill
guaranteedtoreturnthesameoptimalresultsastheoriginalalgorithm.
Amoreaggressivepre-processingcanleadtomoresparsenessandanapprox-
imativpknotsRevGareersionofpresentheted,originalwhereanalgorithm.innerloopFisorreplacedexample,byina[lo39]opthatsparseviteratesersionsonlyof
aovercertainindexbthreshold.oundariesThethatruninducetimeofbasethepairingsoriginalwithpknotsRabaseGpairalgorithmprobabilitisinyOab(no4v)e,

146

struct=sadd(BASE,struct)|
cadd(dangle,struct)|
nil(EMPTY)#h;

dangle=dlr(LOC,closed,LOC);

closed={stack|hairpin|leftB|rightB|iloop|
multiloop}
withstackpairing#h;

Figure10.1:ExampleofasourceofsparsenessinaGAP-Lgrammar.Therules
aresecondpartalteofthrnativeeofADPfoldnon-terminalgrammar.Thestructmoonlyvingneeindexdstoboundaryconsiderofthosethe
boundarieswherethestackpairingfilteristrue.

thecomputationofallbasepairprobabilitiesisinO(n3)andtheapproximative
pknotsRGversionisinO(n3).
Besidespre-processing,additionalsparsenesscanbeexploitedusingadditional
bookkeepingduringruntimeofthealgorithm.In[56]asparsifiedversionofthe
RNAfoldalgorithmispresentedthathasanaveragecaseruntimeofO(n2ψ(n))
whereψconvergesagainstaconstant.Thebasicmatrixrecurrencesarearranged
inpropsucerthyaiswausedyinthatthethesparsetriangleversioninequalittoskipytheholdsforiterationtheofindexcomputedbvoundariesalues.thatThis
cannotimprovetheresult,i.e.itisiteratedintheinnerlooponlyoveraconstant
nandumb[3]erofconcenboutratesndaries.onthepknotsRspaceGwusageasimprosparsifiedvementsusinginaRNAsimilarfoldingtechniqueexploiting[34]
sparseness.Thus,reviewingthestateofsparsenessinDP,generalsupportforautomatic
sparsificationofGAP-LprogramsinGAP-Cisapromisingresearchtopic.Aspar-
sificationanalysisinGAP-Ccouldanalyzetheusedproductandidentifycertain
scorealgebras,ifconditionsneededfordifferentkindofsparsenessexploitations
aresatisfied.Alternatively,ausercouldmanuallymarkascoringalgebraassatis-
fyingsuchconditions,ifcomplicatedscoringfunctionsareused.Whenanalysing
thegrammar,movingindexboundariesthatreferenceguardednon-terminalscould
beautomaticallyidentifiedandpre-processingsparsenessoptimizationcouldbe
generalizedfordifferentkindoffiltersandimplicityieldsizeconstraints.Thecode-
generationcouldusetheresultofsparsenessrelatedanalysestogeneratespecialized
sparsenessexploitingtargetcode.Ideally,inthedevelopmentofanewDPalgo-
rithminGAP-LanautomaticsparsificationfeatureinGAP-Cwouldliberatethe
GAP-Lprogrammerfrommanuallyimplementinglow-levelsparsificationversions.
Insteadonlyarecompilationwouldbeneeded,similartotheautomatictabledesign
feature,whichliberatesthedeveloperfromtediousmanualtabulationconcerns.

147

10.2KnapsackstyleDPalgorithms
ADPisaformalframeworktodevelopdynamicprogrammingalgorithmsoverse-
quences.Itstartedwithsingle-tracksupportandBellman’sGAPimplementation
ofADPgeneralizesittoincludefullmulti-tracksupport.Lookingattextbook
styledynamicprogrammingalgorithmsonsequences,theKnapsackalgorithm[8]
presentsachallengetoimplementitintheADPframework.
TheKnapsackalgorithmrunsinpseudo-polytimeandsolvestheoptimization
problemthat:withasetofobjectsandtheirweightswiandvaluesmithemost
valuablesubsetwhichfitsintoaweightrestrictedbackpack(≤wmax)iscomputed.
Equation10.1specifiesthematrixrecurrencesofthisalgorithm.
Ki,j−1ifj>0
(wj,mj)ifi≥wj
Ki,j=sndmaxKi−wj,j−1+(wj,mj)ifi≥wj,j>0(10.1)
0),(0TotransformthematrixrecurrencesintoGAP-Lwemodelitasatwo-track
GAP-Lprogram.Thefirsttrackisthestringuofincreasingweighs,withu=
0...wmax,andthesecondtrackisthestringvofweightandvaluetuples,withv=
(w1,m1)...(wn,mn).Grammarsackisafirstdraftofthesearchspacedescription:
grammarsackusesBill(axiom=knap){

knap=ins(sack,<WEIGHT,CHAR>)|
start(<WEIGHT,CHAR>)|
skip(<EMPTY,CHAR>)#h;
}AnimaginativeterminalparserWEIGHTtakescaretoparseasmuchofuasthe
currentitemconsumedbytheCHARterminalparserofthesecondtrackweights.
SuchaWEIGHTterminalparseriscurrentlynotpossibleinGAP-Lsincethereis
nolanguageconceptofuser-definedterminalparsersandterminalparserscannot
interactbetweendifferenttracksofamulti-trackprogram.TheGAP-Chasno
conceptofaterminalparserthathasavariableyieldsizewhichisconstantfora
sub-wordoftheothertrackwhichwouldleadtoanadditionalunneededinnerloop
inthegeneratedtargetcode.
TheproblemwiththeimplementationoftheKnapsackandsimilarstylealgo-
rithmsinADPisthatakeyconceptofADPistheseparationofthesearchspace
description(treegrammar)andtheevaluationofcandidates(algebra).Inthe
Knapsackalgorithmtheseaspectsareinherentlyintermixed.
OnesolutionistoextendGAP-Ltoallowforthedefinitionofnewterminal
parsers.Thenewuser-definedterminalparserdefinitionsyntaxhastoprovidethe
possibilitytodefineamulti-trackterminalparserandtospecifytheyieldsizeof
onetrackindependenceoftheothertrack.Inparticular,thesyntaxhastoallow

148

forconstanthetpforossibiliteachytosub-windicateordofthatthethesecondyieldtracsizek.ofThistheisfirstthetrackispreconditionvariable,forbuta
thefuturemovingsemanbticoundaryanalysisontheextensionfirsttracofkofGAP-Ctheinswhichwalternativouldeoftheautomaticallyknapgrammareliminate
rule.alloThwus,thetheimplemenextensiontationofofGAP-LtheclassandofGAP-CKnapsacink-liktheedraftedalgorithmswayfromisfieldsfeasibleliktoe
h.researcerationalop

reesToverADP10.3TheandinADPGAP-Lframewitisorkwasgeneralizeddevelopforedmforultipledynamicinputstringsprogramming(Multi-ToverraconekinputDP).Hostringw-
ever,themethodofdynamicprogrammingisnotrestrictedtostringinputs.An-
otherclassofdynamicprogrammingalgorithmsareDPalgorithmsovertrees.
AnexampleisRNAforester[26],whichcomputestheoptimaltreealignment
bspetwecialeenpurptwoosetreescomorbinatorsforests.definedIntheinHaskdescrelliptionwhichofaretheusedalgorithmforaHaskthereellvareersion.also
ForGeneralizingefficiencyreasonsADPtothetreeandRNAforesterforestinputsprogramwisouldmanuallysimplifytheimplemendevtedelopmenintC++.of
newdynamicprogrammingalgorithmsovertrees,e.g.anRNAforestervariantthat
supportsaffinegap-costsorothermodifications.However,thedesignofADPover
treesprovidesseveralchallenges.
generalizeWhatisthetheappropriatetree-grammargrammarconceptfromdeviceforsinglesearchsequencespaceADP?description?TheyieldHowoftoa
acandidategrammartreeformalismshouldbhasethetobeinputsufficientreeortalforest.practicalAntoASbeCIIusefulrepresenfortationprogramming.ofsuch
Whatareusefulpatternmatchingmechanismsinagrammarthatarepowerful,
easytouseandmanageabletooptimizeinacompiler?
AnotherchallengeistheresearchofothertreeDPalgorithms,todesignADP
overtreesinsuchawaythatitisnotonlyapplicabletotreealignmentproblems.2
spaceAnalogousalgorithmtotheispairwisealreadyansequenceoptimisationalignmenofttheproblem,generalwhereO(n4th)etwo-tracstandardkscO(nheme)
(seeSection5.4.1.6),onecouldstudythegeneralstructureoftwo-tracktreeDP
algorithms.Also,itwouldbeinstructivetostudy“single-track”treeDPalgorithms,
i.e.algorithmsthattakeonlyonetreeorforestasinput,fordesigningADPover
trees.sevIneralgeneral,oppADPortunitiesovertodevtreeseloppronovidesvelsevcompilereralopopentilanguagemizations.designquestionsand

149

Bibliography

[1]MohamedI.Abouelhoda,RobertGiegerich,BehshadBehzadi,andJean-Marc
SteyApproacaert.h.InAlignmenAsiatofPacificMinisatelliteBioinformaticsMaps:AConferMinimencumeKyoto,SpanningJapTreean,14-17based
4.6.42008.,2008January

[2]scaleGeneAmdahl.computingValiditcapabilities.yoftheInsingleAFIPSproConfercessorenceapproacProhceetodingsac,hievingvolumelarge-30,
pages483––485,1967.Availablefrom:http://www-inst.eecs.berkeley.
14.edu/~n252/paper/Amdahl.pdf

[3]RolfBackofen,DekelTsur,ShayZakov,andMichalZiv-Ukelson.Sparse
RNAfolding:Timeandspaceefficientalgorithms.InGregory
KucherovandEskoUkkonen,editors,Proceedingsofthe20thSym-
pNotesosiuminonComputerCombinatorialScience,PatternpagesMatching249–262.,vSpringer,olume55772009.ofAvLeailablecture
freiburg.de/Publications/backofen09:http://www.bioinf.uni-from:_spars_rna_foldin.pdf,doi:10.1007/978-3-642-02441-2_22.10.1

[4]J.K.Baker.Trainablegrammarsforspeechrecognition.TheJournalof
theAcousticalSocietyofAmerica,65(S1):S132–S132,1979.doi:10.1121/1.
5.4.3.1.2017061

[5]RichardE.Bellman.DynamicProgramming.PrincetonUniversityPress,1957.
2.11,

[6]EwanBirneyandRichardDurbin.Dynamite:Aflexiblecodegeneratinglan-
guagefordynamicprogrammingmethodsusedinsequencecomaprison.InPro-
ceedingsofthe5thInternationalConferenceonIntelligentSystemsforMolecu-
larBiology,pages56–64,1997.Availablefrom:http://citeseerx.ist.psu.
1.3.1.edu/viewdoc/download?doi=10.1.1.49.6539&rep=rep1&type=pdf

[7]OpenMPArchitectureReviewBoard.OpenMPapplicationprograminterface
version3.0.Technicalreport,OpenMPArchitectureReviewBoard,2008.
Availablefrom:http://www.openmp.org/mp-documents/spec30.pdf.5.4.2

[8]IntrThomasoductionH.toCormen,AlgorithmsCharles.E.MITLeiserson,Press,2001.Ronald1,L.10.2Rivest,andCliffordStein.

150

[9]DuncanCoutts,DonStewart,andRomanLeshchinskiy.Rewritinghaskell
strings.InPracticalAspectsofDeclarativeLanguages8thInternationalSym-
posium,PADL2007,pages50–64.Springer-Verlag,January2007.Available
3.1.http://www.cse.unsw.edu.au/~dons/papers/fusion.pdffrom:[10]Boostdevelopers.Boost—freepeer-reviewedportablec++sourcelibraries,
2010.Availablefrom:http://www.boost.org/.6.1
[11]YeDingandCharlesE.Lawrence.Astatisticalsamplingalgorithmfor
RNAsecondarystructureprediction.NucleicAcidsResearch,31(24):7280–
7301,December2003.Availablefrom:http://nar.oxfordjournals.org/
5.4.3.1.doi:10.1093/nar/gkg938,cgi/reprint/31/24/7280.pdf[12]RichardDurbin,SeanR.Eddy,AndersKrogh,andGraemeMitchison.Bio-
logicalsequenceanalysis.Cambridge,1998.1.2,2.1.1.1
[13]JasonEisner,EricGoldlust,andNoahA.Smith.Compilingcompling:
WeighteddynamicprogrammingandtheDynalanguage.InProceedingsofHu-
manLanguageTechnologyConferenceandConferenceonEmpiricalMethods
inNaturalLanguageProcessing(HLT-EMNLP),pages281–290,Vancouver,
October2005.Availablefrom:http://cs.jhu.edu/~jason/papers/eisner+
1.3.3.goldlust+smith.emnlp05.pdf[14]DavidEppstein,ZviGalil,RaffaeleGiancarlo,andGiuseppeF.Italiano.Sparse
dynamicprogramming.InSODA’90:ProceedingsofthefirstannualACM-
SIAMsymposiumonDiscretealgorithms,pages513–522,Philadelphia,PA,
USA,1990.SocietyforIndustrialandAppliedMathematics.10.1
[15]DavidEppstein,ZviGalil,RaffaeleGiancarlo,andGiuseppeF.Italiano.Sparse
dynamicprogrammingi:linearcostfunctions.JournaloftheACM,39(3):519–
10.1.doi:10.1145/146637.1466501992.545,[16]DavidEppstein,ZviGalil,RaffaeleGiancarlo,andGiuseppeF.Italiano.Sparse
dynamicprogrammingii:convexandconcavecostfunctions.Journalofthe
ACM,39(3):546–567,1992.doi:10.1145/146637.146656.10.1
[17]MartinFekete,IvoL.Hofacker,andPeterF.Stadler.PredictionofRNA
basepairingprobabilitiesusingmassivelyparallelcomputers.JournalofCom-
putationalBiology,1999.Availablefrom:http://www.santafe.edu/media/
5.4.2.057.pdf06-workingpapers/98-[18]FreeSoftwareFoundation.Bison—GNUparsergenerator,2010.Available
5.1.http://www.gnu.org/software/bison/from:[19]RobertGiegerich.Explainingandcontrollingambiguityindynamicpro-
gramming.InProceedingsofCombinatorialPatternMatching,volume1848
ofSpringerLectureNotesinComputerScience,pages46–59.Springer,

151

2000.Availablefrom:http://bibiserv.techfak.uni-bielefeld.de/adp/
8.1.ps/ambi.pdf[20]RobertGiegerich,CarstenMeyer,andPeterSteffen.Adiscipline
ofdynamicprogrammingoversequencedata.ScienceofComputer
Programming,51(3):215–263,June2004.Availablefrom:http:
//bibiserv.techfak.uni-bielefeld.de/adp/ps/GIE-MEY-STE-2004.pdf,
doi:10.1016/j.scico.2003.12.005.(document),2.1,2.1.2
[21]RobertGiegerichandPeterSteffen.Implementingalgebraicdynamicpro-
gramminginthefunctionalandtheimperativeprogrammingparadigm.In
E.A.BoitenandB.Möller,editors,MathematicsofProgramConstruction,
pages1–20.LNCS2386,2002.Availablefrom:http://bibiserv.techfak.
5.3.33.1,.bielefeld.de/adp/ps/adp_implementing.ps.gzuni-[22]RobertGiegerich,BjörnVoß,andMarcRehmsmeier.Abstractshapes
ofRNA.NucleicAcidsResearch,32(16):4843,September2004.Avail-
http://nar.oxfordjournals.org/cgi/reprint/32/16/4843.from:ablepdf,doi:doi:10.1093/nar/gkh779.5.4.3.1,5.4.3.1
[23]OsamuGotoh.Animprovedalgorithmformatchingbiologicalsequences.Jour-
nalofMolecularBiology,162:705–708,1982.1.3.1
[24]DanS.Hirschberg.Alinearspacealgorithmforcomputingmaximalcom-
monsubsequences.CommunicationsoftheACM,18(6):341–343,1975.Avail-
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.from:able1.3.1.1.88.7183&rep=rep1&type=pdf[25]IvoL.Hofacker,WalterFontana,PeterF.Stadler,L.SebastianBonhoeffer,
ManfredTacker,andPeterSchuster.FastfoldingandcomparisonofRNA
secondarystructures.MonatsheftefürChemie,125(2):167–188,1994.Avail-
http://fontana.med.harvard.edu/www/Documents/WF/Papers/from:ablevienna.rna.pdf,doi:10.1007/BF00818163.2.1.1.1,4.3,5.3.7.3,5.4.1.3,
5.4.2,5.4.3.1,5.4.5,6.4,8.1
[26]MatthiasHöchsmann.TheTreeAlignmentModel:Algorithms,Implementa-
tionsandApplicationsfortheAnalysisofRNASecondaryStructures.PhD
thesis,UniversitätBielefeld,2005.Availablefrom:http://bieson.ub.
10.3.bielefeld.de/volltexte/2005/709/pdf/diss.pdfuni-[27]StefanJanssenandRobertGiegerich.Fastercomputationofexact
RNAshapeprobabilities.Bioinformatics,26(5):632–639,2010.Available
http://bioinformatics.oxfordjournals.org/cgi/reprint/26/5/from:632.pdf,doi:10.1093/bioinformatics/btq014.5.3.7.2,8.2
[28]SimonPeytonJones,editor.Haskell98LanguageandLibraries–TheRevised
Report.CambridgeUniversityPress,2003.Availablefrom:http://www.
haskell.org/definition/haskell98-report.pdf.1.3.4,2.1.2,3.1

152

[29]W.A.Lorenz,YannPonty,andPeterClote.Asymptotics
ofRNAshapes.JournalofComputationalBiology,15(1):31–
63,2008.Availablefrom:http://www.lri.fr/~ponty/docs/
AsymptoticsRNAShapes-10.1089/cmb.2006.0153.5.4.3.1JCompBiol-LorenzPontyClote.pdf,doi:

[30]WellingtonS.Martins,JuanB.DelCuvillo,FranciscoJoseUseche,KevinB.
Theobald,andGuangR.Gao.Amultithreadedparallelimplementationof
adynamicprogrammingalgorithmforsequencecomparison.InProceedings
ofthePacificSymposiumonBiocomputing,pages311–322,2001.Avail-
online/proceedings/psb01/http://psb.stanford.edu/psb-from:able5.4.2.martins.pdf

[31]CarstenSequence-StructureMeyerandPRobatternsertinGiegericRNA.h.MatcJournalhingofandPhysicalSignificanceChemistryEv,aluation216:1–24,of
2002.Availablefrom:http://citeseerx.ist.psu.edu/viewdoc/download?
4.5.8.1.doi=10.1.1.2.7910&rep=rep1&type=pdf

[32]cuits.GordonE.ElectrMoonicsore.MagazineCramming,38(8),morecomp1965.onenAtsvonailabletoinfrom:tegratedftp:cir-
Press_Releases///download.intel.com/museum/Moores_Law/Articles-5.4.2.Gordon_Moore_1965_Article.pdf

[33]AkimasaMorihata.Ashortcuttooptimalsequences.NewGenerationCom-
1.3.42010.accepted,,puting

[34]MathiasMöhl,RahelehSalari,SebastianWill,RolfBackofen,andS.Sahi-
nalp.SparsificationofRNAstructurepredictionincludingpseudoknots.In
onVincenAtlgorithmsMoultoninandMonaBioinformaticsSingh,(Weditors,ABI),vProcolumeeedings6293ofoftheLectur10theWNotesorkshopin
ComputerScience,pages40–51.SpringerBerlin/Heidelberg,2010.Avail-
freiburg.de/Publications/moehl_http://www.bioinf.uni-from:ablewabi10:Sparsification.pdf,doi:10.1007/978-3-642-15294-8_4.10.1

[35]SaulB.NeedlemanandChristianD.Wunsch.Ageneralmethodapplicableto
thesearchforsimilaritiesintheaminoacidsequenceoftwoproteins.Journal
ofMolecularBiology,48:443–453,1970.1.3.1,4.5.2

[36]Rman.uthNussinoAlgorithmsv,forGeorgeloopmatcPieczenik,hings.JerroldSIAMR.JournalGriggs,onAandppliedDanielJ.MathematicsKleit-,
1,35(1):68–82,doi:10.1137/01350061978.Available.2.1.1.1,from:5.4.5http://link.aip.org/link/?SMM/35/68/

[37]TheFlexProject.flex:TheFastLexicalAnalyzer,2010.Availablefrom:
5.1.http://flex.sourceforge.net/

153

[38]JaninaReederandRobertGiegerich.Agraphicalprogrammingsystemfor
molecularmotifsearch.InProceedingsofthe5thinternationalConferenceon
GenerativeProgrammingandComponentEngineering,pages131–140,Port-
land,Oregon,USA,October22-262006.ACMPress,NewYork,NY.
GPCE’06.Availablefrom:http://aop.cslab.openu.ac.il/~lorenz/www/
5.3.7.35.3.7.2,1.2,.ontheShelf/p131.pdf

[39]JensReeder.AlgorithmsforRNAsecondarystructureanalysis:predictionof
pseudoknotsandtheconsensusshapesapproach.Thesis,UniversitätBielefeld,
2007.Availablefrom:http://bieson.ub.uni-bielefeld.de/volltexte/
10.15.42,2.1.3,.2008/1276/pdf/thesis.pdf[40]JensReederandRobertGiegerich.Design,implementationandevaluationofa
practicalpseudoknotfoldingalgorithmbasedonthermodynamics.BMCBioin-
formatics,5:104,August2004.Availablefrom:http://www.biomedcentral.
com/content/pdf/1471-2105-5-104.pdf,doi:10.1186/1471-2105-5-104.
2.1.3,4.5.8.1,4.5.8.4,5.3.7.3,5.4.5,8.4

[41]FastMarcandRehmsmeier,effectivePeterpredictionSteffen,ofMatthiasmicroRNA/targetHöchsmann,duplexes.andRobRNAert,Giegeric10:1507–h.
1517,2004.10/1507.full.pdfAv.ailable2.1.3,from:2.1.3http://rnajournal.cshlp.org/content/10/

[42]DavidSankoff.Simultaneoussolutionofthernafolding,alignmentandpro-
tosequenceproblems.SIAMJournalonAppliedMathematics,45(5):68–82,
1.3.1,1985.erOctob4.5.2

[43]GeorgSauthoff.Java-BackendfürdenADP-Compiler.Diplomarbeit,Univer-
Bielefeld,sität2.1.32007.

[44]StefanieSchirmer.AFrontendfortheADPcompiler.Diplomarbeit,Univer-
5.3.82006.Bielefeld,sität

[45]AlexanderSczyrbaandJanKrüger.BiBiServ–BielefeldUniversity
BioinformaticsServer,2010.Availablefrom:http://bibiserv.techfak.
7.1.bielefeld.deuni-

[46]AlexanderSczyrba,JanKrüger,HenningMersch,StefanKurtz,andRobert
cleicGiegericAcidsh.ResearRNA-relatedch,tools31(13):3767–3770,onthe2003.BielefeldAvailableBioinformaticsfrom:Server.http://nar.Nu-
7.1.oupjournals.org/cgi/reprint/31/13/3767.pdf

[47]Pmingeter.Steffen.PhDthesis,CompilingTecahnischeDomainFakuSpeltätcificUnivLersitätanguageforBielefeld,Dynamic2006.ProAgrvam-ail-
ablepdf/diss.pdffrom:.2.1.3,5.3.6http://bieson.ub.uni-bielefeld.de/volltexte/2007/1035/

154

[48]PeterSteffenandRobertGiegerich.Versatileanddeclarativedynamicpro-
grammingusingpairalgebras.BMCBioinformatics,6(1):224,Septem-
ber1471-2005.2105-Av6-ailable224.pdf,from:doi:10.1186/1471-2105-6-http://www.biomedcentral.com/content/pdf/224.2.1.1,2.1.1,4.6.3

[49]PeterSteffenandRobertGiegerich.Correction:versatileanddeclarative
dynamicprogrammingusingpairalgebras.BMCBioinformatics,7:214,
April1471-2006.2105-A7-vailable214.pdf,from:doi:10.1186/1471-2105-7-http://www.biomedcentral.com/content/pdf/214.2.1.1,4.6.3

[50]PeterSteffenandRobertGiegerich.TableDesigninDynamicProgram-
ming.InformationandComputation,204(9):1325–1345,2006.Available
pdffrom:.5.3.7,5.3.7.1http://www.techfak.uni-bielefeld.de/~psteffen/pub/tabulate.

[51]PeterSteffen,BjörnVoß,MarcRehmsmeier,JensReeder,andRobert
Giegerich.RNAshapes:anintegratedRNAanalysispackagebasedonab-
stractshapes.Bioinformatics,22(4):500–503,February2006.Available
http://bioinformatics.oxfordjournals.org/cgi/reprint/22/4/from:5.3.7.34.5.9,2.1.3,.500.pdf

[52]KedarSwadi,WalidTaha,andOlegKiselyov.Stagingdynamicprogramming
encealgorithms.onFInunctionalProceeProdingsgrofammingthe,10th2005.ACMAvailableSIGPLANfrom:Internationalwww.cs.rice.edu/Confer-
1.3.2.13.pdf04-~taha/publications/preprints/2005-

[53]StephenCommunicH.ationsUnger.ofAtheglobalACM,parserfor11(4):240–247,context-freeAprilphrase1968.5.4.1structuregrammars.

[54]BjörnVoß.AdvancedToolsforRNASecondaryStructureAnalysis.The-
sis,UniversitätBielefeld,2004.Availablefrom:http://bieson.ub.
5.4.3.11.2,.bielefeld.de/volltexte/2005/664/pdf/Diss.pdfuni-

[55]ticBjörnanalysisVoß,ofRoberRNAtshGiegericapes.h,andBMCMarcBiology,Rehmsmeier.4(1):5,FebruaryComplete2006.probabilis-Avail-
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1479382/from:ablepdf/1741-7007-4-5.pdf,doi:10.1186/1741-7007-4-5.5.4.3.1,8.3

[56]YdoWexler,ChayaZilberstein,andMichalZiv-Ukelson.Astudyofaccessible
motifsandrnafoldingcomplexity.JournalofComputationalBiology,14(6),
10.1.doi:10.1089/cmb.2007.R0202007.

[57]StefanWuchty,WalterFontana,IvoL.Hofacker,andPeterSchuster.Com-
pleteBiopsubolymers,optimal49:145–165,foldingof1998.RNAAvandailablethefrom:stabilityofsecondaryhttp://www.santafe.edu/structures.
8.15.4.3,.~walter/Papers/subopt.pdf

155

[58]

[59]

156

Recognition

and

parsingof

inlanguagestext-freecon

n3.DanielH.InformationYounger.andContrRecognitionol,and10(2):189–208,parsingofFconebruarytext-free1967.1.2,languages5.4.1

MichaelZukerandPatrickStiegler.Optimalcomputerfolding

sequences

,chareseR

anddynamicsthermousing

9(1):133–148,

1981.

4.5.2

auxiliary

information.

largeof

Nucleic

time

RNA

cidsA

Soyez le premier à déposer un commentaire !

17/1000 caractères maximum.