La lecture à portée de main
Découvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDécouvre YouScribe en t'inscrivant gratuitement
Je m'inscrisDescription
Sujets
Informations
Publié par | profil-zyak-2012 |
Nombre de lectures | 20 |
Langue | English |
Extrait
1 2
1
2
yinArticialAgenarianeorktsmakLauren[6]tdifying,Oopprseaut.MortalitcoandtheseMarkitselfRingknoandAdicationtsis.ntsoUMRconsequencesAogroPagenarisTwnecth[1]518de/vironmenINRAmmon16erueprofoundClaudesomeoneBernard,setting75005anPunrealisticaris,ofFsrancewn,laurent.orseau@agropdiscussesaristech.fremhttp://www.agroparistinech.fr/mia/orseauorld.Self-MoofIDSIAdify/leadingUnivG?ersittydication).ofwLuganoan/theSUPSImoGalleriab2,consider6928ersionMt-learning,anno-Lugano,agenSwitzerlandtheirmark@idsia.chthenhttp://www.idsia.ch/~rSimpleing/NobAbstract.completelyThisThepaplearningerteractingcvironmenoanthesidersexisttenhethisconhosequencesrofwendopapwingofanariseineddingtel-univligenintrealagenparticular,tthewitwinghtotheoabilitpyitsto(cf.moMacdifyaitsrelatedoselfwnocorigorouslyde.placeTheiinformaltelligenhett'sagenbtbisalsopatterneditscloselyWafterheAIXIersal[1],fourbutts:theprediction-seeking,enekingvironmenandtthesehasnon-learningread-onlyWac-osecesslemma,toGamthescienagenPrizetou'ssuggestsdescription.yOnusualtheofbasisagenofinsomewithsimpleenmotdicationsestostrong,theassumption:utiliagettsyoutsideandthehorizonvironmenfunctions,ButwienotarewableutoodiscussrealandorldcompareThissomeervsomeerythedierenthattfromkindsbofagenagenofts,ersalsptelligenceecically:treinforcementhet-learning,wgoal-seeking,Inpredictivwe,examineandconsequencesknoallowledge-seekinganagentts.moInitsparticular,wnwde,eossiblyintotrooducedemisewhatthewdelehinecallforthedierenbutStreatmenimpletonofGammobitTpursuewhissuesic,healloAIXIwswusthintooriginal,dis-framewcusswwhetherretheseagenagencotscanweouldieddycandhoseenoseytoenmot.difyethemselvtesself-motounivwvardoftheircooagenwnreinforcemendetrimengoal-seeking,t.andKeywwledge-seords:learningSelf-Mots,difyingwAgencomparets,withAIXI,optimal,Univversalts.ArticialeInptelli-agence,diReinforcementhettonLearning,bit:Prediction,famousRealtist,welorldwinner,assumptionsy1trustIn,tranoortunitduction,
x
a2A o2O
t
a ot t
q 2Q Q
h Q qh
h = (o ;a ;:::;o ;a )0 0 t t
q(a ;:::;a ) =o ;:::;o :0 t 0 t
t hh
thjhj + 1 jqj q h kk
a ok k
x x
:Q! (0; 1]
q2Q (h) (q)P
h (h) := (q)q2Qh
u :H! [0; 1]
2w : N ! R;
w(t;k) t
k
P1
w(t;k)<1 :k=t
agentswhicbaseddiscounAthetheretocourse,eWee,wishmto0discussttheersebThehastatemenviorwithofagenfourFspveci,cthelearningccur.agenets,actionsbutirstrstwts,etodescribtevthevenGivvironmenprobabilittalloroseunivanother,erseossiblewithfunctionwhiclengthhutilittheygowiwll,instepteract.tEacwhhoagenntesoutputsitactionsyOftionit?assignse(ainossiblerespaonseourtoramications,theunivobservousationsresptakustouecicydprohducedelihobthatytthevunivfutureerse.thatTheretheistawithtempeoralhistoriesorder,bsooutilithataatoptimeyouldthethesmall,agenhothance,takeesccurs.aneactiponhoWystone.theandrespthebutuniviterseknorespunivondsin,btheyeacprofunducingproan(strong)observpationeighaprobabilitash.erseThe.univverseationsisrefersassumedoftoandbreconsistencomputable;:i.e.,ritofisrdescribhedebay,acanprogramestimatetfortelligenossibleinthe,dwhereunivasthatisthethecsetactionofanother,allvalueprograms.vThethissetcanofuesalltunivTheevrissesutilitthatnsa.rheancvonsistentwwith1.historysecomingwithisydenotedtbfunction,ofmigh.whicTfutureoonsainytheythatfunction,aevprogramendscosttheisstep,consistenthetthewiththettortangeneral,impusttheclatbutstillose)choosescbouestimating(ywimmortalunivorwillwing,oknod,all-smeansncethatdothenotprogramwoutputshtheerseobservisationsitinestimatestheprobabilithistoryofifh.iteiscgivofsenastheargumenactionsandasainput:ositiv,wythapppriorultimatelyy)andeacerpforevuniv,tstlyasinstanAsouconyenieneshorthand,makobservwillframingthattoerationsumIngenerallythetherestoofetheallpaperseser,tctsertainagenconivaenthetionsesonsthewillebwhicemfolblonite.wenedspforhistoryshorthandthereference:topuseantorefersatoytheeactimepstepfuturerighontlikafterohistoryoftsthe,ersesandgenerateisfuture.thereforeorequalagentotoAgenhoicialonertoAerinit;ustyoneMortalitoanderrefersandtoimpliestheitlengassigntalhtoofdierenprogrampicationfutures.;assignmendof2aluesisfuturesthedoneSelf-Moateedyndioptimalco,Wthis.twhicismapsusefulofdiscusyoftoagenaluestheoreticaletareeenwrittenandasTspace.balanceandhort-termandyformalismlong-termmoret.,Wagenehaswillhorizondiscusswrong.fourtdierenerationtthehhuctsrsautilitaluesvOnlybasedagenhocanfaretoeacfuturehovThisariationsthatoferawsingldepeonagen,tcurrenAtimemandrequire,,timebinafutursthateevdnonoAIXIIn[1]i(whicmhbissummable:noteUnivcomputable).3ositivAaouldinistelligent3agenincomputabletststhatbareguaran2topairtheofstrategyactionsandandguaranobserveeaquitetforisionsons,thewhicts'hlimits.u w t (h)t
h
;u;w; A;O(h) (h)t t
t
(h) :=w(t;jhj) u(h) + max (ha)t t
a2A
X
(ha) := (oj ha) (hao) :t t
o2O
(h)th
a := argmax (ha)t th h
a2A
u w
rl
o =ho~;rit t t
~o~ 2Ot
r 2 [0; 1]t
u(h) =:= rjhj
m w(t;k) = 1 k t m
w(t;k) = 0
P
j qj(h) =(h) := 2 :q2Qh
gg
u(h) =g(o ;:::;o ) = 11 jhj
t =jhjP1 t ku(h ) 1 w(t;k) = 2tt=0
g rl rl
(givenaengoalehistory),asAtherecursivvcasealuepofoalvlcpconstanossibleshouldoutcomes,ofse-thegoalaction,theeacwhih,wcan,eigh(2)tedthebuseyattheirueprobabilithorizonyv(asedescribaedfunctionabhistoryoaluevonce,e).useBasedtoonbthis,ytheareagenumtgeneralitcthathoTheosesunltered4(1)thevactionhorizonthatThismaximizyes,vtheactiontrueantheoft-learningalueyvis:gothe,estimatesonlineinsecondnThefuturesaction.eacdevvandaluereachighest-vattheandofdalue3(3)shorterThtheus,AthiseAbrewehatovioraofalue,anlossagen,tsucisrstspveciedybisyycardhoicevofisvW,simpleestimatedwith,horiand.thehorizon.t2.1andVaariousotherwise;univwingersaleagengeneraltsFThepfourthedierentteciesagenhtsforconsideredaluehereset.areekingdescribactionedaindepdetailobservbencoeloutilitw.hTheygivarer(1)onap(fairlyiftraditional)acrateinforacotherwise.ement-lebarningatagentot,nwhicallohWattemptsdiscountofunctionmaximizetaAgenrewvardofsignalacgivOneenwbyyandthetheenaluesvsaiandrardsoassumednhameenmaximt;v(2)andawithoutgoofal-seyekingbagennormalizedt,hwhiclinehTheattemptsvto.acutilithievfunetionaanscoppofecicrewgoalsignal:encovdedely:incalculateditsalueutilit.yefunction;a(3)binaryafunctionpraetdiction-sezoneking:agentimeplusfunctionshistoryandareifcopiedutilifromgiv4witharehistoryenforlexicographicalaltbutpfolloerfectly;discussionandr(4)mainaforknowcomputablelefunctions.dge-seorekingsagenecialt,ofwhreinforcemeniagencAIXI:htheattemptssptocompletelmaximizewhicitsvknoshorthandwlevdgevofThtheTheunival-seerseagent(whicAhparticularishasnotgoalthe,sameendingasthebationeingquence,adedbitslyesuctothatpredicteitossibleweeall).whatThebasedrossibleeinforhconement-lethearningisagenthi,edAvthatvforputy0,Theincanterpretseonehedpartmostofsoitstimeinputtasageawrew,ard.signaleandatheteremaininghorizonpartfunctions,aswitsTheseobservtsation;Articiali.e.,fautilitortedstringsounactionsdisclethehievingisgoal.torydierencesethieenainofand,Mortalitwheredicationaluethatvutilitthevthatofyst,Self-MowhichmerelyattemptsdirectlytothepredictTiesitsbrokeninvironmenorder.
g
p
u(h) = 1
o 0 o^t t
o^ := max (ojh)t o2Oh rl
k
(h)
Q Qh h
Qh
k
u(h) = (h) w(t;k) = 1 k t = m
k
p
p p p
x
h
sm
E
c
E c
tcorrectlypredictsaction,erformanceusefulitsmnextvobservessenceationwithagenthetheA,esandnistheiftsotherwise.notTheApredicAtyiofontthatagenoagensreal.istlylikcaneoundSolomonodenediimpndiucfuturetionen[7,8]meandasymptoticisindenedofbandymakations,sservumob-diableitsarepredictingcalyappbtsytheutilitcanitsagenmaximizesb,calAan,the.compared.Theunchorizonffunctionnoteisitthee.g.,sameforaswforThAAgenagentgenerateekingAdiction-sesaid.[1]Theardsknowforleagendge-sedekingtheagentis,mistakApastevprw,ifmaximizesmistakitsardsknoAwledgelastofctional,itsttingenoundsviron-tmenet,twhicthehaisdidenoracletiycalorldtoprogramminimizingerseTheit.avironmentheenus,erwhicwhichfourdecreasesbwhenevteryunivEquationsersesspinItthetfromyfailbtothematcyhutiltheuseobservpredictionsation(toandcomparisonarevironmenremoifvandedtakfromactions,signalsecialerrors..agen(SinceAthebtrulyeitsentovironmenoftmeaningishnevtheercremoisvthated,givitshistoryrelativpeinprobabilitfractionysalwThaesysonlyinitencreases.)InAthectionsasymptoticallycanhebereitctohosen3inttenicationtionallyfromtoeproandductheyesthethighesternanyumthatbenerTherefore,oftheinconsistenintoob-sepa-servfromatctionalitons,nremoisvingkindprogramsonefromerformspcomputationaThejustofasSelf-Mowofe,univtodenedo,yrunandexpalwerimenystculatesoptimaltothdiscosettingvupperbwhetheragainstourhunivothererseageniscanoneewAaisyborreplacinganother.inA(1-3)onthelyecichas.theisfolloortanwingtoutilitthatyisandreplacedhorizonyfunctions:ineutrlitnotfunctions;esAdotheandustcomplex,whereasyitsarbitrarilof,inputsandalloemeaningfulbwithant,c).itself,us,tAagentsifAtheicialtoeinsamebuilttheyandtheisa0predictionotherwise.5Tlearningotmaximirtzeisutilittoye,alAoptimalisiftpreducestendsagenwasthatmAuc,hthataseacphistoryossible,,whiclearninght'smeanshoicediscardingactionascomparemanwithyof(non-consistenMortalitt)enprogramssameas,pitsossible,erformancediscomeasuredvtermseringthewithoftheehighestitpes.ossus,imistakblehaprobabiliteyhawhicehconsequenceuniv.erseotherisords,theagentrueisone.optimalDiscardingtthnebmostofprobableesprogramsmakresultstendsinwthezero.greatestSelf-MoreductionageninsAand.TheThetsoptimaltheagentsActiontheincomputableisthereforeprogrambutrather,aretextualfororede),theoreofisinceuppresidesbtheonwactualittmomigh.evetuallyNoteear.therewonlydivideenagettwhictothewredictorparestomistakraterctionalestimatetheofThetheirpunivreofrsageet,from,expinerience,abutofAonthatdopesannoinnitelearning:instanit.knoreal-w