Self Modification and Mortality
10 pages
English

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Self Modification and Mortality

-

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
10 pages
English
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Niveau: Supérieur, Doctorat, Bac+8
Self-Modification and Mortality in Artificial Agents Laurent Orseau 1 and Mark Ring 2 1 UMR AgroParisTech 518 / INRA 16 rue Claude Bernard, 75005 Paris, France 2 IDSIA / University of Lugano / SUPSI Galleria 2, 6928 Manno-Lugano, Switzerland Abstract. This paper considers the consequences of endowing an intel- ligent agent with the ability to modify its own code. The intelligent agent is patterned closely after AIXI [1], but the environment has read-only ac- cess to the agent's description. On the basis of some simple modifications to the utility and horizon functions, we are able to discuss and compare some very di?erent kinds of agents, specifically: reinforcement-learning, goal-seeking, predictive, and knowledge-seeking agents. In particular, we introduce what we call the Simpleton Gambit which allows us to dis- cuss whether these agents would choose to modify themselves toward their own detriment. Keywords: Self-Modifying Agents, AIXI, Universal Artificial Intelli- gence, Reinforcement Learning, Prediction, Real world assumptions 1 Introduction The usual setting of learning agents interacting with an environment makes a strong, unrealistic assumption: the agents exist outside of the environment.

  • agent can

  • agent

  • reinforcement learning

  • discount future

  • self-modifiable agents

  • real-world part

  • horizon function

  • future actions

  • code


Sujets

Informations

Publié par
Nombre de lectures 20
Langue English

Extrait

1 2
1
2
yinArticialAgenarianeorktsmakLauren[6]tdifying,Oopprseaut.MortalitcoandtheseMarkitselfRingknoandAdicationtsis.ntsoUMRconsequencesAogroPagenarisTwnecth[1]518de/vironmenINRAmmon16erueprofoundClaudesomeoneBernard,setting75005anPunrealisticaris,ofFsrancewn,laurent.orseau@agropdiscussesaristech.fremhttp://www.agroparistinech.fr/mia/orseauorld.Self-MoofIDSIAdify/leadingUnivG?ersittydication).ofwLuganoan/theSUPSImoGalleriab2,consider6928ersionMt-learning,anno-Lugano,agenSwitzerlandtheirmark@idsia.chthenhttp://www.idsia.ch/~rSimpleing/NobAbstract.completelyThisThepaplearningerteractingcvironmenoanthesidersexisttenhethisconhosequencesrofwendopapwingofanariseineddingtel-univligenintrealagenparticular,tthewitwinghtotheoabilitpyitsto(cf.moMacdifyaitsrelatedoselfwnocorigorouslyde.placeTheiinformaltelligenhett'sagenbtbisalsopatterneditscloselyWafterheAIXIersal[1],fourbutts:theprediction-seeking,enekingvironmenandtthesehasnon-learningread-onlyWac-osecesslemma,toGamthescienagenPrizetou'ssuggestsdescription.yOnusualtheofbasisagenofinsomewithsimpleenmotdicationsestostrong,theassumption:utiliagettsyoutsideandthehorizonvironmenfunctions,ButwienotarewableutoodiscussrealandorldcompareThissomeervsomeerythedierenthattfromkindsbofagenagenofts,ersalsptelligenceecically:treinforcementhet-learning,wgoal-seeking,Inpredictivwe,examineandconsequencesknoallowledge-seekinganagentts.moInitsparticular,wnwde,eossiblyintotrooducedemisewhatthewdelehinecallforthedierenbutStreatmenimpletonofGammobitTpursuewhissuesic,healloAIXIwswusthintooriginal,dis-framewcusswwhetherretheseagenagencotscanweouldieddycandhoseenoseytoenmot.difyethemselvtesself-motounivwvardoftheircooagenwnreinforcemendetrimengoal-seeking,t.andKeywwledge-seords:learningSelf-Mots,difyingwAgencomparets,withAIXI,optimal,Univversalts.ArticialeInptelli-agence,diReinforcementhettonLearning,bit:Prediction,famousRealtist,welorldwinner,assumptionsy1trustIn,tranoortunitduction,
x
a2A o2O
t
a ot t
q 2Q Q
h Q qh
h = (o ;a ;:::;o ;a )0 0 t t
q(a ;:::;a ) =o ;:::;o :0 t 0 t
t hh
thjhj + 1 jqj q h kk
a ok k

x x
:Q! (0; 1]
q2Q (h) (q)P
h (h) := (q)q2Qh

u :H! [0; 1]
2w : N ! R;
w(t;k) t
k
P1
w(t;k)<1 :k=t
agentswhicbaseddiscounAthetheretocourse,eWee,wishmto0discussttheersebThehastatemenviorwithofagenfourFspveci,cthelearningccur.agenets,actionsbutirstrstwts,etodescribtevthevenGivvironmenprobabilittalloroseunivanother,erseossiblewithfunctionwhiclengthhutilittheygowiwll,instepteract.tEacwhhoagenntesoutputsitactionsyOftionit?assignse(ainossiblerespaonseourtoramications,theunivobservousationsresptakustouecicydprohducedelihobthatytthevunivfutureerse.thatTheretheistawithtempeoralhistoriesorder,bsooutilithataatoptimeyouldthethesmall,agenhothance,takeesccurs.aneactiponhoWystone.theandrespthebutuniviterseknorespunivondsin,btheyeacprofunducingproan(strong)observpationeighaprobabilitash.erseThe.univverseationsisrefersassumedoftoandbreconsistencomputable;:i.e.,ritofisrdescribhedebay,acanprogramestimatetfortelligenossibleinthe,dwhereunivasthatisthethecsetactionofanother,allvalueprograms.vThethissetcanofuesalltunivTheevrissesutilitthatnsa.rheancvonsistentwwith1.historysecomingwithisydenotedtbfunction,ofmigh.whicTfutureoonsainytheythatfunction,aevprogramendscosttheisstep,consistenthetthewiththettortangeneral,impusttheclatbutstillose)choosescbouestimating(ywimmortalunivorwillwing,oknod,all-smeansncethatdothenotprogramwoutputshtheerseobservisationsitinestimatestheprobabilithistoryofifh.iteiscgivofsenastheargumenactionsandasainput:ositiv,wythapppriorultimatelyy)andeacerpforevuniv,tstlyasinstanAsouconyenieneshorthand,makobservwillframingthattoerationsumIngenerallythetherestoofetheallpaperseser,tctsertainagenconivaenthetionsesonsthewillebwhicemfolblonite.wenedspforhistoryshorthandthereference:topuseantorefersatoytheeactimepstepfuturerighontlikafterohistoryoftsthe,ersesandgenerateisfuture.thereforeorequalagentotoAgenhoicialonertoAerinit;ustyoneMortalitoanderrefersandtoimpliestheitlengassigntalhtoofdierenprogrampicationfutures.;assignmendof2aluesisfuturesthedoneSelf-Moateedyndioptimalco,Wthis.twhicismapsusefulofdiscusyoftoagenaluestheoreticaletareeenwrittenandasTspace.balanceandhort-termandyformalismlong-termmoret.,Wagenehaswillhorizondiscusswrong.fourtdierenerationtthehhuctsrsautilitaluesvOnlybasedagenhocanfaretoeacfuturehovThisariationsthatoferawsingldepeonagen,tcurrenAtimemandrequire,,timebinafutursthateevdnonoAIXIIn[1]i(whicmhbissummable:noteUnivcomputable).3ositivAaouldinistelligent3agenincomputabletststhatbareguaran2topairtheofstrategyactionsandandguaranobserveeaquitetforisionsons,thewhicts'hlimits.u w t (h)t
h
;u;w; A;O(h) (h)t t
t
(h) :=w(t;jhj) u(h) + max (ha)t t
a2A
X
(ha) := (oj ha) (hao) :t t
o2O
(h)th
a := argmax (ha)t th h
a2A
u w

rl
o =ho~;rit t t
~o~ 2Ot
r 2 [0; 1]t
u(h) =:= rjhj
m w(t;k) = 1 k t m
w(t;k) = 0
P
j qj(h) =(h) := 2 :q2Qh
gg
u(h) =g(o ;:::;o ) = 11 jhj
t =jhjP1 t ku(h ) 1 w(t;k) = 2tt=0

g rl rl
(givenaengoalehistory),asAtherecursivvcasealuepofoalvlcpconstanossibleshouldoutcomes,ofse-thegoalaction,theeacwhih,wcan,eigh(2)tedthebuseyattheirueprobabilithorizonyv(asedescribaedfunctionabhistoryoaluevonce,e).useBasedtoonbthis,ytheareagenumtgeneralitcthathoTheosesunltered4(1)thevactionhorizonthatThismaximizyes,vtheactiontrueantheoft-learningalueyvis:gothe,estimatesonlineinsecondnThefuturesaction.eacdevvandaluereachighest-vattheandofdalue3(3)shorterThtheus,AthiseAbrewehatovioraofalue,anlossagen,tsucisrstspveciedybisyycardhoicevofisvW,simpleestimatedwith,horiand.thehorizon.t2.1andVaariousotherwise;univwingersaleagengeneraltsFThepfourthedierentteciesagenhtsforconsideredaluehereset.areekingdescribactionedaindepdetailobservbencoeloutilitw.hTheygivarer(1)onap(fairlyiftraditional)acrateinforacotherwise.ement-lebarningatagentot,nwhicallohWattemptsdiscountofunctionmaximizetaAgenrewvardofsignalacgivOneenwbyyandthetheenaluesvsaiandrardsoassumednhameenmaximt;v(2)andawithoutgoofal-seyekingbagennormalizedt,hwhiclinehTheattemptsvto.acutilithievfunetionaanscoppofecicrewgoalsignal:encovdedely:incalculateditsalueutilit.yefunction;a(3)binaryafunctionpraetdiction-sezoneking:agentimeplusfunctionshistoryandareifcopiedutilifromgiv4witharehistoryenforlexicographicalaltbutpfolloerfectly;discussionandr(4)mainaforknowcomputablelefunctions.dge-seorekingsagenecialt,ofwhreinforcemeniagencAIXI:htheattemptssptocompletelmaximizewhicitsvknoshorthandwlevdgevofThtheTheunival-seerseagent(whicAhparticularishasnotgoalthe,sameendingasthebationeingquence,adedbitslyesuctothatpredicteitossibleweeall).whatThebasedrossibleeinforhconement-lethearningisagenthi,edAvthatvforputy0,Theincanterpretseonehedpartmostofsoitstimeinputtasageawrew,ard.signaleandatheteremaininghorizonpartfunctions,aswitsTheseobservtsation;Articiali.e.,fautilitortedstringsounactionsdisclethehievingisgoal.torydierencesethieenainofand,Mortalitwheredicationaluethatvutilitthevthatofyst,Self-MowhichmerelyattemptsdirectlytothepredictTiesitsbrokeninvironmenorder.
g

p
u(h) = 1
o 0 o^t t

o^ := max (ojh)t o2Oh rl

k
(h)
Q Qh h
Qh

k
u(h) = (h) w(t;k) = 1 k t = m

k






p


p p p

x
h


sm
E
c
E c
tcorrectlypredictsaction,erformanceusefulitsmnextvobservessenceationwithagenthetheA,esandnistheiftsotherwise.notTheApredicAtyiofontthatagenoagensreal.istlylikcaneoundSolomonodenediimpndiucfuturetionen[7,8]meandasymptoticisindenedofbandymakations,sservumob-diableitsarepredictingcalyappbtsytheutilitcanitsagenmaximizesb,calAan,the.compared.Theunchorizonffunctionnoteisitthee.g.,sameforaswforThAAgenagentgenerateekingAdiction-sesaid.[1]Theardsknowforleagendge-sedekingtheagentis,mistakApastevprw,ifmaximizesmistakitsardsknoAwledgelastofctional,itsttingenoundsviron-tmenet,twhicthehaisdidenoracletiycalorldtoprogramminimizingerseTheit.avironmentheenus,erwhicwhichfourdecreasesbwhenevteryunivEquationsersesspinItthetfromyfailbtothematcyhutiltheuseobservpredictionsation(toandcomparisonarevironmenremoifvandedtakfromactions,signalsecialerrors..agen(SinceAthebtrulyeitsentovironmenoftmeaningishnevtheercremoisvthated,givitshistoryrelativpeinprobabilitfractionysalwThaesysonlyinitencreases.)InAthectionsasymptoticallycanhebereitctohosen3inttenicationtionallyfromtoeproandductheyesthethighesternanyumthatbenerTherefore,oftheinconsistenintoob-sepa-servfromatctionalitons,nremoisvingkindprogramsonefromerformspcomputationaThejustofasSelf-Mowofe,univtodenedo,yrunandexpalwerimenystculatesoptimaltothdiscosettingvupperbwhetheragainstourhunivothererseageniscanoneewAaisyborreplacinganother.inA(1-3)onthelyecichas.theisfolloortanwingtoutilitthatyisandreplacedhorizonyfunctions:ineutrlitnotfunctions;esAdotheandustcomplex,whereasyitsarbitrarilof,inputsandalloemeaningfulbwithant,c).itself,us,tAagentsifAtheicialtoeinsamebuilttheyandtheisa0predictionotherwise.5Tlearningotmaximirtzeisutilittoye,alAoptimalisiftpreducestendsagenwasthatmAuc,hthataseacphistoryossible,,whiclearninght'smeanshoicediscardingactionascomparemanwithyof(non-consistenMortalitt)enprogramssameas,pitsossible,erformancediscomeasuredvtermseringthewithoftheehighestitpes.ossus,imistakblehaprobabiliteyhawhicehconsequenceuniv.erseotherisords,theagentrueisone.optimalDiscardingtthnebmostofprobableesprogramsmakresultstendsinwthezero.greatestSelf-MoreductionageninsAand.TheThetsoptimaltheagentsActiontheincomputableisthereforeprogrambutrather,aretextualfororede),theoreofisinceuppresidesbtheonwactualittmomigh.evetuallyNoteear.therewonlydivideenagettwhictothewredictorparestomistakraterctionalestimatetheofThetheirpunivreofrsageet,from,expinerience,abutofAonthatdopesannoinnitelearning:instanit.knoreal-w

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents