Induction of environment and goal models by an adaptive agent in deterministic environment ; Adaptyvaus agento aplinkos ir tikslo modelių indukcija deterministinėje aplinkoje
44 pages
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Induction of environment and goal models by an adaptive agent in deterministic environment ; Adaptyvaus agento aplinkos ir tikslo modelių indukcija deterministinėje aplinkoje

-

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
44 pages

Informations

Publié par
Publié le 01 janvier 2011
Nombre de lectures 27

Exrait

VYTAUTAS MAGNUS UNIVERSITY
INSTITUTE OF MATHEMATICS AND INFORMATICS
OF VILNIUS UNIVERSITY












Jurgita Kapo či ūt ė-Dzikien ė


INDUCTION OF ENVIRONMENT AND GOAL MODELS
BY AN ADAPTIVE AGENT IN DETERMINISTIC
ENVIRONMENT





Summary of Doctoral Dissertation
Physical Sciences, Informatics (09P)

















Kaunas, 2010
Dissertation was prepared at Vytautas Magnus University in 2005 – 2010.

Scientific supervisor:
doc. dr. Gailius Raškinis (Vytautas Magnus University, Physical Sciences,
Informatics, 09P).

The dissertation is to be defended in the Joint Council on Informatics of
Vytautas Magnus University and Institute of Mathematics and Informatics of
Vilnius University.

Chairman:
prof. habil. dr. Vytautas Kaminskas (Vytautas Magnus University, Physical
Sciences, Informatics, 09P).

Members:
prof. habil. dr. Gintautas Dzemyda (Institute of Mathematics and Informatics of
Vilnius University, Physical Sciences, Informatics, 09P),
prof. habil. dr. Rimantas Šeinauskas (Kaunas University of Technology,
Technological Sciences, Informatics Engineering, 07T),
prof. habil. dr. Edmundas Kazimieras Zavadskas (Vilnius Gediminas Technical
University, Technological Sciences, Informatics Engineering, 07T),
prof. habil. dr. Antanas Žilinskas (Institute of Mathematics and Informatics of atics, 09P).

Opponents:
prof. habil. dr. Henrikas Pranevi čius (Kaunas University of Technology, Physical
Sciences, Informatics, 09P),
prof. habil. dr. Laimutis Telksnys (Institute of Mathematics and Informatics of
Vilnius University, Physical Sciences, Informatics, 09P).

The defense of dissertation is scheduled for 10 a.m. on January 24, 2011, at Vincas
Čepinskis reading room, at the Faculty of Informatics of Vytautas Magnus University.
Address: Vileikos str. 8, LT-44404, Kaunas, Lithuania.
Summary of dissertation was mailed on December , 2010.

The dissertation is available at the National M. Mažvydas Library, the Library of
Vytautas Magnus University, and the Library of Institute of Mathematics and
Informatics of Vilnius University.

VYTAUTO DIDŽIOJO UNIVERSITETAS
VILNIAUS UNIVERSITETO MATEMATIKOS IR INFORMATIKOS
INSTITUTAS












Jurgita Kapo či ūt ė-Dzikien ė


ADAPTYVAUS AGENTO APLINKOS IR TIKSLO
MODELI Ų INDUKCIJA DETERMINISTIN ĖJE APLINKOJE





Daktaro disertacijos santrauka
Fiziniai mokslai, informatika (09P)


















Kaunas, 2010
Disertacija rengta 2005 – 2010 metais Vytauto Didžiojo universitete.

Mokslinis vadovas:
doc. dr. Gailius Raškinis (Vytauto Didžiojo universitetas, fiziniai mokslai,
informatika, 09P).

Disertacija ginama jungtin ėje Vytauto Didžiojo universiteto ir Vilniaus
universiteto Matematikos ir informatikos instituto informatikos mokslo krypties
taryboje.

Pirmininkas:
prof. habil. dr. Vytautas Kaminskas (Vytauto Didžiojo universitetas, fiziniai
mokslai, informatika, 09P).

Nariai:
prof. habil. dr. Gintautas Dzemyda (Vilniaus universiteto Matematikos ir
informatikos institutas, fiziniai mokslai, informatika, 09P),
prof. habil. dr. Rimantas Šeinauskas (Kauno technologijos universitetas,
technologijos mokslai, informatikos inžinerija, 07T),
prof. habil. dr. Edmundas Kazimieras Zavadskas (Vilniaus Gedimino technikos
universitetas, technologijos mokslai, informatikos inžinerija, 07T),
prof. habil. dr. Antanas Žilinskas (Vilniaus universiteto Matematikos ir informatikos
institutas, fiziniai mokslai, informatika, 09P).

Oponentai:
prof. habil. dr. Henrikas Pranevi čius (Kauno technologijos universitetas, fiziniai
mokslai, informatika, 09P),
prof. habil. dr. Laimutis Telksnys (Vilniaus universiteto Matematikos ir
informatikos institutas, fiziniai mokslai, informatika, 09P).

Disertacija bus ginama viešame gynimo tarybos pos ėdyje, kuris įvyks 2011 m.
sausio 24 d., 10 val. Vytauto Didžiojo universiteto Informatikos fakulteto Vinco
Čepinskio skaitykloje.
Adresas: Vileikos g. 8, LT-44404, Kaunas, Lietuva.
Disertacijos santrauka išsiuntin ėta 2010 12 .

Disertacij ą galima perži ūr ėti nacionalin ėje M. Mažvydo, Vytauto Didžiojo
universiteto, Vilniaus universiteto Matematikos ir informatikos instituto bibliotekose.
INTRODUCTION
The artificial intelligence model creators refused the opinion dominating for a long
time – that the artificial intelligence can be created by aggregation of effective but
narrowly specified modules (such as speech recognition, automatic translation, proof of
theorems, machine learning, etc.) into one intellectual system. These specialized
modules are often designed by considering different fundamentals and operational
theories. Instead, the paradigm of adaptive agent was established by denoting that the
intelligence (or adaptation) arises from agent’s permanent interaction with the
environment. Through this interaction adaptive agent performs actions that change the ent; changes of the environment in turn influence its action selection. Usually
adaptive agent seeks for the optimal action that could approach it to the beneficial
situation in the environment. The relevance of the action depends on agent’s learned
knowledge. Therefore the agent can be demanded to achieve only such effectiveness as
in practice it is possible by processing and integrating available information incoming
from the environment. According to this approach it is not important that agent’s
effectiveness could be similar to human effectiveness when solving narrowly specified
tasks. The main accent is the range of solvable tasks, integrated learning and decision
making mechanisms.
Interaction between agent and its environment can be simulated in many different
ways. Compared with other similar researches, this dissertation differs in that it presents
three novel solutions for three learning problems.
The objective and tasks
The objective of this work is to create an adaptive agent, able to interact with the
grid-world environment (that is the analogue of space), functioning as the deterministic
st1 order Markov decision process. The agent must be capable to solve broader set of
adaptation tasks, compared with other known architectures of adaptive agents.
Tasks, necessary for the objective achievement are:
1. To extend the capabilities of existing adaptive agents by solving the problem of
knowledge transferability from one environment into the others (environments differ
in objects arrangements, but not in the laws). The problem has to be solved using
proposed algorithm (implemented into adaptive agent), able to learn approximation
of the laws of environment (defined as environment model).
2. To extend the capabilities of existing adaptive agents by solving the problem of goal
percepts generalization. The problem has to be solved using proposed algorithm
(implemented into adaptive agent), able to determine the features that goal situations
have (defined as goal model).
3. To extend the capabilities of existing adaptive agents, by solving perceptual aliasing
thproblem (when percepts change under the deterministic n order Markov decision
process) and allowing agent to operate in partially observable environment. The
problem has to be solved by complementation of environment model induction
thalgorithm with the capability of transforming the n order Markov decision process
stinto the 1 order.
Statement for the defense
If the laws independent from changes of states exist in observable or partially
observable environment, functioning as deterministic Markov decision process, it is
possible to create such logical induction methods (and implement into adaptive agent)
5capable to discover those laws (by learning environment and goal models) only through
interaction between agent and environment and having no initial knowledge; learned
knowledge can be transferred and applied in the unknown environments thus indicates
better adaptation of the agent.
Object of investigation
The main objects of investigation in this dissertation are: interaction between
adaptive agent and environment; the possibility of integration of learning, recognition,
prediction and planning processes; learning process of adaptive agent; the knowledge
representation.
In this dissertation the following capabilities of adaptive agent are investigated:
st• Capability of operating in the environment, functioning as the deterministic 1 order
Markov decision process and learning environment model, corresponding
deterministic finite state automaton and approximating deterministic Markov
decision process.
• Capability of learning goal model, enabling agent to search for goal situations
purposively in unknown environments (when goal situations are represented via
reinforcement).
• Capability of orienting itself in partially observable and deterministic environment
thby solving perceptual aliasing problem: i.e. by transforming the deterministic n
storder Markov decision process into the 1 the way that environment model for
partially observable environment could be created.
Methods of investigation
During the investigation the following machine learning methods were used:
decision tree induction, constructive induction, searching methods; the knowledge of
C++ programming.
Scientific novelty
The dissertation belongs to the field of research that is at the juncture of the
experimental cognitive modeling and artificial intelligence. This work is novel in the
scientific point of view, because in the paradigm of agent and its environment the
following was never done before:
• Implementation of the algorithm, capable to perform generalization of goal
situations, based on reinforcement values. Using this algorithm agent is capable of
recognizing unseen goal situations and planning its actions to the future in both
known and unknown environments.
• Implem that benefits from the assumption about clear
separation of laws of the environment and percepts. Such separation enables the
agent to learn environment model even in such cases, when the perceptual frame
hypothesis is not valid. Furthermore, the learned environment model corresponds to
stthe deterministic 1 order finite state automaton, where states are presented not as
single variables, but as collections of variables.
th• Implementation of the algorithm that transforms the deterministic n order Markov
stdecision process into analogous the 1 order process, by using original hidden
variable insertion procedure that extends the description of process state.
6 Practical significance
Presented learning algorithms can be used as the component of artificial intelligence
in the game industry and logical puzzles. The possibility to learn automatically the
“black box” (environment) matching some finite state automaton, when learning
process is based on description of input/output signals only, could be beneficial in
different automation processes.
Publications of the research
Major dissertation results are presented in 3 publications. All publications are
referred in ISI Web of science:
1. Kapo či ūt ė-Dzikien ė J., Raškinis G. (2010). Constructive Induction of Goal
Concepts from Agent Percepts and Reinforcement Feedback. Information
Technology and Control, KTU, Kaunas, vol. 39 (3), pp. 211–219.
2. Kapo či ūt ė-Dzikien ė J., Raškinis G. (2008). Incremental Hierarchical Classifier. In
Proceedings of 14th International Conference on Information and Software
Technologies, KTU, pp. 53–61.
3. Kapo či ūt ė-Dzikien ė J., Raškinis A. J. (2008). Incremental Hierarchical
Classificator: Cognitive Approach to Decision Tree Building. Information
Technology and Control, KTU, Kaunas, vol. 37 (1), pp. 43–51.
4. Kapo či ūt ė-Dzikien ė J., Raškinis G. Learning a Transferable World Model by
Reinforcement Agent in Deterministic Observable Grid-World Environments.
Informatica, MII, Vilnius (submitted in 2009, under review).
Structure and size of the work
The dissertation consists of introduction, review of related works, review of
problems of adaptive agents in deterministic environments, the solutions for presented s, conclusions and references.
Introduction part includes relevance of the problem, the objective and tasks,
scientific novelty and practical impact. In the first chapter the papers on relevant topics
are discussed. In the second chapter the problems that adaptive agents face in
deterministic environments are analyzed. The third chapter describes the solutions for
the first problem of knowledge transferability. The fourth chapter describes the
solutions for the second problem of goal percepts generalizations. The fifth chapter
describes the solutions for the third problem of perceptual aliasing. At the end of the
dissertation the conclusions are provided and the list of references is presented.
CONTENT OF THE DISSERTATION
1 Review of related works
Adaptive agent – is an integrated system (Fig. 1.1), having a potential to change and
able to adapt itself to the changes of the environment. Adaptive agents are considered as
imitations of biological systems and often called animats. An adaptation (or
intelligence) of such agent depends on the parameters (called knowledge of agent), that
are modified during learning process.
7 Adaptive inputs:
To agent
tr Parameters tEnvironment outputs a
t t-1 t-1 (knowledge) t ts =F(s , a ) t os H(s ) t tr =R(s )
Learning of
parameters

Figure 1.1. An illustration of interaction between adaptive agent and environment

Adaptive agent can operate in the particular environment implementation (analogue
of place). Different arrangements of objects in the grid-world environment, changing
under the same laws are defined as environment implementations. Agent is not able to
change environment implementation; its actions can only change the state of
environment.
The performance of agent is iterative and separated into discrete time moments t.
tDuring the single iteration agent performs action a, which transfers it from the
t t+1environment state s to s . This fact will be denoted as:
t t t+1 (1.1) s × a → s .
In this dissertation we concentrate on the agents interacting with the environments,
functioning as deterministic Markov decision process and changing under the law of
environment F:
t+1 t t (1.2) s = F(s , a ).
t tAgent can not directly obtain s through its inputs: it obtains percepts o as the result
1 tof perceptual filter H applied on s :
t t (1.3) o = H(s ).
Percepts are related to the environment states therefore changes in states also
influence agent’s percepts. Single iteration of percepts change is defined as agent’s
elementary experience and will be denoted as following:
t t t+1 (1.4) o × a → o .
t
Some adaptive agents can also obtain reinforcement r depending on the reward
2function R (directly not accessible by the agent) :
t tr = R(s). (1.5)
Agent’s performance is separated into the life cycles – i.e. discrete sequence of
0iterations, that are necessary to find the path from initial percepts o to the goal percepts
T 0 0 1 1 To : o × a → o × a → …→ o . In this dissertation we concentrate on the agents that
start their performance having no initial knowledge about the environment

1 t H preserves information about spatial relationships of components in s .
2 Usually different reinforcement values are used to determine agent’s goals and other
situations.
8 implementation, its optimal actions and consequences of actions. Such agents are
usually evaluated according the following two criteria: the path length during one life
cycle (the shorter – the better); and the number of life cycles, necessary for their
knowledge to become steady (the smaller – the better).
1.1 Characteristics of the environment
Agents operate in the environments, having particular features:
• Observable vs. partially observable (Fig. 1.2). Environment is defined as observable
if H is the identity function (in common case). Environment is defined as partially
tobservable if H reveals only a part of information about s .
t tH (s ) H (s ) 1 2
t t to s s
t
S S So
S S S
S D SD SDS
D D DSD
(a) (b)

Figure 1.2. The illustration of differences between observable and partially observable
environments. (a) Environment is observable. (b) Environment is partially observable.
Percepts (observations) in the environment are marked by thicker frame

• Static vs. dynamic. Environment is defined as static, if the changes are only
influenced by the actions performed by agent; otherwise – environment is dynamic.
• Discrete vs. continuous. Environment is defined as discrete if it consists of
numerable number of states and actions; otherwise – it is continuous.
In this dissertation we are concentrating on the agents operating in either observable
or partially observable, static and discrete environments.
1.2 Architectural assumptions of adaptive agents
Agents can have the following architectural assumptions:
• Assumption about environment determinism. Some agents can follow determinism
assumption stating that if the same action is performed in the same state but at
different time moments, then the consequences (the next state) in both forthcoming
time moments must coincide (to be the same). If agent is following determinism
assumption and operating in partially observable environment, it can face perceptual
aliasing problem, when two identically seeming percepts and actions can lead to
absolutely different states.
• Assumptions about percepts interpretation language. Percepts can be interpreted as
one undividable unit (monolithic description), or as the collection of m components:
o = 〈o 〉 = 〈o , o , ..., o 〉. If the second case is selected, percepts can be interpreted i 1 2 m
either as the collection of propositions, or as the collection of predicates. Predicates
are more expressive, but their extraction process is more difficult.
• Assumptions about percepts arrangement. Perceptual frame hypothesis states that
the action performed by agent will influence only the small part of o in o. If agent i
follows frame hypothesis assumption, then during learning process it must cope with
the smaller part of changing percepts; otherwise it has to cope with bigger part.
9• Assumptions about the noise. The stochastic methods are usually selected if the
assumption about noise existence is followed; otherwise different induction methods
can be selected.
In this dissertation we concentrate on the possibilities of induction methods;
therefore agent being created will follow the following assumptions: deterministic
assumption, assumption that percepts are interpreted as the collection of prepositions, ption that frame hypothesis is not required (more general case) and assumption
that noise is not possible.
1.3 The architectures of adaptive agents
t tFor the adaptive agent to become capable to find the best output a for input o , it
must have action selection policy π, which maps all possible percepts O to all possible
actions A.
Depending on π and learned knowledge, all agents can be grouped into the
following (Fig. 1.3):
• Reflexive agents. Such agents learn function f: O → A directly mapping percepts to
optimal actions. Mapping function f is exactly π. The learning methods used in
reflexive agents are the following: Gaussian Mixture Models [3, 19], Bayesian
Networks [1, 6], Genetic Neuro-Fuzzy approach [14], k-Nearest Neighbor [2, 4, 16],
etc.
• Reinforcement agents. Calculated values of optimal value function Q (quality
function) or U (utility function) depend on reinforcement values. Action selection
policy π means to select actions in the way that total cumulative reinforcement
could be maximized. The methods used in reinforcement agents are the following:
Q-learning [20, 21], Dyna-Q+ [18], ADP [11, 12], Relational Reinforcement
Learning [5], Relational Learning Neural Network [8], etc.
ˆ• Anticipatory agents. The environment model F is found during the learning
process. This model maps percepts and action at t to percepts at t+1. Agent uses
ˆlearned F and searches for the virtual path from initial percepts to the goal (i.e.
performs planning). The action selection policy π is successively selecting actions
found in the virtual path. The methods used in the anticipatory agents are the
following: LIVE [17], CALM [13], SRS/E [22], etc.
(a) Reflexive agent:
t t π = f : O → A o , a Learning
(b) Reinforcement agent:
t t tt t t U(o ) or Q(o , a ) o , a , r π : O → A Action Learning
selection
(c) Anticipatory agent: t+1 t tˆo = F (o , a ) t t t+1
o , a , o π : O → A TLearning Planning o
Figure 1.3. Architectural schemas of adaptive agents
10

  • Accueil Accueil
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • BD BD
  • Documents Documents