Active Semantic Mapping for a Domestic Service Robot Miguel Oliveira da Silva Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering Supervisors: Prof. Rodrigo Martins de Matos Ventura Prof. Pedro Manuel Urbano de Almeida Lima Examination Committee Chairperson: Prof. Jo ˜ ao Fernando Cardoso Silva Sequeira Supervisor: Prof. Rodrigo Martins de Matos Ventura Members of the Committee: Prof. Francisco Ant´ onio Chaves Saraiva de Melo October 2018
80
Embed
Active Semantic Mapping for a Domestic Service Robot · Abstract Title: Active Semantic Mapping for a Domestic Service Robot Abstract: Domestic service robots need to deal with complex
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Active Semantic Mapping for a Domestic Service Robot
Miguel Oliveira da Silva
Thesis to obtain the Master of Science Degree in
Electrical and Computer Engineering
Supervisors: Prof. Rodrigo Martins de Matos VenturaProf. Pedro Manuel Urbano de Almeida Lima
Examination Committee
Chairperson: Prof. Joao Fernando Cardoso Silva SequeiraSupervisor: Prof. Rodrigo Martins de Matos Ventura
Members of the Committee: Prof. Francisco Antonio Chaves Saraiva de Melo
October 2018
I declare that this document is an original work of my own authorship and that it fulfills all the
requirements of the Code of Conduct and Good Practices of the Universidade de Lisboa.
i
Acknowledgments
I would like to thank my parents for their friendship, encouragement and caring over all these years,
for always being there for me and for teaching me that success is a result of hard work.
I would like to thank Cristiana for the support and for being always available to help me in this work
with her writing and communication skills.
I would also like to acknowledge my dissertation supervisors Prof. Pedro Lima and Prof. Rodrigo
Ventura and Tiago Veiga for their insight, support and sharing of knowledge that has made this Thesis
possible.
I would also like to thank the SocRob team, for sharing a lot of knowledge about several subjects
related to robotics.
Last but not least, to all my friends and colleagues that I have met in the last 5 years in the University
that helped me to arrive at this point.
Thank you all.
ii
Abstract
Title: Active Semantic Mapping for a Domestic Service Robot
Abstract: Domestic service robots need to deal with complex and dynamic environments. In order to
interact with them, robots must keep an up to date representation of relevant information. In this work,
an architecture to solve that problem is presented, considering the uncertainty associated with that rep-
resentation, given the stochastic and not fully observable characteristics of a domestic environment.
The architecture needs to generate a semantic map of the domestic environment, maintain it up to date
and making use of that. A solution to the agent’s problem of driving its behavior to keep an updated
probabilistic representation of the world state and using that information to carry out some tasks is pre-
sented. The architecture presented is composed by two parts: a Knowledge Representation Engine that
keeps a global belief about the world state and is responsible for generating and controlling the second
part, the Decision Maker that is responsible for the agent’s behavior. The Knowledge Representation
Engine uses ProbLog to have a probabilistic world representation and to take advantage of the inference
process to generate the Decision Maker model and the world state. The Decision Maker is composed
by a set of POMDPs, where each one is responsible for having a partial representation of the global
knowledge of the world and for making decisions, if required, in order to reduce the uncertainty about
the world state and eventually reach a specific goal. The decision making problem is divided into several
problems to reduce the state space of each POMDP and to bypass the problem of finding the optimal
policy on a large POMDP, given the poor scalability of existing solution algorithms.
Figure 2.1: Example of a 2D map of an indoor environment. Figure adapted from [1]
A semantic map has a qualitative description of the robot’s environment, allowing the robot to get an
augmented representation of what surrounds it, complementing the geometrical knowledge with seman-
tic knowledge from different sources. The word semantic, in the dictionary 1 is defined as, ”relating to
meaning in language or logic”, and for that reasoning it is expected that a semantic map represents the
meaning given by a qualitative description of what is mapped.
The semantic mapping contains assignments of some mapped features to classes, that represent
their meaning and characteristics. Furthermore, it is also possible to create a relation between these
classes and use the knowledge about them to give some reasoning skills to robots. In that way, the agent
has a qualitative description of the environment, that is closer with the human conception of the world
and a knowledge base used for reasoning. For example, a semantic map using a metric map augmented
with labels of objects and rooms that are of interest to the robot and which it should be aware, allows
it to accomplish tasks in a domestic environment like: ”Robot, bring me a cup”. A semantic map can
allow the robot to have a knowledge base to reasoning about the characteristics of a cup, where it can
be, and how to arrive there, of course, providing a reasoning engine to the robot. Basically, a semantic
map has the capacity to augment the navigations and task-planning skills and helps in the human-robot
interaction, since it provides a conception of the world close to the human being.
All of this semantic information, that a robot can get from the environment, grants robots the ability
to represent and reason its surrounding in a semantic map, and can also be organized and divided into
different modalities. The inference method used to reason about what is observed is crucial and there
is a lot of information that can be used from different sources, such as the geometry, general appear-1Online Oxford English Dictionary, en.oxforddictionaries.com. Accessed 14 Oct 2018
5
ance and shape of places, recognized objects, topology of the environment and human input. In many
methods, only single sources are used to infer some semantic information about a place, while some
other methods exploit multiple sources. There is also another important feature in semantic mapping
techniques which is the temporal coherence. It is useful due to the fact that the information acquired,
at a single point in time, is not enough to provide an evidence for reliable categorization of places, or
objects. The confidence degree is related to the time that information was acquired because most of the
environments are stochastic and dynamic.
Most of the times, a semantic map also has incorporated a topological map incorporated which can
retain both geometrical information of the places arrangement and conceptual information about them.
2.2 Logic programming
Logic programming is a method of expressing knowledge in a formal language and trying to solve
some problems running inference processes on that knowledge. The basic objects in logic programs
are variables, constants, functors and predicates [7, p. 40-41]. The variables are denoted by strings that
start with uppercase letters and the others are also denoted by strings but start with lower case letters.
A term is a variable, constant or a functor of arity n, that depends on n terms, i.e. f(t1, ..., tn). An
atom or atomic sentence is formed from a predicate of arity n that depends on n terms, i.e. p(t1, ..., tn).
A ground term is a term with no variables. A literal is an atom (positive literal) or a negated atom
(negative literal). A clause is a disjunction of literals and a unit clause is a clause with a single literal. A
definite clause is a disjunction of literals of which exactly one is positive and as the form h : −a1, . . . , anwhere h and the ai are atoms. A rule, also called normal clause, has the form: h : −l1, . . . , ln and is
a universally quantified expression form that means l1 ∧ . . . ∧ ln ⇒ h where l1, . . . , ln (the body of the
rule) are literals and h (the head of the rule) is an atom. A rule that does not have a body is a fact
and represent an unconditional truth. An important concept in logic programming is also the Herbrand
base [8, p. 351] that is the set of ground atoms, which can be constructed using the predicates, functors
and constants in the theory. Herbrand interpretations are subsets of Herbrand base.
A Herbrand interpretation can be considered as a model of a clause, (which corresponds to a world
that satisfies that clause) if for every substitution θ in the body and in the head of the clause the resulting
body is in the interpretation as well. A substitution θ is a finite set of pairs V1/t1, V2/t2, ..., Vn/tn,
where Vi are different variables and ti are the terms that will replace the respective variable. A Herbrand
interpretation is a model of a logic program if it is a model of all clauses in the theory.
For negation-free Logic Programs (LPs), or definite clause programs, the model-theoretic semantics
is given by the smallest Herbrand model, also known as Least Herbrand Model (LHM), and it is assured
that it exists and it is unique. The main goal of a LP system is to check if a given atom is true in the LHM.
6
2.3 Probabilistic Logic Programming and ProbLog
The introduction of probabilities in logic programming allows it to encode this inherent uncertainty that
are present in real-life situations. Probabilistic logic programs are logic programs in which some of the
facts are annotated with probabilities, supporting probabilistic inference and learning. In this Chapter, it
will be presented a probabilistic logic programming language called ProbLog.
A ProbLog program has a set of ground probabilistic facts and a set of rules and non probabilistic
facts [9] . The last one is the same as in logic programming. A ground probabilistic fact is a fact f
with no variables and probability p, and can be written as p::f. It is also possible to write an inten-
tional probabilistic fact, which is a syntactic sugar for compactly specifying an entire set of ground
probabilistic facts. In Example 2.1, the statement 0.5::male(V):-vertebrate(V) is an intentional prob-
abilistic fact and is a compact way to write the ground probabilistic fact 0.5 :: male(v1) : −vertebrate(v1)
and 0.5 :: male(v2) : −vertebrate(v2). ProbLog also allows the use of annotated disjunctions [10], like
the sentence 0.15::bird(V); 0.09::mammal(V); 0.5::fish(V) :− vertebrate(V), with the struc-
ture p1 :: h1 ; ... ; pn :: hn : − body.
Example 2.1.
vertebrate(v1).
vertebrate(v2).
0.5 :: male(V) : −vertebrate(V).
The different atoms in a ProbLog program can be divided into probabilist atoms and derived atoms.
The first ones are the atoms that appear in a ground probabilistic fact and the second ones are the
atoms that appear in the head of some rule in a logic program. It is also important to refer that all the
variables in the head of a rule should also appear in a positive literal in the body of the rule.
The ProbLog allows to make inference in probabilistic logic systems, and can be considered different
inference tasks [7] [9] :
• SUCC(q), where q is a ground query. The task computes the success probability of the query q
• MARG(Q|e), where Q is the set of ground atoms of interest (query atoms). The task is to compute
the marginal probability distribution of each query atom q ∈ Q given the evidence e.
• MAP (Q|e) task is to find the most likely truth-assignment q to the atoms in Q given the evidence e.
• MPE(U |e), where U is the set of all atoms in the Herbrand base that do not occur in e (unobserved
atoms). Thes task is to find the most likely world of all the unobserved atoms given the evidence.
7
2.4 Decision Making Under Uncertainty
An agent, like a robot and a Human, act based on observations taken from the environment and
there is a cycle between the agent and the world. Over time, the agent receives an observation of the
world, chooses an action through some decision-making process, applies that action on the world and
that action effects the world, which forms a cycle. For an intelligent agent, the decision-making process
of choosing an action has the goal to achieving some objectives over time, given the set of observations
and knowledge about the environment.
Most of the agents, and clearly the robots, need to deal with uncertainty during this cycle, due to
the fact that the environment is uncertain, or in other words, it is not fully-observable, non-deterministic
or both [8, p. 42-45]. An environment is considered as fully-observable if the agent sensors provide
it with information about the full state at each point in time, or in other words, if the agent has access
to all the relevant aspects about the environment to decide which action to take. For that reason, an
environment can be partially observable if part of the state is missing from the observation data, for
example, occlusion of a small object by a bigger one, or if observations are noisy and inaccurate due to
the sensors used. In a nondeterministic environment, the next state is not fully determined by the actual
state and by the agent’s action. For that reason, the actions are characterized by the possible outcomes
and if we are characterizing and quantifying these possible outcomes, using probabilities, we consider
a stochastic environment.
Thus, an agent dealing with an uncertain environment may never know for certain in what state it’s
in, considering uncertainty in perception, or where it will end up after doing a given action, considering
uncertainty in action effects. The first one is related with not fully-observable environments and the
second one, with the stochastic environments.
When an agent is dealing with uncertainty, it should be able to compare the plausibility of different
statements, even if it is not sure about them. For example, even if a robot is not sure about an object’s
color, it should be able to represent that the belief in a color is stronger, weaker or more equal than the
belief in another color. For that reason, the agent may represent the degree of belief in some statement
using some tool and the main one is the probability theory.
For an agent, like a domestic service robot, the decision problem of which action should the robot take
at each time is a sequential decision problem, in which the agent is not interested in a single decision.
The agent is interested in taking a series of decisions to solve a problem, as search and planning
problems, for example. An algorithm to make a sequential decision in stochastic environments under
the assumption that the model is known and the environment is fully observable, this is presented in
Chapter 2.4.1, with Markov Decision Process (MDP). Furthermore, Chapter 2.4.2, presents a process
where both types of uncertainty, in action effects and perception, are considered. This model is called
Partially Observable Markov Decision Process (POMDP).
8
2.4.1 Markov Decision Processes (MDPs)
Considering that the agent has perfect perception abilities about the environment, which means that
the state of the world is fully observable at any point in time, a Markov Decision Process (MDP) assumes
that there is uncertainty about the effects of the agent’s actions. In an MDP, at each time t, the agent
chooses the action at based on observing state st and receives a reward rt for taking that action in that
state.
An MDP can be described as a tuple 〈S,A, T,R〉 [2], where S is a set of states of the world, A is a
set of actions, T is the probabilistic state transition function and R is the reward function. These sets
can be considered finite or infinite, but in this Chapter, it will be discussed only the finite case.
The state transition function T (s, a, s′) represents the transition probability of ending up in state s′,
given that starts in state s and executes action a. It can also be written as Pr (s′|a, s). R(s, a) represents
the expected reward received for taking action a from state s. The reward function depends only on
the current state and action. In this model, it is also assumed that the transition depends just on the
previous state and on the action taken, not considering any state or action from the previous history of
earlier states and actions. An MDP can be represented as in Figure 2.2. The assumption associated
with this property is the Markov assumption - the state at time t only depends on the state and action
taken at time t− 1.
Figure 2.2: MDP diagram.
It is also important to define how the solution to this problem looks like because it is already known
that any fixed action sequence will not solve the problem. The uncertainty about action effects can make
the agent end up in a state different to the goal. For that reason, it is important to define a policy denoted
by π, whose result of π(s) is the action specified by the policy π for state s. A policy is a description of the
behavior of an agent, specifying what action the agent should take for any state that it might reach. And
Two kinds of policies are considered: stationary and nonstationary, [2]. A stationary policy considers
that the choice of an action depends only on the state, independently of the time step. A nonstationary
policy takes into account the time, and it is represented with a subscript t.
9
What is desired with a sequential decision process, is that the agent acts to get the best performance.
For an MDP, this performance is represented by an additive utility function of the long-term rewards.
For that reason, the quality of a policy is therefore measured by the corresponding expected utility, that
for MDPs is often referred as the value function Vπ. An optimal policy is a policy that yields the highest
value function and it is denoted by π∗. In order to find the optimal policy, it is important to define if there
is a finite horizon or an infinite horizon for decision making and for finding the optimal policy.
When dealing with a finite horizon, the agent should act to maximize a finite horizon of K steps,
maximizing the value function given by the sum of rewards of the next K steps, presented in Equa-
tion (2.1).
E
[K−1∑t=0
rt
](2.1)
In an infinite horizon the number of steps is unbounded and the sum of the rewards can become
infinite. One way to solve the problem of defining the value function in the infinite horizon case is using
a discounted model, with a discount factor γ between 0 and 1 and the value function of Equation (2.2).
E
[ ∞∑t=0
γtrt
](2.2)
For the value function of Equation (2.2), the rewards in the current time are worth more than rewards
in the future because they have more value to the agent. If γ is close to 0, rewards in the future are
considered insignificant and the closer to 1 the discount factor is, the more the effect future rewards
will have on current decision making. The discount factor ensures that the value function is finite if the
rewards are also finite.
In the finite horizon model, the optimal policy is typically nonstationary because with a finite horizon
the optimal action for a given state depends on time. For example, if the agent has a goal and it has
a short horizon, it must head directly for it, perhaps in the bigger horizon, the agent may act avoiding
more uncertainty in the actions’ result. The way that the agent chooses its actions when it has a long
journey ahead is generally different than when it decides which action to take in the last step. One can
use dynamic programming to evaluate the utility of a policy π for t steps. Thus, in the finite horizon model
the value function Vπ,t(s) is the expected utility from starting in state s and executing the policy π for t
steps, given by the recursive Equation (2.3).
Vπ,t(s) = R (s, πt (s)) + γ∑s′∈S
T (s, πt (s) , s′)Vπ,t−1 (s′) (2.3)
The step t = 1 is the last step and the respective value Vπ,1(s) = R (s, π1 (s)) is just the expected
reward for taking action of policy π1. For a generic step t, should be also added the discounted value
of the the remaining t− 1 steps, considering all the possible states s′ under the policy π and respective
10
likelihood T (s, πt (s) , s′).
In the infinite horizon model, the agent has always the same time remaining. For that reason makes
no sense to change action strategy depending on time, which is why the optimal policy is stationary and
the value function Vπ(s) is given by the unique simultaneous solution of the set of Equations (2.4).
Vπ(s) = R (s, π (s)) + γ∑s′∈S
T (s, π (s) , s′)Vπ (s′) for all s ∈ S (2.4)
This process of computing the value function from executing a policy is known as policy evaluation.
To find optimal policies for MDPs, it can be used several methods but in this Chapter, it will be
presented the value iteration method, because it will also serve as the basis for finding policies in
POMDPs in Chapter 2.4.2.
To get the optimal policy π∗ for the finite-horizon, it is only needed a complete sequence of optimal
value functions and π∗ is defined by Equation (2.5).
π∗t (s) = argmaxa
[R (s, a) + γ
∑s′∈S
T (s, a, s′)Vπ∗t−1,t−1 (s′)
](2.5)
Considering that Vπ∗t−1,t−1 is the optimal value function for the step t − 1 and that it is derived from
policy π∗t−1 and value function Vπ∗t−2,t−2, this is a recursive function until the last step t = 1 when the
optimal policy π∗1 is given by Equation (2.6).
π∗1(s) = argmaxa
R (s, a) (2.6)
In infinite horizon discounted models, computing the optimal stationary policy is independent of the
starting state. It can be proven [8, p. 654-656] that the value of an optimal policy satisfies the Bellman
Equation (2.7), given that the value iteration Algorithm 2.1 eventually converges to a unique set of
solutions of the Bellman equations for all s ∈ S.
Vπ∗(s) = maxa
[R (s, a) + γ
∑s′∈S
T (s, a, s′)Vπ∗ (s′)
](2.7)
The initialization of V0(s) may not be 0 if there is a guess of the optimal value function. In that
case, the guessed value is used in an attempt to speed up the convergence. But independently of the
initialization if |V0(s)| < ∞ value iteration can be proven to converge [8, p. 654-656]. The algorithm
terminates when the maximum difference between two successive value functions is less than some ε,
that can be chosen in order to define the policy loss. The policy loss is the most the agent can lose by
executing near-optimal policy extracted from V ′π∗ instead of the optimal policy.
11
Algorithm 2.1: Value iterationt←− 0V0(s)←− 0 for all s ∈ S
repeatt←− t+ 1forall s ∈ S do
Vt(s)←− maxa
[R (s, a) + γ
∑s′∈S T (s, a, s′)Vt−1 (s′)
]until |Vt(s)− Vt−1(s)| < ε for all s ∈ SV ′π∗(s)←− Vt(s)
Once V ′π∗ is obtained, the near-optimal policy can be easily extracted using Equation (2.8).
In the previous Chapter 2.4.1 the environment was considered fully observable and with that as-
sumption the agent always knows in which state it is in. But, most of the times, because of sensor
limitations or noise, the state might not be perfectly observable and for that reason the Partially Observ-
able Markov Decision Processes (POMDPs) take into account the state uncertainty. In POMDPs there
is also a probabilistic model of the chance to make a particular observation given the current state.
A POMDP can be described as a tuple 〈S,A, T,R,Ω, O〉 [2], where S,A, T and R are the same as
described for MDPs in Chapter 2.4.1, Ω is a finite set of observations that the agent can experience and
O is the observation function that gives a probability distribution over possible observations, given an
action and resulting state. So, O(s′, a, o) can be defined as the probability of making an observation
o, given that the agent took an action a and end up in state s′, then is Pr (o|s′, a). A POMDP can be
represented in a diagram, has presented in Figure 2.3.
When considering optimal decision making in POMDP, a direct mapping of observations to actions
is not sufficient. The agent should have a memory about its past history, so it can choose actions
successfully in partially observable environments. For that reason, the agent can keep an internal belief
state b, that summarizes all information about its past. The belief b that will be used is a probability
distribution over all the states of the set S because it is a sufficient statistic of the history, which means
that extra data about its past actions or observations would not supply any further information about the
current state [11, p. 392]. The agent is responsible for updating this belief based on the previous belief
state, the last action and the current observation. Considering b(s′) as the probability of the belief state
b assigned to the state s′. It is possible to compute boa(s′), that represents the new degree of belief after
12
Figure 2.3: POMDP diagram
doing action a and get observation o, in state s′, by Equation (2.9).
boa(s′) = Pr(s′|o, a, b) =O(s′, a, o)
∑s∈S T (s, a, s′) b(s)
Pr(o|a, b)(2.9)
The complete derivation of Equation (2.9) can be founded in [2, p. 107]. After computing Equa-
tion (2.9) for all s ∈ S, it is possible to obtain the new belief state boa. This process can be labeled as the
update belief function UB(b, a, o) and has the new belief state boa as its output.
A POMDP can be considered as an MDP in which the states are belief states, called belief-state MDP.
The set of belief states of this kind of MDP, can be considered as B and it comprises the state space.
The set of actions A remains the same and the state transition function τ(b, a, b′) is now defined as
Equation (2.10).
τ(b, a, b′) = Pr(b′|a, b)
=∑o∈Ω
Pr(b′|a, b, o) Pr(o|a, b)
=∑o∈Ω
Pr(b′|a, b, o)∑s′∈S
Pr(o|s′, a, b) Pr(s′|a, b)
=∑o∈Ω
Pr(b′|a, b, o)∑s′∈S
Pr(o|s′, a, b)∑s∈S
Pr(s′|a, b, s) Pr(s|a, b)
=∑o∈Ω
Pr(b′|a, b, o)∑s′∈S
O(s′, a, o)∑s∈S
T (s, a, s′)b(s)
(2.10)
Where Pr(b′|a, b, o) is equal to 1 if b′ = boa and 0 otherwise. The reward function for belief states can
13
be written as ρ(b, a) and is given by Equation (2.11).
ρ(b, a) =∑s∈S
b(s)R(s, a) (2.11)
The belief-state MDP has a continuous belief space since it is the space of all distributions over the
finite state space and for that reason solving a belief-state is challenging. But if it is possible to get the
optimal policy π∗(b) for it, it can be shown that the policy is also the optimal one for the original POMDP.
The problem is that the method to solve MDPs presented in Chapter 2.4.1 is not directly applicable
to this belief-state MDP given its continuity over the belief state. A possible solution to the problem is
presented in Chapter 2.4.2.B.
2.4.2.A Value Function
The quality of a policy π(b) in belief-state MDP is measured by the value function V π(b), similarly to
what is done for MDPs. The main goal is to maximize the expected rewards for each belief, following the
optimal policy π∗ that is defined by the optimal value function V ∗. The optimal value function satisfies
the Bellman equation V ∗ = HV ∗:
V ∗(b) = maxa
[ρ (b, a) + γ
∑b′∈B
τ (b, a, b′)V ∗ (b′)
]
= maxa
[ρ (b, a) + γ
∑o∈Ω
p (o|a, b)V ∗ (boa)
].
(2.12)
It has been proved that the value function V (b) presents a particular structure, given the geometric
characteristics of its form. The value function for finite-horizon POMDPs are Piecewise Linear and
Convex (PWLC) and it can be represented by a set of piecewise linear functions over the belief space:
Vt =αit
, with i = 1, . . . , |Vt| , (2.13)
where αit is a vector, with dimension equal to the number of states. It represents a hyperplane and
it defines the value function over a bounded region of the belief. Each α-vector is associated with an
action. Then, Vt can be defined as the inner product presented in Equation (2.14).
Vt(b) = maxαi
t∈Vt
αit · b (2.14)
Given these characteristics of the value function Vt, the belief space can be divided into regions. The
regions are defined by the upper surface of the α-vectors because the maximizing vector dominates the
set of vectors for that particular region, given the goal of maximizing the Value function.
14
Figure 2.4 is an example of a value function for a two-state problem, represented as a set of α-
vectors.
Figure 2.4: Example of a Value Function with two states. Figure adapted from [2]
2.4.2.B Point-Based Value Iteration (PBVI)
The limited scalability of value iteration algorithms to solve POMDPs is motivated by the dimension of
the problem and leads to several approximations to POMDP solving. In a problem with n states, POMDP
planners must reason about belief states in a continuous space with dimension n-1. For that purpose,
discretize the belief space and selecting a small set of representative belief points B is the proposed
approach of Point-Based Value Iteration (PBVI) algorithm, presented in [12].
Point-based methods, using the approximations presented, can derive Equation (2.15) to compute
the value function at each particular belief b.
Vt+1(b) = maxa
[b · αa0 + γb ·
∑o∈Ω
arg maxgia,oi
b · gia,o
]= max
gba
b · gba,(2.15)
where,
gia,o(s) =∑s′
Pr(o|s′, a)Pr(s′|s, a)αit(s′) (2.16)
and
gba = αa0 + γ∑o∈Ω
arg maxgia,oi
b · gia,o (2.17)
The backup operator which selects the maximizing vector for the belief b becomes:
backup(b) = arg maxgbaa∈A
b · gba (2.18)
15
The value function, at each step, is the union of all the vectors resulting from previous backup of all
the belief points in the set B.
There are several PBVI algorithms in the literature. In [13] one, called Perseus is presented. This
randomized PBVI algorithm performs approximate value backup stages, ensuring that in each backup
stage the value of each point in the belief set is improved, but with the important characteristic that a
single backup may improve the value of not just the respective belief point. Perseus backs up only a
(randomly selected) subset of points in the belief set that is sufficient for improving the value of every
belief point in B.
2.4.3 POMDP with Information Rewards (POMDP-IR)
In an active perception task, the goal is typically to increase the available information by reducing
the uncertainty regarding the state. It means that the agent, considering the effects of its actions, must
decide what actions it should take to efficiently reduce the uncertainty about the state variables. A
typical POMDP is a possible decision-theoretic model for active perception. However, usually reducing
the uncertainty about the state it is not expressed as the goal but is the consequence in order to achieve
it. For example, if the goal is to pick an object, the agent may take actions that reduce its uncertainty
about the object’s location. However, rewarding an agent for reaching a certain level of belief may not be
easy to be done in these typical POMDP models. For that purpose in [14] a Partial Observable Markov
Decision Process - Information Rewards (POMDP-IR) is presented. In POMDP-IR, a reward information
gain is given, keeping the characteristic of a classical POMDP, having value functions PWLC.
POMDP-IR introduces the addition of a new set of “information-reward” actions (prediction actions)
to the problem definition. Considering that the state space can be factored as presented in Equa-
tion (2.19).
S = X1 ×X2 × ...×Xk × ...×XK (2.19)
At each time step, the agent simultaneously chooses a normal action an and a prediction action ak for
each particular state variable Xk that the agent wants to have low uncertainty. Prediction actions have
an action space AK = commit, null and they have no effect on states or observations, but may affect
rewards. The reward function in the POMDP-IR is equal to the sum of the original reward function R of
the POMDP and a reward Rk for each Xk, given by Equation (2.20).
Rk (b, ak) =
P (Xk = xk) · rcorrecti −(1− P (Xk = xk)
)· rincorrecti if ak = commit
0 if ak = null(2.20)
At every time step, the agent can choose to either execute only a normal action, choosing ak = null,
or in addition also receive a reward for its belief over Xk, choosing ak = commit. Thus, the expected
16
reward of choosing commit is only higher than the null action when Rk(b, ak) > 0, which implies
P (Xk = xk) >rincorrecti
rcorrecti + rincorrecti
. (2.21)
If rewarding the agent for having a degree of belief, P (Xk = xk), of at least β, is desired, then it is
important to set the relation between rcorrecti and rincorrecti in order to the expected reward of choosing
commit being higher than the null action, when P (Xk = xk) > β. The precise values of rcorrecti and
rincorrecti depend on the model and the original reward function R.
17
3Related Work
Some work has been developed in the research area in order to find a way for a robot to get better
conception of the environment that surrounds it. This conception is not only related with the geomet-
rical characteristics of the environment, but also related with the semantic information. The semantic
information is related to some cognitive interpretation capacities that the human has and that with the
semantic mapping methods has been applyed to the robots. The work in [1], presents an overview
about what has been done in semantic mapping, for different types of environments and different type of
applications, as it is explained in more detail in Chapter 2.1. This work will focus on semantic mapping
for domestic indoor environments. In [15], the authors present a layered model of the world at different
levels of abstraction, metric line map, navigation graph, topological map and conceptual map. The lower
levels are derived from sensor input and are used to robot localization and navigation, and the higher
levels provide a human-like categorization of the world. The metric map is obtained by SLAM. The nav-
igation graph establishes a model of free space and its connectivity, adding some semantic information
on this level, storing the objects detected and using label history, assigning the navigation nodes to one
of the classes: room, corridor, or doorway. The topological map divides the nodes in the navigation
graph into groups that are separated by a doorway node. In the last level, there is a conceptual map
and conceptual knowledge is encoded in Web Ontology Language - Description Logic (OWL-DL). With a
description-logic based reasoning software, based on the knowledge representation it is possible to infer
new knowledge about the world that is neither perceived nor given verbally. However, this work does
not provide decision making capabilities needed when performing tasks, given the knowledge acquired
about the environment.
In [16] another approach for semantic mapping representation and the way to use that information
in the performance of navigation tasks is introduced. This approach uses two parallel hierarchical rep-
resentations of the space, a spatial representation and a semantic one. The first one is related to the
18
sensor-based representations of the environment and the second one has the symbolic representation
of the space. The link between both uses the concept of anchoring. In each of the representations the
hierarchy is related to the level of detail of the information and the level of abstraction is bigger in higher
levels. Making use of the anchoring connection between both representations, two kinds of inference
were developed. Based on recognized objects the inference system is able to classify, semantically, the
room where the object was recognized and based on semantic information about rooms the inference
process deducts the probable location of a non-previously seen object. The authors validate their ap-
proach, testing the learned model by executing navigation commands. Some of the authors of [16], in
the article [6], using the semantic map representation explained above, present a task planning process
using a Temporal-Logic Progressive Planner (PTLplan), that is able to deal with partial observability and
uncertainty, however the knowledge representation system is Loom, which only supports declarative
knowledge, not allowing probabilistic annotations to the facts and probabilistic inference.
In [17], the authors propose a formalization and a standardization in the representation of seman-
tic maps and they make a proposal for evaluation and benchmarking semantic mapping methods. A
”formalization of a minimal general structure of the representation that should be implemented in a se-
mantic map” is proposed, where the representation is defined by a global reference system, a set of
geometrical elements obtained as raw sensor data and a set of predicates that provide an abstraction
of geometrical elements. Based on the idea that a ground truth for semantic maps exists, building a
dataset to be shared by the scientific community is proposed and that allows a fair comparison between
different semantic mapping methods.
The approach in [18] presents a 3D semantic mapping technique that uses the point cloud consisting
of multiple 3D scans, obtained by 6D SLAM, to do scene interpretation and labeling of the basic elements
in the scene, as for example walls, ground, doors and others. Afterwards, data is transformed into 2D
images that are used to detect and localize objects and after the object localization is transformed back
into the 3D data. For interpreting planes in the scene, a constraint solver in Prolog was used, but there
were no more inference methods used.
In [3] a spatial knowledge representation is presented, called by the authors Cognitive lAyered Rep-
resentation of Spatial knowledgE (COARSE). It is based on layered representation with different levels
of abstraction and it is designed for representing complex, cross-modal, spatial knowledge considering
the uncertain and dynamics of the space, as presented in Figure 3.1. This representation is the main
principle of the work presented in [19], where it is assumed that knowledge should be abstracted to
keep the representations compact, allowing the robot to infer additional knowledge about the environ-
ment based on combining background knowledge with observations. In order to characterize the space
in a higher level of abstraction, the system assigns properties to places, such as objects, shape, size
and appearance.
19
Figure 3.1: The layered structure of the spatial representation in [3], showing the different levels of abstraction ofthe spatial knowledge. Figure adapted from [3]
To represent the conceptual map, a probabilistic chain graph model is used and the structure is
adapted at runtime according to the state of the topological map. In order to perform inference this model
is first converted into a factor graph representation and afterwards an approximate inference engine
is applied, Loopy Belief Propagation to consider time constraints. However this work only supports
inference of unexplored concepts, such as objects or rooms, and it lacks in inference about explored
concepts. The characterization of explored concepts can suffer modifications given the fact that the
environment is stochastic. The inference process also allows, for goal-oriented exploration, to use a
distribution of possible extensions to the known world.
In [20] the authors, propose a representation of the semantic map, which they refer to as SOM+
(semantic object maps), using a symbolic knowledge in description logic, having a spatiotemporal rep-
resentation of object poses. It is also associated Prolog predicates for inference process. The SOM+ is
an abstract representation of the environment that contains facts about objects and links objects to data
structures such as appearance models or other features used by the perception system to recognize
the objects. The work was developed with the objective of making the robot able to interact with a small
environment, more specifically, a kitchen.
The authors in [21] present a system that allows acquiring new objects in the representation through
a continuous human-robot interaction. At the beginning, the robot is guided by a user in a recognition
tour that allows an initial construction of the semantic map but the robot is also able to acquire additional
20
knowledge about the environment after the initial set-up, through a multi-modal human-robot interaction.
The behavior of the robot to interact with the humans and to collect information to update the semantic
map is implemented using Petri Net Plans. Prolog is also used to store information about the topological
graph of the environment and for each object is created predicates with information about object’s type,
localization, position and properties in order to perform inference on it.
In [22], probabilistic conceptual maps and probabilistic planning have been also combined in object
search tasks, where the conceptual map is represented as the higher layer of the hierarchical knowl-
edge representation in [3]. In order to do planning, a switching continual planner was presented which
switches between Decision-Theoretic Planning Domain Definition Language (DTPDDL) and classical
modes of planning at different levels of abstraction.
The most similar work, with what is proposed in this master thesis, is the work in [23], where the prob-
abilistic representation of the semantic map is based on probabilistic programming language ProbLog.
However, probabilistic inference tasks were used to infer a query given an evidence, inferring the prob-
ability of an object to be in such a place given a statement that expresses the probability of observing
an object in that place and an evidence (observation) confirming it. This work [23] not only presents a
probabilistic knowledge representation, but also a framework for planning under uncertainty, that was
a POMDP, computing approximate solutions in order to manage the scalability problems of POMDPs.
The decision maker also takes into account phenomena that may affect the perception algorithm, as an
error in vision algorithms and possible occlusions. In this work, a POMDP with Information Rewards
(POMDP-IR) [14] is used. This framework intends to reward the agent for reaching a certain level of
belief regarding a state feature. Because, if more certain information about the state improves task
performance, it is important to increase the available information by reducing the uncertainty regarding
the state. On this paper, the work was developed to active cooperative perception for fusion of sensory
information with the goal of maximizing the amount and quality of perceptual information available to the
system.
In [24] a solution for POMDPs is also presented when the problem of having an explicit measure
of the agent’s knowledge about the system, based on the beliefs instead of states, is incorporated in
the performance criterion. For that reason, the defining rewards are based on the acquired knowledge
represented by belief states. This framework is called ρPOMDP. If the reward function for beliefs ρ
preserves the convexity, the convexity of the concerned belief-based value function is proved. If ρ is
PWLC and the initial value function is equal to 0, then the belief-based value function is also PWLC and
it is easy to adapt POMDP algorithms to solve ρPOMDPs.
21
4Proposed method
4.1 Architecture Description
As presented in Chapter 1.3, the proposed approach for the problem is to create a probabilistic
knowledge representation of the world, that is able to provide enough information to the agent to take
decisions and keeping it updated. For that reason, the architecture that was developed can be divided
into two main parts and has the structure presented in Figure 4.1. The first part, designated as Knowl-
edge Representation Engine receives the world model as an input and is responsible for the operation
of all the architecture, as explained in detail in Chapter 4.1.1. This part is also responsible for the full
generation of the second part (Decision Maker). The Decision Maker part is composed of a set of
POMDPs, where each one is responsible for having a partial representation of the global knowledge of
the world. If selected, the POMDP takes the role of deciding which actions the agent should take. For
semantic mapping in a domestic environment, as proposed, a model that makes sense to use is to have
a Decision Maker with a POMDP for each room. If the world model is a house with N rooms, the Decision
Maker will have N POMDPs. The Decision Maker is also explained in more detail in Chapter 4.1.2.
Given the fact that the Decision Maker has multiple POMDPs, it is also necessary to choose which
one takes the role of driving the behavior of the agent, at each moment. The Knowledge Representation
Engine is also responsible for taking that decision and for that purpose, before taking the decision, it
analyses the Value function of each POMDP, given the current belief state. How to make this choice is
also explained in more detail, in Chapter 4.1.2.A.
Summing up, the architecture needs to be initialized and for that the Knowledge Representation En-
gine needs to generate a global belief about the world state in ProbLog and the different POMDPs,
given the world model provided. Then, using that initial global belief, it analyses the Value Function of
each POMDP created, to choose which one should drive the agent’s behavior. The chosen POMDP
22
will keep driving the agent’s behavior, updating the internal belief of the POMDP, given the set of pairs
action-observation. This internal belief also keeps updating the global belief in Knowledge Representa-
tion Engine, as explained in Chapter 4.1.2.B. When the POMDP starts to take the action to do nothing,
it means that the agent as already accomplished the goal and it stops being the POMDP taking care
of the agent’s behavior. At this point, it will return to the Knowledge Representation Engine the final
POMDP internal belief, updating the global world representation. Given the new updates in the global
world representation, the Knowledge Representation Engine decides again which POMDP should be
chosen, repeating the cycle.
ProblogSWorld Model
POMDP 2S2
POMDP 1S1
POMDP NSN
...
action
observation
action
observation
action
observation
Knowledge RepresentationEngine
Decision maker
b1
b2
bn
Figure 4.1: Scheme of the architecture operation
4.1.1 Knowledge Representation Engine
The Knowledge Representation Engine, as it was explained before, is mainly responsible for the
architecture operation. It has the global world representation and chooses which POMDP should drive
the agent behavior, based on the current global belief. For that purpose, it starts by receiving an initial
world model, that in a semantic mapping context can be considered as a list of objects, furniture and
rooms with their characteristics, such as position, volume, size and others. That information is used to
create a set of facts in ProbLog. The Knowledge Representation Engine is also responsible for having a
representation of the interactions and relations between the world model components, considering the
uncertainties that are present in real-life models. In the semantic mapping context, it is necessary to
define the relationship between different objects, objects and furniture, furniture and rooms, etc. This
is possible to be done, defining a set of rules and probabilistic facts in ProbLog, which specifies the
23
behavior guidelines, and then taking advantage of the inference process of ProbLog. That information
will be useful to make some inference about the world state and to generate the POMDPs. The global
world representation presented in the Knowledge Representation Engine can be called global belief b,
representing the probability distribution over the set of possible world states S.
As referred before, the Knowledge Representation Engine is initially responsible for the full genera-
tion of different POMDPs, dividing the global world representation into subworlds, based on the criterion
of division defined a priori by the model. Each tuple 〈Sn,An, Tn, Rn,Ωn, On〉, that defines the POMDP n,
as explained in Chapter 2.4.2, is completely defined by the Knowledge Representation Engine. The goal
of making this division is to simplify the global world representation in a set of smaller worlds, in order
to be easy to make decisions in each one. For that reason, the dimension of Sn for each POMDP is
smaller than S, that considers all the possible world states.
The Knowledge Representation Engine considers that, at each time, a state S can be defined as
the joint discrete probability distribution of a set X = X1, X2, . . . , XK of independent discrete random
variables. Each state S in S can be defined as:
S = X1 ×X2 × . . .×Xk . . .×XK (4.1)
Each variable Xk is denominated as state variable and it has a set of possible outcomes Dk, that
corresponds to its domain. Then, the dimension of the world state |S| is equal to the combinations of
the domain Dk of each state variable Xk,
|S| =K∏k=1
Dk. (4.2)
Each POMDP n of the Decision Maker has a set of states Sn. Each state S′ in Sn is defined as the joint
discrete probability distribution of a set Xn of independent discrete random variables. It is important to
notice that,
|Xn| ≤ |X | (4.3)
and that each variable X′
k ∈ Xn has a match with the variable Xk ∈ X , because they are representing
the same feature in the world model. However, they are not the same because they have different
domains. The domain of X ′k is D′k and it is adapted to the subworld of the respective POMDP. It should
be noted that there is an important characteristic of the relation between Xk and X ′k domains, given by
Equation (4.4), because the subworld in each POMDP is restricted, compared with the global world.
|D′k| ≤ |Dk|, (4.4)
The conditions presented in Equations (4.3) and (4.4) are the reason for the dimension of Sn be smaller
24
than S.
To construct the domain of the variables D′k in a POMDP, the domain values that are not available on
that specific subworld need to keep represented, because those values are still valid in the global world
representation. For that reason, all those values can be aggregated in a single value that for example,
can be called as none. It keeps representing those values, but not discriminating each one individually,
minimizing the number of POMDP states, as desired. Every time that the Knowledge Representation
Engine needs to calculate the belief bn of each POMDP n, it needs to generate the new probability
distribution for each state variable X′
k of each POMDP n. For that purpose, it is considered a function
fk,n for each state variable X′
k of each POMDP, that associates each element of the domain Dk to a
single element of the domain D′k,
fk,n : Dk → D′
k. (4.5)
If Dk = D′k, fk,n is an endofunction, however, the most common case is to have D′k ⊆ Dk ∪ none
where none represents the set of elements Dk \D′k and then
P (X′
k = x′) =
∑
x∈Dk\D′k
P (Xk = x) , if x′ = none
P (Xk = x′) , otherwise(4.6)
4.1.2 Decision Maker
The Decision Maker designed needs to deal with uncertainty in different aspects, such as observa-
tions and action results. As was explained in Chapter 2.4.2, a POMDP is able to consider that uncertainty
and make decisions on those conditions, however, finding the optimal policy on large POMDPs is limited
by the poor scalability of existing solution algorithms and the large state spaces is one important source
of intractability. This problem can be minimized, dividing the decision making task into several POMDPs,
where which one is responsible for taking limited decisions, given the limitation of the possible states,
actions and observations of the subworld that it represents. However, all of them, together with the
Knowledge Representation Engine, can be able to get an agent behavior close to the one given by the
optimal policy, obtained from a single POMDP representing the global world model. For that purpose,
at each moment, it is important to have an engine able to select the POMDP that makes more sense
to guide the agent’s behavior, given the current global belief b and the agent’s goal, as explained in
Chapter 4.1.2.A.
When initialized, each POMDP n of the Decision Maker also needs to be solved, using a POMDP
solver to compute the optimal policy π∗ that maps all possible beliefs bn, in the belief space B, to an
action a in the set An of the possible actions that the robot can perform in that subworld. The POMDP
solver computes an approximation of the optimal policy π∗, that is the one that maximizes the agent’s
expected total reward, given by the value function V (b, π).
25
4.1.2.A POMDP selection
The POMDP selection needs to be done by the Knowledge Representation Engine, taking into ac-
count the current global belief b, that provides the information about the distribution over the possible
world states. So, it starts by computing each POMDP belief bn, as explained in Chapter 4.1.1. Then,
those initial beliefs can be used to calculate the expected total reward of each POMDP n, using the
value function Vn(bn, π∗) that was calculated previously. Therefore, comparing the expected total re-
ward of each POMDP, it is possible to use different selection criteria to choose the POMDP that should
conduct the agent’s behavior. Those different criteria are related to the main goal of the agent.
The different POMDP value functions can be compared, because the model used by the Knowledge
Representation Engine to generate them is the same. In other words, the rewards values and the
observation and transition probabilities are similar and the differences are just related with the specific
characteristics of the subworld that the POMDP represents and those characteristics are supposed to be
reflected in the Value function. The fact that the POMDP states are not the same is also a differentiating
factor and influences the value of the value functions as desired, in order to characterize the POMDP.
4.1.2.B Global knowledge representation update
Each time that a POMDP is selected, it guides the agent’s behavior, updating the internal belief bn,
with the collected information. At the same time, the updated internal belief of the POMDP also updates
the global belief b of the Knowledge Representation Engine.
Both beliefs b and bn are defined as the joint probability distribution of the sets X and Xn of inde-
pendent discrete random variables, respectively. Remembering the independence between the state
variables in Xn and in X , updating the belief b with the new belief bn is the same as updating the prob-
ability distribution of each variable Xk given the probability distribution of the respective X ′k and then
calculating the joint probability distribution of the variables updated in the set X . The probability distri-
bution of the state variables Xk remains the same when there is no correspondent X ′k in the set Xn. On
the other hand, the probability distributions of the remain variables Xk is updated individually, taking into
account the probability distribution P(X′
k|Z)
that come up from the POMDP n. This probability repre-
sents the distribution of the variable X′
k, given the observations Z that the agent collected. Considering
P (Xk) as the prior probability distribution of Xk, the subsequent probability P (Xk|Z) is given by the
Equation (4.7), where fk,n is the function that associates each element in Dk to an element in D′
k, for
the POMDP n. The complete derivation of Equation (4.7) is presented in Appendix A.
P (Xk|Z) =P (Xk)∑
x∈Dk
P(X′k|Xk = x
)P (Xk = x)
P(X′
k|Z), with X
′
k = fk,n (Xk) (4.7)
26
Considering that P(X′
k|Xk
)is given by Equation (4.8), Equation (4.7) can also be written as in (4.9).
P(X′
k|Xk
)=
1 , if X
′
k = fk,n (Xk)
0 , otherwise(4.8)
P (Xk|Z) =
P(X′
k|Z)
, if Xk ∈ Dk ∩D′
k
P (Xk)∑x∈Dk\D
′k
P (Xk = x)P(X′
k|Z)
, otherwise (4.9)
4.2 Semantic Mapping Application
The architecture designed can be applied in different contexts. The semantic mapping in a domestic
environment application is the main motivation of the work presented. In semantic mapping, it is possible
to consider a house configuration as the world model, with all its rooms, furniture, objects and their
respective characteristics and relations. In a domestic environment, using this architecture it is possible
to have a semantic map, representing the probability distribution of any objects being placed over the
considered placements (furniture where an object can be placed) or being located over the possible
rooms. Knowing that at each moment the position of each object does not depend on the position of
the remaining objects and robot, the set of state variables Xk, can be the robot and the location of the
objects, and the domain Dk can be possible robot locations and possible places where the objects can
be located, respectively. Using this world model of the house, the Knowledge Representation Engine
can generate a POMDP for each room, that represents that subworld. In each POMDP, the possible
states correspond to the different possible combinations of the location of the objects and robot location,
inside that room. The state variables X′
k are robot and objects location, considering of course that the
domain D′
k is a smaller set than Dk, because of the restrictions in the placements and robot locations.
D′
k can also take the value none in order to represent the possibility of the robot or object being located
in another room, where applicable.
The Knowledge Representation Engine also needs to define the possible actions and observations
of each POMDP. In an architecture where the goal is to generate and keep updated a semantic map
of the environment, it makes sense to have actions for moving the robot, search for objects and an
extra action for doing nothing. This action is chosen by the optimal policy when there is no more big
value to explore that room. Then, the robot behavior should stop being guided by that POMDP and the
POMDP evaluation and selection process should be repeated using the new information collected. The
observation function presented in each POMDP can be represented as the possibility of observing an
object or not, based on the object characteristics. In the semantic mapping context, the main goal of the
agent is to reduce the uncertainty about objects’ location. In order to represent this goal in the POMDP
27
model, the advantage of the POMDP-IR presented in Chapter 2.4.3 is used, rewarding the agent for
reaching a state with lower uncertainty in the location of the objects. For that reason, a reward Rk for
each state variables Xk of an object is used.
Summarizing, each POMDP model of the Decision Maker, for the semantic mapping application is
defined by:
1. States and Transitions: The model considers one state variable for the robot and a state variable
for each object that can be located in the room. The robot and the object state variables repre-
sent the location of the robot and objects, respectively. The state transition model for the robot
represents the probabilities of it being located in a certain location, given the previous one and the
action taken.
2. Observations: The model has an observation binary variable for each object variable considered
to indicate the probability of the object being observed by the perception module or not.
3. Domain Actions: There is one action for searching for objects, triggering the perception model,
one action for moving the robot to each placement and an action just to stop the robot, indicating
the end of a searching process in that room.
4. Prediction Actions: A prediction action variable for each object is considered, indicating whether
an object is believed to be in some location in the room, not found in this room, or null if there is
not enough information.
5. Rewards: Each reward value of taking an action, given the robot and object location, depends
on the environment and the desired agent’s behavior. However, given the usage of information
rewards, in general, it makes sense to give higher rewards to the stop action than to the search
object action and higher rewards to the search object than the move actions, in order to represent
the action effort.
6. Informarion Rewards: The information rewards considered depend on the desired degree of
belief about the location of the objects.
An example of a POMDP model with two objects is presented in Figure 4.2, where the arrows repre-
sent the dependencies.
The fact that objects can change locations with time, given the natural dynamic characteristic of a
domestic environment, this can be represented in the model. The Knowledge Representation Engine
updates the distribution of the global belief, considering that there is an exponential decay, in the prob-
ability distribution of each object state variable Xk. For that purpose, the probability distribution of each
variable Xk is equal to Equation (4.10), where Pprevious(Xk = x) is the value of the probability distribu-
28
Figure 4.2: POMDP model example
tion of Xk at the previous time step, Puniform(Xk = x) is the probability value for a uniform distribution
Table 5.13: Mean Hellinger distance for 100 steps for different object configurations in Scenario 1.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table, DT - Dining Table
experiment 1 2 3 4 5cocacola S 0.233 DT 0.630 S 0.229 KC 0.433 BS 0.208mug gray DT 0.658 KT 0.478 DT 0.659 KC 0.435 B 0.436pringles NS 0.474 CT 0.243 DT 0.651 S 0.237 KT 0.388
experiment 6 7 8 9 10cocacola BS 0.215 B 0.446 NS 0.219 S 0.233 KT 0.342mug gray B 0.443 CT 0.449 B 0.234 DT 0.644 KC 0.353pringles KC 0.389 KC 0.208 BS 0.253 S 0.232 CT 0.284
Table 5.14: Mean Hellinger distance for 100 steps for different object configurations in Scenario 2.KT - Kitchen Table, KC - Kitchen Cabinet, S - Sideboard, CT - Coffee Table,
BS - Bookshelf, DT - Dining Table, B - Bed, NS - Night Stand
Considering that a uniform distribution corresponds to Hellinger distance equal to approximately
0.74 for Scenario 1 and 0.8 for Scenario 2, it is possible to verify that for different configurations in
both scenarios the architecture is able to keep a reduced uncertainty about the environment. However,
in Scenario 2 the mean Hellinger distance is bigger than in Scenario 1 because its complexity is also
bigger.
In order to also analyze if the architecture is able to keep a reduced uncertainty about the location
of the objects when the objects configuration changes, in Table 5.15 and Table 5.16 is presented the
mean Hellinger distance for 200 steps for Scenario 1 and 2, respectively, on that conditions. For each
experiment is selected an initial random configuration of the location of the objects and an initial uniform
44
distribution. Approximately halfway, the location of at least two of the objects is modified.
experiment 1 2 3 4cocacola CT KC 0.178 KC KT 0.215 DT CT 0.333 S S 0.199mug gray KT KT 0.198 DT KC 0.285 DT KT 0.354 S CT 0.220pringles DT S 0.303 DT S 0.344 DT S 0.331 KT DT 0.292
experiment 5 6 7cocacola KT DT 0.325 DT S 0.310 S DT 0.321mug gray S DT 0.341 DT CT 0.349 S KC 0.188pringles S KT 0.186 S KC 0.242 DT S 0.284