Dynamic Selection of Action Sequences

Presented at 2nd Conf. on Simulation of Adaptive Beahavior, Honolulu, 7-11 dec. 1992. MIT Press/Bradford Books #1

Dynamic Selection of Action Sequences

Feliz Ribeiro1, Jean-Paul Barthès1 and Eugénio Oliveira2

1 Université de Technologie de Compiègne

U.R.A. C.N.R.S. 817 Heudiasyc

B.P. 649 F-60206 Compiègne cédex FRANCE

tel: (+33)44.23.44.23

fax: (+33)44.23.44.77

email: {fribeiro, jpbarthes}@hds.univ-compiegne.fr

2 Faculdade de Engenharia da Universidade do Porto

D.E.E.C.

P-4099 Porto codex PORTUGAL

tel: (+351-2)27.505

fax: (+351-2)23.192.80

email: [email protected]

Abstract

Planning has been generally considered as a problem-solving activity where it would be possible to search through a state space to find an admissible, and often optimal, solution. Action operators map states to states, modifying accordingly some of the facts known to be true (or false); the decision of which operator to apply at any given step is based on the analysis of what the corresponding plan would look like a few steps ahead. Lately, some authors have been suggesting that planning should, and could, be viewed as the result of the interactions of a group of some sort of computational units

(agents)1. These units are simple, are strongly interconnected between them and to the external world, and each can be selected to decide on the action to perform at a given moment. What the units can be and what kinds of interactions can take place is what this paper is concerned about. We present here some results we obtained by ―wiring‖ the state and the goals into a network where the nodes are the entities in the world being modeled. They exchange limited information, have simple behaviors, and modify their available interaction links accordingly to their current state. These approaches clearly suggest that decentralised, local forms of control can ensure coherent behavior in changing and

1 ―This image raises the possibility that we can dispense with the all idea of planning and just focus on competent behavior‖ — McDermott (1988).

unpredictable outcome worlds, while at the same time keeping the units very simple.

1. Introduction

This paper is concerned with planning viewed as the result of interactions between a group of simple units (i.e., presenting restricted cognitive capabilities), each having only local information about the state of the world and having some means of transmitting limited information to the others. Units can be the entities being acted upon (e.g., the blocks in a blocks world), or the actions which can be performed (e.g., instantiated operators). These units are highly reactive to the external environment and to the others, and as a result they increase/decrease some internal measure of the suitability they have to be chosen to perform an action, given the present state of the world. Even if the units are not cognitive by nature (there's neither internal inference nor knowledge representation structures) they are not only purely reactive either. A unit reacting to a changing fact in the environment generally entails a chain of reactions involving other units, which makes that the result of the initial reaction will be ―weigthed‖ over the entire group. As such, these approaches avoid the problems which traditionnal planners generally have to cope with:

• the Sussman anomaly (conjunctive goals2).

• the detection and the resolution of conflicts in non-linear plans.

2 cf. p. 112, Waldinger (1977).


• the exponential growth of the size of the state space to search.

Having no explicit global representation of the world, and since there is no partial plan being built, the group of units simply ignores these problems. In the remainder of this paper we show how a network of units linked by their local state and their goals ―builds‖ a plan. First, in the next section we describe the algorithm for computing energy. In section 3 we discuss our approach, and section 4 analyses some related work. Finally, in section 5 we expose our concluding remarks. Throughout the paper our examples are taken from the blocks world. The planning problems found in a world simple as it is have been discussed for some time now, even if they were not always solved, or even completely understood. Some examples here and in the related bibliography show difficult cases which are solved in a quite natural way if we look at the effort classical planners do — e.g., in Sacerdoti (1977) the later is proportional to the number of conflicts.

2. How do interacting units plan

Our approach relies on the multi-agent planning paradigm where a group of agents builds collectively a plan. Each agent has a goal, a state, and the task of the group is to satisfy all individual goals; moreover, we want the global solution to be optimal, i.e., the one that minimizes the total number of steps necessary to reach that solution. If we were to characterize our approach inside the planning or action sequencing work, we would say that it is an ―attempt to avoid planning‖ (following McDermott, 1992).

2.1. What units really are

We take here as example the blocks world where cubes must be moved from one place to another. However, we are only interested here in the mechanisms allowing an optimal selection procedure to take place inside a group of agents (whatever they may be) and so we do not care about external elements which could act as resources for these agents — for example, a robot manipulator that could only act as a ―transporter‖, or sensory devices that would act also upon request. In a sense, the agents we consider are aware of these possibilities, although this view can certainly confine our problem to a smaller area of the overall adaptive action selection behavior. Each agent manipulates a crude representation of its current state and that is all that is needed in order to interact with the others — we assume that the means it has to build that representation are available as they do not rely on a complex processing of the data they manipulate.

In our case, agents are the blocks and the table. We say that blocks are satisfied if they are both in their final position and the one supporting them is also satisfied; the table is always satisfied (for the moment we deal with an infinite capacity table: there is enough space for all blocks).

Blocks have two choices when moving: either their final position is satisfied, in which case they move to it, otherwise they move to the table. When a block is selected to move, it always chooses one of these two options, and so it must be free (i.e., not supporting a block). At each step, the block selected to move, besides being free, must also be the one that minimizes the total number of moves. In other words, with an infinite capacity table, a block has a maximum of two moves: one to the table (if its final position is not satisfied), and another one to its final position.

All the agents are linked by two binary relations: depends-on (Don) and blocked-by (Bby). The first says that

if the goal of A is to be on top of B, then A depends-on B.

The second just reflects the current state of the world: if A

is on top of C, than C is blocked-by A. The inverse

relations also exist (i.e., for the above we could say that B

is-dependence-of A and that A blocks C); for simplicity we will omit them in what follows, their presence being implicit. The agents and the relations form a network where the nodes (the agents) are going to be assigned numeric

values (henceforth called energy3) which come from other nodes via the links representing the above relations between the entities — see figure 1: the current and final states are shown at the top, and the initial network at the bottom.

fig. 1. Initial configuration and network.

After a move, the blocked-by links are updated according to the new state of the world; unless we allow for changing goals, an agent's depends-on link will never be modified. When all agents are satisfied the algorithm stops: the plan executed is then the temporal ordering of all the moves made so far.

Before going further we will highlight two points:

• the energy of a block (which expresses, in a sense,

3 Given the similarity of the approach, we choosed the term ―energy‖ which plays the same role (albeit computed in a different manner) of the ―activation energy‖ of Maes (1989).


the agent’s will of getting satisfaction4) is computed based on local information about the node: that is, based on its goal and on its current state, as reflected by the links blocked-by and depends-on which transmit energy. All decisions are thus local, although we will see later that blocks propagate also energy: the energy of a block can thus have been modified by some other block.

• the plan is the collection of moves (decisions after each propagation phase). If a move does not produce the desired results (a block falls while being moved by a gripper) only the links between the entities involved — the block being moved and its initial and final positions — are updated. In the next propagation phase there will be no replanning whatsoever (which can be rather complex to deal with, e.g., Drummond and Currie, 1989); the network will just use the new links and again another agent will be chosen. This is quite important when interleaving planning and execution, where there is often the need to modify previous planning sequences that were based on expected outcomes.

Some hypothesis are made concerning the agent's behavior. An agent is commited to retransmit whatever information is to be propagated (see section 2.2). This means that, even if they have some degree of autonomy, they respect the rules governing the action selection mechanism. As they are all engaged in some sort of a cooperative problem-solving process, it is not apparent how they could, for example, hide information so as to benefit from it (e.g., as in Zlotkin and Rosenschein, 1990).

These two remarks will become apparent later. Next we will describe how energy is computed for each agent.

2.2. Computing energy

Planning will be considered here as the collection of choices made by the group of agents at regular time intervals (the time needed to compute the energy terms we will see below). These choices are based on the energy level of each block after all exchanges of energy have been done. It is this mutual exchange, or better, propagation, mechanism which will be described below.

At the beginning of each propagation phase all entities have zero energy; the table besides having no energy is always satisfied — it will never be moved…

Energy is computed as the sum of four components:

Eb is the energy exchanged via the blocked-by links.

4 As in the Eco-problem-solving approach (Ferber, 1990).

Ed is the energy coming from the depends-on links.

Edb is the energy going through the depends-on

links and propagated via the blocked-by links.

Ebd is the energy going through the blocked-by links

and propagated via the depends-on links.

Energy values range from -∞ to +∞. E(X) denotes one

of the above terms for block X. Let us first introduce some

auxiliary predicates: sat, don and bby. We will say sat(A)

if A is satisfied, don(A, B) if A depends-on B, and

bby(A, B) if A is blocked-by B. So, if we are computing

the energy for block X, the following local increments in its energy are made:

• If bby(x, y),

∆Eb(X) = 0 if sat(X) -1 otherwise

∆Eb(Y) = 0 if sat(X) +1 otherwise

if Z : bby(Z, X) the function propagate-b(Z,

∆Eb(X)) is invoked unless X is satisfied.

The function propagate-b does the following:

propagate-b(Unit, Value)

Eb(Unit) = Eb(Unit) + Value

If X : bby(X, Unit) then

propagate-b(X, Value)

Intuitively, the energy of a blocked block is decreased while a blocking block sees its energy increased (unless the one below is satisfied) and eventually will be forced (i.e., chosen) to move. This energy by itself would not be able to take into account the goal relations between the nodes; it is only based on the local, and current, state of the blocks. We use then the links depends-on so that the blocks' energy will be influenced also by the goal intentions of the agents. It is the introduction of this energy and of the two subsequent propagation phases that will optimize the blocks' moves.

• If don(x, y),

∆Ed(X) = +1 if sat(Y) Ed(Y)-1 otherwise

if Z : don(Z, X) the function propagate-d(Z,

∆Ed(X)) is invoked unless X is satisfied. If X is satisfied,

this propagation is useless, because propagated values from satisfied agents are always unitary. The function

propagate-d does the following:


propagate-d(Unit, Value)

If Ed(Unit) was already computed then

Ed(Unit) = Ed(Unit) + Value

If X : don(X, Unit) then

propagate-d(X, Value)

This function just takes into account the fact that when we compute the energy for one agent we may be using values that will be updated later. For the example in figure 1 this will happen if we compute Ed(B) before

Ed(C); we would find the value –1 for Ed(B) and it should

be 0 (see at bottom in figure 2 below). Also, we only update our energy if we have already computed it; otherwise when we compute it later, Value would have been counted twice. As agents are independent and can compute their energy in parallel regardless of the computations going on in the others, these restrictions are compulsory.

AB CBby Bby

10-2

A B CDon Don Don

Table

10-1

fig. 2. Values of Eb (top) and Ed (bottom) for fig. 1.

After all blocks had their Eb and Ed values computed

(as shown in figure 2), we start a phase of forward propagation of energy to compute Edb and Ebd. This

aditional propagation will play the role of an action at a distance, and will influence block's energy by those who are only indirectly related to it — that is, not linked by an explicit depends-on or blocked-by relation. This phase proceeds as described in the next section.

2.3. Propagating mixed energy

We saw before how the current state and the goal intentions of the agents contributed to their internal energy. We now explore the fact that, in order to have a rational, optimal if possible, selection criterion for the agents, there must be some way of weighting their global interaction. That is, given two agents with the same energy why should we prefer one in detriment of the other? We argue that this decision can be made solely by letting the agents exchange information about their local relations to other nodes, this time via both the blocked-by and depends-on links. This information, which is propagated from agent to agent following some fixed rules, can be though of as a mixed energy. For each agent, it will add the contributions of both

those who depend on it and of those who are blocked by it.

Propagation and local computation of the remaining two terms go as follows:

• Computing Ebd

Take the graph at the top in figure 2. Consider the set S

of the nodes having a predecessor (C and A). B has no

predecessor (incoming Bby link). Let P(s) be the length of

the predecessor chain of s S (including s itself): thus

P(A)=3 and P(C)=2.

S S do:

If X : don(s, x) then

propagate-bd(x, P(s))

where propagate-bd is defined as follows:

propagate-bd(Unit, Value)

If X : don(Unit, x) and not(sat(x)) then

propagate-bd(x,Value) else

New-Unit = blocked?(Unit) Ebd(New-Unit) = Ebd(New-Unit)+Value

Where blocked?(Unit) returns the last in a Bby chain starting at Unit, Unit otherwise.

The test on sat(X) tells us that if X is satisfied that makes no sense to continue propagating energy; its depends-on link can only point to another satisfied block, so there's no interest in propagating further. The last remark becomes evident if we recall that an agent is satisfied if it is in its final position, and the one supporting it is also satisfied. We stop the propagation just before the first satisfied agent found. Note also that these functions are simplified from the fact that we do not consider here multiple supporting and supported blocks. If this were the case, then we would have had to consider multiple incoming and outcoming blocked-by links for each agent.

• Computing Edb

Take the graph at the bottom in figure 2. Consider the

set S of the nodes having a predecessor (C and B). As

before, we define P(s) such that now we have P(C)=3 and

P(B)=2.

S S do:

If X : bby(s, x) then

propagate-db(x, P(s))

where propagate-db is defined as follows:

propagate-db(Unit, Value)


If X : bby(Unit, x) then

propagate-db(x,Value) else

unless y : don(Unit, y) and not(sat(y)) then

Edb(Unit) = Edb(Unit)+Value

This last propagation step is not necessary if the initial

agent s is satisfied. Indeed, in this case the agents standing above should not be ―forced‖ to move — it is as if they were standing on the table. From this function we can see that the propagated values are only stored in the end nodes (unless they depend on non-satisfied agents); those along the propagation path do not modify their energy, although this could be useful if certain optimizations to the algorithms were to be done. Specifically, after each propagation cycle energy values are reset. As only one agent is selected to move, only a few links will be modified, having a restricted impact on others' energies which don't need to be recomputed again from the beginning.

If we look at the links in figure 2 it is easily seen that Ebd(A)=2 and Edb(A)=3, all the other mixed energies

being 0. After all four values have been computed, the non-satisfied agent with the highest energy is chosen to try to achieve its goal; if its final destination is satisfied the block can be moved to it, otherwise it moves to the table. It should be noted that it is the agent that is chosen, not the action to be performed; it is then up to the agent to choose which action to take, based on the local information it has.

3. Analysis of the approach

We have presented a model of multi-agent interaction based on two primary relations: agent’s dependency and blocking. An agent depends on someone (something) to achieve its goal, and it can be blocked by someone (something) in its endeavour. These two relations help define the communication channels between the agents. Via these channels, they communicate and propagate received information (essentially numeric values) following a fixed and common set of rules. At the current stage of the research, it is not clear yet how agents can have different behaviors in order to ―falsify‖ (to their own profit) the selection criteria: they just cooperate, and their collaborative effort allows an optimal solution to be found. The latter is indeed the most interesting result of this approach, which means that temporal action sequencing is the optimal. Other recent approaches in other domains make use also of simple (and numeric) information exchange to solve heuristic search-based problems (e.g. Clearwater and Huberman, 1991) which have been traditionnaly viewed as hard state space search problems.

Let alone for efficiency reasons, this result shows some sort of ―social‖ agreement based on two primitive behaviors: know what you want to do and what prevents

you from doing it. From that, increase the energy of those who can help you in getting satisfaction, and decrease that of those who prevent you from having it. One notable consequence of this behavior is that no distinction is made between planning and replanning phases. As we said before, only modifying links affects propagation of energy, and this does not rely on the previous states of the group. On the contrary, this is however a question of great concern in traditional planners, which is tackled in a more or less natural way in several experiments (e.g., see Hutchinson and Kak, 1990, Ramos and Oliveira 1991).

There remain several restrictions on a more general application of the approach. When an agent is selected to act, it could only do a move after he decided where to. With more complex situations, with a set of actions to choose from, it could be useful to use the individual components of the energy, or else to have a hard-wired combination of these values directly into each known action — this is what the behavioral approach preconizes (e.g., Brooks 1991, 1991a). It seems however that identifying the blocked-by and depends-on links should be important when using our approach. Even if they can be related to the classical precondition lists, it is not clear yet how this mapping could be done.

We will discuss next to what extent our approach is inspired from other work in the same, or related, fields.

4. Relation to other work

Essentially, our inspiration comes from the spreading activation network of Maes (1989), and from the eco-

problem-solving framework of Ferber (1990)5. The first uses a network where the nodes are all the possible actions in the world, and the links between these nodes relate all the facts in the precondition, add, and delete lists of these actions (in the blocks world, actions would be something like put-on-A-B — meaning that A is to be put on top of B — and other facts could be clear-A, on-A-B, etc). Nodes exchange inhibition/activation energy mutualy, weighted by some tuning factors which allow the overall system to be more goal- or situation-oriented.

Besides this later characteristic, which reveals some sort of adaptativity and allows for capabilities such as learning to be introduced, the system presents two drawbacks: first, all possible actions must be explicitly present in the network, even if they are not very important. If the number of possible actions grows, the size (number of nodes and links) of the network grows accordingly. Second, mutual influences must be tuned up prior to using the system. That means the designer must have a pretty good idea of the relative weigths of the links if the system is to behave ―as expected‖. Nevertheless, it is expected that

5 The reader is referred to the original papers for detailed descriptions of these works.

https://www.researchgate.net/publication/6064968_Cooperative_Solution_of_Constraint_Satisfaction_Problems?el=1_x_8&enrichId=rgreq-20f7edea-0683-4217-963c-26e83b8bdc21&enrichSource=Y292ZXJQYWdlOzIyOTU5ODIxOTtBUzoxMDQyODg3MzkxMzU0OTJAMTQwMTg3NTc4MTIzNA==


some sort of learning can also be introduced in this case. On the other hand, as the units are all the possible actions, this approach is really more concerned with the action selection problem inside a single agent rather than with viewing the network as a group of agents. Recent experiments (Maes and Brooks, 1990) demonstrated that the activation/inhibition network could be at the basis of an adaptive behavior selection mechanism in a mobile robot.

The second work we consider important to our approach is also based on a simple set of interaction behaviors (Ferber and Jacopin, 1990). MASH (for Multi Agent Satisfaction Handler) is a multi-agent system based on the eco-problem-solving paradigm (Ferber, 1990). Units in MASH are the blocks and the table; a unit can be seen as a finite deterministic automaton with a set of simple behaviors: the will to be satisfied, the obligation to move away (when attacked), and the will to be free (which can lead to attack others).

However, there is no mutually agreed upon decision about which unit should be given priority to act. This example could also make use (as in other examples in eco-problem-solving — e.g., Drougoul et al., 1991) of some heuristic function which would allow the system to optimize, choosing the ―best‖ unit to be satisfied first; from the the standpoint of our approach, this would be equivalent to assign to the units more than two ―energy‖ levels, and choose amongst them the highest. Nevertheless, as it is, this approach — simple and conceptually attractive — is well suited to problems where actions are cheap but the time to get to the solution is expensive.

Another source of inspiration has been the ―multi-agent planning‖ work. Although agents in this case are generally considered to be more representation and deduction-oriented (at the expense of an increased decoupling from the external world — but there are exceptions, e.g., Hayes-Roth, 1992), they are also engaged in a global interaction process where global coherence and local consistency are difficult to obtain.

Recent work on new approaches to model cognitive behavior has also provided good and stimulating examples of what non-symbolic techniques can offer (e.g., Real, 1991, Roitblat et al., 1991, and see also Meyer and Guillot, 1990, for a comprehensive review on the attempts to simulate adaptive behavior).

5. Final remarks

We have presented here a model of group interaction based on dependency and blocking relations, which has provided optimal results in the multi-agent planning paradigm. It is based on the assumption that agents are well-behaved, i.e., they respect a fixed set of interaction rules, such as retransmitting information when they are expected to do so, and taking actions when asked to. Moreover, they are

commited to update their direct relations to other agents, such as when they are moved to other places. It seems also that more behaviors could be added so as to take into account other characteristics of real-world domains, such as resource constraints, and individuality.

We have seen that the group of agents can adapt itself easily to changing environments: simply by modifying the links between the agents, the system can go on ―planning‖, its internal structure was thus reconfigured to match the state of the environment.

Application to other domains with different characteristics will be necessary to test the validity of this approach. For the moment only well-defined goal relations between agents can be taken into account. We are currently extending our work to domains where these relations are more complex in nature, and involve different types of preconditions.

6. References

Brooks R.A. (1991), ―Intelligence without Reason‖, AI Memo 1293, AI Laboratory, Massachusetts Institute of Technology, Cambridge, MA, april 1991.

Brooks R.A. (1991a), ―New approaches to Robotics‖, Science, 253, pp. 1227-1232, september1991.

Clearwater S., and B. Huberman (1991), ―Cooperative solution of constraint satisfaction problems‖, Science, 254, pp. 1181-1183, 1991.

Drogoul A., J. Ferber, and E. Jacopin (1991), ―Viewing cognitive modeling as Eco–Problem–Solving: The Pengi experience‖, Report LAFORIA 2/91, LAFORIA, University Paris 6, Paris, january 1991.

Drummond M. and K. Currie (1989), ―Goal ordering in partially ordered plans‖, in Proc. IJCAI 89, pp. 960-965, Detroit, MI, august 1989.

Elcock E. & D. Michie, eds. (1977), Machine Intelligence 8, Ellis Horwood, Sussex, 1977.

Ferber J. (1990), ―Eco–Problem–Solving: How to solve a problem by Interactions‖, Report LAFORIA 5/90, LAFORIA, University Paris 6, Paris, february 1990.

Ferber J. and E. Jacopin (1990), ―A Multi–agent satisfaction planner for building plans as side effects‖, Report LAFORIA 7/90, LAFORIA, University Paris 6, Paris, july 1990.

Hayes–Roth B. (1992), ―Opportunistic control of action in intelligent agents‖, Technical Report KSL-92-32, Knowledge Systems Laboratory, Stanford University, Stanford, CA, april 1992.

Hutchinson S. and A. Kak (1990), ―Spar: A Planner that satisfies operational and geometric goals in uncertain environments‖, AI Magazine, 11(1), pp. 31-61, 1990.


Maes P. (1989), ―How to do the right thing‖, AI Memo 1180, AI Laboratory, Massachusetts Institute of Technology, Cambridge, MA, december 1989.

Maes P. and R.A. Brooks (1990), ―Learning to coordinate behaviors‖, in Proc. AAAI 90, pp. 796-802, Boston, MA.

McDermott D. (1988), ―Planning and Execution‖ in Swartout (1988).

McDermott D. (1992), ―Robot planning‖, AI Magazine, 13(2), pp. 55-79, 1992.

Meyer J.A. and A. Guillot (1990), ―From Animals to Animats: everything you wanted to know about the Simulation of Adaptive Behavior‖, Report BioInfo-90-1, Groupe de BioInformatique, Ecole Normale Supérieure, Paris, september 1990.

Ramos C. and E. Oliveira (1991), ―The generation of efficient high–level plans and the world representation in a cooperative community of robotic agents‖, in Proc. ICAR 91, Pisa, 1991.

Real L.A. (1991), ―Animal choice behavior and the evolution of cognitive architecture‖, Science, 253, pp. 980-986, august 1991.

Roitblat H.L., P.W.B Moore, P.E. Natchtigall, and R.H. Penner (1991), ―Biomimetic sonar processing: From Dolphin echolocation to artificial neural networks‖, in J.A. Meyer & S. Wilson, eds, Simulation of Adaptive Behavior, pp. 66-67, MIT Press, Cambridge, MA, 1991.

Sacerdoti E. (1977), A Structure for Plans and Behavior, American Elsevier, New York, NY, 1977.

Swartout W., ed. (1988), ―1987 DARPA Santa Cruz Workshop on Planning‖, AI Magazine, summer 1988.

Waldinger R. (1977) ―Achieving several goals simultaneously‖, in Elcock & Michie (1977).

Zlotkin G. and J. Rosenschein (1990), ―Blocks, lies, and postal freight: the nature of deception in negotiation‖, in Proc. 10th AAAI Int’l Workshop on Distributed Artificial Intelligence, Bandera, TX, october 1990.

Dynamic Selection of Action Sequences

Documents