Thesis presented to the Faculty of the Department of Graduate Studies of the Aeronautics Institute of Technology, in partial fulfillment of the requirements for the Degree of Doctor in Science in the Program of Electronic Engineering and Computer Science, Field Computer Science. Guilherme Sousa Bastos METHODS FOR TRUCK DISPATCHING IN OPEN-PIT MINING Thesis approved in its final version by signatories below: Prof.Dr. Carlos Henrique Costa Ribeiro Advisor Prof.Dr Luiz Edival de Souza Co-advisor Prof. Celso Massaki Hirata Head of the Faculty of the Department of Graduate Studies Campo Montenegro S˜ ao Jos´ e dos Campos, SP - Brazil 2010
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Thesis presented to the Faculty of the Department of Graduate Studies
of the Aeronautics Institute of Technology, in partial fulfillment of the
requirements for the Degree of Doctor in Science in the Program of
Electronic Engineering and Computer Science, Field Computer Science.
Guilherme Sousa Bastos
METHODS FOR TRUCK DISPATCHING IN
OPEN-PIT MINING
Thesis approved in its final version by signatories below:
Prof.Dr. Carlos Henrique Costa Ribeiro
Advisor
Prof.Dr Luiz Edival de Souza
Co-advisor
Prof. Celso Massaki Hirata
Head of the Faculty of the Department of Graduate Studies
Campo Montenegro
Sao Jose dos Campos, SP - Brazil
2010
Cataloging-in Publication Data
Documentation and Information Division
Sousa Bastos, Guilherme
Methods for Truck Dispatching in Open-Pit Mining / Guilherme Sousa Bastos.
Sao Jose dos Campos, 2010.
140f.
Thesis of Doctor in Science – Course of Electronic Engineering and Computer Science. Area of
Computer Science – Aeronautical Institute of Technology, 2010. Advisor: Prof.Dr. Carlos
Henrique Costa Ribeiro. Co-advisor: Prof.Dr Luiz Edival de Souza.
1. Programacao matematica. 2. Distribuicao de mercadorias. 3. Algoritmos Geneticos.
4. Matematica aplicada. 5. Rotas. 6. Caminhoes. 7. Matematica. I. Aeronautics Institute of
Technology. II. Title.
BIBLIOGRAPHIC REFERENCE
SOUSA BASTOS, Guilherme. Methods for Truck Dispatching in Open-Pit
Mining. 2010. 140f. Thesis of Doctor in Science – Aeronautics Institute of Technology,
Sao Jose dos Campos.
CESSION OF RIGHTS
AUTHOR NAME: Guilherme Sousa Bastos
PUBLICATION TITLE: Methods for Truck Dispatching in Open-Pit Mining.
PUBLICATION KIND/YEAR: Thesis / 2010
It is granted to Aeronautics Institute of Technology permission to reproduce copies of
this thesis and to only loan or to sell copies for academic and scientific purposes. The
author reserves other publication rights and no part of this thesis can be reproduced
without the authorization of the author.
Guilherme Sousa Bastos
Rua Oscar Renno, 309. Costa II
CEP 37500-433 – Itajuba–MG
METHODS FOR TRUCK DISPATCHING IN
OPEN-PIT MINING
Guilherme Sousa Bastos
Thesis Committee Composition:
Prof. Cairo Lucio Nascimento Junior Chair Person - ITA
Prof.Dr. Carlos Henrique Costa Ribeiro Advisor - ITA
Prof.Dr Luiz Edival de Souza Co-advisor - UNIFEI
Prof. Rodrigo Arnaldo Scarpel Member - ITA
Dra. Leliane Nunes de Barros External Member - IME-USP
Dr. Marcone Jamilson Freitas Souza External Member - UFOP
ITA
To Karina, by her love and pa-
tience.
Acknowledgments
Thank you God for writing straight on crooked lines... All my entire life has been guided
by this wise saying, and now, after a really hard way, I’m here finishing my most
important work till now.
I would like to express my gratitude to my advisor Prof. Carlos Henrique Costa Ribeiro.
Your supervision style was primordial to point the research way, by never giving the
correct ways, but always avoiding me from the wrong ones. Because of this, I can affirm
that now I am a researcher. Thank you very much!
Thanks to my co-advisor and colleague Prof. Luiz Edival de Souza. Your presence
beside my office was fundamental in my developments, by being every time available to
answer and help me in my infinite questions. This is an end point of your supervisions
on my researches, which occurs since I was doing my engineering course; however, it is a
start point of our future research projects. Thanks a lot!
Another special thanks goes to my supa in Australian Centre for Field Robotics (ACFR)
Dr. Fabio Ramos. Thank you for had received me in ACFR and supervised my work
during my six months stay in Sydney. This time period was the differential of my work,
which certainly will drive my future researches to a superior quality rate. Cheers mate!
Thanks to CAPES for conceding a scholarship, which was primordial for my studies at
ACFR.
Continuing in the Oz Land, I must thank the persons that helped me in the works, and
mainly in the the foreign life. Thanks to ACFR staff, and mainly to Vitor, Sildomar,
Guilherme, Tim, Adrian, Paco, Simon, Gabriel, Surya, and Pablo Chilean. A special
thanks goes to Pablo Peruvian, you were my first friend in Sydney! Thanks to guide me
vi
(a newbie) across the great pubs in the city! Another special thanks to my other friends
in Sydney, which I can classify as my brothers, Alex Cowboy, Du, Elton, Leandro, and
Pablo Chilean. My staying in Sydney can be divided on before and after knowing yous!
Another thank to my great friend Andy and his wife Joanna; thanks a lot for bought
”Possante”, I am sure that it will bring happiness for you!
Many thanks to Karina Valdivia for teaching me the ”crazy” Factored MDPs. I am sure
that we can make a partnership in a near future to study and develop new trends in
decision making area.
So many times in this long way I had the comprehension of two special persons at
UNIFEI allowing my research work at ITA and adjusting my schedule whenever I
needed; thank you Prof. Carlos Augusto Ayres and Prof. Carlos Alberto Pinheiro.
Thanks to my mom and dad for the constant incentives on my studies since I was a kid.
I really cannot have achieved this position without your help. I love you two.
A really special thanks to my wife Karina. Only you know the difficulties that we have
passed together during this years of studies... That’s the past, from now we will collect
the fruits that we have started planting five years ago! Thanks for everything my love!
Eu te amo!!!
“Logic takes you from a to b.Imagination takes you everywhere.”
— Albert Einstein
Resumo
O transporte de material e um dos mais importantes aspectos das operacoes realizadas
em minas a ceu aberto. Este problema envolve geralmente um sistema de despacho de
caminhoes, o qual realiza a alocacao dos caminhoes em tempo real. Dada a importancia
deste problema, diversos sistemas de decisao vem sendo desenvolvidos durante os ultimos
anos, aumentando a produtividade e diminuindo os custos operacionais. Como em muitas
outras aplicacoes reais, uma correta modelagem das incertezas presentes no problema
torna-se crucial para o bom funcionamento do sistema de despacho. Como incertezas
podem-se citar falhas em equipamentos, condicoes climaticas e erros humanos, as quais
podem resultar em filas de caminhoes e carregadeiras inoperantes. Entretanto, incertezas
nao sao consideradas na maioria dos sistemas de despacho comerciais, fato que pode levar
a resultados longe dos esperados. Nesta tese, novos sistemas de despacho de caminhoes sao
introduzidos aproximando deste modo os sistemas atuais a uma metodologia de decisao
estocastica. Primeiramente, e apresentado um metodo estocastico utilizando Processo
Decisorio de Markov Dependente do Tempo (TiMDP) aplicado ao problema de despacho
de caminhoes. Neste modelo, os tempos de deslocamento dos caminhoes sao representa-
dos como funcoes de densidade de probabilidade, janelas de tempo podem ser inseridas
representando disponibilidade das rotas existentes, e utilidade baseada no tempo pode
ser utilizada como um parametro de prioridade. Com o objetivo de minimizar a questao
ix
ja bem conhecida da maldicao da dimensionalidade, na qual problemas multi-agentes es-
tao sujeitos quando se considera modelagem em estados discretos, o sistema e modelado
utilizando-se o conceito introduzido de simples-agentes interdependentes. Baseando-se
ainda neste conceito, o metodo TiMDP Genetico (G-TiMDP) e apresentado para apli-
cacao no problema de despacho de caminhoes. Este metodo apresenta-se como uma hi-
bridizacao do modelo TiMDP e Algoritmos Geneticos (GA), o qual e tambem utilizado
para solucionar o problema de despacho. Finalmente, de modo a testar e comparar os
resultados dos metodos introduzidos, sao executadas simulacoes pelo metodo de Monte
Carlo em uma mina heterogenea composta por 15 caminhoes, 3 carregadeiras e 1 ponto de
processamento de minerio. O aspecto de incerteza presente no problema e representado
pela escolha da rota entre o ponto de processamento do minerio e as carregadeiras, a qual
e realizada pelo motorista do caminhao, sendo independente do sistema de despacho. Os
resultados sao comparados a sistemas classicos de despacho (Heurıstica Gulosa e Mini-
mizacao dos Tempos de Ciclo dos Caminhoes – MTCT) utilizando o Teste T de Student,
comprovando a eficiencia dos metodos de despacho de caminhoes propostos.
Abstract
Material transportation is one of the most important aspects of open-pit mine oper-
ations. The problem usually involves a truck dispatching system in which decisions on
truck assignments and destinations are taken in real-time. Due to its significance, several
decision systems for this problem have been developed in the last few years, improving
productivity and reducing operating costs. As in many other real-world applications, the
assessment and correct modeling of uncertainty is a crucial requirement as the unpre-
dictability originated from equipment faults, weather conditions, and human mistakes,
can often result in truck queues or idle shovels. However, uncertainty is not considered in
most commercial dispatching systems. In this thesis, we introduce novel truck dispatching
systems as a starting point to modify the current practices with a statistically princi-
pled decision making methodology. First, we present a stochastic method using Time-
Dependent Markov Decision Process (TiMDP) applied to the truck dispatching problem.
In the TiMDP model, travel times are represented as probabilistic density functions (pdfs),
time-windows can be inserted for paths availability, and time-dependent utility can be used
as a priority parameter. In order to minimize the well-known curse of dimensionality is-
sue, to which multi-agent problems are subject when considering discrete state modelings,
the system is modeled based on the introduced single-dependent-agents. Based also on the
single-dependent-agents concept, we introduce the Genetic TiMDP (G-TiMDP) method
xi
applied to the truck dispatching problem. This method is a hybridization of the TiMDP
model and of a Genetic Algorithm (GA), which is also used to solve the truck dispatching
problem. Finally, in order to evaluate and compare the results of the introduced methods,
we execute Monte Carlo simulations in a example heterogeneous mine composed by 15
trucks, 3 shovels, and 1 crusher. The uncertain aspect of the problem is represented by
the path selection through crusher and shovels, which is executed by the truck driver, being
independent of the dispatching system. The results are compared to classical dispatching
approaches (Greedy Heuristic and Minimization of Truck Cycle Times – MTCT) using
Student’s T-test, proving the efficiency of the introduced truck dispatching methods.
List of Figures
FIGURE 2.1 – Example of a MDP with 3 states. . . . . . . . . . . . . . . . . . . . 33
FIGURE 2.2 – Value Iteration for (a) γ = 0.9 and (b) γ = 0.3. . . . . . . . . . . . . 36
FIGURE 2.3 – TiMDP example solved step-by-step by value iteration. . . . . . . . 42
FIGURE 2.4 – Sequential decision making problem using time-dependent utility. . . 46
1999). Uncertain parameters can be modeled based on reliable historical database, equip-
ment faults, weather conditions and route availability. Another set of uncertain param-
eters based on time, such as truck travel and loading durations, cannot be represented
by a MDP. To solve this problem, the truck dispatching model can be based on a Time-
dependent MDP (TiMDP) (BOYAN; LITTMAN, 2000), which is used to model and
solve sequential decision problems with stochastic state transitions and stochastic time-
dependent action durations.
CHAPTER 1. INTRODUCTION 22
1.2 Objectives
The solution of a dispatching problem modeled by a MDP is represented by policies
producing the actions that must be selected by the agent (truck) when it is in a specific
state. Generally, for a single agent, the optimal solution can be found quickly by dynamic
programming techniques. However, dispatching problems with many agents (multi-agent)
will generate an exponential state space augmentation that causes a correspondingly dras-
tic increase of the necessary time to find the optimal solution. This issue is known as curse
of dimensionality (BELLMAN, 1966) and can be very serious in combinatorial problems
like these and critical for TiMDP models, in which policies also depend on current time.
Therefore, the main objective of this thesis is to develop and study an approach to mini-
mize this unwanted behavior with an approximation to a single-dependent-agent problem.
In this approximation, the problem is modeled for each truck type (the mine may have
trucks that differ on speed and capacity) with dependent states that represent queues
with different sizes at shovels. Thus, the decision on which shovel must the truck travel
to will depend on the states representing the current size of the queues. In this case, the
policies for a specific truck dispatch can change ”on-the-fly” (real-time operating system)
because of the dependence on current queue sizes; if a truck goes to a shovel and has to
wait in a queue before its loading, it will indeed increase the size of the queue, this way
affecting the next truck dispatching decision.
Another point that must be considered are the real-time characteristics of truck dis-
patching in an open-pit mining. The values used to model the problem behavior are not
fixed and change all the time. For example, the truck travel time from a specific point to
another one certainly will not be exactly the same over distinct passages. TiMDP deals
very well with these characteristics, in which the actions duration can be modeled by
CHAPTER 1. INTRODUCTION 23
probability density functions (pdfs) based on reliable historical database.
Given the presented truck dispatching characteristics, we investigate the dispatching
system for an example mine based on a TiMDP model, verifying the validity of the
method comparing its simulated results to those from other methods, namely: (1) Greedy
heuristic, (2) MTCT heuristic, and (3) Genetic algorithm (GA). We then present a novel
hybrid method named Genetic TiMDP (G-TiMDP), which uses the value functions given
by the TiMDP model as the GA fitness function. The G-TiMDP results are also compared
to the results of the previous methods. For the empirical analysis, we apply the objective
function of maximization of tonnage production for the whole mining shift to all modeled
and simulated methods.
1.3 Work Contributions
The contributions of this thesis are presented in what follows, in the sequence in which
they appear in the text.
TiMDP solution by backwards convolution
The TiMDP solution method is presented by Boyan and Littman (2000) and sub-
sequent works as Li and Littman (2005) and Rachelson, Fabiani and Garcia (2009a);
however these works are strictly mathematical and do not present a basic step-by-step
solution example. We developed a method to solve a TiMDP by complete discretization
and backwards convolution. The term backwards convolution is used to represent the step
needed to solve the TiMDP model, which is performed in the reverse way of a standard
convolution. Finally, we solve step-by-step a TiMDP model using our proposed method.
CHAPTER 1. INTRODUCTION 24
Time-dependent utility decision making using the TiMDP model
Many common situations can be modeled with time-dependent utilities, in which
specific parameters represent the gain or cost that some decision problems returns over
time for the decision maker (agent). We make a correlation between time-dependent
utility and TiMDP using definitions and examples, which can be useful to solve and make
a better approximation to decision problems that occur in practical real-world domain
settings.
Single-dependent-agent truck dispatching modeling
We developed an approximation for the multi-agent problem (truck dispatching) oc-
curring in an example mine (JAOUA; GAMACHE; RIOPEL, 2009), in which the state
models are built for each truck and are self-dependent in a specific common state (queue-
ing state). We named this approximation as single-dependent-agent, which minimizes the
space state size, making possible an approximated and a fast solution for truck dispatching
using a TiMDP model.
Real-time truck dispatching using TiMDP policies
The single-dependent-agent state representation that we developed is used by the
TiMDP model in the real-time truck dispatching simulation. The truck assignments
(for which shovel must the truck travel to) are taken in real-time using the corrected
value functions given by the TiMDP solution (policies). The TiMDP is solved before the
simulation, making the assignment decisions extremely fast.
Real-time truck dispatching using a GA
We used SimEventsTM(package of MatlabTM) to simulate the truck dispatching in the
example mine during a 10 hour shift, and developed a novel technique that uses a GA in
CHAPTER 1. INTRODUCTION 25
real-time for the shovel-truck assignments. This optimization algorithm is very fast and
seems to be suitable for real-time applications. Because of the uncertain parameters that
are present in the truck dispatching model, the sequence of trucks asking for dispatch
until the end of the shift becomes impossible to be predicted. Therefore, the algorithm is
executed many times during the whole shift, seeking for result improvement (maximization
of the tonnage production).
Real-time truck dispatching using G-TiMDP
The truck decision results given by the TiMDP policies are in general adequate, but
are degraded by the approximation made by the single-dependent-agent model. There-
fore, we developed a novel technique, named G-TiMDP, that uses the corrected value
functions given by the TiMDP to feed the fitness function of the previous developed GA
technique. We performed a Monte Carlo simulation using this novel technique and eval-
uate the superiority of this method comparing the results by means of a Student’s t-test
comparison.
1.4 Thesis Outline
Chapter 2 presents an introduction to the TiMDP, which is the main model used in
our truck dispatching algorithm development. We present the state-of-the-art and current
research on TiMDPs, and propose a solution method that can be used in various discrete
applications. We also solve an example using this method and propose the use of TiMDP
models for problems with time-dependent utilities.
The truck dispatching problem in open-pit mining is presented in Chapter 3. In order
to position the complexity and details of the truck dispatching problem, we first review
CHAPTER 1. INTRODUCTION 26
the general vehicle dispatching problem with some variants and applied solution methods.
Following, we present the specificities of the truck dispatching problem, such as involved
equipments, specific goals, and dispatching strategies that are used in real-world truck
dispatching problems. Dispatching strategies are presented, which are the basis for the
developed solution methods presented in the next chapter.
We present in Chapter 4 real-time truck dispatching methods for open-pit mines
operating with production policy (maximize the tonnage production over the shift). We
model this problem by using the concept of single-dependent-agents for TiMDPs and
G-TiMDPs. We also present additional techniques for truck dispatching that are used
in further analysis, namely: greedy heuristic, MTCT (Minimizing Truck Cycle Time)
heuristic, and GAs.
Chapter 5 presents simulation results and analysis of the developed dispatching meth-
ods. This includes details of the Monte Carlo simulations and comparison of results using
the Student’s t-test. Some analysis on improving the method are also made here.
The conclusions of the work are presented in Chapter 6. We also present some recent
trends in MDP modeling and make propositions for future work.
For reference, we also present overviews of Genetic Algorithms in Appendix A and
relevant statistical distributions in Appendix B.
2 Time Dependence in Decision
Processes
Consider the following problem: an accident have occurred, three people are injured,
there is only one doctor (agent) who can only give medical care for a single person at any
time, their lives are dependent on medical care, What does the doctor do? Consider still
that the injury level can be different for each person and there are uncertainties on life
maintenance after medical care. Analyzing these parameters, it is almost obvious that the
right decision on the attendance sequence could maximize the probability of life savings.
Decision theory is often claimed as the right framework for producing the most rational
choice (PARSONS; WOOLDRIDGE, 2002), and it can be the basic theory to solve this
practical and common sequential decision problem.
In fact, sequential decision problems have been tackled very intensively in the last few
years, and it is well known that the theoretical framework based on MDPs is the best way
to model and solve them, giving optimal results in many cases (BOUTILIER; DEAN;
HANKS, 1999). However, real-world problems have an additional and specific parameter,
which is time dependency. MDP theory only considers fixed time steps between epochs
that can be easily understood and modeled as iteration steps. To avoid this limitation,
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 28
Semi-MDPs (SMDPs) (SUTTON; PRECUP; SINGH, 2000), and, more recently, Time-
dependent MDPs (TiMDPs) have been proposed1. In those models, the transition between
states is not instantaneous, but instead takes a specific time t (durative action). In a
TiMDP, time is observable, so the agent can wait the best moment to make the decision
(or execute the action in the current state). For the SMDP, the problem can be modeled
in infinite time horizon and there is a time duration probability for the durative action,
that is, the agent cannot decide to wait for the best moment to execute the action. A
TiMDP also has likelihood time-dependent functions that activate the action outcome for
the current time, and always models finite time horizon problems (the decisions are made
between a starting and ending clock marks).
In the TiMDP model, the rewards related to the action outcomes can be also repre-
sented as time-dependent functions. In the accident scenario, the person lifetime, defined
as a utility for decision problems (RUSSELL; NORVIG, 2009), decreases over time and
can be formally understood as a time-dependent utility (HORVITZ; RUTLEDGE, 1991).
This problem can be modeled as a TiMDP, in which time-dependent utilities can be
directly represented by time-dependent rewards in the model. This is only one appli-
cation that can be modeled as a TiMDP problem. Other instances like vehicle routing
and scheduling problems with time window constraints (SOLOMON, 1987; ICHOUA;
GENDREAU; POTVIN, 2003; JI, 2005) can also be modeled as TiMDPs.
The following sections present an introduction to MDPs and TiMDPs as technical
basis for modeling the problem considered in this thesis, namely truck dispatching in a
open-pit mine.
1We use the TiMDP representation introduced by Rachelson, Fabiani and Garcia (2009b) insteadof the original one, TMDP introduced by Boyan and Littman (2000), to avoid confusion with otherrepresentations such as tree-structured MDPs (LENGYEL; DAYAN, 2007)
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 29
2.1 Markov Decision Processes
2.1.1 MDP formulation
The Markov Decision Process (MDP) (PUTERMAN, 1994; BERTSEKAS, 1987;
PELLEGRINI; WAINER, 2008) is a stochastic system modeling technique, in which the
transitions between states are probabilistic, the states are observable and it is possible
to interfere with the system dynamics through actions that produce state changes and
rewards. A process is Markovian if it follows the Markov Property : the effect of an action
depends only on the action itself and on the current state of the system. The decision
aspect is found in the fact that the agent can periodically take decisions on the system,
using actions.
Formally, a MDP is a tuple (S, A, T, R) as follows:
• S is the set of possible states of the system;
• A is a set of actions that can be executed in different decision epochs;
• T : S × A × S → [0, 1] is a probability function for the system changing to state
s′ ∈ S, from state s ∈ S and agent action a ∈ A, denoted by T (s′|s, a); and
• R : S × A → R is the reward by taking the decision a ∈ A when the system is in
state s ∈ S.
Considering that the system is at some state s in a given decision epoch k, it is
necessary to select which action a must be executed. The action is selected following a
decision rule, and the mapping of actions to states following the decision rules is the policy
(π). Given a policy, we can calculate the expected utility (or the expected total reward)
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 30
of the taken action sequence. The expected total reward, considering immediate reward
r and for a finite horizon z is
E
z−1∑k=0
rk
. (2.1)
We can also define the discounted expected reward for finite horizon z,
E
z−1∑k=0
γkrk
, (2.2)
which uses a discount factor γ ∈]0, 1[ to ensure a bounded value for the expected total
reward in the case of infinite horizon:
E
limz→∞
z−1∑k=0
γkrk
. (2.3)
The importance of decisions taken in future epochs is governed by the discount factor
γ; a value zero gives no importance to future rewards (greedy behavior), whereas a value
one gives no discounts in the cumulative expected reward.
A policy is optimal (π∗) when the expected total reward for any state is maximized.
The value function V ∗(s) gives the optimal expected total reward value for the optimal
policy π∗:
V∗(s) = maxa∈A
R(s, a) + γ∑s′∈S
T(s′|s, a)V∗(s′)
. (2.4)
The action-value function Qπ(s, a), for a given policy π, gives the value of action a in
state s, considering the immediate reward from the execution of a in s and the expected
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 31
total reward thereafter:
Qπ(s, a) = R(s, a) + γ∑s′∈S
T(s′|s, a)Vπ(s′) . (2.5)
For an optimal policy π∗, we can define Q∗(s, a):
Q∗(s, a) = R(s, a) + γ∑s′∈S
T(s′|s, a)V∗(s′) . (2.6)
The optimal policy π∗ produces the optimal actions that return the maximum Q
values for each state s:
π∗(s) = arg maxa∈A
Q∗(s, a) . (2.7)
Notice that V ∗(s) can also be represented based on the maximum Q value in the state
s:
V∗(s) = maxa∈A
Q∗(s, a) . (2.8)
2.1.2 MDP solution
The solution of a MDP is an optimal policy π∗ that produces the value function V ∗(s)
for all states. A successive approximation algorithm to solve a MDP, called Value Iteration
(ALG. 1), was presented by Bellman (1966).
The stopping criterion of ALG. 1 for an error ε is defined by the so-called Bellman
Error :
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 32
Algorithm 1: Value iteration
Input: MDP(S,A,T,R)Output: V*foreach s ∈ S do
V0(s)← maxa∈A R(s, a);
endi← 1;while stop criteria not satisfied do
foreach s ∈ S doVi(s) = maxa∈A
[R(s, a) + γ
∑s′∈S T(s′|s, a)Vi−1(s′)
];
endi← i+ 1 ;
endreturn V ;
∀s ∈ S, |V(s)− V′(s)| ≤ ε(1− γ)
2γ. (2.9)
The policy iteration algorithm, which is more efficient than value iteration (converges
in less iterations), was proposed by Howard (1960). This algorithm (ALG. 2) alternates
between a value determination step (current policy execution), and a policy improvement
step (current policy improvement).
Algorithm 2: Policy iteration
Input: MDP(S,A,T,R)Output: π∗
Initialize π randomly repeatπ ← π′;∀s ∈ S, V (s) = R(s, π′(s)) + γ
∑s′∈S T (s, π′(s), s′)V (s′);
foreach s ∈ S do∀a ∈ A,Qπ′(s, a)← R(s, a) + γ
∑s′∈S T (s, a, s′)V (s′);
endforeach s ∈ S do
π(s)← arg maxa∈AQπ′(s, a);
end
until π = π′;return π;
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 33
action that returns the highest immediate reward. In the other example (γ = 0.9), the
agent considers the future rewards given the transition probabilities.
2.2 Time-dependent Markov Decision Processes
Time-dependent MDPs (TiMDPs) were first proposed by Boyan and Littman (2000)
to model and solve sequential decision problems with the following attributes:
• Stochastic state transitions; and
• Stochastic time-dependent action durations.
Formally, a TiMDP consists of the following components:
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 38
S Discrete space state
A Discrete action space
M Discrete set of outcomes, each of the form µ =⟨s′µ, Tµ, Pµ
⟩:
s′µ ∈ S: the resulting space
Tµ ∈ {ABS,REL}: specifies the type of the resulting time distribution
(absolute or relative)
Pµ(t′)(if Tµ = ABS): pdf over absolute arrival times of µ
Pµ(δ)(if Tµ = REL): pdf over durations of µ
L L(µ|s, t, a) is the likelihood of outcome µ given state s, time t AND action a
R R(µ, t, δ) is the reward for the outcome µ at time t with duration δ
The TiMDP model is represented by the following Bellman equations2:
V (s, t) = maxa∈AQ(s, t, a)
Q(s, t, a) =∑
µ∈M L(µ|s, a, t).U(µ, t)
U(µ, t) =
∫∞−∞ Pµ(t′)[R(µ, t, t′ − t) + V (s′µ, t
′)]dt′ (if Tµ = ABS)∫∞−∞ Pµ(t′ − t)[R(µ, t, t′ − t) + V (s′µ, t
′)]dt′ (if Tµ = REL)
, (2.13)
where U(µ, t) is the utility of outcome µ in time t, V (s, t) is the time-value function for
the immediate action, and Q(s, t, a) is the expected Q time-value over outcomes.
We can note that the calculations of U(µ, t) are convolutions of the result-time pdf
Pµ with the lookahead value R+ V . The likelihood function L represents the probability
of an outcome occurring for action a in time t, and can be used to model problems with
2The equation 2.13 differs from original one defined in Boyan and Littman (2000) on not havingdawdling, that is, the agent does not receive a reward for waiting in a state. Several works like Li andLittman (2005) and Marecki, Topol and Tambe (2006) use the same formulation proposed herein.
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 39
time-windows (BRESINA et al., 2002).
This model is used to solve time-dependent problems with finite time horizon and
represents an undiscounted continuous-time MDP.
2.2.1 Discrete solution for relative time distributions by back-
wards convolution
In the general TiMDP model (BOYAN; LITTMAN, 2000), the time-value functions for
each state can be arbitrarily complex and therefore impossible to represent exactly. The
TiMDP problem is solved by representing R and V as a piecewise linear (PWL) function,
L as a piecewise constant (PWC) function, and Pµ discretized. This representation ensures
closure under the convolutions and avoids an increased number of iterations. This solution
is fast and exact (for the approximated functions), but there are the following drawbacks:
loss of information caused by the initial approximations, insertion of new breakpoints in
the piecewise functions over iterations, and need for an analytic solution of the convolution
integral.
Li and Littman (2005) explored the practical solution of value iteration considering
that Pµ is now a PWC function. This way, the degree of convoluted functions would grow
up during the iterations, making impossible its solution in a reasonable time. To prevent
this behavior, Li and Littman (2005) introduced the Lazy Approximation Algorithm, in
which the resultant PWL function of the convolution is approximated to a PWC function
on each iteration. Hence, the imprecisions and state space augmentation introduced by
discretization of Pµ is avoided in this solution method.
In a recent work performed by Rachelson, Fabiani and Garcia (2009a), the related
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 40
functions of the TiMDP model are represented by piecewise polynomial (PWP) functions.
In order to limit the degree growing of the iteration results, the introduced algorithm
executes, when needed, a decreasing step, reducing the degree of the results in the current
iteration by PWP interpolation.
In order to simplify the solution algorithm and focus on the proposed dispatching
problem, we propose the discretization of all involved functions in the model and solution
of the convolutions by a discrete numerical method. This approximation does not provide
a solution as fast as the original one, but it is an easier and direct way to solve problems
with few states. The only problem here is that the convolution present in the TiMDP
model is not solved as conventional convolution integral. A conventional convolution
integral can be represented by:
h(t) =
∫ ∞−∞
g(t′)k(t− t′)dt′ . (2.14)
The discrete formulation of a convolution is,
h(j) = k(j) ∗ g(j) =∑i
g(i)k(j − i) . (2.15)
This convolution involves a delay represented by the k function over the g function.
However, in the TiMDP there is a negative delay, and the convolution integral is now,
h(t) =
∫ ∞−∞
g(t′)k(t′ − t)dt′ . (2.16)
We characterize it as a backwards convolution, and its discrete solution is,
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 41
h(j) = k(j) • g(j) =∑i
g(i)k(j + i) . (2.17)
So, using our solution method, the time-value function V for relative Pµ is,
V (s, t) = maxa∈A
∑µ∈M
L(µ|s, a, t) · Pµ(t) • [R(µ, t) + V (s′µ, t)] . (2.18)
For discretized problems with absolute time distributions, the integral of Eq. 2.13 can
be solved by numerical methods such as the Newton-Cotes Rule (THISTED, 1988).
2.2.2 A TiMDP example
The example presented in FIG. 2.3 is a good starting point to understand value
iteration in TiMDPs. The problem is composed by two states, one action per state,
constant rewards (R) over time t, and an unitary probability function (L) over all time
horizon. In this case, at State 1 the agent will receive reward R1 = 1 after one time
period (the action is durative and takes exactly one time period), going to State 2. In
State 2, the agent will receive a reward R2 = 2 after two time periods. The rewards can
be cumulated until the end of the time horizon.
The system starts with time-value function V equal to zero for both states. Then, the
problem is solved by value iteration using Bellman equations (eq.2.18) with our approxi-
mation presented in Section 2.2.1.
The value iteration process converges at the sixth iteration, and the solution of V
gives important information for agent decision making. For example, when the agent is
at State 2 at time 2 it knows that can receive an accumulated reward of 6 units following
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 42
1 2
t1 2 3 40
p1
1
t1 2 3 40
p2
1
t1 2 3 40
R1
1
5 6 7 8 9 10t
1 2 3 40
R2
2
5 6 7 8 9 10
t1 2 3 40
V2
5
5 6 7 8 9 10
23
6
7
t1 2 3 40
V1
4
5 6 7 8 9 10
1
3
6
9
t1 2 3 40
V2
5
5 6 7 8 9 10
23
6
89
7
t1 2 3 40
V1
4
5 6 7 8 9 10
1
3
6
t1 2 3 40
V2
5
5 6 7 8 9 10
23
6
8
t1 2 3 40
V1
4
5 6 7 8 9 10
1
3
6Iteration 4
Iteration 5
Iteration 6
t1 2 3 40
V2
2
5 6 7 8 9 10
t1 2 3 40
V1
3
5 6 7 8 9 10
1
t1 2 3 40
V2
3
5 6 7 8 9 10
2
t1 2 3 40
V1
4
5 6 7 8 9 10
1
3
t1 2 3 40
V2
3
5 6 7 8 9 10
2
5
Iteration 1
Iteration 2
Iteration 3
t1 2 3 40
V1
1
5 6 7 8 9 10
FIGURE 2.3 – TiMDP example solved step-by-step by value iteration.
the policy. In this case, it can wait until time 3 and receive the same cumulated reward.
So, for TiMDPs, policies are dependent both on state and current time.
2.3 Time-dependent utilities
An agent needs a measurement value to select the best option (or to make a decision)
among others. This measurement is the value of the utility function (LI; SOH, 2004). This
value is also called, in decision theoretic planning, value function (cumulated rewards in
sequential decision making) (BOUTILIER; DEAN; HANKS, 1999). The expected utility
(EU) can be calculated for problems with nondeterministic actions (RUSSELL; NORVIG,
2009):
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 43
EU(A|E) =∑i
P (Resulti|E,Do(A))U(Resulti(A)) , (2.19)
where Resulti(A) are the possible outcome states for a nondeterministic action A, E
summarizes the agent’s available evidence about the world, and Do(A) is a proposition
informing that action A is executed in the current state.
This common utility representation may not be used in complex real problems, in
which actions to be executed are durative and have priorities. Often, it is necessary to
solve more urgent tasks and to leave others in wait (BASTOS; RIBEIRO; SOUZA, 2008).
For solving this question, time-dependent utility theory (HORVITZ; RUTLEDGE, 1991)
can be used. In this theory, the utility is a function of time, greater than zero, and can
be increasing or decreasing.
2.3.1 Decreasing time-dependent utility function
Decreasing functions can be used to represent a task lifespan and give some idea of
priorities to the decision maker. For example, there are two injured people that must
receive medical care by the only doctor present in a scenario. They have different injury
levels and will die if do not receive medical care as soon as possible. So, the doctor
needs to take a right decision in the attempt to save both lives, choosing which person to
attend first. This decision could be made easily, for this simple example, if the doctor has
a time-dependent utility function representing the importance of a person life (that is,
the death risk) in the current time. This function must map important information like
age, life decreasing rate, injury level and so forth, to a utility value (this mapping is not
the focus of this work, and it is assumed known by the decision maker). Therefore, the
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 44
right doctor decision is the one that executes the right attendance sequence, considering
durative actions, without the utility function reaching a zero value (death).
The decreasing time-dependent utility function can be represented by any decreasing
function, but for functionality and simplicity we use exponential or linear functions for
its representation:
U(A, t) = U(A, to) · e−k1·t
U(A, t) = U(A, to)− k2 · t, U(A, t) ≥ 0
, (2.20)
where U(A,t) is the utility for choosing action A at time t, to is the initial time, and k1
and k2 are parameters for adjusting the exponential and linear functions, respectively, for
the problem requirements.
2.3.2 Increasing time-dependent utility function
Increasing functions can be used for instance to represent profits along time. For
example, sometimes it is interesting to choose the task execution sequence based on greater
rewards, as is the case for the vehicle refueling problem, in which the utility of the refueling
state increases over time. Thus, as the fuel level decreases, the utility of refueling increases,
and after a certain time and depending on the current position of the vehicle (distance
from the fueling station), the refueling decision will be taken. Unlike the decreasing
utility function that has a minimum value (zero in the most of the cases), in this case
it is reasonable to assume a maximum value. For vehicle refueling in particular, it is
important to agree upon a maximum utility value that will refer to an empty tank. The
utility model is
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 45
U(A, t) = Umax · (1− e−k3·t)
U(A, t) = U(A, to) + k4 · t, U(A, t) ≤ 0
, (2.21)
where U(A, t) is the utility for choosing action A at time t, to is the initial time, Umax is
the maximum utility, and k3 and k4 are parameter constants for adjusting the exponential
and linear functions, respectively, for the problem requirements.
2.3.3 A time-dependent utility example
In this section, we present a more complex sequential decision making example using
time-dependent utilities (or rewards varying over time) modeled and solved by a TiMDP.
The example is presented in FIG. 2.4. It has three states, two selectable actions per state,
a finite horizon with limit of 100 time periods, unitary likelihood function over all time
horizon, and deterministic action durations.
The problem was solved by value iteration using Bellman equations with our approx-
imations (eq. 2.18). The results for the time-value functions V and Q are presented in
FIGS. 2.5, 2.6 and 2.7.
In the graphics, we have the time-value function V (State, t), which is the maximum
between the Q(State, Action, t) time functions.
This solution follows the same idea presented in section 2.2.2, with the difference that
the agent cannot wait in the state for the best decision making time. The solution is
hard to analyze (even for just three states and six actions), and it shows the need and
importance of TiMDP models for solving large time-dependent problems.
FIG.2.8 shows the policies depending on the time. Such policies define the actions
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 46
FIGURE 2.4 – Sequential decision making problem using time-dependent utility.
FIGURE 2.5 – Value function - V (1, t).
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 47
FIGURE 2.6 – Value function - V (2, t).
FIGURE 2.7 – Value function - V (3, t).
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 48
FIGURE 2.8 – Policies over time.
that the agent must choose based on the maximum Q value for a state, in time t. For
example, if the agent is at State 3 and the current time is 77, it must choose Action 6,
therefore moving to State 2.
In the TiMDP model, actions can be durative and uncertain (represented by pdfs). We
used a Normal Distribution to represent P1 in our example, with mean 10 and variance
3. Normal distribution are very convenient for this kind of problem, in which the action
is durative and with different durations over executions. For real situations and with a
reliable database of past action durations, a Normal distribution is a good approximation
for the action duration pdf, because it tends to cluster around a single mean value with
the proper variance. The solution for State 1 is shown in FIG. 2.9.
Comparing this result with the original problem (FIG. 2.4), it is clear that the function
Q(1, 1, t) becomes smoother. There is also a change in aspect for function Q(1, 2, t). In
fact, these changes in the function may change the overall policies due to the uncertainty
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 49
FIGURE 2.9 – Value function V (1, t), P1 = N(10, 3).
in action durations that is related to inherent variances.
The policies are shown in the FIG. 2.10.
Comparing to FIG. 2.8 we note a difference between the policies, that is caused by
the uncertainty added in the duration of Action 1. For example, now the policy in State 1
at time 25 is Action 2, against Action 1 in the original problem. The uncertainty added
to the action duration that belongs to State 1 has also caused a changing in the policy
for State 3. Therefore, it is very important to model correctly the pdfs in order to avoid
wrong decisions.
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES 50
FIGURE 2.10 – Policies over time - P1 = N(10, 3).
3 Truck Dispatching in Open Pit
Mines
Truck dispatching in open-pit mining consists of material (mineral matter) transporta-
tion during a shift by haul trucks from pickup stations (shovels) to delivery stations or
dump points (crushers, waste dumps or stock piles). The mineral matter is composed by
(KOLONJA; KALASKY; MUTMANSKY, 1993) ore (the most valuable mineral prod-
uct), leach (of marginal, but positive value), and waste (of no value). A mine is often
composed by different models of trucks and shovels (heterogeneous fleet), that work at
specific and different truck speeds and capacities and shovel digging rates.
Under truck driver solicitations, the dispatcher (or fleet manager) must decide in real-
time which shovel must the truck travel to (truck assignment) based on the current mine
state and on a decision support system or on own experience. These decisions have crucial
importance in the mining operation, given that material transportation is one of the most
important aspects of open-pit mine operations, representing up to 60% of operating costs
(ALARIE; GAMACHE, 2002). Due to its significance, several decision systems for this
problem have been developed in the last few years, improving productivity and reducing
operational costs.
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES 52
In the following sections similarities between truck dispatching and other vehicle dis-
patching systems are presented; the truck dispatching in open pit-mining is fully addressed
and detailed.
3.1 Vehicle dispatching problems
The truck dispatching problem does not occur only in Mining, and can be found in any
area that includes management of a vehicle fleet. Some examples of vehicle dispatching
problems are:
• Dynamic vehicle assignment problem (POWELL, 1988)
This is a common problem in the shipping industry. Given a request, the fleet manager
must decide which truck will be sent to the ship for loading and further delivering. After
the delivering, if there is not more loadings, the truck must be repositioned given future
loading demands.
• Dial-a-ride (GENDREAU; POTVIN, 1998)
It is a generalization of the dynamic vehicle assignment problem. During a day, a vehicle
must pickup and deliver material (or people) in different locations. This problem can have
some capacity restrictions and soft time-window constraints. The objective is doing all
transportation with minimum costs.
• Automated Guided Vehicles (AGVs) in the manufacturing industry (CO; TAN-
CHOCO, 1990)
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES 53
AGVs, or mobile robots, do the material transportation in a shop floor (raw material or
finished product) in an automated plant. The transportation occurs in close locations and
there are predefined robot waiting places to avoid queues in the processes.
Alarie and Gamache (2002) relate that truck dispatching in open-pit mining seems to
be a simplification of the other vehicle dispatching problems; however, it presents some
characteristics that are not commonly reported in the literature:
• Mines are closed systems, that is, the pickup and delivery points remain the same
and stay at the same position during a long period of time (generally, a shift of 8 to
12 hours);
• The traveling distances are short comparatively to the length of the shift (10 to 25
min);
• The frequency of demands at each pickup point is high (each 3 to 5 minutes); and
• If the size of the fleet is too large, truck queues may appear.
Additionally, we cite the high combinatorial aspect of the problem due to several
trucks typically working in a mine (the dispatching system must considers the position
of all trucks on its assignment to the shovels, which is exemplified by values in the next
chapter considering our example mine model). In the simulated mines presented by Jaoua,
Gamache and Riopel (2009), there are 15 trucks in a medium-scale mine (3 shovels and
2 dump points), and 60 trucks in a large scale mine (10 shovels and 3 dump points). In
Computer Science, the truck is an agent and this problem is modeled as a multi-agent
system.
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES 54
The number of trucks (fleet size) working in a mine is defined in a previous decision
epoch by a specific optimization technique, which is not the focus of this work. Situations
with more trucks than the optimal quantity (over-trucked) will increase the length of
queues at shovels, while less trucks (under-trucked) cause shovel underutilization. So, the
results of our algorithm are strongly influenced by the quantity of trucks operating in a
shift, that must be close enough to the optimal quantity.
3.2 Truck dispatching problem
Solving a truck dispatching problem in open-pit mining can signify maximizing ton-
nage production (productivity policy), minimization of equipment inactivity (truck waiting
time and shovel idle time), or Run of Mine (ROM) attendance (quality policy). In a mine,
the ROM is the quality level of the ore that can be a combination (balanced mean) of
many mining fronts. Pinto (2007) developed a Fuzzy Algorithm to simultaneously find a
balanced result using both production and quality policies. Therefore, to obtain the best
results, the problem is divided in two upper stages (KRAUSE; MUSINGWINI, 2007): (1)
truck resource allocation or fleet size estimation, and (2) real-time truck dispatching.
The fleet size estimation, which is not the focus of this thesis, is a very important
issue to be tackled in the truck dispatching problem; over-trucked situations will increase
the length of queues at shovels, whereas under-trucking cause shovel underutilization
(ALARIE; GAMACHE, 2002). The costs in an over-trucked mine are increased because
of higher truck utilization causing more maintenance stops and higher fuel consumption,
whereas the production objectives will not be attained in an under-trucked mine. Due
to its importance, this issue is tackled by many recent works in the mining literature.
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES 55
Brahma (2007) used Queueing Theory (GROSS, 2008) and Petri Nets (MURATA, 2002)
to find the optimal number of trucks in the context of a shovel dumper (haul truck) com-
bination system; Krause and Musingwini (2007) used a modified Machine Repair Model
for estimating the truck fleet size; Ta et al. (2005) used a chance-constrained stochastic
optimization approach in heterogeneous truck fleet resource allocation, accommodating
uncertain parameters such as truck load and cycle time; Huang et al. (2010) used a Ge-
netic Algorithm to optimize the number of trucks in an open-pit mine minimizing the
cost of truck transportation and maintenance; and Souza et al. (2010) developed a hybrid
metaheuristic algorithm (Greedy Randomized Adaptive Search Procedure and General
Variable Neighborhood Search) to minimize the number of mining trucks used to meet
production goals and quality requirements.
The real-time truck dispatching stage can be modeled by three strategies (ALARIE;
GAMACHE, 2002): (1) 1-truck-for-n-shovels, (2) m-trucks-for-1-shovel, and (3)m-trucks-
for-n-shovels.
3.2.1 The 1-truck-for-n-shovels strategy
This is the most used strategy in the mining industry. Trucks are assigned one by one
to shovels (FIG. 3.1).
The fleet manager assigns the truck to the shovel that is most suitable to the current
dispatching criterion, following a heuristic method (ALARIE; GAMACHE, 2002), or rule
(TA et al., 2005). Heuristics are procedures which are not mathematically proven but
which are based upon practical or logical operating procedures (RUSSELL; NORVIG,
2009). The most used heuristic methods used in truck dispatching are (KOLONJA;
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES 56
FIGURE 3.1 – 1-truck-for-n-shovels strategy.
KALASKY; MUTMANSKY, 1993; CETIN, 2004):
• Minimizing Shovel Waiting Time (MSWT): an empty truck in the dispatching point
is assigned to the longest idle time shovel, or to the shovel that expects to be idle
first. The objective of this criterion is to maximize the utilization of both truck and
shovels.
• Minimizing Truck Cycle Time (MTCT): the goal of this strategy is to assign an
empty truck to the shovel that allows the shortest truck cycle time, maximizing
the total tonnage productivity. The objective of this criterion is to maximize the
number of truck cycles during the shift.
• Minimizing Truck Waiting Time (MTWT): in this criterion, an empty truck in
the dispatching point is assigned to a shovel in which the loading operation starts
first. The objective of this criterion is to maximize the utilization of a shovel by
minimizing its waiting time.
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES 57
• Minimizing Shovel Saturation or Coverage (MSC): empty trucks are assigned to the
shovel at equal time intervals to keep a non-idle shovel operation. The objective
of this rule is to assign the trucks to the shovels at equal time intervals to keep a
shovel operating without waiting for trucks.
This strategy is myopic (or greedy) because the system is not completely observed
when a truck is being dispatched. For example, in a two shovel and two truck mine,
the first truck positioned at the dispatching point is assigned to the shovel number one,
because of its higher production, and the second one must have to be assigned to the
shovel number two (this example system does not allow queues in the mining). In this
situation, the total production, following the production policy, will not be the maximum
one. Thus, the global result (sum of individual truck productions) is affected because of
the greedy behavior of this strategy. Nevertheless, Lizotte and Bonates (1987) and Tu
and Hucka (1985) used this strategy in their works.
3.2.2 The m-trucks-for-1-shovel strategy
In this strategy (FIG. 3.2), the shovels are first sorted following a priority scheme
(e.g., by how much they are behind schedule on their production), and then, each one ”se-
lects”, from a list of m trucks, the one that best serves it (e.g., the truck with highest load
capacity and the nearest one). Alarie and Gamache (2002) relate that there is only one
implemented system that use this strategy, namely the DISPATCHTMcommercial package
for truck dispatching, which is developed by Modular Mining Systems. As DISPATCHTMis
a commercial package, no substantial information about its algorithms and heuristic meth-
ods are found in the scientific literature.
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES 58
Hig
he
r P
rio
rity
FIGURE 3.2 – m-trucks-for-1-shovel strategy.
3.2.3 The m-trucks-for-n-shovels strategy
This strategy (FIG. 3.3) considers simultaneously the m available trucks for dispatch-
ing and the n shovels present in the mine. This is a combinatorial problem that can be
modeled as an assignment problem or as a transport problem.
Elbrond and Soumis (1987) solves the truck dispatching as an assignment problem.
Here, the system considers for the assignment optimization the truck that asks for dis-
patching and the next 10 to 15 trucks that will ask for dispatch in the near future (e.g.
over the paths, finishing dumping or finishing material loading). Only the assignment of
the current asking truck is answered, other assignments are discarded. The system will
repeat the same steps in the next dispatching solicitations. The solution is only for the
near future dispatching trucks because of the combinatorial explosion of this problem,
that is, NP-hard (PAPADIMITRIOU; STEIGLITZ, 1998). In fact, a solution considering
the whole shift would be extremely time consuming, and impracticable for a real-time
system.
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES 59
Ne
xt k D
isp
atc
he
d T
rucks
FIGURE 3.3 – m-trucks-for-n-shovels strategy.
The system proposed by Temeng, Otuonye and Frendewey (1997) is modeled and
solved as a transport problem. In this problem, each supply center is associated to a
truck that will be dispatched in a near future, and each receiver center is a shovel present
in the mine. The receiver center demand is expressed as the number of trucks needed to
reach the production goals. The cost of sending a truck to a shovel is given by the truck
waiting time (truck queues at the shovels).
Another current trend in solving this kind of problem is the Evolutionary Algorithm
(EA), which uses some mechanisms inspired by biological evolution: reproduction, muta-
tion, recombination, and selection. This is a near optimal algorithm, that is, the global
optimal solution is not guaranteed to be found and the algorithm often converges to lo-
cal optimal solutions (the EAs have specific search mechanisms to avoid a premature
convergence to first local optimal solutions). A near optimal solution is generally found
must faster by the AGs than exact searching methods (e.g. breadth-first search), and can
be considered acceptable given the convergence criteria of the algorithm. Some related
The simulations presented in this section concern the off-line and on-line phases (ex-
ecuted before the shift) of the TiMDP model, which uses all mine data presented until
then. The assignment sub-phase, which returns the final results of this method (that is,
CHAPTER 4. TRUCK DISPATCHING MODELING 83
the total tonnage production), is presented in the next chapter due to the necessity of
a simulation considering all trucks and executed over the whole shift. All simulations
presented have results displayed in a graphical form in which data are presented by ex-
pected tonnage production (tons) versus time (minutes). In order to understand the main
characteristics of the method we present a diversity of simulations combining different
shovels, queues sizes, and phases (off-line and on-line).
The differences between the off-line (Normal TiMDP) and on-line (Dislocated TiMDP)
phases are shown in FIG. 4.7 for dispatching decision of T1 with queue size at shovel 1
equal to zero 2. We show in FIG. 4.7b the detail for the time-window (blocking of path
between C and S1), in which we can observe more carefully the differences between ton-
nage productions for a same instant. As commented in the previous section, the indicated
value at the off-line phase represents the expected tonnage for the current size of the
queue, represented by states Q; however, this value must be referred to state C, occurring
in the presented differences between phases. The time-window is represented by the first
and last discontinuities in the function, in the interval 295-355 minutes for the Normal
TiMDP. The difference in the original time-window that represents the path blockage be-
tween 300-360 minutes can be explained based on TiMDPs theory whose decisions depend
on subsequent action durations. Therefore, if T1 moves from S1’ to Q1 in instant 355 the
truck driver will have the choice (considering that there is only one truck in the mine) to
take the shortest path in the return travel, because the size of the queue is equal to zero
and its loading takes 5 minutes. However, as the decisions are taken in state C, we must
consider the Dislocated TiMDP function to analyze the time-window behavior. Now, we
can observe a difference in the time mark of the first function discontinuity comparing
2We define the term Dislocated TiMDP based on function dislocation that the on-line sub-phase causeson the original TiMDP – calculated in the off-line phase – defined here as Normal TiMDP.
CHAPTER 4. TRUCK DISPATCHING MODELING 84
Normal and Dislocated TiMDPs. This difference is explained by the dislocation on the
function caused by the expected time that the truck might have to travel from the crusher
to the shovel. The last discontinuity changes exactly to instant 360 minutes, which is the
unblocking instant of the shortest path between crusher and shovel 1. The other discon-
tinuities present in the Dislocated TiMDP can be explained based on outcome likelihood
functions (FIG. 4.4) applied to EQS. 4.3 and 4.4, which are used to refer the decision to
state C.
We can observe another effect of the on-line sub-phase in FIG. 4.7c, which is the
dislocation in time of the last value of tonnage production that differs from zero. The
zero value in the function indicates that the truck should go to a parking lot, due to the
time size of the shift, that is, the crusher ends it works exactly at time 600 minutes, and if
a truck travels to the shovel it may (expected values) encounter the crusher out of work.
Thus, in the simulations presented in the next chapter, the trucks are always sent to a
parking lot if expected values of tonnage production for all shovels are equal to zero. In
this example, we can observe clearly the dislocation of the TiMDP function caused by the
on-line sub-phase.
The next figures presented in this section refer to the on-line sub-phase. FIG. 4.8
presents the differences between values of expected tonnage production considering all
shovels and queues with size zero. We can observe in FIGs. 4.8b and 4.8c, the difference
that the values present along the shift. For example, in time around 293 minutes, the
policy (defined in the assignment sub-phase and found based on the higher tonnage pro-
duction value) changes from Shovel 1 to 3. This change in the policy can be explained
by the time-window. The Shovel 1 returns to be the best dispatching decision in time
360 minutes. In time around 572 minutes we observe a change in the policies, which are
CHAPTER 4. TRUCK DISPATCHING MODELING 85
FIGURE 4.7 – Expected tonnage production at crusher C (Truck 1 - Shovel 1 - Queue 0).
CHAPTER 4. TRUCK DISPATCHING MODELING 86
dependent on the approximating end of the shift, and the timespans in the system, such
as t shovel and t load. These decisions based on expected tonnage production, current
time, and queue size, are all executed in the assignment sub-phase.
The differences between the expected tonnage production for a same shovel and truck,
and different size of queues are presented in FIG. 4.9 for T2 and Shovel 3. We can observe
that the differences remain almost the same during most of the shift (FIG. 4.9b), except
for the end (FIG. 4.9c), in which the differences are all highly dependent on the current
time. Clearly, the truck should go earlier to the parking lot if the size of the queue is
larger.
FIG. 4.10 compares results in a more realistic behavior of the mine environment, in
which the size of the queues differs from each one during the shift. We observe in the
zoomed figures (FIGs. 4.10b and 4.10c) that the policies change depending on current
time; before time 300 the action move_shovel_1 is better than action move_shovel_2,but
it is the worst action during the period 300-333. We can note that move_shovel_3, even
leading to the lengthiest queue, is a good action to be selected during most of time. This
issue occurs because of the average loading rate of Shovel 3, which is 2.5 times longer
than in Shovel 1 and 5 times longer than in Shovel 2. We must also note that the sizes of
the queues change during all the shift, indeed modifying the policies, however, we show
in this section comparisons among fixed size queues just for a better understanding of the
TiMDP model.
Up to this point, we have shown results of TiMDP models considering standard time
representations (exact durations of the actions), whereas the time in a real-world problem
tends to be non exact. Let us then consider, as presented in Table 4.3, the action durations
represented by Gaussian distributions. In order to show the differences between standard
CHAPTER 4. TRUCK DISPATCHING MODELING 87
FIGURE 4.8 – Expected tonnage production at crusher C (Truck 1 - Queue 0).
CHAPTER 4. TRUCK DISPATCHING MODELING 88
FIGURE 4.9 – Expected tonnage production at crusher C (Truck 2 - Shovel 3).
CHAPTER 4. TRUCK DISPATCHING MODELING 89
FIGURE 4.10 – Expected tonnage production at crusher C (Truck 2).
CHAPTER 4. TRUCK DISPATCHING MODELING 90
and Gaussian time representations, we present in FIG. 4.11 two graphics for the same
condition, in which are considered the expected tonnage production at C for T3 and sizes
of queues equal to zero at Shovels 1, 2, and 3. We can observe a smooth function for
the Gaussian representation (FIG. 4.11b) compared to the standard representation (FIG.
4.11a), which can be explained based on the convolution operations of the Q functions
with the discretized Gaussian representations that are used in the TiMDP solution. In
order to guarantee a good solution (convergence of TiMDP solution to its near optimal
values) and to limit the use of memory in simulations, we used a discretization step of 0.2
minutes 3.
The effects of the Gaussian representations can be better observed in FIG. 4.12. Com-
paring FIGs. 4.12b2 and 4.12b1 we observe the increase of the expected tonnage produc-
tion introduced by the Guassian representations. Policies can change also due to the
behavior of the Gaussian distribution; originally the selected action between times 353
and 354 was move_shovel_2 (FIG. 4.12c1), being changed to action move_shovel_1 in
the Gaussian representation (FIG. 4.12c2). The smoothness from the Gaussian represen-
tation can be also observed comparing FIGs. 4.12d1 and 4.12d2, and 4.12e1 and 4.12e2.
We note that all those modifications are based on all combined Gaussian distributions
present in the model, as shown in Table 4.3, and it can be a difficult task to predict the
behavior of this type of representation due to the high number of combination of values
that are executed in a TiMDP solution. Certainly, these modifications are more significant
in a system with a complete Gaussian representation of all involved times.
3Discretization steps smaller than 0.2 minutes caused memory overflow because of usage of 32 bitsoperational system. Steps bigger than 1 minute returned results much different of results presented bystandard TiMDP. We have reduced regularly the discretization steps upon 0.2 minutes observing theconvergence tendency of the results and avoiding memory overflow.
CHAPTER 4. TRUCK DISPATCHING MODELING 91
0 100 200 300 400 500 6000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
Time (min)
To
nn
ag
e P
rod
uctio
n (
t)
Expected Tonnage Production at Crusher C (Truck 3 - Queue 0 - Gauss)
Shovel 1
Shovel 2
Shovel 3
0 100 200 300 400 500 6000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
Time (min)
To
nn
ag
e P
rod
uctio
n (
t)
Expected Tonnage Production at Crusher C (Truck 3 - Queue 0)
Shovel 1
Shovel 2
Shovel 3
a)
b)
FIGURE 4.11 – Comparative of expected tonnage production at crusher C (Truck 3 -Queue 0) for standard and Gauss representations.
CHAPTER 4. TRUCK DISPATCHING MODELING 92
0 100 200 300 400 500 6000
1000
2000
3000
4000
5000
6000
7000
8000
9000
Time (min)
Ton
nage
Pro
duct
ion
(t)
Expected Tonnage Production at Crusher C ( Truck 1 - Queue 0 - Gauss)
Shovel 1Shovel 2Shovel 3
520 530 540 550 560 570 580 590 6000
100
200
300
400
500
600
700
800
900
1000
1100
Time (min)
Ton
nage
Pro
duct
ion
(t)
290 300 310 320 330 340 350 360 3702800
3000
3200
3400
3600
3800
4000
4200
Time (min)
Ton
nage
Pro
duct
ion
(t)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 57900
7950
8000
8050
8100
8150
8200
Time (min)
Ton
nage
Pro
duct
ion
(t)
520 530 540 550 560 570 580 590 6000
100
200
300
400
500
600
700
800
900
1000
1100
Time (min)
Ton
nage
Pro
duct
ion
(t)
290 300 310 320 330 340 350 360 3702800
3000
3200
3400
3600
3800
4000
4200
Time (min)
Ton
nage
Pro
duct
ion
(t)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 57900
7950
8000
8050
8100
8150
8200
Time (min)
Ton
nage
Pro
duct
ion
(t)
0 100 200 300 400 500 6000
1000
2000
3000
4000
5000
6000
7000
8000
9000
Time (min)
Ton
nage
Pro
duct
ion
(t)
Expected Tonnage Production at Crusher C ( Truck 1 - Queue 0)
FIGURE 4.17 – Truck dispatching GA reproduction result.
another reproduction phase, and start a new generation. The convergence was obtained
in around 50 generations, and took less than one minute. The solution, which is the first
truck assignment, is the best individual after the problem convergence.
For the next decision instants we consider that only one truck asks for dispatching
per time. Now, the GA dispatching method will consider for shovel assignment the asking
truck and the next m estimated trucks to arrive in state C in the next tGA time period.
The shovel assignments for the future expected trucks arriving in the state C during the
considered tGA will be placed in a so-called dispatching list. Another dispatching list will
be only generated when the first truck arrives at state C after the considered tGA. As this
method perform the truck dispatch considering more than one truck, it is considered a
m-trucks-for-n-shovels strategy.
Now, the chromosome (represented as the first array of the previous chromosome
presented in FIG. 4.13) is defined considering observations on trucks being loaded at
shovels and unloaded at the crusher, waiting in the queues, and traveling through the
paths. Some genes may indicate zero, representing that the truck was not observed (its
arrival time on state C cannot be estimated) and it will not be considered in the GA
algorithm for that dispatching decision.
In order to estimate the truck cycle, we insert an auxiliary array to the chromosome
CHAPTER 4. TRUCK DISPATCHING MODELING 98
taT#1. . .taT#2 taT#3 taT#14 taT#15
T#1 T#2 T#3 T#14 T#15
Estimated
arrival time
. . .
FIGURE 4.18 – Auxiliary chromosome array.
(FIG. 4.18), which only indicates the estimated arrival time of the truck (ta) on state C,
not being used in crossovers or mutations.
Therefore, the GA algorithm is executed considering the current dispatching truck,
and the next trucks that are expected to dispatch in the next tGA minutes. The estimated
arriving times of trucks on state C depend on observations of their current states. However,
we face some specific characteristics of dispatching simulator that difficult the estimation
of arriving times, such as impossibility of observing the truck when traveling through
the paths and the position of a specific truck in any queue 4. These issues and the
stochastic behavior of the problem (uncertainty on selecting the traveling paths) add
some imprecisions on trucks arriving time, which may imply in results that differs from
the previewed by the GA algorithm. In this case, when a truck arrives at state C before
tGA and it is not at the dispatching list, a dispatch heuristic method (such MTCT) must
be executed in order to perform the shovel assignment. Certainly, this situations will
degrade the quality of the general GA method. In order to minimize this problem, we
limit the maximum considered time for chromosome construction (tGA) based on current
truck observations and estimation of arrival times at state C. This time limitation is
presented by the ALG. 3, in which tGA is found based on estimated trucks arrival times
on state C. In the algorithm, T S1, T S2, and T S3 are the set of estimated truck arrival
times of observed trucks being loaded and at first and second position in queues on
4We added the possibility of observing the first and the second trucks in the queue in order to improveour results.
CHAPTER 4. TRUCK DISPATCHING MODELING 99
shovel 1, shovel 2, and shovel 3, respectively; tcurrent is the current shift time got in the
dispatching GA decision. In the t max S calculation, it is added a constant, that is,
the loading time of the smallest truck on each shovel. The tGA is basically the smallest
maximum arrival time at crusher of considered trucks. Therefore, the trucks composing
the chromosome must have their estimated arrival time at state C between tcurrent and
tGA. This approximation added more GA dispatching executions, however the quality of
the results was considerably improved due to the drastic reduction of heuristic dispatches.
Algorithm 3: Calculation of maximum truck arrival time in state C
Input: CALC TGA(T S1,T S2,T S3,tcurrent)Output: tGAt max S1← max(T S1) + 5 ;t max S2← max(T S2) + 10 ;t max S3← max(T S3) + 2 ;t max a← min(t max S1, t max S2, t max S3) ;t max← tcurrent ;foreach t s1 ∈ T S1 do
if t s1 ≤ t max a AND t s1 ≥ t max thent max← t1 ;
end
endforeach t s2 ∈ T S2 do
if t s2 ≤ t max a AND t s2 ≥ t max thent max← t2 ;
end
endforeach t s3 ∈ T S3 do
if t s3 ≤ t max a AND t s3 ≥ t max thent max← t3 ;
end
endtGA ← t max ;return tGA;
The GA is always started when the first truck arrives in state C after the previously
calculated tGA following the previous shown steps (first dispatching decision) with small
modifications because of differences on the current chromosome construction. As the
chromosome is formed by only one array indicating the trucks positions, the mutation
CHAPTER 4. TRUCK DISPATCHING MODELING 100
phase, that was executed in the decision queue position, is now executed on the trucks
positions, following the previous defined procedures. The fitness function follows the
previous one, which is used for minimizing the total cycle time; however, now considering
the estimated truck arrivals times (ta) to find the cycle time for each truck represented
in the chromosome. As the number of shovels attended (indicated by the chromosome
construction) is dependent on the observed trucks, the size of the population used in the
GA algorithm will be dependent on the chromosome configuration. We have adjusted the
population size in order to converge to good results in short time (less than one minute)
due to the real-time dispatching best practices.
4.2.5 G-TiMDP
The introduced TiMDP model for truck dispatching seems to be a good representation
for this problem because of its specific characteristics, such as: stochastic behavior (the
real-world problems are often uncertain), sequential decision making (the accumulated
reward, or value function, considers the expected results of all sequential actions during
the whole shift, hence the reward can be considered just for one action – in our prob-
lem the action unload_truck), time-dependent decisions (time-windows and variations
on outcomes over time can be easily considered). However, due to single-dependent-agent
approximation, the model follows the 1-truck-for-n-shovels strategy, that is, the dispatch-
ing decisions are egotist leading to not so good results. In order to improve the results, we
introduce the Genetic TiMDP (G-TiMDP), which is a hybrid algorithm that combines
the sequential decisions in uncertain environments of the TiMDP with the combinatorial
characteristic of GA, leading to a new m-trucks-for-n-shovel method.
The G-TiMDP differs from the GA model (presented in the last subsection) only in
CHAPTER 4. TRUCK DISPATCHING MODELING 101
the selection phase, in which the fitness function is evaluated based on maximization of the
Expected Tonnage Production that is given by the proposed TiMDP model. Following
the TiMDP phases, in this hybrid dispatching method the off-line and on-line phases
remain calculated as previously, providing its results to a new assignment phase, which is
now performed by the GA dispatching method. As the TiMDP results in the cited phases
are found before the mine shift and the shovel assignments resulted from the GA method
(such as the previous one), the dispatching time of G-TiMDP remains the same of the
pure GA method, that is, less than one minute.
5 Simulations and Analysis
We defined in the last chapter some truck dispatching methods that are applied to
our example mine: (1) Greedy heuristic, (2) MTCT heuristic, (3) TiMDP model, (4) GA
model, and (5) G-TiMDP model. In order to test the performance of these methods, we
developed a simulation framework based on example mine data, such as shovels character-
istics and positions, trucks characteristics, present uncertainties, shift length, and queue
size limitations. The dispatching methods were evaluated by Monte Carlo simulation, be-
ing their results compared using Student’s t-test providing enough data for further quality
analysis.
5.1 Simulation Framework
The proposed dispatching methods were developed and simulated using the software
SimEventsTM(a MatlabTMpackage). All simulations follow the characteristics of the pro-
posed mine environment example, being executed during a 10 hour shift. The objective of
the simulations is to compute the total tonnage production in the end of the shift consid-
ering a fleet composed by 15 heterogeneous trucks as already proposed. A general mine
simulation environment is presented in FIG. 5.1. The dispatching methods use the same
simulation framework, except for specific functionalities, shown in the next subsections.
CHAPTER 5. SIMULATIONS AND ANALYSIS 103
Referring to FIG. 5.1, the trucks are treated as entities, which are generated in the
Truck Generator block with their specific characteristics (based on truck type), which
will define the traveling times along the paths and the quantity of transported material.
After that, at time zero (started by the Start Timer block), the trucks are positioned
following a priority scheme (faster trucks are positioned first; GA and G-TiMDP methods
follow the priority based on the first dispatching decision defined in Sections 4.2.4 and
4.2.5) in the Priority Queue block. The TiMDP block is specific for TiMDP and G-
TiMDP methods and is responsible for getting the results of the real-time phase from
the workspace in the current time t. Some entity attributes that indicate important
informations for dispatching decisions, such as size of the queues at the shovels, are set
in the Set Attribute block with current data from the environment. The dispatching
decision of all methods considering their specific characteristics is taken in the Shovel
Decision Function block. After the shovel assignment, the truck travels to the Shovel
and then (after material loading) goes to the Crusher. The time period during which the
truck stays unloading at the crusher is calculated in the Crusher Time Function block.
The quantity of material transported in a cycle is added to the total tonnage production
FIGURE 5.6 – Quantity of trucks in shovels for the Greedy Heuristic simulation.
CHAPTER 5. SIMULATIONS AND ANALYSIS 110
shovel is not considered here (we consider that the paths for the shovels, despite being the
same in the real problem, are different and are considered independently with proper out-
comes). The time-window (the µ1 blockage between times 300 and 360 minutes) is clearly
shown in FIG. 5.7a, in which no truck is allowed to travel through path 1. The likelihood
function is also represented in the graphics by the higher usability of the outcomes µ1, µ2,
and µ3, sequentially.
In the MTCT heuristic the trucks are dispatched according to the minimum truck
cycle, which is directly related to the shovels loading rates and consequently the size of
their queues. This behavior makes the mean sizes of the queues differ from each other,
which can be explained by the shovels loading rates. Indeed, fastest shovels can attend
more trucks in the same timespan, that is, more trucks can be sent to those shovels,
consequently leading to a larger queue. FIG. 5.8 shows this behavior, in which the highest
mean size of the queue is for Shovel 3 (FIG. 5.8c), followed by Shovel 1 (FIG. 5.8a), and
then Shovel 2 (FIG. 5.8b). This dispatching heuristic knows about the path blockage
(time-window), but does not know about the likelihood function; it considers that the
trucks always travel through the shortest available path, that is, outcome µ1 during the
whole shift, except during the time-window in which the outcome is µ2. The total tonnage
production for this simulation was 88 900 tons.
In the MTCT heuristic simulation, trucks must go to the parking lot (FIG. 5.9) when
it is not possible to complete a cycle until the end of the shift. In fact, due to uncertainties
present in the problem, such as time in the queues and path outcomes, some trucks are
dispatched and do not return to the decision point (Crusher) until the end of the shift.
This problem can be bypassed by considering the addition of a constant in the calculated
cycles, but this could worsen the results.
CHAPTER 5. SIMULATIONS AND ANALYSIS 111
a)
b)
c)
0 100 200 300 400 500 6000
1
2
3
4
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - u1 - Shovel 1 - Greedy
0 100 200 300 400 500 6000
1
2
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - u3 - Shovel 1 - Greedy
0 100 200 300 400 500 6000
1
2
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - u2 - Shovel 1 - Greedy
FIGURE 5.7 – Quantity of trucks in paths going to Shovel 1 for the Greedy Heuristicsimulation.
CHAPTER 5. SIMULATIONS AND ANALYSIS 112
0 100 200 300 400 500 6000
1
2Quantity of Trucks - Shovel 2 - MTCT
Time (min)
Qu
an
tity
of T
rucks
0 100 200 300 400 500 6000
1
2
3
4
5Quantity of Trucks - Shovel 1 - MTCT
Time (min)
Qu
an
tity
of T
rucks
0 100 200 300 400 500 6000
1
2
3
4
5
6
7
8
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 3 - MTCT
a)
b)
c)
FIGURE 5.8 – Quantity of trucks in shovels for the MTCT Heuristic simulation.
CHAPTER 5. SIMULATIONS AND ANALYSIS 113
575 580 585 590 595 600
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Time (min)
Tru
ck N
um
be
r
Trucks on Parking Lot - MTCT
FIGURE 5.9 – Trucks on parking lot for the MTCT heuristic.
The quantity of trucks in shovels for the TiMDP model is shown in FIG. 5.10. It is
hard to notice a different behavior from the last dispatching method. However, due to
likelihood knowledge and consideration of the sequential decision, the tonnage production
is slightly better: 90 200 tons.
The parking lot occupation for the TiMDP model is shown in FIG. 5.11. Due to
uncertainties in the problem (e.g., time in the queues, and outcomes of move_shovel
action), not all trucks go to the parking lot until the end of the shift.
The GA model for truck dispatching is based on the MTCT heuristic, and likewise it
does not assume knowledge about the likelihood function for path selection. Moreover,
its assumptions are the same regarding the time-window. However, dispatching is made
considering the sequence of trucks going to the Crusher in the next time tGA, which leads
to better results. The quantity of trucks in shovels for this model is shown in FIG. 5.12.
An interesting result is that the mean quantity of trucks in Shovel 1 (FIG. 5.12a) does
not decrease during the time-window. Independently of this behavior, the results of this
CHAPTER 5. SIMULATIONS AND ANALYSIS 114
a)
b)
c)
0 100 200 300 400 500 6000
1
2
3
4
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 1 - TiMDP
0 100 200 300 400 500 6000
1
2
3Quantity of Trucks - Shovel 2 - TiMDP
Time (min)
Qu
an
tity
of T
rucks
0 100 200 300 400 500 6000
1
2
3
4
5
6
7
8
9
10
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 3 - TiMDP
FIGURE 5.10 – Quantity of trucks in shovels for TiMDP model simulation.
CHAPTER 5. SIMULATIONS AND ANALYSIS 115
582 586 590 594 598 6002
4
6
8
10
12
14Trucks on Parking Lot - TiMDP
Time (min)
Tru
ck N
um
be
r
FIGURE 5.11 – Trucks on parking lot for TiMDP model.
method are better than those for MTCT; its total tonnage production is 90 300 tons 1.
The parking lot occupation for the GA model is shown in FIG. 5.13.
The quantity of trucks in the shovels for the G-TiMDP model is presented in FIG. 5.14.
Again, it is hard to identify substantial changes when comparing to the other presented
methods. An interesting aspect is that Shovel 2 is used more times during the shift, with
a peak of 4 trucks on it. The results of this m-trucks-for-n-shovels method were the best
among all we tested: a production of 90 600 tons.
The parking lot occupation for the G-TiMDP model is shown in FIG. 5.15.
1We note that this result is even better than the one presented for the TiMDP model, however it is aspecific result based on considered paths outcomes. Based only on this result, we cannot claim that thismethod is superior to TiMDP. A complete and statistically sound comparison is presented in the nextsection.
CHAPTER 5. SIMULATIONS AND ANALYSIS 116
a)
b)
c)
0 100 200 300 400 500 6000
1
2
3
4
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 1 - GA
0 100 200 300 400 500 6000
1
2
3
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 2 - GA
0 100 200 300 400 500 6000
1
2
3
4
5
6
7
8
9
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 3 - GA
FIGURE 5.12 – Quantity of trucks in shovels for the GA model simulation.
CHAPTER 5. SIMULATIONS AND ANALYSIS 117
570 575 580 585 590 595 6001
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Time (min)
Tru
ck N
um
be
r
Trucks on Parking Lot - GA
FIGURE 5.13 – Trucks on parking lot for the GA model.
5.3 Comparative Results and Analysis
Given the simulation framework and due to the stochastic behavior of the system
(path outcomes), we compare the presented truck dispatching methods using Monte Carlo
simulation (MOONEY, 1997).
In the simulations, we used two different t queue (multiplying factor due to the queue
size) for the TiMDP and G-TiMDP models. In the first simulation, we used the necessary
time to load (t load) the truck type with the mean capacity (truck T2) as t queue. How-
ever, even though this time value is a good initial approximation, t queue certainly will
not be the mean value of t load (unbalanced number of truck types and parallel queues
with different servers, or shovels). As a better approach, we did some preliminary simu-
lations and found t queue as the Average queue length/Average wait, whose values are
given by the statistical information from the Queue block. An example is found in FIG.
5.16, in which only the mean time in the queues of Shovel 1 and Shovel 3 are given by
FIGs 5.16a and 5.16b, respectively. The mean time of the Shovel 2 queue is not repre-
CHAPTER 5. SIMULATIONS AND ANALYSIS 118
a)
b)
c)
0 100 200 300 400 500 6000
1
2
3
4
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 1 - G-TiMDP
0 100 200 300 400 500 6000
1
2
3
4
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 2 - G-TiMDP
0 100 200 300 400 500 6000
1
2
3
4
5
6
7
8
9
Time (min)
Qu
an
tity
of T
rucks
Quantity of Trucks - Shovel 3 - G-TiMDP
FIGURE 5.14 – Quantity of trucks in shovels for the G-TiMDP simulation.
CHAPTER 5. SIMULATIONS AND ANALYSIS 119
570 575 580 585 590 595 6001
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Time (min)
Tru
ck N
um
be
r
Trucks on Parking Lot - G-TiMDP
FIGURE 5.15 – Trucks on parking lot for the G-TiMDP model.
sented because it is not constantly used, making the simulator incapable to indicate the
variables Average queue length and Average wait (these values are always shown as zero
by the simulator). The considered t queue must be taken after its convergence. Based on
many observations, we adopted the values 6.2, 13 (estimated), and 2.5 minutes for t queue
of shovels 1, 2, and 3, respectively.
The simulations results (Table 5.1) are presented for all methods, considering around
4500 simulations and a standard representation of involved times, that is, the times are
always exact. The TiMDP (1) and G-TiMDP (1) methods used the original t queue,
whereas TiMDP (2) and G-TiMDP (2) methods used the estimated t queue. Considering
only the averages, we can observe that the better methods are for, in descending order:
G-TiMDP, TiMDP, GA, MTCT heuristic, and Greedy heuristic, as we have previewed
in the previous sections. The TiMDP (2) results contradicted our predictions by being
worse than the TiMDP (1) results, which can be explained by the single-dependent agent
approximations. However, the G-TiMDP (2) results, which are for a m-trucks-for-n-
CHAPTER 5. SIMULATIONS AND ANALYSIS 120
0 100 200 300 400 500 6003
3.5
4
4.5
5
5.5
6
6.5
Time (min)
Me
an
tim
e (
min
)
Mean Time in Queue - Shovel 1 - TiMDP
0 100 200 300 400 500 6000
0.5
1
1.5
2
2.5
3
Time (min)
Me
an
tim
e (
min
)
Mean Time in Queue - Shovel 3 - TiMDP
a)
b)
FIGURE 5.16 – Mean time in the queues for TiMDP model.
CHAPTER 5. SIMULATIONS AND ANALYSIS 121
TABLE 5.1 – Monte Carlo simulations of truck dispatching methods using standard rep-resentation (standard deviation equals zero for all considered times).
Method Sims Min (tons) Max (tons) Mean (tons) Std Dev (tons)Greedy 4722 74 300 80 600 77 427 870.3MTCT 4570 86 900 91 400 89 292 629.2
The comparatives of the proposed truck dispatching methods are shown in Table
5.4 for the system with Gaussian representations for the involved times. The proposed
G-TiMDP Gauss method is superior to all other methods, however it is worse than G-
TiMDP (with no consideration of Gaussian distributions). However, we note that the
confidence level is smaller than 95%, therefore we cannot categorically affirm that G-
TiMDP is better than G-TiMDP with the Gauss model. In fact, in our view G-TiMDP
Gauss should be selected to be used in truck dispatching environments because of its
more precise uncertain time representation. Certainly, better results can be attained in
environments with a complete Gauss representation of involved times, as commonly found
in real-world applications.
6 Final Remarks
We present in this chapter the final conclusions of this thesis, based on all contributions
made and results achieved along the work. We also suggest future work that can be
useful to improve the representation of the real-world truck dispatching problem and the
proposed dispatching methods, leading to consideration of contingencies by the model and
probably to higher tonnage production.
6.1 Conclusions
We presented the development of diverse truck dispatching methods to optimize the
tonnage production in an example stochastic time-window mine. The developed methods
were: (1) Greedy heuristic, (2) MTCT heuristic, (3) TiMDP model, (4) GA model, and
(5) G-TiMDP model. The methods (1) and (2) are classical in the open-mining industry,
being classified as 1-truck-to-n-shovels strategies. They suffer from many problems, such
as egotist behavior and determinism, being their results used as comparatives to the other
developed methods. The method (3) is also classified as a 1-truck-to-n-shovels strategy,
and the methods (4) and (5) are classified as m-trucks-to-n-shovels strategies, in which its
combinatorial behavior may lead to better results. Our contributions point to methods
(3), (4), and (5), whereas methods (1) and (2) are classical ones used in truck dispatching
CHAPTER 6. FINAL REMARKS 125
for open-pit mining, which were used in the thesis just for basement of the problem and
result comparisons over a simulated example mine environment.
The example mine environment was composed by time-window and uncertain vari-
ables, such as path choices by truck driver and involved times modeled as Gaussian dis-
tributions. The time-window was used to indicate the path blockage in a period of the
shift, and was assumed as available information by all methods, except (1).
We developed a novel application of TiMDP models to the real-time truck dispatch-
ing problem, which is a real-world problem with inherent uncertainties. The TiMDP
model was solved by introducing backwards convolution, which is a solution method for
discretized states. In order to minimize the curse of dimensionality (result of agents com-
bination and state discretization), we modeled the problem using the introduced single-
dependent agent representation, in which agents are modeled in a concurrent single agent
environment being their actions choices dependent on the current general state of the en-
vironment (which is changed by all agents’ actions). In our development, the dependence
was modeled based on the size of the queues at shovels. Hence, the dispatching decisions
were dependent on the characteristics of the truck itself and on the current state of the
mine environment.
Since all previously developed methods belonged to the 1-truck-to-n-shovels strategy
class (egotist behavior), we introduced GA truck dispatching, which used the MTCT
heuristic as fitness function. This m-trucks-to-n-shovels strategy considers the following
trucks in the dispatching decision, however inherent uncertainties of the environment are
not considered, leading to worse results than the TiMDP model. Finally, we developed
our main contribution, a novel hybrid method called G-TiMDP, which is basically the GA
model using the results of the TiMDP model as fitness function. Basically, this approach
CHAPTER 6. FINAL REMARKS 126
adapted the TiMDP model to a m-trucks-to-n-shovels strategy.
All presented methods demonstrated to be good choices for the considered real-time
problem, taking into account that dispatching decisions must be pursued quickly and the
methods returned the decisions in timespans shorter than one minute.
Monte Carlo simulations for the example mine were performed for all methods using
the SimEventsTMenvironment. The results were compared using Student’s T-Test, in
which the G-TiMDP model was ranked as the best one.
The presented methods can also be used for other mine configurations, simply by
adjusting the models to the new conditions. Certainly, considering a tonnage production
goal, G-TiMDP will be, for any mine configuration, the best method among the presented
ones.
6.2 Future work
We address some future work in order to improve the methods and to deal with
common contingencies present in mine environments.
Factored TiMDP representation
MDPs suffer the curse of dimensionality problem, in which state space explosion can
lead to extremely time-consuming solutions. TiMDPs are more affected because of its
discretized time representation, which is indeed a segmentation of time in states (the
number of states grows up based on the discretization resolution increase). Factored
MDPs (BOUTILIER; DEARDEN; GOLDSZMIDT, 2000) deal very well with large state
spaces, by considering states represented in a Dynamic Bayesian Network. We propose
CHAPTER 6. FINAL REMARKS 127
a factored TiMDP, which can be a good approach for time-dependent stochastic decision
problems with large state spaces. Thus, our presented TiMDP model for truck dispatching
can be represented as a multi-agent problem (as a m-trucks-to-n-shovels strategy), with
a likely improvement of results.
Time-dependent Reinforcement Learning
Reinforcement Learning (RL) (SUTTON; BARTO, 1998) is a method for learning
in uncertain environments that can be represented according to a MDP formalism. In
RL, the agent learns characteristics about the environment based on its actions that can
return positive or negative reinforcements (rewards or punishments in the MDP jargon).
We propose the study of Time-Dependent Reinforcement Learning (TiRL), in which the
reinforcements will be also related to current time and action durations. The associated
theory may be applied to all time-dependent problems that can already be represented
by TiMDPs. Therefore, by following a RL representation, TiRL can be based on TiMDP
theory. In our presented truck dispatching problem, TiRL could be successfully applied in
all involved time adjustments (such as t queue), leading to better results along the shifts.
TiMDP sensibility analysis
Real-world problems are subject to non-previewed alterations along the decision pe-
riod. In our problem, paths can be blocked and shovels may break down or become
unavailable during the shift. To consider such issues, TiMDP can be remodeled and its
off-line and on-line phases executed again. However, all this rework might require a long
time, which is unacceptable for real-time dispatching problems. Another solution, could
be a policy selection considering that some states are unavailable, e.g. in a state, an agent
can select among three different actions and the policy indicates the action that leads to
an unavailable state; in this case, the agent must select the second action in the policy
CHAPTER 6. FINAL REMARKS 128
list. In some cases, depending on the weight of the unavailable state to the model, the
selected action by the agent can be the best one, however, the best action could be the
third one, changing the quality of the final result. Therefore, we propose an analysis on
TiMDP sensibility, in which we could know in advance the maximum error in the quality
of the solution introduced by the modification caused in the original TiMDP model.
GA method improvement
The introduced GA method used to solve the truck dispatching problem can be im-
proved in order to provide better solutions in a shorter time. Problem representation
and reproduction phase revisions certainly will improve the final results. Based on this
improvement and on the last cited future work, probably, a reviewed version of G-TiMDP
will provide better results than those presented in this thesis.
Consideration of production and blending goals
Application of a TiMDP model to a real-time dispatching problem allowed us to solve
the truck dispatching problem considering the simplified tonnage production goal. Gen-
erally, in real-world mines, the goals are based on plans, which considers daily production
and blending necessities. Hence, we propose the introduction of these goals in future
models using all developed stochastic methods for truck dispatching, in order to have a
better representation of the problem which might improve the quality of the results.
Truck dispatching based on time-dependent utilities
The truck dispatching problem is composed by many other parameters that were not
considered in our developed modelings and can be also considered time-dependent. Pa-
rameters like fuel consumption and tires usage are truck-displacement-dependent, however
they can be correctly approximated to dependence on time. This way, we can use the in-
CHAPTER 6. FINAL REMARKS 129
troduced time-dependent utilities applied as rewards in TiMDP models to consider other
important parameters in the truck dispatching problem. For example, when a truck is
asking for dispatch the system must decide whether it is better to send it to a shovel or to
the fuel-station. When these decisions are taken incorrectly, they may send the trucks to
a premature refueling, thus reducing the total tonnage production, or, in the worst case,
causing a truck halting in the mine environment because of an empty fuel.
Bibliography
ALARIE, S.; GAMACHE, M. Overview of solution strategies used in truck dispatchingsystems for open pit mines. International Journal of Surface Mining,Reclamation and Environment, Taylor and Francis Ltd, v. 16, n. 1, p. 59–76, 2002.
BASTOS, G. S.; RIBEIRO, C. H. C.; SOUZA, L. E. de. Variable utility in multi-robottask allocation systems. Robotic Symposium, IEEE Latin American, IEEEComputer Society, p. 179–183, 2008.
BELLMAN, R. Dynamic programming. Science, American Association for theAdvancement of Science, v. 153, n. 3731, p. 34–37, 1966.
BERTSEKAS, D. Dynamic programming: deterministic and stochastic models.[S.l.]: Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1987. ISBN 0132215810.
BOUTILIER, C.; DEAN, T.; HANKS, S. Decision-theoretic planning: Structuralassumptions and computational leverage. Journal of Artificial IntelligenceResearch, Citeseer, v. 11, n. 1, p. 94, 1999.
BOUTILIER, C.; DEARDEN, R.; GOLDSZMIDT, M. Stochastic dynamicprogramming with factored representations. Artificial Intelligence, Elsevier, v. 121,n. 1-2, p. 49–107, 2000.
BOYAN, J.; LITTMAN, M. Exact solutions to timedependent mdps. Advances inNeural Information Processing Systems, v. 13, p. 1–7, 2000.
BRAHMA, K. C. A Study on Application of Strategic Planning AndOperations Research Techniques in Open Cast Mining. 2007. Tese (Doutorado)— Department of Mining Engineering, National Institute of Technology, 2007.
BRESINA, J.; DEARDEN, R.; MEULEAU, N.; RAMAKRISHNAN, S.; SMITH, D.;WASHINGTON, R. Planning under continuous time and resource uncertainty: Achallenge for AI. In: CITESEER. AIPS Workshop on Planning for TemporalDomains. [S.l.], 2002. p. 91–97.
CETIN, N. Open-pit truck/shovel haulage system simulation. Tese (Doutorado)— The Graduate School of Natural and Applied Sciences, Middle East TechnicalUniversity, 2004.
CO, C.; TANCHOCO, J. A Review of Research and AGVS VehicleManagement. [S.l.]: School of Industrial Engineering, Purdue University, 1990.
CRAMER, H. Mathematical methods of statistics. [S.l.]: Princeton Univ Pr, 1999.
BIBLIOGRAPHY 131
ELBROND, J.; SOUMIS, F. Towards integrated production planning and truckdispatching in open pit mines. International Journal of Mining, Reclamation andEnvironment, Taylor & Francis, v. 1, n. 1, p. 1–6, 1987.
GENDREAU, M.; POTVIN, J. Dynamic Vehicle Routing and Dispatching. Fleetmanagement and logistics, Kluwer Academic Publishers, p. 115–126, 1998.
GIBSON, M.; BRUCK, J. Efficient exact stochastic simulation of chemical systems withmany species and many channels. J. Phys. Chem. A, ACS Publications, v. 104, n. 9,p. 1876–1889, 2000.
GROSS, D. Fundamentals of queueing theory. [S.l.]: Wiley-India, 2008. ISBN8126517778.
HOEY, J.; ST-AUBIN, R.; HU, A.; BOUTILIER, C. SPUDD: Stochastic planning usingdecision diagrams. In: CITESEER. Proceedings of the Fifteenth Conference onUncertainty in Artificial Intelligence. [S.l.], 1999. p. 279–288.
HORVITZ, E.; RUTLEDGE, G. Time-dependent utility and action under uncertainty.In: CITESEER. Proceedings of Seventh Conference on Uncertainty inArtificial Intelligence, Los Angeles, CA. [S.l.], 1991. p. 151–158.
HOWARD, R. Dynamic programming and Markov process. [S.l.]: MIT press,1960.
HUANG, B.; WEI, J.; HE, M.; LU, X. The Genetic Algorithm for Truck DispatchingProblems in Surface Mine. Information Technology Journal, v. 9, n. 4, p. 710–714,2010.
ICHOUA, S.; GENDREAU, M.; POTVIN, J. Vehicle dispatching with time-dependenttravel times. European journal of operational research, Elsevier, v. 144, n. 2, p.379–396, 2003.
JAOUA, A.; GAMACHE, M.; RIOPEL, D. Specification of an IntelligentSimulation–Based Real Time Control Architecture: Application to Truck ControlSystem. Simulation a Evenements Discrets pour la Commande Temps Reel deSystemes Dynamiques Complexes, p. 24, 2009.
JI, X. Models and algorithm for stochastic shortest path problem. AppliedMathematics and Computation, Elsevier, v. 170, n. 1, p. 503–514, 2005.
KIRKPATRICK, S. Optimization by simulated annealing: Quantitative studies.Journal of Statistical Physics, Springer, v. 34, n. 5, p. 975–986, 1984. ISSN0022-4715.
KOLONJA, B.; KALASKY, D.; MUTMANSKY, J. Optimization of dispatching criteriafor open-pit truck haulage system design using multiple comparisons with the best andcommon random numbers. In: ACM NEW YORK, NY, USA. Proceedings of the25th conference on Winter simulation. [S.l.], 1993. p. 393–401.
KRAUSE, A.; MUSINGWINI, C. Modelling open pit shovel-truck systems using theMachine Repair Model. Journal of the South African Institute of Mining andMetallurgy, Marshalltown, South Africa., v. 107, n. 8, p. 469–476, 2007.
BIBLIOGRAPHY 132
LENGYEL, M.; DAYAN, P. Hippocampal contributions to control: The third way.Adv. Neural Inf. Process. Syst, Citeseer, v. 20, p. 889–896, 2007.
LI, L.; LITTMAN, M. Lazy approximation for solving continuous finite-horizon MDPs.In: MENLO PARK, CA; CAMBRIDGE, MA; LONDON; AAAI PRESS; MIT PRESS;1999. Proceedings of the National Conference on Artificial Intelligence. [S.l.],2005. v. 20, n. 3, p. 1175.
LI, X.; SOH, L. Applications of Decision and Utility Theory in Multi-Agent Systems.CSE Technical reports, p. 56, 2004.
LITTMAN, M.; DEAN, T.; KAELBLING, L. On the complexity of solving Markovdecision problems. In: CITESEER. Proceedings of the Eleventh Conference onUncertainty in Artificial Intelligence. [S.l.], 1995. p. 394–402.
LIZOTTE, Y.; BONATES, E. Truck and shovel dispatching rules assessment usingsimulation. Mining Science and Technology, v. 5, p. 45–58, 1987.
LUDWIG, D. The distribution of population survival times. American Naturalist,JSTOR, v. 147, n. 4, p. 506–526, 1996.
MARECKI, J.; TOPOL, Z.; TAMBE, M. A fast analytical algorithm for MDPs withcontinuous state spaces. In: AAMAS-06 Proceedings of 8th Workshop on GameTheoretic and Decision Theoretic Agents. [S.l.: s.n.], 2006.
MAUSAM, M.; WELD, D. Solving concurrent Markov decision processes. In: AAAIPRESS. Proceedings of the 19th national conference on Artifical intelligence.[S.l.], 2004. p. 716–722.
MITCHELL, M. An introduction to genetic algorithms. [S.l.]: The MIT press,1998.
MOONEY, C. Monte Carlo Simulation. [S.l.]: Sage Publications, Inc, 1997.
MURATA, T. Petri nets: Properties, analysis and applications. Proceedings of theIEEE, IEEE, v. 77, n. 4, p. 541–580, 2002. ISSN 0018-9219.
PARSONS, S.; WOOLDRIDGE, M. An introduction to game theory and decisiontheory. Game theory and decision theory in agent-based systems, KluwerAcademic Publishers, p. 1–28, 2002.
PELLEGRINI, J.; WAINER, J. Processos de Decisao de Markov: um tutorial. Revistade Informatica Teorica e Aplicada, v. 14, n. 2, p. 133–179, 2008.
PINTO, E. B. Despacho de caminhoes em mineracao usando logica nebulosa,visando ao atendimento simultaneo de polıticas excludentes. 2007. 120 p.Dissertacao (Masters in Production Engineering) — Engineering School, FederalUniversity of Minas Gerais, 2007.
POWELL, W. A comparative review of alternative algorithms for the dynamic vehicleallocation problem. Vehicle Routing: Methods and Studies, p. 249–291, 1988.
BIBLIOGRAPHY 133
PUTERMAN, M. Markov decision processes: discrete stochastic dynamicprogramming. [S.l.]: John Wiley & Sons, Inc. New York, NY, USA, 1994.
RACHELSON, E.; FABIANI, P.; GARCIA, F. TiMDPpoly: An improved method forsolving time-dependent MDPs. In: Proceedings of the 21st IEEE InternationalConference on Tools with Artificial Intelligence (ICTAI). [S.l.: s.n.], 2009. p.796–799.
RACHELSON, E.; FABIANI, P.; GARCIA, F. TiMDPpoly: An Improved Method forSolving Time-Dependent MDPs. In: IEEE. 2009 21st IEEE InternationalConference on Tools with Artificial Intelligence. [S.l.], 2009. p. 796–799.
RUSSELL, S.; NORVIG, P. Artificial intelligence: a modern approach. [S.l.]:Prentice hall, 2009.
SHI, Y.; EBERHART, R. Empirical study of particle swarm optimization. In: IEEE.Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congresson. [S.l.], 2002. v. 3. ISBN 0780355369.
SOLOMON, M. Algorithms for the vehicle routing and scheduling problems with timewindow constraints. Operations research, JSTOR, p. 254–265, 1987.
SOUZA, M.; COELHO, I.; RIBAS, S.; SANTOS, H.; MERSCHMANN, L. A hybridheuristic algorithm for the open-pit-mining operational planning problem. EuropeanJournal of Operational Research, Elsevier, v. 207, p. 1041–1051, 2010.
ST-AUBIN, R.; HOEY, J.; BOUTILIER, C. APRICODD: Approximate policyconstruction using decision diagrams. Advances in Neural Information ProcessingSystems, Citeseer, p. 1089–1096, 2001.
SUTTON, R.; BARTO, A. Reinforcement learning: An introduction. [S.l.]: TheMIT press, 1998.
SUTTON, R.; PRECUP, D.; SINGH, S. Between MDPs and Semi-MDPs: Learning,planning, and representing knowledge at multiple temporal scales. ArtificialIntelligence, Citeseer, v. 112, p. 181–211, 2000.
TA, C.; KRESTA, J.; FORBES, J.; MARQUEZ, H. A stochastic optimization approachto mine truck allocation. International Journal of Mining, Reclamation andEnvironment, Taylor & Francis, v. 19, n. 3, p. 162–175, 2005.
TEMENG, V.; OTUONYE, F.; FRENDEWEY, J. Real-time truck dispatching using atransportation algorithm. International Journal of Mining, Reclamation andEnvironment, Taylor & Francis, v. 11, n. 4, p. 203–207, 1997.
THISTED, R. Elements of statistical computing: numerical computation. [S.l.]:Chapman & Hall/CRC, 1988.
TU, J.; HUCKA, V. Analysis of open-pit truck haulage system by use of a computermodel. CIM Bulletin, v. 78, n. 879, p. 53–59, 1985.
Appendix A - Genetic Algorithm
Genetic Algorithm (GA) (MITCHELL, 1998) is a search procedure (or heuristic) that
is based on the process of natural evolution. GAs originated in 1975 from studies of cellular
automata, conducted by John Holland and his students at the University of Michigan.
Their applications include different areas such as scheduling and dispatching problems,
neural nets training, image feature extraction and recognition, and other optimization
and search problems.
GA is part of a larger class, called evolutionary algorithms (EA), in which is also
encountered the such algorithms: Ant Colony Optimization (ACO), Cultural Algorithm
nealing (SA), and Tabu Search (TS). These algorithms generate solutions to optimization
and search problems using techniques inspired by natural evolution, such as selection,
crossover and mutation.
A.1 Methodology
In order to find the solution of a search or optimization problem, GA simulates the
process of natural evolution (Alg. 4). First, the algorithm generates randomly the initial
population with its total size (size pop) composed by candidate solutions (individuals).
APPENDIX A. GENETIC ALGORITHM 135
Each individual is encoded by an array (chromosome), in which each value can be rep-
resented by a binary value (gene). After the generation, the population is reduced to
its better individuals in selection phase evaluated by the fitness function (fitness func).
The next generations are encountered following reproduction and selection phases until
the end condition (end condition) is attained.
Algorithm 4: Genetic Algorithm
Input: GA(size pop, fitness func, end condition)Output: solutiont← 0;Generate initial population, G(0), based on size pop;Select G(0) using fitness func;repeat
t← t+ 1;Generate G(t) by reproduction using G(t− 1);Select G(t− 1) using fitness func;
until end condition;return best individual(G(t))
A.1.1 Population generation
The population is generated randomly in order to cover the entire range of solutions
(search space). Its size depends on the nature of the problem, being a percentage of the
total of possible solutions; it contains typically hundreds or thousands of individuals. The
size of the population is direct related to the quality of the solution; small populations can
lead to local optimal solutions, and very large populations turns the convergence slow.
A.1.2 Selection
The selection phase is responsible to select the best individuals of a generation ac-
cording to a fitness function. The selected individuals are those in which its solutions
APPENDIX A. GENETIC ALGORITHM 136
fits better to the fitness function. Popular selection methods include roulette wheel se-
lection and tournament selection. In the roulette wheel the individuals has its selection
probability based on its fitness values, that is, individuals with larger fitness values (for a
maximization fitness function) have higher selection probability. In the tournament the
selection is made considering the most fittest individual of a pair.
A.1.3 Reproduction
After the best individuals selection, the size population is reduced to a percentage of
its original size. In order, to restore the original size and, mainly, to improve the quality
of individuals, the reproduction phase is executed. This phase is divided in two phases:
crossover, and mutation.
A.1.3.1 Crossover
The crossover (or recombination) phase generates new individuals (child or son) from
the random combination of genes of individuals of the previous generation (parents). This
phase occurs until an appropriate population size is generated.
In order to maintain the best individuals of the previous generation (that is, the best
solutions), the crossover phase can be elitist. In this case, the parents can proceed in
the next generation if its fitness function is better than of their sons. Therefore, the new
generated individuals that are worse than their parents are not considered in the current
generation.
APPENDIX A. GENETIC ALGORITHM 137
A.1.3.2 Mutation
After the crossover phase, the generated individuals can have some of their genes
randomly swapped or value changed, based on a mutation ratio (generally less than 1%).
The mutation phase is used to avoid the convergence to local optimal solutions; some
mutated individual can lead the next generations to better solution spaces, increasing the
chance to find the optimal solution.
A.1.4 Termination
The GA ends when a termination condition is reached. Common termination condi-
tions are:
• A fixed number of generations is reached;
• A maximum time of computation is reached;
• Solution convergence given an allowed maximum error;
• A solution is found that satisfies minimum criteria; or
• Combination of the above.
Appendix B - Statistical
Distributions
B.1 Gamma Distribution
Parameters: Shape parameter (α) and scale parameter (β) specified as positive real
values.
Range: [0,+∞)
Mean: αβ
Variance: αβ2
Applications: The gamma distribution is often used to represent the time required
to complete some task (e.g., a machining time or machine repair time).
The Gamma pdf is
U(µ, t) =
β−αxα−1e−x/β
Γ(α)for x > 0
0 otherwise
, (B.1)
where Γ is the complete gamma function given by:
APPENDIX B. STATISTICAL DISTRIBUTIONS 139
Γ(α) =
∫ ∞0
tα−1e−1dt . (B.2)
The gamma distribution is represented by the following graphic.
FIGURE B.1 – Gamma distribution.
B.2 Gaussian Distribution
Parameters: The mean (µ) is specified as a real number and standard deviation (σ)
is specified as a positive real number.
Range: (−∞,+∞)
Mean: µ
Variance: σ2
Applications: The Gaussian (or Normal) used empirically for many processes that
appear to have a symmetric distribution. Because the theoretical range is from −∞ to
APPENDIX B. STATISTICAL DISTRIBUTIONS 140
+∞, the distribution should only be used for positive quantities like processing times.
The normal pdf is
f(x) =1
σ√
2πe−(x−µ)2/(2σ2) , (B.3)
for all real x.
The normal distribution is represented by the following graphic.
FIGURE B.2 – Gaussian distribution.
FOLHA DE REGISTRO DO DOCUMENTO
1. CLASSIFICACAO/TIPO 2. DATA 3. DOCUMENTO No 4. No DE PAGINAS
TD 20 de dezembro de 2010 DCTA/ITA/TD - 018/2010 140
Programacao matematica; Distribuicao de mercadorias; Algoritmos geneticos; Matematica aplicada; Rotas;Caminhoes; Mineracao; Matematica10. APRESENTACAO: (X) Nacional ( ) InternacionalITA, Sao Jose dos Campos. Curso de Doutorado. Programa de Pos-Graduacao em Engenharia Eletronica eComputacao. Area de Informatica. Orientador: Carlos Henrique Costa Ribeiro; co-orientador: Luiz Edival deSouza . Defesa em 09/12/2010. Publicada em 2010.
11. RESUMO:
Material transportation is one of the most important aspects of open-pit mine operations. The problem usuallyinvolves a truck dispatching system in which decisions on truck assignments and destinations are taken in real-time. Due to its significance, several decision systems for this problem have been developed in the last few years,improving productivity and reducing operating costs. As in many other real-world applications, the assessmentand correct modeling of uncertainty is a crucial requirement as the unpredictability originated from equipmentfaults, weather conditions, and human mistakes, can often result in truck queues or idle shovels. However,uncertainty is not considered in most commercial dispatching systems. In this thesis, we introduce novel truckdispatching systems as a starting point to modify the current practices with a statistically principled decisionmaking methodology. First, we present a stochastic method using Time-Dependent Markov Decision Process(TiMDP) applied to the truck dispatching problem. In the TiMDP model, travel times are represented asprobabilistic density functions (pdfs), time-windows can be inserted for paths availability, and time-dependentutility can be used as a priority parameter. In order to minimize the well-known curse of dimensionality issue, towhich multi-agent problems are subject when considering discrete state modelings, the system is modeled basedon the introduced single-dependent-agents. Based also on the single-dependent-agents concept, we introduce theGenetic TiMDP (G-TiMDP) method applied to the truck dispatching problem. This method is a hybridization ofthe TiMDP model and of a Genetic Algorithm (GA), which is also used to solve the truck dispatching problem.Finally, in order to evaluate and compare the results of the introduced methods, we execute Monte Carlosimulations in a example heterogeneous mine composed by 15 trucks, 3 shovels, and 1 crusher. The uncertainaspect of the problem is represented by the path selection through crusher and shovels, which is executed bythe truck driver, being independent of the dispatching system. The results are compared to classical dispatchingapproaches (Greedy Heuristic and Minimization of Truck Cycle Times – MTCT) using Student’s T-test, provingthe efficiency of the introduced truck dispatching methods.