1 Learning a Model of Ship Movements University of Amsterdam, Faculty of Science Science Park 904 Postbus 94216 1090 GE Amsterdam The Netherlands Thesis for Bachelor of Science - Artificial Intelligence Author: Roderik Lagerweij [email protected]Supervisors: Gerben de Vries [email protected]Maarten van Someren [email protected]December 24, 2009 Keywords Machine Learning, Data Mining, Classification, Time Series Data, Sliding Windows, Discretization, Attribute Selection, C4.5, Sequential Association-Rule Mining, Automatic Identification System Abstract In very large seaports, where many ships are entering and leaving the port, collision avoidance is of utmost importance. A system used to quickly identify ships and to provide additional information is the Automatic Identification System (AIS). Data Mining methods may be employed to mine AIS trajectory data for patterns to create a model capable of predicting future events, which can be used as an extra aid for situational awareness. Two classification methods are proposed and described to create such model. The final port of a ship entering a large sea port is chosen as future event to predict. Both presented classification methods significantly outperform the baseline method.
20
Embed
Learning a Model of Ship Movements - UvALearning a Model of Ship Movements University of Amsterdam, Faculty of Science Science Park 904 Postbus 94216 1090 GE Amsterdam The Netherlands
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Learning a Model of Ship Movements
University of Amsterdam, Faculty of Science
Science Park 904 Postbus 94216
1090 GE Amsterdam The Netherlands
Thesis for Bachelor of Science - Artificial Intelligence
Discretization is a popular method used in machine learning to handle numeric attributes. In the
dataset, dimension and draught values may be assigned to bins to reduce the number of unique
values: Suppose the dataset contains n instances, and the values of a numeric attribute Xi are sorted
in ascending order. Equal Frequency Discretization (EFD, [4]) is then used to divide the sorted values
of Xi in k intervals so that each interval contains approximately an equal number of instances.
Parameter k is a user predefined parameter and is set to three in this project. Three bins should be
enough to distinguish small, medium and large vessels from each other. To prevent specific ships
(based on MMSI) more frequently occurring in the dataset from having a larger effect on the
thresholds, a list of unique ships along with its numeric attributes is created which is then used for
this discretization process.
An alternative to this method, is a binning method were the draught bins are set manually. This
method allows it to incorporate domain knowledge of ships not being allowed to enter certain
regions when their draught exceeds a certain value. This technique is evaluated as well.
6
4.3 Missing Values
A significant portion of the dataset contains incomplete instances, where dimensions and draught
are set to zero, which obviously is impossible. To prevent these values from having any influence on
bin thresholds they are replaced by a specific ´missing attribute´ value and not used in threshold
calculations.
5. Classification Methods
5.1 Baseline method
As baseline method, a dataset based on static attributes of the ships only is used to predict the final
port. To accomplish this, the dataset is first preprocessed using the sliding window approach
described in section 4.1, and then stripped of this time series data. What remains is a dataset were
only on the basis of ships characteristics a prediction can be made where the ship will dock. The
reason windowed instances are created from the original dataset first, and then stripped of this time
series data, is that otherwise a fair comparison between this method and the two proposed
classification methods would not be possible because of differing datasets.
Quinlan's C4.5 algorithm [5], a decision tree method, is then used to create a model to predict the
final port. This algorithm, popular for its execution speed and robustness, uses the information
theoretic measure gain ratio as its guide for the selection of attributes. For the root node of the tree,
C4.5 greedily searches the attribute that maximizes the ratio of its gain divided by its entropy. Then,
the algorithm is applied recursively to form the sub trees. As a final step C4.5 prunes its tree to
reduce over fitting, unlike its predecessor ID3 that skips this final pruning step. The algorithm handles
missing values by using a probabilistic approach, as described in [6].
Optionally, the following alterations can be made to the dataset, resulting in variations of the
baseline method:
Discretization of Numeric Attributes
Even though the C4.5 algorithm can handle numeric values, the earlier described discretization
method can be used to create other bins. C4.5 uses the measure gain ratio, which should
compensate for attributes having a high number of values. However, studies have shown that
discretization may improve classification accuracies anyway (e.g. [7]). One explanation for this, is that
C4.5 uses local discretization of numeric attributes, resulting in different discretizations of the same
attribute in different places. As tree depth increases less context is available, compromising their
reliability. Using a global discretization method like the one described in section 4.2 can stop this
effect from occurring.
Attribute Selection
Attribute selection can be performed to datasets to reduce dimensionality and improve results. A
selection mechanism proposed in [8] is used to select attributes. This method evaluates subsets by
considering their predictive ability in conjunction with the degree of redundancy between them.
Subsets that correlate highly with their class attribute while having low inter correlation with other
7
subsets are selected. Using these criteria, the attribute search space is searched using a best first
method.
In this paper, two additional classification methods are described of which performances are
compared to these baseline methods. For ease, in future sections the dataset used for this baseline
method will be referred to as the baseline dataset, meaning, a dataset consisting only of static
attributes and no time series data.
5.2 Classification Method 1 - Augmenting the Baseline Dataset Using a
Sliding Window Approach
This first proposed classification method applies the sliding window approach described in section
4.1 to the baseline dataset. Again, the C4.5 classifier is used to create a model and classify new
instances. Figure 5.1 summarizes this approach. In this figure, the red line represents the baseline
method described in the previous section.
Augmenting the dataset with a history of passed clusters will enrich the set by a high degree. In
upcoming experiments, different window sizes are chosen to test their effect on performance.
Intuitively, it makes sense that adding a history of passed clusters would improve classification
results.
Besides varying window sizes, it is also evaluated if the inclusion of ∆t values has a significant effect.
These ∆t values are used in its original numeric form, as well as binned by using manually set bin
thresholds.
Finally, using experimentation, the combination of parameters resulting in the highest performance
is searched for and these obtained results are reported.
Figure 5.1: Summary of Classification method 1 and Baseline Method (red)
8
5.3 Classification Method 2 - Building a Classifier from Sequential
Association-Rules
The second approach is based on an associating mining task introduced in [9]. In this case specifically,
sequential association rules are mined from the dataset. Figure 5.2 at the end of this section
summarizes the method in a schematic overview.
Sequence mining is a method used in a variety of domains, such as shopping basket analysis and
protein sequence prediction. The goal here is to mine frequently occurring sequences (ordered
collections of items) in the dataset. An association rule is an implication in the form of X --> Y, where
this implication is satisfied in the dataset with a confidence factor 0 ≤ c ≤ 1 if and only if at least c% of
the instances of the dataset that satisfy X also satisfy Y. A sequential association-rule, is a
combination of a sequence and an association rule. By using a rule ordering system and matching the
antecedent of these rules with new instances, a classifier can be build.
The implementation of this method in this project differs from standard sequential association-rule mining tasks in that not only sequences may appear in rules, but static ship attributes as well. Therefore, resulting rules may take a form that equals a combination of a sequence and a number of ship attribute restrictions, such as: Cargo Ship, No Cargo Specified, 10, 12, 15, seg_21, poi_45 --> http://www.few.vu.nl/~wrvhage/2008/11//16/
Or, for a far less restricted example (where '_' indicates all values may be used):
Method 2 46.92 % Window Size = 2 Min. Support = 0.0001 Min. Confidence = 0. 2
41.63 % Window Size = 2 Min. Support = 0.0001 Min. Confidence = 0.2
41.94 % Window Size = 2 Min. Support = 0.0001 Min. Confidence = 0.2
Significance Level * t = 8.3186 p = 0.0004
* t = 5.9430 p = 0.0019
t = 1.9722 p = 0.1056
Table 6.11: Best Obtained Results Methods 1 & 2 (percentage of correct classifications)
In this second classification task, classification percentages are higher compared to the first
classification task. As the next cluster is a location less distant into the future, this would be
expected. The C4.5 classifier is able to predict next clusters significantly better than the classifier
based on association-rules for the Large Vessels and Special Crafts categories.
Unlike predicting the final port, the prediction of a next-to-enter cluster seems to be easier for Large
Vessels than for Special Crafts. Intuitively, this makes sense as Special Crafts will probably take many
more course correction than large vessels, making it more difficult to predict where they go.
7. Conclusion Both presented classification methods significantly outperform the baseline method, which is based
on static ship attributes only. This can largely be attributed to adding a history of clusters to the
dataset. The C4.5 classifier, however, outperforms the classifier based on sequential association-rules
for the Large Vessels category in the first classification task, and for the Large Vessels and Special
Crafts categories in the second classification task.
Although the C4.5 classifier outperforms the classification method based on association-rules for
some experiments, differences in results are not that big. Apparently, differences in biases of the
classifier are not that big.
As would be expected, locations less far into the future can be predicted more accurately. This can be
seen for the predictions of the next cluster.
When comparing the Large Vessels- and the Special Crafts category, results seem to differ between
classification tasks. For the final port prediction, results seem to favor the Special Crafts category. For
predictions of the next cluster however, accuracy is higher for the Large Vessels category. This is
probably because of the erratic behavior of Specials Crafts compared to tankers and cargo ships.
For both classification methods, finding parameters resulting in good performance generally isn't
hard. Usually, the first step taken was choosing a window size by trial and error. Using a small
window of clusters (such as 1, 2 or 3 clusters) most often gave the best results. Choosing a window
19
size too big often resulted in worse performance. This can probably be attributed to the classifiers
not being able to handle the large number of attributes very well, a problem in classification tasks
often referred to as the 'Curse of Dimensionality'. Besides choosing an appropriate window size, for
the C4.5 classifier the exclusion of ∆t attributes seemed to be the only parameter significantly
improving classification accuracy. Again, this is probably a results of the classifier not being able to
handle the extra attributes very well. Using discretization didn´t make matters any better here.
For both classification tasks, it is hard to say if classification accuracy is high to enough to use such
classification method in practice. One reason for this is that without a more in depth analysis of the
results there is no information for which ship types, clusters, ports, etc. predictions can be
considered good. As presented results are averaged classification percentages, these results may for
example mask specific locations for which accuracy is a lot better. The same could be the case for
specific ship types, ships with a certain cargo, etc.
8. Future Work Most importantly, future research should focus on the analysis of the obtained results in this paper
(e.g. the results of different cluster plotted on a geographic map). Without knowing in which
situations and for what reasons these methods fail to predict future locations, it becomes hard to
efficiently implement any improvements.
However, some straightforward improvements include the use of different learning algorithms, as
well as different discretization- and attribute selection methods. Also, the effect of augmenting the
dataset with other attributes could be measured. These could be attributes contained in the AIS data
(such as navigational status, rate of turn, destination, etc.), or these could be external attributes like
weather conditions, day of the week, etc. Furthermore, it would be interesting to see if a larger
dataset yields different results.
To reduce the currently vast number of association rules mined from the dataset, pruning methods
could be employed.
As association rules allow multiple items in the consequent, another possibility is that a model could
be build where a sequence of future events is predicted by one rule, instead of just one future event
(which is the case for final port or next cluster). Also, instead of predicting future events, based on
the history of passed clusters static ship attributes could be predicted in case this information is
missing.
Finally, attention could be paid to creating models that detect anomalies. This way coast guard
operators can be alarmed in time if something unexpected happens. For example, this could be a
ship on a collision course, or an unauthorized entrance of a region by a ship. This would probably
improve situational awareness.
20
9. References [1] G. de Vries and M. van Someren, "Unsupervised Ship Trajectory Modeling and Prediction Using
Compression and Clustering", In BNAIC 2008, Proceedings 20th Belgian-Netherlands Conference
on Artificial Intelligence
[2] D. Douglas and T. Peucker, "Algorithms for the reduction of the number of points required to
represent a digitized line or its caricature.", The Canadian Cartographer, 10(2):112–122, 1973
[3] D. Lindsay and S. Cox, "Effective Probability Forecasting for Time Series Data Using Standard Machine Learning Techniques", in Lecture Notes in Computer Science, Volume 3686 (2005) [4] Y. Yang and G.I. Webb, "A comparative study of discretization methods for naïve-bayes classifiers", Proceeding of the Pacific Rim Knowledge Acquisition Workshop, 2002. [5] J. R. Quinlan. "C4.5: Programs for Machine Learning", Morgan Kaufmann, 1993.
[6] G. Batista and M.C. Monard, "An Analysis of Four Missing Data Treatment Methods for Supervised Learning", Applied Artificial Intelligence, 17: 519–533, 2003 [7] J. Dougherty, R. Kohavi and M. Sahami, "Supervised and Unsupervised Discretization of Continuous Features", In Prieditis, A., & Russell, S. (Eds.), Proceedings of the Twelfth International Conference on Machine Learning, pp. 194–202, (1995) San Francisco. Morgan Kaufmann [8] M. Hall, "Feature Subset Selection: A correlation Based Filter Approach", in Proc. Fourth International Conference on Neural Information Processing and Intelligent Information Systems, pp 855-858, 1997 [9] R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules Between Sets of Items in Large Databases", ACM SIGMOD Conf. Management of Data, May 1993
[10] R. Agrawal, R. Srikant, “Fast algorithm for mining association rules”, Proc. of 20th VLDB, pp.487- 499, 1994 [11] A.A. Freitas, "Understanding the crucial differences between classification and discovery of association rules - a position paper", To appear in ACM SIGKDD Explorations, 2(1), 2000 [12] B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining.” KDD-98, New York, 1998 [13] T Mitchell, "Machine Learning", McGraw Hill, pp 63-72, 1997 [14] T.G. Dietterich, "Approximate statistical tests for comparing supervised classification learning algorithms', Neural Computation, 1998, 10:1885-1924 [15] http://www.rulequest.com/see5-comparison.html