Top Banner
A Location-Item-Time sequential pattern mining algorithm for route recommendation Chieh-Yuan Tsai a,b,, Bo-Han Lai a a Department of Industrial Engineering and Management, Yuan-Ze University, Taiwan b Innovation Center for Big Data and Digital Convergence, Yuan-Ze University, Taiwan article info Article history: Received 14 January 2014 Received in revised form 5 September 2014 Accepted 26 September 2014 Available online 5 October 2014 Keywords: Recommendation systems Sequential pattern Sequence mining Behavior computing Theme park abstract To survive in a rapidly changing environment, theme parks need to provide high quality services in terms of visitor tastes and preferences. Understanding the spatial and temporal behavior of visitors could enhance the attraction management and geographical distribution for visitors. To fulfill the need, this research defined a Location-Item-Time (LIT) sequence to describe visitor’s spatial and temporal behavior. Then, the Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining algorithm is developed to discover fre- quent LIT sequential patterns. Next, the route suggestion procedure is proposed to retrieve suitable LIT sequential patterns for visitors under the constraints of their intended-visiting time, favorite regions, and favorite recreation facilities. A simplified theme park is used as an example to show the feasibility of the proposed system. The experimental results show that the system can help managers understand visitors’ behavior and provide appropriate visiting experiences for visitors. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction A theme park is an aggregation of attractions including archi- tecture, landscape, rides, shows, food services, costumed personnel and retail shops. Well-known examples include Disney World, Dis- neyland, Universal Studios and Six Flags. Although the theme park industry has enjoyed steady attendance growth in the past several decades, the theme park market has entered a mature stage and is no longer experiencing high growth [5,6]. To survive in a rapidly changing environment, theme parks need to provide high quality services in terms of visitor tastes and preferences. Understanding the spatial and temporal behavior of visitors could enhance the management of attractions and contribute to extending the geo- graphical distribution of visitors within regions. In the past decade, the recommendation technique has been regarded as a popular technique for providing a variety of products, services and items to customers in the tourism industry [4,7,13]. Personalized tourism services aim at helping users to find what they are looking for by comparing the user profile to reference character- istics. Wang et al. [19] presented semantic web technologies for pro- viding personalized access to digital museum collections. Niaraki and Kim [12] proposed a generic ontology-based architecture using a multi-criteria decision making technique to design a personalized route planning system. Schiaffino and Amandi [14] developed an expert software agent in the tourism and travel domain, named Traveler. This agent combines collaborative filtering with content- based recommendations and demographic information about customers to make recommendations. García-Crespo et al. [3] presented the SPETA system, which uses knowledge of user’s current location, preferences, as well as a history of past locations to provide the type of recommendation services that tourists expect from a real tour guide. Tsai and Lo [17] took previous popular visiting behaviors as the foundation and developed a sequential pattern based route suggestion system to generate personalized tours. Tsai and Chung [16] developed a route recommendation system that provides per- sonalized visiting routes for tourist in theme parks that consider a set of visiting sequences. Based on the retrieved visiting behavior data and facility queuing situation, their system can generate a proper route suggestion for visitors. The above recommendation systems have demonstrated them- selves efficient tools by designing user interfaces that can smoothly interact with the environment, providing convenient information query tools, or suggesting a set of associated products (or services). However, three major problems are revealed. First, these systems simply return a set of suggested facilities (items) in a sequential order, but fail to illustrate the complete visiting path for visitors. For example, their systems might suggest a visitor visit items k 1 , k 4 , and k 8 in order (i.e., k 1 ? k 4 ? k 8 ). However, the actual path http://dx.doi.org/10.1016/j.knosys.2014.09.012 0950-7051/Ó 2014 Elsevier B.V. All rights reserved. Corresponding author at: Department of Industrial Engineering and Management, Yuan-Ze University, Taiwan. E-mail address: [email protected] (C.-Y. Tsai). Knowledge-Based Systems 73 (2015) 97–110 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys
15

A Location-Item-Time sequential pattern mining algorithm ...

Feb 09, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Location-Item-Time sequential pattern mining algorithm ...

Knowledge-Based Systems 73 (2015) 97–110

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier .com/ locate /knosys

A Location-Item-Time sequential pattern mining algorithm for routerecommendation

http://dx.doi.org/10.1016/j.knosys.2014.09.0120950-7051/� 2014 Elsevier B.V. All rights reserved.

⇑ Corresponding author at: Department of Industrial Engineering andManagement, Yuan-Ze University, Taiwan.

E-mail address: [email protected] (C.-Y. Tsai).

Chieh-Yuan Tsai a,b,⇑, Bo-Han Lai a

a Department of Industrial Engineering and Management, Yuan-Ze University, Taiwanb Innovation Center for Big Data and Digital Convergence, Yuan-Ze University, Taiwan

a r t i c l e i n f o

Article history:Received 14 January 2014Received in revised form 5 September 2014Accepted 26 September 2014Available online 5 October 2014

Keywords:Recommendation systemsSequential patternSequence miningBehavior computingTheme park

a b s t r a c t

To survive in a rapidly changing environment, theme parks need to provide high quality services in termsof visitor tastes and preferences. Understanding the spatial and temporal behavior of visitors couldenhance the attraction management and geographical distribution for visitors. To fulfill the need, thisresearch defined a Location-Item-Time (LIT) sequence to describe visitor’s spatial and temporal behavior.Then, the Location-Item-Time PrefixSpan (LIT-PrefixSpan) mining algorithm is developed to discover fre-quent LIT sequential patterns. Next, the route suggestion procedure is proposed to retrieve suitable LITsequential patterns for visitors under the constraints of their intended-visiting time, favorite regions,and favorite recreation facilities. A simplified theme park is used as an example to show the feasibilityof the proposed system. The experimental results show that the system can help managers understandvisitors’ behavior and provide appropriate visiting experiences for visitors.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction

A theme park is an aggregation of attractions including archi-tecture, landscape, rides, shows, food services, costumed personneland retail shops. Well-known examples include Disney World, Dis-neyland, Universal Studios and Six Flags. Although the theme parkindustry has enjoyed steady attendance growth in the past severaldecades, the theme park market has entered a mature stage and isno longer experiencing high growth [5,6]. To survive in a rapidlychanging environment, theme parks need to provide high qualityservices in terms of visitor tastes and preferences. Understandingthe spatial and temporal behavior of visitors could enhance themanagement of attractions and contribute to extending the geo-graphical distribution of visitors within regions.

In the past decade, the recommendation technique has beenregarded as a popular technique for providing a variety of products,services and items to customers in the tourism industry [4,7,13].Personalized tourism services aim at helping users to find what theyare looking for by comparing the user profile to reference character-istics. Wang et al. [19] presented semantic web technologies for pro-viding personalized access to digital museum collections. Niarakiand Kim [12] proposed a generic ontology-based architecture using

a multi-criteria decision making technique to design a personalizedroute planning system. Schiaffino and Amandi [14] developed anexpert software agent in the tourism and travel domain, namedTraveler. This agent combines collaborative filtering with content-based recommendations and demographic information aboutcustomers to make recommendations. García-Crespo et al. [3]presented the SPETA system, which uses knowledge of user’s currentlocation, preferences, as well as a history of past locations to providethe type of recommendation services that tourists expect from a realtour guide. Tsai and Lo [17] took previous popular visiting behaviorsas the foundation and developed a sequential pattern based routesuggestion system to generate personalized tours. Tsai and Chung[16] developed a route recommendation system that provides per-sonalized visiting routes for tourist in theme parks that consider aset of visiting sequences. Based on the retrieved visiting behaviordata and facility queuing situation, their system can generate aproper route suggestion for visitors.

The above recommendation systems have demonstrated them-selves efficient tools by designing user interfaces that can smoothlyinteract with the environment, providing convenient informationquery tools, or suggesting a set of associated products (or services).However, three major problems are revealed. First, these systemssimply return a set of suggested facilities (items) in a sequentialorder, but fail to illustrate the complete visiting path for visitors.For example, their systems might suggest a visitor visit items k1,k4, and k8 in order (i.e., k1 ? k4 ? k8). However, the actual path

Page 2: A Location-Item-Time sequential pattern mining algorithm ...

98 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

to complete the route should contain ‘‘by-pass items’’ such ask1 ? k4 ? k7 ? k8, k1 ? k4 ? k6 ? k8, or even k1 ? k4 ? k7 ?k6 ? k8. Without providing complete path information, a visitormight get confused and spend much more time to finish the route.Second, previous systems seldom take the geographic constraintsinto consideration so that their suggested routes are often trivialand impractical. For example, previous studies might suggest visi-tor a long route k1 ? k2 ? k6 ? k4 ? k7 ? k10 ? k8 ? k12. How-ever, the route is trivial and hard-to-follow since k1, k2, and k4

are in region A, k6, k7 and k8 are in region B, and k10 and k12 inregion C. In fact, a theme park consists of several regions whereeach region contains dozens of facilities and shops. It will beworthwhile to suggest a no-trivial suggestion such as A(k1, k4,k2) ? B(k8, k6) ? C(k10, k12) for visitors. Third, previous studiesseldom took the time constraints into consideration when theyprovided route suggestion for visitors. For example, previoussystems simply suggest a route format such as k1 ? k4 ? k8 for vis-itors. However, when time interval information between items arerevealed, this route will be k1 ? (1 h) ? k4 ? (1 h) ? k8. If theintended-visiting time for a visitor is 90 min, this suggestion isunacceptable since the visitor cannot finish the route on time. Onthe other hand, if intended-visiting time is 300 min, this suggestionis not suitable also. Without providing time interval between itemsin the suggestion, tourists are unsure whether she/he can completethe suggested route on time or not.

To solve the above problems, this research defines a Location-Item-Time (LIT) sequence to describe visitor’s spatial and temporalbehavior. To the best of our knowledge, this study is the first workto include location (region), item, and time-interval informationsimultaneously into a sequence. Then, the Location-Item-Time Pre-fixSpan (LIT-PrefixSpan) mining procedure is developed to discoverfrequent LIT sequential patterns. Finally, the route suggestionprocedure is proposed to retrieve suitable LIT sequential patternsunder the constraints of visitor’s intended-visiting time, favoriteregions and its related visiting time, favorite recreation facilities.This paper is organized as follows. Section 2 reviews previousworks related to sequential pattern mining and suggestion. Section3 introduces the framework of the proposed route recommenda-tion system. Section 4 demonstrates a case to show the feasibilityof the proposed system. Finally, Section 5 summarizes the conclu-sions and points out possible future directions.

2. Literature review

Yavas et al. [20] proposed a three-phase mobility predictionalgorithm for the prediction of user movement in a mobile comput-ing system. Their algorithm enables the system to allocate resourcefor users in an efficient manner, and to produce more accurateanswers to location-dependent queries that refer to future positionsof mobile users. Cho et al. [2] proposed a sequential rule-based rec-ommendation method that considers the evolution of customers’purchase sequences. The purchase transaction records of a cus-tomer for a certain period are used to build a customer profile. Then,a collaboration-based system is in charge to find a set of customers,through calculating the correlations among customers profile. Tanet al. [15] proposed a new approach to build personalization recom-mendation system based on access sequential patterns, namedFrequent Accessed Sequence Tree (FAS-Tree). All frequent accesssequential patterns are compressed into FAS-Tree to save storagegreatly. During personalization recommendation stage, it is onlynecessary to traverse sub paths of FAS-tree referring to page viewsin active window to find match patterns, without the need to gener-ate association rules. Yun and Chen [21] developed a mining mobilesequential patterns algorithm to better reflect the customer usagepatterns in the mobile commerce environment, which takes boththe moving patterns (location) and purchase patterns (items) of

customers into consideration. Tseng and Lin [18] proposed a noveldata mining method, namely SMAP-Mine that can efficiently dis-cover mobile users’ sequential movement patterns associated withrequested services. Through empirical evaluation under varioussimulation conditions, SMAP-Mine is shown to deliver excellentperformance in terms of accuracy, execution efficiency and scalabil-ity. Meanwhile, the proposed prediction strategies are also verifiedto be effective in measurements of precision, hit ratio andapplicability.

Li et al. [8] proposed a Multi-Stage Collaborative Filtering(MSCF) process to provide the location-aware event recommenda-tion service in mobile environment. The first stage in MSCF per-forms the People-to-People Collaborative Filtering (P2P-CF),while the Event-to-Event Collaborative Filtering (E2E-CF) discoversthe sequential rules of event-participation in the second stage. Liuand Chang [9] proposed a route recommendation system whichguides the user through a series of locations. Their system usedthe methods of sequential pattern mining to extract popular routepatterns from a large set of historical user’s route records. Then, thesystem recommends routes by matching the user’s current routewith the set of popular route patterns. Liu et al. [10] proposed anovel hybrid recommendation approach that combines thesegmentation-based sequential rule (SSR) method with the seg-mentation-based KNN-CF (SKCF) method. In order to enhance thequality of product recommendations, their method considerscustomers’ purchase sequences over time and their purchase datafor the current period. Hung and Peng [6] proposed a Regression-based approach for mining User Movement Patterns (RUMP). LargeSequence (LS) algorithm extracts the call detail records and TimeClustering (TC) algorithm determines the number of regressionfunctions. Then, Movement Function (MF) algorithm generatesthe movement function representing user movement patterns ofmobile users. Lu et al. [11] proposed a hybrid semantic recommen-dation approach which integrates item-based CF similarity withitem-based semantic similarity techniques. The hybrid semanticrecommendation approach has been implemented in an IntelligentBusiness Partner Locator recommendation system prototypenamed BizSeeker. Similarly, Zhang et al. [22] developed a hybridrecommendation approach which combines user-based and item-based collaborative filtering techniques with fuzzy set techniquesand knowledge base for mobile product and service recommenda-tion. It particularly implements the approach in a personalized rec-ommender system for telecom products/services called FTCP-RS.Although the above sequential pattern algorithms are efficient indifferent environment, however, they did not take location, item,and time-interval information into consideration at the same time.

3. Research method

3.1. Environment assumption and system overview

Typically, a theme park is divided into several regions and eachregion contains a set of recreation facilities. It is assumed that eachregion is fully covered by RFID readers. In addition, RFID readersare installed in the entrance of each recreation facility, andentrance and exit of the park. When a visitor with a RFID taggedwristband enters a region or entrance of a facility, RFID readersrecord the RFID tag code, region id, facility id, and the time into aroute database. The recording process continues until the visitorleaves the park. Let’s take the layout in Fig. 1 as an example. Attimestamp t1, a visitor passes the entrance k11 of the park in regionB. Then, she moves to region A at timestamp t2, region F attimestamp t3, and region G at timestamp t4. In region G, she takesfacility k1. After that, she moves to region K at timestamp t5, regionO at timestamp t6. In region O, she takes facilities k2 and k3. Therecording process continues until she leaves the park from the exit

Page 3: A Location-Item-Time sequential pattern mining algorithm ...

Fig. 1. An illustrative example for route sequence generation.

C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 99

k12 in region B. Finally, the route sequence <(B, t1, {k11}), (A, t2,/),(F, t3,/), (G, t4, {k1}), (K, t5,/), (O, t6, {k2,k3}), (K, t7,/), (G, t8,/),(B, t9, {k12})> is collected and stored in the route database.

Whenever a visitor wants to request a route suggestion, he/shecan reach the kiosk machine in the park and input his/herpreference information to the route recommendation system. Thepreference includes intended total visiting time, favorite regions,intended visiting time in the favorite regions, and favorite recrea-tion facilities. The route recommendation system consists of twomajor modules. The first module is to generate a set of frequentLocation-Item-Time (LIT) sequential patterns from the routedatabase using the proposed Location-Item-Time PrefixSpan(LIT-PrefixSpan) mining procedure. The second module evaluatesthe similarity between the visitor’s preference and candidate LITroutes, retrieves top ranking routes for the visitor. The frameworkof the proposed system is shown in Fig. 2.

3.2. Location-Item-Time (LIT) sequential patterns

Let N = {n1,n2, . . . ,ng} be the set of cells (regions) in the themepark and K = {k1,k2, . . . ,kh} be the set of items (facilities, entrance,and exit). In the route database RD, a record is represented by

Route databaseLocation-Item-Time

(LIT) mining procedure

Return suitable routesuggestion

Visitorsinput

preferences

FirstModule

SecondModule

The set of LITsequential patterns

Route recommendationprocedure

Fig. 2. Two modules in the proposed route recommendation system.

<sid,rs> where sid is the identifier of the record and rs is a routesequence. Formally, rs is represented as <(B1, t1, itemset1), (B2, t2,itemset2), . . . , (Bn, tn, itemsetn)> where (Bi, ti, itemseti) is an event;Bi is the visited region and Bi 2 N; ti stands for the timestamp thatregion Bi is first entered and ti�1 6 ti for 2 6 i 6 n; itemseti is the setof items visited in region Bi and itemseti # K Without timestampinformation, <Bi, itemseti> is called a transaction if itemseti is anon-empty set.

Definition 1. A transaction pattern is defined as <Bi;z> where z isthe non-empty subset of itemseti. A transaction pattern <Bi;z> iscalled a k-transaction pattern if the length of z is k.

Example 1. There are two route sequences sid 300 and sid 600 inthe route database RD shown in Table 1. 6 itemsets {k11}, {k1},{k3}, {k4}, {k5}, and {k12} can be found in sid 300, while 4 itemsetsof {k11}, {k1}, {k2,k3}, and {k12} can be found in sid 600. Therefore,transaction patterns <B;{k11}>, <G;{k1}>, <O;{k3}>, <L;{k4}>,<Q,{k5}>, <B, {k12}> can be extracted from sid 300. Similarly, trans-action patterns <B;{k11}>, <G;{k1}>, <O;{k2,k3}>, <O,{k2}>,<O,{k3}>, <B,{k12}> can be extracted from sid 600. Finally, seven1-transaction patterns of <B;{k11}>, <G;{k1}>, <O;{k3}>, <L;{k4}>,<Q,{k5}>, <B;{k12}> and <O,{k2}> and one 2-transaction pattern of<O;{k2,k3}> can be obtained.

Let Dt = ti+1 � ti be the time interval between two successiveevents where 1 6 i 6 n� 1 and Tc be a set of given constants for1 6 c 6 r. Then, the time interval Dt can be mapped as one of theelements in the set of discrete time intervals TI = {I1, I2, . . . , Ir} by

DiscTIðDtÞ ¼I1 if 0 < Dt 6 T1

Ij if Tj�1 < Dt 6 Tj for 1 < j 6 r

�ð1Þ

For example, assume T1 = 10, T2 = 20, T3 = 30, T4 = 40, T5 = 50, andT6 = 60. Therefore, the set of discrete time intervals is TI = {I1, I2, I3,I4, I5, I6}, where I1: 0 < Dt 6 10, I2: 10 < Dt 6 20, I3: 20 < Dt 6 30,I4: 30 < Dt 6 40, I5: 40 < Dt 6 50, I6: 50 < Dt 6 60.

Definition 2. Let C = {c1,c2, . . . ,cn} be the set of transactionpatterns and TI = {I1, I2, . . . , Ir} be the set of discrete time intervals.A sequence b = (D1,e1,D2,e2, . . . ,Dq�1,eq�1,Dq) is a Location-Item-Time (LIT) sequence if Ds e C for 1 6 s 6 q and es e TI for1 6 s 6 q� 1.

3.3. Location-Item-Time mining procedure

Similar to the work of Yun and Chen [21], the proposed LITsequential pattern mining method consists of three phases: thelarge-transaction generation phase, large-transaction transforma-tion phase, and LIT sequential pattern generation phase.

3.3.1. Large-transaction generation phaseThe large-transaction generation phase generates the large

transactions from the route database RD. Fig. 3 shows the pseudo-code of the large-transaction generation algorithm. This algorithmconsists of two main steps. As shown from line 1 to 11, the first stepderives all k-transaction patterns from the RD according to Defini-tion 1. In addition, the support count of each k-transaction patternis calculated. The second step, as shown from line 12 to 16, findsthe set of large k-transaction patterns. If the support count of a k-transaction pattern is greater than or equal to the user-specifiedminimum support count (called min_sup_count), the k-transactionpattern is called a large k-transaction pattern. Next, the itemsetsin all large k-transaction patterns is replaced by unique symbols.The set of all large k-transaction patterns after symbol replacementare called large 1-sequential patterns.

Page 4: A Location-Item-Time sequential pattern mining algorithm ...

Table 1A simple route database, RD.

Sid Route sequence

300 <(B,8, {k11}), (G,9, {k1}), (F,11,/), (K,24,/), (O,25, {k3}), (P,35,/), (L,37,{k4}), (Q,39,{k5}), (M,40,/), (H,45,/), (D,46,/), (C,51,/), (B,54, {k12})>600 <(B,7, {k11}), (A,8,/), (F,21,/), (G,30, {k1}), (K,41,/), (O,44, {k2,k3}), (K,51,/), (G,54,/), (B,58,{k12})>

Fig. 3. Pseudo-code of large-transaction generation algorithm.

Fig. 4. Six route sequences in the RD.

100 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

Example 2. Let’s take six route sequences in Fig. 4 as exampleto explain the large-transaction generation phase. If themin_sup_count is set as 2, 12 candidate 1-transaction patternsand 2 candidate large 2-transaction patterns are found andshown in Fig. 5(a). If the sup_count of a transaction pattern isless than the min_sup_count, the transaction pattern should be

deleted. Therefore, transaction patterns <P;{k9}>, <Q;{k6}>,<Q;{k5,k6}> are deleted. Next, the itemsets in all largek-transaction patterns is replaced by unique symbols as shownin Fig. 5(b). The set of all large k-transaction patterns aftersymbol replacement, called large 1-sequential patterns, issummarized in Fig. 5(c).

Page 5: A Location-Item-Time sequential pattern mining algorithm ...

C a n d id a te 2 -tra n sa c tio n p a tte rn s

C e ll I te m se t S u p _ c o u n t

* Q { k 5 ,k 6 } 1

O { k 2,k 3 } 2

C a n d id a te 1 - tra n sa c tio n p a tte rn s

C e ll I te m se t S u p _ c o u n t

G { k 1 } 6

O { k 2 } 3

O { k 3 } 6

* P { k 9 } 1

L { k 4 } 5

Q { k 5 } 3

* Q { k 6 } 1

M { k 7 } 2

U { k 8 } 2

R { k 1 0 } 2

B { k 1 1 } 6

B { k 1 2 } 6

C e ll I te m se tL a rg e

tra n sa c tio n

G { k 1 } { G ;g 1 }

O { k 2 } { O ;g 2 }

O { k 3 } { O ;g 3 }

L { k 4 } { L ;g 4 }

Q { k 5 } { Q ;g 5 }

M { k 7 } { M ;g 6 }

U { k 8 } { U ;g 7 }

R { k 1 0 } { R ;g 8 }

B { k 1 1 } { B ;g 9 }

B { k 1 2 } { B ;g 1 0 }

O { k 2 ,k 3 } { O ;g 1 1 }

L a rg e 1 -se q u e n tia l p a tte r n s

L a rg e tra n sa c tio n S u p _ c o u n t

< G ;g 1 > 6

< O ;g 2 > 3

< O ;g 3 > 6

< L ;g 4 > 5

< Q ;g 5 > 3

< M ;g 6 > 2

< U ;g 7 > 2

< R ;g 8 > 2

< B ;g 9 > 6

< B ;g 1 0 > 6

< O ;g 1 1 > 2

(a) (b) (c)

Fig. 5. Large 1-sequential patterns.

C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 101

3.3.2. Large-transaction transformation phaseThe large-transaction transformation phase transforms route

sequences into the maximal large-transaction sequences. Fig. 6shows the pseudo-code of large-transaction transformation algo-rithm. Line 2–4 initializes variables String, ML, and Path as emptyvalues. String is a temporary variable storing the on-going stringin a buffer; ML represents the on-going maximal large-transactionsequence; Path represents the on-going path of the maximal large-transaction sequence. For each event (Bi, ti, itemseti) in routesequence rs, itemseti might be non-empty or empty. If itemseti isnon-empty (line 7–14), the algorithm checks whether <Bi,z> existsin the set of large 1-sequential patterns C0 or not where z is non-empty subset of itemseti. If z does exist in C0, the algorithmappends its unique symbol g to Gi. After all z are checked,<(Bi,Gi), ti> will be appended to ML and String will be appended toPath. Finally, the algorithm sets String to <Bi>. If itemseti is empty(line 16–20), the algorithm checks whether <Bi> has been visitedor not. If <Bi> has not been visited, the algorithm will append<Bi> to String. Otherwise, anything after the first <Bi> in String willbe deleted. Through the phase, the record with the form of <sid,rs>in the RD will be transferred to the form of <sid, maximal large-transaction sequence, path> which is stored in the transformedroute database TRD.

Example 3. According to the large 1-sequential patterns shown inFig. 5, Table 2 illustrates the operations in the large-transactiontransformation phase for route sequence sid 600. The first columnis the sequence of movements, the second column is the visitedregions, the third column is the visited time, and the fourth columnis the recreation facilities played by the visitor. The fifth columngives the on-going large-transaction in the buffer and the sixthcolumn gives on-going string in the buffer. The seventh columnshows the maximal large-transaction sequence and the eighthcolumn shows the path of the maximal large-transaction sequence.After a series of transformation, the maximal large-transactionsequence for sid 600 becomes <<(B;g9),7>, <(G;g1),30>,<(O;g2,g3,g11),44>, <(B;g10),58>> and its path is BAFGKOKGB.Through the same process, all route sequences in the RD of Fig. 4are transformed to maximal large-transaction sequences in theTRD as shown in Table 3.

3.3.3. Location-Item-Time sequential pattern generation phaseNext, a LIT sequential pattern algorithm is developed to gener-

ate all large LIT sequential patterns from the TRD. Similar to Chenet al. [1], the proposed LIT sequential pattern algorithm, calledLIT-PrefixSpan algorithm, is based on PrefixSpan mining concept.Before introducing the LIT-PrefixSpan algorithm, the following def-initions are given.

Definition 3. For a maximal large-transaction sequencea = (<(B1;z1), t1>, <(B2;z2), t2>, . . . , <(Bn;zn), tn>) and a Location-Item-Time (LIT) sequence b = (D1,e1,D2,e2, . . . ,Dq�1,eq�1,Dq), b issaid to be contained in a or b is a LIT subsequence of a if theintegers 1 6 j1 < j2 < � � � < jq 6 n exist such that,

1. D1 ¼ ðBj1 ; zj1 Þ;D2 ¼ ðBj2 ; zj2 Þ; . . . ;Dq ¼ ðBjq ; zjq Þ.2. tji � tji�1

satisfies the condition of time-interval ei�1 for 2 6 i 6 q.

Definition 4. support_countTRD(a) = |{(sid, maximal large-transactionsequence, path)| (sid, maximal large-transaction sequence,path) e TRD ^ a is contained in TRD}|. A LIT sequence a is called a LITsequential pattern if the percentage of records in TRD consisting ofa is greater than or equal to the pre-defined minimum support, calledmin_sup. That is, a is named a LIT sequential pattern in TRD if sup-port_countTRD(a) P |TRD| �min_sup or support_countTR(a) P min_sup_count. A LIT sequence whose length is l is denoted as a l-LITsequence.Definition 5. Given a maximal large-transaction sequencea = (<(B1;z1), t1>, <(B2;z2), t2>, . . . , <(Bn;zn), tn>) and a LIT sequenceb = (D1,e1,D2,e2, . . . ,Dq�1,eq�1,Dq) (q 6 n), b is a LIT prefix of a ifand only if (1) Di = (Bi;zi) for 1 6 i 6 m; (2) ti � ti�1 satisfies thecondition of ei�1 for 1 < i 6 m� 1.

Definition 6. Given a maximal large-transaction sequencea = (<(B1;z1), t1>, <(B2;z2), t2>, . . . , <(Bn;zn), tn>) and a LIT sequenceb = (D1,e1,D2,e2, . . . ,Dq�1,eq�1,Dq) (q 6 n) such that b is a subse-quence of a. Let i1 < i2 < � � � <iq be the indexes of the large-transactionpatterns in a that match the large-transaction patterns of b. A subse-quence a0 = (< ðB01; z01Þ; t01 >;< ðB

02; z02Þ; t02 >; . . . ; < ðB0p; z0pÞ; t0p >) of

sequence a, where p = q + n � iq is called a projection of a withrespect to b if and only if (1) b is a LIT prefix of a0 and (2) the last n � iq

Page 6: A Location-Item-Time sequential pattern mining algorithm ...

Fig. 6. Pseudo-code of large-transaction transformation algorithm.

Table 2Process of producing the maximal large-transaction sequence for sid 600.

Move Cell Time Items Large-transaction String Maximal large-transaction sequence Path

1 B 7 k11 <B;g9> B <(B;g9),7> –2 A 8 – – BA – –3 F 21 – – BAF – –4 G 30 k1 <G;g1> G <(B;g9),7>, <(G;g1),30> BAF5 K 41 – – GK – –6 O 44 k2, k3 <O;g2,g3,g11> O <(B;g9),7>, <(G;g1),30>, <(O;g2,g3,g11),44> BAFGK7 K 51 – – OK – –8 G 54 – – OKG – –9 B 58 k12 <B;g10> B <(B;g9),7>, <(G;g1),30>, <(O;g2,g3,g11),44>, <(B;g10),58> BAFGKOKGB

Table 3Transformed route database, TRD.

Sid Maximal large-transaction sequence Path

100 <(B;g9),4>, <(G;g1),5>, <(O;g2,g3,g11),14>, <(L;g4),20>, <(Q;g5),22>, <(M;g6),38>, <(U;g7),52>, <(B;g10),60> BGKOTPLQRMQVUPKGB200 <(B;g9),1>, <(G;g1),20>, <(O;g3),39>, <(L;g4),46>, <(Q;g5),47>, <(M;g6),50>, <(B;g10),60> BAFGKOTPLQRMIHCB300 <(B;g9),8>, <(G;g1),9>, <(O;g3),25>, <(L;g4),37>, <(Q;g5),39>, <(B;g10),54> BGFKOPLQMHDCB400 <(B;g9),2>, <(G;g1),7>, <(O;g3),17>, <(L;g4),27>, <(R;g8),46>, <(U;g7),53>, <(O;g2),56>, <(B;g10),60> BFGKOTPLQRQVUTOKGB500 <(B;g9),1>, <(G;g1),2>, <(O;g3),14>, <(L;g4),19>, <(R;g8),40>, <(B;g10),60> BGKOTPLQRNIDCB600 <(B;g9),7>, <(G;g1),30>, <(O;g2,g3,g11),44>, <(B;g10),58> BAFGKOKGB

102 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

large-transaction patterns of a0 are the same as the last n � iq large-transaction patterns of a.

Definition 7. Let a0 = (<ðB01; z01Þ; t01 >;< ðB02; z02Þ; t02 >; . . . ; < ðB0p; z0pÞ;

t0p>) be the projection of a with respect to a LIT prefix b = (D1,e1,D2,e2, . . . ,Dq�1,eq�1,Dq). Then h = (<ðB0qþ1; z0qþ1Þ; t0qþ1 >;< ðB

0qþ2; z0qþ2Þ;

t0qþ2 >; . . . ; < ðB0p; z0pÞ; t0p>) is the postfix of a with respect to prefix b.

The pseudo-code of the proposed LIT-PrefixSpan algorithm isillustrated in Fig. 7. The a-projected database defined by the collec-tion of postfixes of maximal large-transaction sequences in TRD

with respect to a is denoted as TRD|a. The major differencebetween LIT-PrefixSpan and I-PrefixSpan is that the LIT-PrefixSpanincludes both cells and items in transaction pattern. Therefore, atable LIT_Table is used to store this type of relation, where acolumn corresponds to a large-transaction pattern and a row cor-responds to a time-interval in TI = {I1, I2, . . . , Ir}. Each cellLIT_Table(Ii,c0i) in the table records the number of transactions inTRD|a which contains transaction pattern and the time differencebetween this transaction pattern and the last transaction patternof a lies within Ii. Processing every transaction in TRD|a sequen-tially enables LIT_Table to be formed and the frequent cells to be

Page 7: A Location-Item-Time sequential pattern mining algorithm ...

C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 103

identified. If the cell LIT_Table(Ii,c0i) is a frequent cell, (Ii,c0i) can beappended to a to yield a LIT sequential pattern a0, and to constructthe a0-projected database TRDja0 Recursively discovering the LITsequential patterns in TRDja0 finally yields all LIT sequential pat-terns in the TRD.

Example 4. Suppose TI = {I1, I2, I3, I4, I5, I6}, where I1: 0 < Dt 6 10, I2:10 < Dt 6 20, I3: 20 < t 6 30, I4: 30 < Dt 6 40, I5: 40 < Dt 6 50, I6:50 < Dt 6 60. Consider the TRD shown in Table 3 and themin_sup_count is set as 2. At the beginning, a is empty and thefrequent transaction patterns <B;g9>, <G;g1>, <O;g2>, <O;g3><O;g11>, <L;g4>, <Q;g5>, <M;g6>, <U;g7>, <R;g8> and <B;g10> arediscovered. Appending these frequent transaction patterns to a isempty and yields 9 different a0. Table 4 summarizes the LITsequential pattern mining result. The total number of LIT sequen-tial patterns is 68 (=11 + 25 + 23 + 8 + 1) since there are 11 1-LITsequential patterns, 25 2-LIT sequential patterns, and so on.

3.4. Route recommendation procedure

When a visitor requires a route suggestion, he/she is requestedto enter personal preference to the route recommendation systemin the kiosk. The visitor’s preference can be represented as a VPvector:

VP ¼< ITVT; < FR1; FItems1; IRVT1 >;< FR2; FItems2; IRVT2 >; . . . >

ð2Þ

where ITVT is the intended total visiting time. FRi is the favoriteregion i, FItemsi is the set of favorite facilities in FRi, and IRVTi is

Fig. 7. Pseudo-code of the LI

the intended visiting time in FRi. For example,VP = <420,<G,{k1},90>, <O,{k2,k3},120>> indicates that a visitorintends to spend 420 min in the theme park. In addition, he/shewould like to spend 90 min in region G and take recreation facilityk1 in region G, and 120 min in region O and take recreation facilityk1 and k3 in region O. Note that the more information a visitor cen-ters, the more satisfied suggestion the visitor can obtain.

3.4.1. Time constraintThe number of LIT sequential patterns generated from LIT-Pre-

fixSpan algorithm might be large. However, a LIT sequential pat-tern is a candidate LIT route if the pattern satisfies the followingrules. First, a LIT sequential pattern should include entrance andexit. Second, a LIT sequential pattern should satisfy the time con-straint provided by the visitor. As mentioned in Section 3.2, thetime interval Dt can be transferred as one of elements in the setof discrete time intervals TI = {I1, I2, . . . , Ir} according to Eq. (1).Therefore, the lower bound and upper bound of a time intervalsIj are derived using Eqs. (3) and (4) respectively.

f LBðIjÞ ¼0; if j ¼ 1

Tj�1; if 1 < j 6 r

(ð3Þ

f UBðIjÞ ¼T1; if j ¼ 1

Tj; if 1 < j 6 r

(ð4Þ

Let a LIT sequential pattern b be represented as (D1,e1,D2,e2, . . . ,Dq�1,eq�1,Dq). The total visiting time of b can be represented asVTb = (VTb

LB;VTbUB] where the lower bound of VTb is derived as:

T-PrefixSpan algorithm.

Page 8: A Location-Item-Time sequential pattern mining algorithm ...

Table 4LIT sequential pattern mining result.

k Number ofpatterns

k-LIT sequential patterns Sup_Count

1 11 <B;g9> 6<G;g1> 6� � � � � �<B;g10> 6

2 25 <B;g9>, I6, <B;g10> 5<B;g9>, I2, <O;g3> 3� � � � � �<R;g8>, I2, <B;g10> 2

3 23 <B;g9>, I1, <G;g1>, I6, <B;g10> 3<B;g9>, I1, <G;g1>, I2, <O;g3> 2� � � � � �<L;g4>, I1, <O;g5>, I2, <B;g10> 2

4 8 <B;g9>, I1, <G;g1>, I4, <R;g8>, I2, <B;g10> 2<B;g9>, I1, <G;g1>, I1, <O;g3>, I4, <U;g7> 2� � � � � �<G;g1>, I3, <L;g4>, I1, <Q;g5>, I2, <B;g10> 2

5 1 <B;g9>, I1, <G;g1>, I1, <O;g3>, I4, <U;g7>,I1, <B;g10>

2

104 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

VTbLB ¼

Xq�1

s¼1

f LBðesÞ ð5Þ

and the upper bound of VTb is defined as:

VTbUB ¼

Xq�1

s¼1

f UBðesÞ ð6Þ

If VTbLB 6 ITVT and VTb

UB P ITVT, we say that LIT sequential pattern bsatisfies the a visitor’s time constraint where ITVT is the visitor’s theintended total visiting time in Eq. (2).

Example 5. Suppose TI = {I1, I2, I3, I4} where I1: 0 < Dt 6 30, I2:30 < Dt 6 60, I3: 60 < Dt 6 90, I4: 90 < Dt 6 120, and five LIT sequen-tial patterns are shown in the first three columns of Table 5.According to Eqs. (3)–(6), the total visiting time of each pattern canbe derived in the last column of Table 5. If a visitor’s intend totalvisiting time ITVT is 320 min, LIT sequential patterns 1, 2, and 3 areconsidered as candidate routes since LIT sequential patterns 4 and 5do not satisfy the visitor’s time constraint.

3.4.2. Similarity measurementThe similarity between VP = <ITVT, <FR1,FItems1, IRVT1>,<FR2,

FItems2, IRVT2>, . . .> and candidate LIT route b = ((B1;z1), e1, (B2;z2),e2, . . . , (Bq�1;zq�1), eq�1, (Bq;zq)) is evaluated based on the followingconcepts. First, the intended visiting time for region i, IRVTi, in VP willbe mapped as one of the elements in TI = {I1, I2, . . . , Ir} according to Eq.(1) for all i. Second, when conducting the similarity evaluation, <FRi,FItemsi, IRVTi> in VP and <(Bj;zj),ej> in b are considered as comparisonunits. Third, if FRi and Bj are the same region, similarity evaluationbetween <FItemsi, IRVTi> and <zj,ej> will be initialized. Base on aboveconcepts, the similarity between ith unit in VP and the jth unit in b isdefined as:

Simi;j ¼w1 � 1þw2 � ISimði; jÞ þw3 � TSimði; jÞ if FRi ¼ Bj

0 if FRi–Bj

�ð7Þ

where w1, w2, and w3 are the important degrees for region, facility,time-interval considerations respectively, and w1 + w2 + w3 = 1.ISim(i, j) is the itemset similarity between FItemsi and zj which isdefined as:

ISimði; jÞ ¼ jFItemsi \ zjj=jFItemsij ð8Þ

where \ is the set union operator and | | is the cardinality of the set.In addition, TimeIntervalSim(i, j) is the time interval similaritybetween IRVTi and ej which is defined as:

TSimði; jÞ ¼ 1� jf ðIRVTiÞ � f ðejÞj=f ðIrÞ ð9Þ

where |�| is the absolute value operator and f(Ib) is the rank of thetime-interval Ib in TI and is defined as f(Ib) = b where b = 1, . . . ,r.With Eq. (7), the similarity between VP and b is defined as:

SimðVP;bÞ ¼XjVPj

i¼1

Xjbjj¼1

simði; jÞ,jVPj ð10Þ

where |�| is the length of the sequence. After the similaritiesbetween VP and all candidate routes are derived, they are sortedin decreasing order and returned back to the kiosk machine as sug-gested routes. If more than one candidate routes have the samesimilarity value, the route having larger number of total facilitieshas higher ranking order.

Example 6. Assume LIT sequential patterns 1, 2, and 3 in Example5 are candidate LIT routes and visitor preference of a visitor isVP = <300,<O,{k3},70>,<Q,{k5,k6},100>>. According to discretetime-interval definition in Example 5, VP will be transferred as<300,<O,{k3}, I3>,<Q,{k5,k6}, I4>>. For candidate LIT route #1, wehave Sim1,1 = 1/3 + 1/3 * (1/1) + 1/3 * (1 � |f(I3) � f(I2)|/f(I4)) = 11/12; Sim1,2 = 0; Sim1,3 = 0; Sim1,4 = 0. Sim2,1 = 0; Sim2,2 = 0; Sim2,3 =1/3 + 1/3 * (1/2) + 1/3 * (1 � |f(I4) � f(I1)|/f(I4)) = 7/12; Sim2,4 = 0.Hence, the total similarity between VP and candidate LIT route#1 is ((11/12 + 0 + 0 + 0) + (0 + 0 + 7/12 + 0))/2 = 0.75. With thesimilar process, we have Sim(VP, #1) = 0.75, Sim(VP, #2) = 0.458,and Sim(VP, #3) = 0.75. It is found that the candidate LIT route #1and #3 have the same total similarity score. When the totalsimilarity score are the same, their total number of facilities will becompared. The total number of facilities for candidate LIT route #1and #3 are 4 and 5 respectively. Therefore, candidate LIT route #3is ranked as 1. Table 6 shows the final ranking result for the threecandidate LIT routes. Based on candidate LIT route #3 and its path,the route recommendation system will suggest a visitor to passentrance k11 in region B. After time-interval I1, the visitor issuggested to move to region G and take k1. Then, aftertime-interval I3, he/she is suggested to take k4 in region L, andso on.

4. Implementation and experiment results

The proposed route recommendation system is implementedusing C# and tested on a PC with Core i5 2.80 GHz CPU and 4 GBmemory.

4.1. Case description and route generator

In this study, a simplified theme park is used as an example toillustrate the feasibility of the proposed system. As shown in Fig. 8,there are seven thematic regions and thirty-four recreation facili-ties (k1 to k34). For example, thematic region B contains facilitiesk1, k2, and k3, while thematic region H contains facilities k31, k32,k33, and k34. To simulate visiting behavior, a route generator isdeveloped. In the generator, visitors start their visiting from theentrance (k35) and finish at the exit (k36). The regions that visitorspass through must be adjacent. The total visiting time of a routesequence is randomly determined by a uniform distribution within780 (min) since the operation time of the park is from 9:00 a.m. to10:00 p.m. The time in which a visitor moves to the next region israndomly generated from a uniform distribution between 15 (min)and 30 (min). In addition, the time in which a visitor spends for

Page 9: A Location-Item-Time sequential pattern mining algorithm ...

Table 5Five LIT sequential patterns.

No LIT sequential pattern Path Total visiting time

1 <B;k11>, I1, <O;k3>, I2, <L;k4>, I4, <Q;k6>, I1, <M;k7>, I4, <B;k12> BAKOKLQMHCB (210,360]2 <B;k11>, I1, <G;k1>, I3, <O;k2,k3>, I4, <L;k4>, I4, <B;k12> BGKOTPLHCB (240,360]3 <B;k11>, I1, <G;k1>, I3, <L;k4>, I4, <Q;k6>, I1, <O;k2,k3>, I2, <B;k12> BGLQPOKFB (180,330]4 <B;k11>, I1, <G;k1>, I1, <O;k3>, I4, <U;k8>, I4, <B;k12> BGKOTUQLHCB (180,300]5 <B;k11>, I1, <G;k1>, I3, <L;k4>, I1, <Q;k5>, I2, <B;k12> BGLQMHCB (90,210]

Table 6Three LIT candidate routes and their rankings.

No. Candidate LIT route Path Total visitingtime

Total similarityscore

Total facilitynumber

Finalrank

3 <B;k11>, I1, <G;k1>, I3, <L;k4>, I4, <Q;k6>, I1, <O;k2,k3>, I2,<B;k12>

BGLQPOKFB (180,330] 0.75 5 1

1 <B;k11>, I1, <O;k3>, I2, <L;k4>, I4, <Q;k6>, I1, <M;k7>, I4, <B;k12> BAKOKLQMHCB (210,360] 0.75 4 22 <B;k11>, I1, <G;k1>, I3, <O;k2,k3>, I4, <L;k4>, I4, <B;k12> BGKOTPLHCB (240,360] 0.458 4 3

C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 105

taking a recreation facility is randomly generated from a uniformdistribution between 30 (min) and 90 (min).

According to the tourism reports, five must-visited recreationfacilities are k4, k12, k13, k25, k26 and seven popular facilities arek2, k6, k17, k22, k23, k24, k32. Therefore, if a generated route sequencedoes not contain one of the five must-visited recreation facilities,the route will be discarded. Likewise, if a generated route sequencedoes not contain one of the seven popular recreation facilities, thisroute sequence will have 80% of probability to be discarded. Inaddition, the average number of visitors in the theme park is26,000 per day. Therefore, 26,000 route sequences are generatedto simulate the visiting behaviors of visitors in one day.

4.2. Route recommendation

Before executing the proposed LIT-PrefixSpan mining procedure,the minimum support and the set of discrete time intervals shouldbe determined. For simplicity, the time intervals in this study areset as equal length of 30 min and the minimum support is set as0.02%. That is, the set of discrete time intervals are TI = {I1, I2, I3, . . . ,I20}, where I1: 0 < t 6 30, I2: 30 < t 6 60; I3: 60 < t 6 90, . . . , I20:760 < t 6 800. Based on the settings, 380,735 LIT sequential pat-terns are discovered from LIT-PrefixSpan mining procedure.

Assume a new visitor intends to spend 420 min (7 h) andwishes to play recreation facility {k12} of region D, and recreationfacility {k22} of region F. In addition, he/she wishes to spend150 min in region D and 120 min in region F, respectively. Thus,the visitor preference, VP, is <420,<D,{k12},150>,<F, {k22},120>>.The important degrees for region w1, facility w2, time-interval w3

in Eq. (7) are set equally as 1/3. Based on the set of discretetime-intervals I, the total visiting time (VTu) of each LIT sequentialpattern can be calculated. After deleting the sequential patternsthat do not contains entrance and exit as well as the patterns thatdo not satisfy the time constraint, 5471 candidate LIT routes can befound. Table 7 shows the ranking information of candidate LITroutes derived by the route recommendation generation module.

Fig. 9 shows top one ranking visiting route. The recommenda-tion system suggests the visitor starts the trip from the entrancein region A. Within 30 min (time-interval I1), the visitor is sug-gested to take k2 recreation facility in region B. After 120–160 min (time-interval I4), the system suggests the visitor takesk12 and k13 in region D. Again, after 120–160 min (time-intervalI4), the visitor is suggested to take k22 in region F. Finally, after120–160 min (time-interval I4), the visitor is suggested to leavethe theme park from the exit in region A by passing throughregions C, B, and A sequentially.

To validate the proposed route recommendation module, differ-ent visitor’s preferences shown in Table 8 are experimented. Case Iis the case previously introduced and used as the benchmark case.For Case II, a shorter intended-leaving time (300 min) is inputted.Therefore, it is straightforward that less recreation facilities willbe suggested. Fig. 10(b) shows the suggested rout <A;k35>, I1,<D;k12,k13>, I4, <F;k22>, I4, <A;k36> with the path ABDFCBA. ForCase III, the visitor simply inputs the constraints of takings k12 inregion D and spending 150 min in region D. Since less constrainsare provided in Case III, the similarity between the visitor’s prefer-ence and many candidate routes are 1. Fig. 10(c) shows one of can-didate routes, <A;k35>, I1, <B;k1,k2,k3>, I6, <D;k12,k13>, I4, <A;k36>,suggested by the system. For Case IV, the intended-leaving time isthe same as the one in Case I, but other preferences are different.Fig. 10(d) shows the route recommendation system suggests 3 rec-reation facilities (k3, k12, k32) among 3 regions (B, D, H) for Case IV.

4.3. Experimental designs

In the proposed route recommendation system, differentparameter settings might affect the final suggestion results. There-fore, a set of experiments are conducted to observe the affectioncaused by these parameters. Without other notice, the setting ofparameters and visitor preference is the same as Case I in Section4.2.

4.3.1. Discussion of data sizeAs discussed in Section 3.3, the LIT-PrefixSpan mining proce-

dure module consists of three major phases: the large-transactiongeneration phase (Phase I), the large-transaction transformationphase (Phase II), and the Location-Item-Time sequential patterngeneration phase (Phase III). To observe how the number of routesequences (data size) affects the LIT-PrefixSpan mining proceduremodule, data size is changed from 10,000 to 26,000. Table 9 sum-marizes the execution time of each phase in the LIT-PrefixSpanmining procedure module. It is clear that, when the number ofroute sequences increases, the execution time for the LIT-Prefix-Span mining procedure module increases linearly. In addition,the execution time of Phase III is significantly longer than the timeof other two phases. Although the LIT-PrefixSpan mining proce-dure module takes much time to execute, this module is typicallydaily or weekly instead of every request.

4.3.2. Discussion of minimum supportTo know how the minimum support in LIT-PrefixSpan mining

procedure module affect the result, the minimum support ranging

Page 10: A Location-Item-Time sequential pattern mining algorithm ...

33

30

14

13

1

22

4

35

32

31

34

29

2728

25

26

15

1716

1819

21

20

12

2

3

36

2324

5

6

7

8

9

10

11

G

HF

C

B

A

E

D

Fig. 8. Layout of the implementation example.

Table 7Ranking information of each candidate LIT routes.

Ranking Candidate LIT route Total visiting time (min) Total similarity score Total facility number Sup. Path

1 <A;k35>, I1, <B;k2>, I4, <D;k12,k13>, I4, <F;k22>, I4, <A;k36> (360,520] 0.991667 4 5 ABDFCBAABDFCBAABDFCBAABDFCBAABDFCBA

2 <A;k35>, I1, <B;k2>, I4, <D;k12>, I4, <F;k22>, I4, <A;k36> (360,520] 0.991667 3 5 ABDFCBAABDFCBAABDFCBAABDFCBAABDFCBA

3 <A;k35>, I1, <B;k2>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> (360,520] 0.983333 4 5 ABDFCBAABDFCBAABDFGDBAABDFCBAABDFCBA

4 <A;k35>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> (360,480] 0.983333 3 8 ABDFCBAABDFGDBA� � �ABDFCBA

� � � � � � � � � � � � � � � � � � � � �

5471 <A;k35>, I1, <B;k1>, I11, <A;k36> (400,480] 0 1 522 ABDEBAABDCBA

..

.

ABDEHEBA

106 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

from 0.02% to 0.2% is experimented. Fig. 11(a) represents the num-ber of LIT sequential patterns generated from the LIT-PrefixSpanmining procedure module, and Fig. 11(b) represents the number

of candidate LIT routes in route recommendation generationmodule under different minimum supports. When the minimumsupport increases, both the number of LIT sequential patterns

Page 11: A Location-Item-Time sequential pattern mining algorithm ...

13

22

35

12

2

36

G

HF

C

B

A

E

D

Fig. 9. Visiting sequence recommendation based on visitor’s preference.

C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 107

and the number of candidate LIT routes decrease. If the minimumsupport value is set as 0.02%, the first module generates 380,735LIT sequential patterns and the second module generates 5471candidate LIT routes. However, if the minimum support is 0.2%,there are only 14,831 LIT sequential patterns generated from thefirst module, and 108 candidate LIT routes in the second module.Therefore, based on the observation from Fig. 11, the minimumsupport value is suggested as 0.02% in this case.

Fig. 12 shows the execution time of LIT-PrefixSpan mining pro-cedure module and route recommendation generation moduleunder different minimum supports. It is clear that, when the min-imum supports increases, the execution time of the two modulesdecreases. It is notes that the execution time for route recommen-dation generation module is 1.27 s if minimum support is 0.2%. Theexecution time should be acceptable for visitors to conduct the on-line recommendation request.

4.3.3. Discussion of time-interval rangeTo observe how the time-interval range affects the proposed

route recommendation system, a set of time-interval ranges from10 min to 120 min are experimented. Fig. 13(a) summarizes thenumber of LIT sequential patterns generated from the LIT-Prefix-Span mining procedure module and Fig. 13(b) summarizes the

Table 8Different visitors’ preference settings.

Case ITVT (min) <FRi, {Fav-itemseti},VTi>

I 420 <D,{k12},150>, <F,{k22},120>II 300 <D,{k12},150>, <F,{k22},120>III 420 <D,{k12},150>IV 420 <H,{k32},120>, <B, {k1},90>

number of candidate LIT routes generated from route recommenda-tion generation module. As shown in Fig. 13, when the range of timeinterval increases, both LIT sequential patterns and the number ofcandidate LIT routes increase. That is, if the time interval range islarge, the time between two events will fall into the same timeinterval range. Thus, it is easier to satisfy the minimum supportthreshold and generate many same LIT sequential patterns. Forexample, assume that there are ten route sequences of<(D,28,{k12}), (A,45,{k35})> and ten route sequences of<(D,40,{k12}), (A,85,{k35})>. If time-interval range is set as 30 min,two different LIT sequential patterns <(D;k12), I1, (A;k35)> and<(D;k12), I2, (A;k35)> are found where I1: 0–30, I2: 30–60. However,if the time-interval range is set as 60 min, two the same LIT sequen-tial pattern <(D;k12), I1, (A;k35)> is generated since I1: 0–60. Basedon the observation from Fig. 13, time-interval range is suggestedas the value between 40 min and 60 min to ensure the quality ofthe suggested routes.

4.3.4. Discussion of w1, w2, and w3

In Eq. (7), w1, w2 and w3 are the important degree for region,facility and time-interval consideration respectively. To observehow important degree values affect the route ranking, three moreexperiments are conducted. As shown in Table 10, no matter howthe important degree value is changed, the top-four ranking candi-date LIT routes are the same. The reason is that the region compar-ison is conducted first according to the third rule of similaritymeasurement design in Section 3.4.2. That is, if the region in theVP vector is not the same with the region in the candidate LITroute, the similarity between the facilities and time-interval ofthe two regions will not be counted. This design makes the impor-tant degree values have less affection for the proposed system.Based on the observation from Table 10, the important degree for

Page 12: A Location-Item-Time sequential pattern mining algorithm ...

13

22

35

12

2

36

G

HF

C

B

A

E

D13

22

35

12

36

G

HF

C

B

A

E

D

(a) Case I (b) Case II

13

1

35

12

2

3

36

G

HF

C

B

A

E

D

35

32

12

3

36

G

H

F

C

B

A

E

D

(c) Case III (d) Case IV

Fig. 10. Route recommendation results.

Table 9Execution time (in second) of each phase in LIT-PrefixSpan mining procedure module.

The number of route sequences

10,000 13,000 16,000 19,000 22,000 26,000

Phase I 0.58 0.74 0.89 1.16 1.22 1.44Phase II 0.96 1.09 1.30 1.66 1.82 2.08Phase III 8240.01 7241.19 10563.63 11088.18 15220.72 17285.32

Total 8241.55 7243.01 10565.82 11090.99 15223.77 17288.83

108 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

region, facility and time-interval is suggested as 1/3, 1/3, and 1/3respectively in this study.

5. Conclusions and further study

In the past decade, the recommendation technique has beenregarded as a popular technique for providing a variety of products,services and items to potential visitors in the tourism industry.Many recommendation systems have demonstrated themselvesefficient tools by designing user interfaces that can smoothly inter-act with the environment, providing convenient information query

tools, or suggesting a set of associated products (or services). How-ever, three major problems are revealed. First, these systems sim-ply return a set of suggested facilities (items) in sequential order,but fail to illustrate the complete visiting path for visitors. Second,previous systems seldom take the geographic constraints into con-sideration so that their suggested routes might be trivial and hardto follow. Third, previous studies seldom take the time intervalbetween items into consideration. To solve the above problems,this research defines a Location-Item-Time (LIT) sequence todescribe visitor’s spatial and temporal behavior. To the best ofour knowledge, this study is the first work to include location(region), item, and time-interval information at the same time intoa sequence. Then, the Location-Item-Time PrefixSpan (LIT-Prefix-Span) mining procedure is developed to discover frequent LITsequential patterns. Next, the route suggestion procedure is devel-oped to retrieve suitable LIT sequential patterns. The experimentalresults show that the managers can understand their visitorsclearly in terms of proposed Location-Item-Time sequentialpatterns.

Although the case of a theme park is illustrated in this paper,the proposed three-phase methodology can be applied to any fieldif records of location, item, and time are available. For example, in

Page 13: A Location-Item-Time sequential pattern mining algorithm ...

(a) (b)

Fig. 11. (a) Number of LIT sequential patterns and (b) number of candidate LIT routes under different minimum supports.

Fig. 12. Execution time of the two modules under different minimum supports.

(a) (b)

Fig. 13. (a) Number of LIT sequential patterns and (b) number of candidate LIT routes under different time-interval ranges.

C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110 109

mobile commerce environment, a customer moves among cellulargirds and makes transaction in the corresponding cell through the

mobile device. Through the proposed recommendation system, acustomer can obtain real time store/shopping suggestion by the

Page 14: A Location-Item-Time sequential pattern mining algorithm ...

Table 10Route ranking using different important degrees.

w1 w2 w3 Ranking Candidate LIT route Total similarity score

1/3 1/3 1/3 1 <A;k35>, I1, <B;k2>, I4, <D;k12,k13>, I4, <F;k22>, I4, <A;k36> 0.99172 <A;k35>, I1, <B;k2>, I4, <D;k12>, I4, <F;k22>, I4, <A;k36> 0.99173 <A;k35>, I1, <B;k2>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> 0.98334 <A;k35>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> 0.98335 <A;k35>, I1, <B;k3>, I4, <D;k12>, I3, <F;k22>, I4, <A;k36> 0.98336 <A;k35>, I1, <B;k2>, I3, <D;k12>, I4, <F;k22>, I5, <A;k36> 0.9833

0.8 0.1 0.1 1 <A;k35>, I1, <B;k2>, I4, <D;k12,k13>, I4, <F;k22>, I4, <A;k36> 0.99752 <A;k35>, I1, <B;k2>, I4, <D;k12>, I4, <F;k22>, I4, <A;k36> 0.99753 <A;k35>, I1, <B;k2>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> 0.9954 <A;k35>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> 0.9955 <A;k35>, I1, <B;k3>, I4, <D;k12>, I3, <F;k22>, I4, <A;k36> 0.9956 <A;k35>, I1, <B;k2>, I3, <D;k12>, I4, <F;k22>, I5, <A;k36> 0.995

0.1 0.8 0.1 1 <A;k35>, I1, <B;k2>, I4, <D;k12,k13>, I4, <F;k22>, I4, <A;k36> 0.99752 <A;k35>, I1, <B;k2>, I4, <D;k12>, I4, <F;k22>, I4, <A;k36> 0.99753 <A;k35>, I1, <B;k2>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> 0.9954 <A;k35>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> 0.9955 <A;k35>, I1, <B;k3>, I4, <D;k12>, I3, <F;k22>, I4, <A;k36> 0.9956 <A;k35>, I1, <B;k2>, I3, <D;k12>, I4, <F;k22>, I5, <A;k36> 0.995

0.1 0.1 0.8 1 <A;k35>, I1, <B;k2>, I4, <D;k12,k13>, I4, <F;k22>, I4, <A;k36> 0.982 <A;k35>, I1, <B;k2>, I4, <D;k12>, I4, <F;k22>, I4, <A;k36> 0.983 <A;k35>, I1, <B;k2>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> 0.964 <A;k35>, I3, <D;k12,k13>, I4, <F;k22>, I5, <A;k36> 0.965 <A;k35>, I1, <B;k2>, I3, <D;k12>, I4, <F;k22>, I5, <A;k36> 0.966 <A;k35>, I3, <D;k12>, I4, <F;k22>, I5, <A;k36> 0.96

110 C.-Y. Tsai, B.-H. Lai / Knowledge-Based Systems 73 (2015) 97–110

mobile device before he/she moves to the next cellular grid. Simi-larly, in a grocery store, a customer moves around store aisles andpick up his/her target products. The recommendation system canprovide the customer an efficient moving path and prompt him/her other popular products to increase cross-selling opportunity.

Some potential extensions for this research are as follows. First,in some cases, the entrance and exit of a facility might belong todifferent regions. It would be worthwhile to discuss such irregularlayouts in the future. Second, the minimum support, time intervalrange, and the important degree should be decided by users cur-rently. Further studies can explore how to automate the parametersettings by adopting optimization techniques. Third, when visitorsare visiting a theme park, they might plan to visit some facilitiesthan others. As such, the further study can ask visitors input thefacility priorities and rearrange the route according to the priori-ties. Finally, the proposed system assumes that a visitor makes arecommendation request at the time he/she enters the park. It ispossible, however, that a visitor wants to make a recommendationrequest at anytime and anywhere in the park. The future studymight record visitor’s requested location and time so that thesystem can provide more flexible suggestions.

Acknowledgement

This work was partially supported by the National ScienceCouncil of Taiwan, R.O.C., No. 102-2221-E-155-041-MY3.

References

[1] Y.L. Chen, M.C. Chiang, M.T. Ko, Discovering time-interval sequential patternsin sequence database, Expert Syst. Appl. 25 (3) (2003) 343–354.

[2] Y.B. Cho, Y.-H. Cho, S.H. Kim, Mining changes in customer buying behavior forcollaborative recommendations, Expert Syst. Appl. 28 (2) (2005) 359–369.

[3] A. García-Crespo, J. Chamizo, I. Rivera, M. Mencke, R. Colomo-Palacios, J.M.Gómez-Berbís, SPETA: social pervasive e-tourism advisor, Telematics Inform.26 (3) (2009) 306–315.

[4] A. Guerbas, O. Addam, O. Zaarour, M. Nagi, A. Elhajj, M. Ridley, R. Alhajj,Effective web log mining and online navigational pattern prediction, Knowl.-Based Syst. 49 (2013) 50–62.

[5] C.Y. Heo, S. Lee, Application of revenue management practices to the themepark industry, Int. J. Hospitality Manage. 28 (3) (2009) 446–453.

[6] C.C. Hung, W.C. Peng, A regression-based approach for mining user movementpatterns from random sample data, Data Knowl. Eng. 70 (1) (2011) 1–20.

[7] K. Kabassi, Personalizing recommendation for tourists, Telemetric Inform. 27(1) (2010) 51–66.

[8] L.H. Li, F.M. Lee, Y.C. Chen, C.Y. Cheng, A multi-stage collaborative filteringapproach for mobile recommendation, in: Proceedings of the 3rd InternationalConference on Ubiquitous Information Management and Communication,2009, pp. 88–97.

[9] D. Liu, M. Chang, Recommend touring routes to travelers according to theirsequential wandering behaviors, in: Proceedings of the 10th InternationalSymposium on Pervasive Systems, Algorithms, and Networks, 2009, pp. 350–355.

[10] D.R. Liu, C.H. Lai, W.J. Lee, A hybrid of sequential rules and collaborativefiltering for product recommendation, Inform. Sci. 179 (20) (2009) 3505–3519.

[11] J. Lu, Q. Shambour, Y. Xu, Q. Lin, G. Zhang, BizSeeker: a hybrid semanticrecommendation system for personalized government-to-business e-services,Internet Res. 20 (3) (2010) 342–365.

[12] A.S. Niaraki, K. Kim, Ontology based personalized route planning system usinga multi-criteria decision making approach, Expert Syst. Appl. 36 (2) (2009)2250–2259.

[13] M. Salehi, I.N. Kamalabadi, Hybrid recommendation approach for learningmaterial based on sequential pattern of the accessed material and the learner’spreference tree, Knowl.-Based Syst. 48 (2013) 57–69.

[14] S. Schiaffino, A. Amandi, Building an expert travel agent as a software agent,Expert Syst. Appl. 36 (2) (2009) 1291–1299.

[15] X. Tan, M. Yao, M. Xu, An effective technique for personalizationrecommendation based on access sequential patterns, in: Proceedings of2006 IEEE Asia-Pacific Conference on Services Computing, 2006, pp. 42–46.

[16] C.Y. Tsai, S.H. Chung, A personalized route recommendation service for themeparks using RFID information and tourist behavior, Decis. Support Syst. 52 (2)(2012) 514–527.

[17] C.Y. Tsai, P.H. Lo, A sequential pattern based route suggestion system, Int. J.Innovative Comput., Inform. Control 6 (10) (2010) 4389–4408.

[18] V.S. Tseng, K.W. Lin, Efficient mining and prediction of user behavior patternsin mobile web systems, Inf. Softw. Technol. 48 (6) (2006) 357–369.

[19] Y. Wang, N. Stash, L. Aroyo, P. Gorgels, L. Rutledged, G. Schreiberb,Recommendations based on semantically enriched museum collections, WebSemantics: Sci., Serv. Agents World Wide Web 6 (4) (2008) 283–290.

[20] G. Yavas, D. Katsaros, O. Ulusoy, Y. Manolopoulos, A data mining approach forlocation prediction in mobile environments, Data Knowl. Eng. 54 (2) (2005)121–146.

[21] C.H. Yun, M.S. Chen, Mining mobile sequential patterns in a mobile commerceenvironment, IEEE Trans. Syst., Man Cybernet. Part C: Appl. Rev. 37 (2) (2007)278–295.

[22] Z. Zhang, H. Lin, K. Liu, D. Wu, G. Zhang, J. Lu, A hybrid fuzzy-basedpersonalized recommender system for telecom products/services, Inform. Sci.235 (2013) 117–129.

Page 15: A Location-Item-Time sequential pattern mining algorithm ...

本文献由“学霸图书馆-文献云下载”收集自网络,仅供学习交流使用。

学霸图书馆(www.xuebalib.com)是一个“整合众多图书馆数据库资源,

提供一站式文献检索和下载服务”的24 小时在线不限IP

图书馆。

图书馆致力于便利、促进学习与科研,提供最强文献下载服务。

图书馆导航:

图书馆首页 文献云下载 图书馆入口 外文数据库大全 疑难文献辅助工具