Top Banner
The effect of an individuals age on the perceived importance and usage intensity of communications servicesA Bayesian Network analysis Pekka Kekolahti & Juuso Karikoski & Antti Riikonen Published online: 21 May 2014 # Springer Science+Business Media New York 2014 Abstract Multiple novel interpersonal communications ser- vices have emerged recently, but how their usage and per- ceived importance are related to the personal characteristics of the users is still relatively unexplored. Therefore, the aim of this study is to explore the effect of an individuals age on the perceived importance and usage intensity of communications services based on Bayesian Networks using a survey of 3008 Finns during 2011. In the case of Short Message Service (SMS), Instant Messaging (IM), Internet forums and commu- nities (e.g., Facebook & Twitter), and e-mail the results indi- cate that the perceived importance of the communications services decreases as the age increases. With phone calls and letters, however, no clear dependencies with age were identi- fied. In the causal analysis the importance of Internet forums and communities was the only variable which can be stated to be directly caused by an individuals age. This variable also acts as a mediator in the path from age towards perceived importance of other communication services and also towards their usage intensity. These results about the central role of Internet forums and communities can be exploited, for exam- ple, by device manufacturers when designing their products, and by service providers when designing their consumer services. The study also provides new information for mobile operators about the dependencies between mobile communi- cations services and a documented example workflow for research community to construct a causal Bayesian Network from a combination of observational data and domain expertise. Keywords Communications services . Bayesian networks . Machine learning . Causality . Perceived importance . Usage intensity . Individuals age 1 Introduction The number of technical alternatives for interpersonal interac- tions has increased during the last two decades due to ICT- related disruptive innovations. In addition to traditional letters and phone calls, text-based communications services, such as e-mail, Short Message Service (SMS), Instant Messaging (IM), and Twitter have emerged. Moreover, video communi- cation and communication through Facebook and similar community services are becoming more common. Several studies exist about the usage and importance of these services (see, e.g., Gerpott et al. 2012; Karikoski 2013). For example, the value of information as a function of time, different social circles, asymmetries in equipment, and adaptation of new technical innovations have been identified as factors affecting the choice of communications service (see, e.g., Grinter and Palen 2002). Only a few studies exist examining the importance of communications services with respect to the age of an indi- vidual. For instance, de Bailliencourt et al. (2011) have highlighted the decreasing adoption of SMS as age increases. The same tendency was identified for IM from the age group 1824 year-olds onwards, whereas the adoption of e-mails increased up to the age group of 5064 year-olds. The studies of Howard et al. (2001) indicate a clear dependency between Internet activities and demographic parameters such as age. For example, e-mail activity increased up until the age group of 5564 year-olds. Studies about teenage IM use show thatt P. Kekolahti (*) : J. Karikoski : A. Riikonen Department of Communications and Networking, Aalto University, P.O. Box 13000, 00076 Aalto, Finland e-mail: [email protected] J. Karikoski e-mail: [email protected] A. Riikonen e-mail: [email protected] Inf Syst Front (2015) 17:13131333 DOI 10.1007/s10796-014-9502-9
21

The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

Jan 24, 2023

Download

Documents

Herbert SIxta
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

The effect of an individual’s age on the perceived importanceand usage intensity of communications services—A BayesianNetwork analysis

Pekka Kekolahti & Juuso Karikoski & Antti Riikonen

Published online: 21 May 2014# Springer Science+Business Media New York 2014

Abstract Multiple novel interpersonal communications ser-vices have emerged recently, but how their usage and per-ceived importance are related to the personal characteristics ofthe users is still relatively unexplored. Therefore, the aim ofthis study is to explore the effect of an individual’s age on theperceived importance and usage intensity of communicationsservices based on Bayesian Networks using a survey of 3008Finns during 2011. In the case of Short Message Service(SMS), Instant Messaging (IM), Internet forums and commu-nities (e.g., Facebook & Twitter), and e-mail the results indi-cate that the perceived importance of the communicationsservices decreases as the age increases. With phone calls andletters, however, no clear dependencies with age were identi-fied. In the causal analysis the importance of Internet forumsand communities was the only variable which can be stated tobe directly caused by an individual’s age. This variable alsoacts as a mediator in the path from age towards perceivedimportance of other communication services and also towardstheir usage intensity. These results about the central role ofInternet forums and communities can be exploited, for exam-ple, by device manufacturers when designing their products,and by service providers when designing their consumerservices. The study also provides new information for mobileoperators about the dependencies between mobile communi-cations services and a documented example workflow forresearch community to construct a causal Bayesian Network

from a combination of observational data and domainexpertise.

Keywords Communications services . Bayesian networks .

Machine learning . Causality . Perceived importance . Usageintensity . Individual’s age

1 Introduction

The number of technical alternatives for interpersonal interac-tions has increased during the last two decades due to ICT-related disruptive innovations. In addition to traditional lettersand phone calls, text-based communications services, such ase-mail, Short Message Service (SMS), Instant Messaging(IM), and Twitter have emerged. Moreover, video communi-cation and communication through Facebook and similarcommunity services are becoming more common. Severalstudies exist about the usage and importance of these services(see, e.g., Gerpott et al. 2012; Karikoski 2013). For example,the value of information as a function of time, different socialcircles, asymmetries in equipment, and adaptation of newtechnical innovations have been identified as factors affectingthe choice of communications service (see, e.g., Grinter andPalen 2002).

Only a few studies exist examining the importance ofcommunications services with respect to the age of an indi-vidual. For instance, de Bailliencourt et al. (2011) havehighlighted the decreasing adoption of SMS as age increases.The same tendency was identified for IM from the age group18–24 year-olds onwards, whereas the adoption of e-mailsincreased up to the age group of 50–64 year-olds. The studiesof Howard et al. (2001) indicate a clear dependency betweenInternet activities and demographic parameters such as age.For example, e-mail activity increased up until the age groupof 55–64 year-olds. Studies about teenage IM use show thatt

P. Kekolahti (*) : J. Karikoski :A. RiikonenDepartment of Communications and Networking, Aalto University,P.O. Box 13000, 00076 Aalto, Finlande-mail: [email protected]

J. Karikoskie-mail: [email protected]

A. Riikonene-mail: [email protected]

Inf Syst Front (2015) 17:1313–1333DOI 10.1007/s10796-014-9502-9

Page 2: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

social circles and asymmetry in equipment determine the usedIM tool (Grinter and Palen 2002). Studies of deaf people’stext-based communications service preferences indicated thatSMS and IM usage was highest in the age group of 19–29 year-olds, and decreasing towards older ages. With e-mail the peak was in the age group of 30–49 year-olds slightlydecreasing after it. In the age group of 70+ year-olds e-mailwas most popular followed by SMS. However, IM was usedmuch less (Pilling and Barrett 2008). Based on Fox (2001),people between the ages of 20 and 30 are most frequent usersof Internet and are spending time online chatting, e-mailing,and gaming, while seniors are less frequent users of thoseservices. The studies of Thayer and Ray (2006) show a cleareffect of age on online communication as well as relationshipbuilding preferences regarding friends and strangers. Accord-ing to the Finnish Communications Regulatory Authority(Ficora 2011) survey from 2011 regarding Internet usagebehavior of Finns, Internet and its different services aremuch more important for younger than older age groups.Same applied to the usage of some new communicationsservices. On the other hand, according to Rogers (2003)inconsistent evidence exists about the relationship betweenage and innovation adoption. Half of the diffusion studiesanalyzed show no relationship.

A correlation between the age of an individual and theperceived importance of a communications service soundslogical. The causal relationship between them, on the otherhand, is not so self evident; the potential correlation might bespurious, i.e., caused by confounders (see, e.g., Geng and Li2002), and no causal relationships might exist. This studyattempts to examine causal relationships between an individ-ual’s age and the perceived importance and usage intensity ofdifferent communications services, using the Ficora telephonesurvey dataset of 3008 Finns from 2011. The study does notanalyze the meaning and factors of the term “perceived im-portance” itself. We refer to the studies of perceived value anduser experience (see, e.g., Sweeney and Soutar 2001;Verkasalo et al. 2010; Yang and Peterson 2004). Perceivedimportance consists of many factors, which might offerpotentially valuable additional variables for this studynot present in the survey. Nevertheless, for simplicityand focus, we treat the term as a single survey question,having five predefined qualitative answer alternatives(see Appendix 3).

Causal thinking has a long history and strong theoreticalbasis (see chapter 3.2). However, practical guidelines on howto use Bayesian Networks to discover causal structures using acombination of observational data and expert opinions are notwell defined. Some examples can be found using either expertdata (e.g., Nadkarni and Shenoy 2004) or observational data(see constraint based learning, e.g., Scutari 2010).

The purpose of this study is to explore the followingquestions:

& What kind of causal dependencies exist between an indi-vidual’s age and the perceived importance and usageintensity of communications services?

& What kind of causal dependencies exist between the per-ceived importance and the usage intensity of communica-tions services?

& How can these dependencies (statistical and causal) beinterpreted and explained?What kinds of differences existin causality results between Pearl’s do calculus and aproprietary method called Jouffe’s Likelihood Matching?

& What is an example workflow to create a causal BayesianNetwork from observational data and domain expertise?

& How feasible are metrics called Node Force and Maxi-mum Spanning Tree -learning method for qualification ofa relative importance of a certain variable within the wholeBayesian Network-model?

& What additional (latent) variables, not present in the ob-servational data, may contribute to the causal results?

The structure of the article is as follows. Chapter 2 brieflydescribes the structure of the survey dataset used. Chapter 3explainsmotivations for using Bayesian Networks (BN) as theanalysis method and presents the workflow and methods usedto create a probabilistic and causal BN from the dataset andexplains key metrics used in this study. The results are pre-sented in Chapter 4, both as BN and as causal BN includingresults from causal strength analysis. Chapter 5 discusses theimplications of the results, and finally Chapter 6 concludes thepaper.

2 The survey dataset

The study uses Ficora’s telephone survey, conducted with3008 Finns during 2011. The survey has three types ofquestions:

& Nominal questions, e.g., “how important is SMS to you,select from four alternatives?”

& Questions consisting of more than one dichotomous ques-tion, and where the respondent can answer “yes” to noneor all of them

& Conditional questions consisting of both nominal anddichotomous questions (so called skip logic), e.g., “ifyou have an Internet connection, what is the speed of yourconnection?”

The coding of the questions in this study was done in thefollowing way: the first number refers to the theme numberand the second number is a serial number within the theme.The codes i and u refer to importance and usage, respective-ly—for some of the services both of them were not surveyed(letter and phone call). Characters a-h refer to potential

1314 Inf Syst Front (2015) 17:1313–1333

Page 3: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

dichotomous questions. In addition, some of the key questionscontain additional clarifying words (such as CALL referringto phone call and FB to Facebook & Twitter) based on whichit is easier to the reader to interpret the meaning of the questionfrom the BN. The questions were grouped into the followingfour themes and two other groups for the study. See Appen-dix 1 for the actual questions and the themes for which thequestions were grouped into. The pre-defined values for per-ceived importance i and usage intensity u are listed inAppendix 3.

The themes of the survey were defined as follows:

Theme 1: Perceived importance and usage intensity ofcommunications services: questions 1.1 LANDLINE,1.2 MOBILEPHONE, 1.3i SMS, 1.3u SMS, 1.4iEMAIL, 1.4u EMAIL, 1.5i FB, 1.5u FB, 1.6u INTER-NET, 1.7 MOREBANDWIDTH, 1.8u SKYPE, 1.9i IM,1.9u IM, 1.10i CALL, and 1.11i LETTER. Questionsregarding the availability of landline and mobile phone,as well as the need for higher access speed for someservices are included in this theme, because they aretreated as enablers for usage and importance. SMS refersto Short Message Service; EMAIL to e-mail; FB toInternet forums and communities, such as Google+,Facebook & Twitter; IM to Instant Messaging; CALLto phone call; and LETTER to traditional letter. Thistheme is the main focus in this study.Theme 2: Importance of how to read daily news ingeneral and from Internet: questions 2.1, 2.2a–2.2d,2.3u BROWSING, 2.4u VIDEO&MUSIC, 2.5, 2.6uTV, 2.7–2.9. Even if this theme is not the main focus inthis study, some of the questions might possibly mediatethe usage or importance of communications services.Therefore, this theme is included in the analysis.Theme 3: Fixed to mobile Internet convergence: ques-tions 3.1a–3.1 h, 3.2 INTERNETCONNECTION, 3.3a–3.3e, 3.4a-3.4f, 3.5–3.11, 3.12a–3.12e, 3.13a–3.13 g,3.14–3.17, 3.18a–3.18 h. Similarly to the previous theme,this theme also includes variables that are expected to bemediator variables.Theme 4: Contract types with service provider, Inter-net access speed versus monthly payments: questions4.1-4.5.Theme 5: Demographic questions: 5.1 GENDER, 5.2COMMUNES, 5.3 AGE, 5.4 RESIDENCE, 5.5LIFESITUATION, 5.6 HOUSEHOLD.Theme 6: Other questions: 6.1u TRANSACTIONS,6.2u MUSIC, 6.3u GAMES, 6.4u FILETRANSFER,6.5u REMOTEWORK, 6.6, 6.8–6.10, which cannot becategorized to any of above groups.

This study uses an age categorization, where the distribu-tion within the dataset is as even as possible in order to avoid

potential bias from a skewed distribution. The term AGE isused to refer to the BN variable consisting of these five agecategories with an even distribution, and term age when werefer to an individual’s age in general.

3 Methods

3.1 Why Bayesian Networks approach for causal analysis?

Causal analysis typically tries to find answers to questionssuch as “why things happen, what the reasons were, and howto control the effect”. Causality is defined, e.g., by OxfordDictionaries (2013) as ‘the relationship between cause andeffect’. A more detailed definition depends on viewpoints, i.e.,philosophical or statistical, which on the other hand are relatedto different causal theories. The causal theories can be classi-fied as causal counter-factual, causal probability/potential out-comes, causal agent-manipulation, and causal process theories(Woodward 2013). This study relies on causal probabilitytheories—meaning that causal influence can be analyzed asprobabilistic relations. In a simple form causal strength is thereinterpreted as probability increase: event A is a cause of eventB, if P(B | A) > P(B). This assumption does not always hold,and a typical example is a spurious relationship due to aconfounding variable. Spurious relationships refer to associa-tions (statistical relationships) between two variables withouta causal relationship. For additional information on causaltheories, the readers are encouraged to study, e.g., Dawid(2010), Grotzer and Perkins (2000), Pearl (2009), Weirich(2012), and Woodward (2013).

In addition to Bayesian Networks, there are many otherestablished techniques for causal analysis. For instance, re-gression modeling can be used for causal interpretations (e.g.,Gelman and Hill 2006). Structural Equation Modeling (SEM)is a commonly used concept for assessing causality and hastwo approaches, namely Confirmatory Factor Analysis, oper-ating with both observed and latent variables, and Path Anal-ysis, operating only with observed variables (e.g., Andersonand Gerbing 1988; Tenenhaus et al. 2005). When analyzingthe conditional probability of an event given a set of observedcovariates, Propensity Score Matching provides a means foradjusting for potential selection bias (due to confoundingvariables) in observational studies. This method is based onRubin’s potential outcomes (counterfactual) framework(Rosenbaum and Rubin 1983). Sufficient-component causemodels are originated in epidemiology having special charac-teristics for that research domain (e.g., Greenland andBrumback 2002).

This study focuses on (causal) Bayesian Networks as theanalysis method, which belongs to the family of graphicalmodels. Graphical models have been characterized byGreenland and Brumback (2002, p. 1030) in the study about

Inf Syst Front (2015) 17:1313–1333 1315

Page 4: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

four different methods for causal analysis as follows: ‘Graph-ical models can illustrate qualitative population assumptionsand sources of bias not easily seen with other approaches…they can be easily applied in any study to display assumptionsof causal analyses, and to check whether covariates or sets ofcovariates are insufficient, excessive, or inappropriate to con-trol given those assumptions’. Causal Bayesian Networks—procedure (sometimes called interventional graphs) is a suit-able method if the objective is to measure the probabilitychange of some variables, when a certain variable is inter-vened. In comparison, in SEM the objective is more focusedon the factors determining the value of certain other variablewith help of somemechanism (Pearl 2009, pp. 23–24, 33–36).Because we are more interested in the former in this study, theBayesian Network—based method was chosen. In addition,even though Causal Bayesian Networks have been extensive-ly studied from a theoretical point of view (e.g., Pearl 2009), itis rarely utilized in empirical research articles. Therefore, wealso wanted to document an example workflow to lower thebar for researchers to use Bayesian Networks, especially incases where data sources consist of a combination of obser-vational data and expert knowledge.

Pearl’s do-calculus (Pearl 2009, chapter 3.4) is a graphicalidentification criterion for causal effects through intervention.It is used in this study for causal strength analysis. The methodis time consuming especially in complex networks and com-putationally, and from analysis process point of view it ispossible to use a faster proprietary method (e.g., Jouffe’sLikelihood Matching, see Chapter 3.4). The objective is toverify how the causality differs between the proprietary meth-od and Pearl’s method. If the results are comparable, thecausal BayesianNetwork analysis using this proprietary meth-od could be used also by non-experts, even outside researchcommunity.

Finally, when the model consists of tens of variables, as isthe case in this study, a graphical model may give new insightnot easily acquired with other methods, as expressed byGreenland and Brumback (2002). Especially, we wanted totest the capability of Node Force -metrics and MaximumWeight Spanning Tree Learning to discover new insights froma complex network.

3.2 The difference between probabilistic and causal BayesianNetworks

A Bayesian Network (BN), called also Bayes Belief Network(BBN), is a graphical model, where the nodes represent ran-dom variables {Xi} and the arcs represent the existence ofconditional probabilities between certain nodes. The structureof this model is a directed acyclic graph (DAG), where theconditional probabilities for each variable Xi are specifiedwith {Pi (Xi | pa(Xi))} and where {pa(Xi)} is the parentfunction. The joint probability of DAG is P({Xi})=∏I Xi |

pa(Xi) (Pearl 2009; Scutari 2010; Elwert 2013; Barber 2012).This study uses the term BN to denote a model consisting ofthe combination of a DAG and a set of {Pi (Xi | pa(Xi))}. ABN model is visualized either as a DAG graph without con-ditional probability tables (CPT), or with such tables. CPTscontain one probability distribution for every combination ofthe discrete states of the parent.

Some differences exist between a BN and a causal BN(Pearl 2009; Dawid 2010; Elwert 2013). Firstly, in a causalBN the arc direction indicates an assumed causal direction,whereas a BN does not necessarily hold that information. Inaddition, in a causal BN an arc is a sign of a possible causalrelationship between variables, whereas a missing arc betweenthem represents an assumption of no direct causal effectbetween. On the other hand, in a BN a missing arc is only asign of conditional independence assumptions between vari-ables. Secondly, in a causal BN the variables are temporallyordered, meaning that an event in the future cannot causesomething in the past. On the other hand, in a BN no certaintyabout it exists. Thirdly, Causal Markov Condition is assumedin a causal BN in each variable. This means that a variable isindependent of its non-descendants given its parents. In com-parison, a BN may include a Markov condition without cau-sality. Fourthly, a causal BN embodies a concept of Faithful-ness (Stability), i.e., the effects are always probabilisticallydependent on each cause. Finally, a causal BN can be thoughtto consist of a set of do-operators do(Xi=xi) to describe theactive intervention regarding each Xi (Pearl 2009), whereas aBN is based on observational data. According to the causalMarkov assumption, this intervention cuts all the parentalconnections to the intervened variable Xi, thus changing thewhole causal BN structure. The difference between interven-tional (causal BN) and observational (BN) regarding the rela-tionship between two variables can be described as “given thatwe do” compared to “given that we observe”. The thinkingbehind this kind of intervention is, that the intervened variablehas a certain effect on another variable if and only if thedependency between them would persist under the right sortof manipulation of X (Hagmayer et al. 2007; Pearl 2009).

3.3 Construction of Bayesian Networks from survey data

Machine learning is used to construct a BN from survey data.Machine learning methods and their use cases are discussed inmore detail, e.g., in Barber (2012) and the methods applied inthis study are discussed, e.g., in Kekolahti and Karikoski(2013). The survey questions are variables of the BN andthe existence of an arc between two variables depends onhow big the conditional dependency between them is (i.e.,how the individuals have answered the questions in this case).Two main learning algorithm types are available to create aBN from data, namely constraints based algorithms and scorebased algorithms (e.g., Scutari 2010). Constraints based

1316 Inf Syst Front (2015) 17:1313–1333

Page 5: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

algorithms directly yield a causal model. This study utilizesscore based algorithms for structure learning. Therefore, noguarantee about causality exists without domain expert-basedeffort (top-down modeling). BN structure alternatives arescored by calculating how well the structure matches the dataand the one with the best scoring is selected. Many scoringfunctions exist for BNs, such as Akaike's information criterion(AIC), Minimum Description Length (MDL), and BayesianInformation Criteria (BIC) among others (Akaike 1973;Rissanen 1978; Friedman et al. 1997; Grünwald 2007; Lamand Bacchus 1994; Yun and Keong 2004; Schwarz 1978).MDL versions can be categorized as refined (one stage meth-od) and two-stage MDL (called also crude MDL), out ofwhich this study uses the latter. There is a trade-off betweenmodel complexity and the fit to data in two-stage MDL, andthe optimum model is the one where MDL(Model|Data) is atits minimum. In other words, simple model structures arepreferred over more complex ones. Grünwald (2007) prefersrefined MDL but states, that the use of the two-stage MDL isevitable in special modeling situations. The computationalmodel used of MDL score (Eq. 1) is also equipped with anadditional proprietary weight factor called Structural Coeffi-cient (SC). A default value for it is 1, yielding optimumMDL.With values SC < 1 the model structure decreases yielding amore complex model, and with SC > 1 more simple models.SC is similar kind of a parameter as sampling. With SC ≠ 1three types of special cases can be exploited: 1) when thetarget is to derive a unified DAG without any non-connectedvariables, 2) a computationally optimum MDL (with SC = 1)will produce too complex a network, or 3) a simpler networkis sufficient for the study targets. The high-level MDL score(Eq. 1, Bayesialab library 2013a) can be expressed as:

MDL BN���D

� �¼ α DL DAGð Þ þ DL P

���DAG� �� �

þ DL D���BN

� �ð1Þ

where MDL is the Minimum Description Length score, BN isa Bayesian Network with DAG and with a set of CPTs, α isSC, DL(DAG) is the number of bits needed to describe thegraph-part of BN, P is the set of CPTs in BN, and DL(D|B) isthe number of bits needed to describe data given BN.

The number of alternative structures for the MDL scoringprocess increases exponentially as a function of the number ofvariables. When the number of variables is high, as in the caseof this study (initially used 100 variables, see Appendix 2), itis not computationally possible to find the structure withhighest score from all possible alternatives. Therefore, fiveunsupervised learning algorithms with different search criteriaare utilized to search candidate BNs for the MDL process.Maximum Weight Spanning Tree (MWST) is used in this

study to create a BN for qualitative and visual analysis andin order to get a fast overview of the model and itsassociations. MWST was originally utilized by Chow andLiu (1968) for supervised learning delivering more accurateprediction results than Naïve Bayes-learning. Friedman et al.(1997) adapt the method for TAN classifier. MWST restrictsthe number of parents per child to one. The benefit of thisrestriction is that the learning is computationally fast and theBN is easy to “read” due to missing V-structures even by non-experts. From MDL scoring point of view, MWST yieldsmore compressed DAGs with a potentially lower match ofdata given the BN. The MDL score might also be comparablewith other methods, but due to the restriction of only oneparent per child, the model is not used for quantitative analysisin this study. Instead, Taboo, Taboo Order, EQ, and SopLEQlearning are used for quantitative analysis, as these algorithmsdon’t have one parent per child restriction. In addition, each ofthem are based on heuristic search algorithms, which toleratesituations with many high scoring models (Taboo search withTaboo list size and its variant in Taboo Order, equivalenceclasses search in EQ and its variant in SopLEQ) (Bouckaert2008; Hyun et al. 2011; Jouffe andMunteanu 2001; Kekolahtiand Karikoski 2013; Munteanu 2001; Scutari 2010). Each ofthese four algorithms is tested in parallel and the selection ofthe BN used for further analysis is based on the MDL score:The model with the lowest MDL score is selected.

The assumption of the usability of MWST -learnedBN for overview discovery and for definition of thecentrality of certain variables is tested by using multipleSC values associated with MWST learning. Similar testsare done also with Taboo and EQ -learning. In thetesting the learned BNs are visually compared (e.g., bycalculating the number of arcs per variable) and numer-ically by implementing a Node Force -analysis per eachXi). The assumption is that the more central the role Xi

has the more there are associations with other variablesand thus also the more arcs between Xi and othervariables in a BN exist. If this phenomenon is visiblein the same way in different learning methods and withdifferent SC values, then we can make conclusionsabout the centrality and usability of MWST-learning todiscover centrality.

Numerical values for centrality analysis is achieved byexecuting a Node Force (NF) analysis for each Xi in thelearned BN. Xi nodes with higher NF values, have relativelymore central roles in the BN. NF is calculated as a sum ofKullback–Leibler divergence (DKL), i.e., as a sum of infor-mation gain from each of the entering and outgoing arcs for avariable Xi (see NF and DKL formulas, e.g., in Kekolahti andKarikoski 2013). NF is a heuristic measure and is used, e.g.,by Lee and Choi (2011) to define a relative importance of thevariable in the BN. We interpret relative importance andcentrality as the same.

Inf Syst Front (2015) 17:1313–1333 1317

Page 6: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

3.4 Aworkflow to create and analyze causal Bayesiannetworks

A causal BN was created using a six-phased workflow shownin Fig. 1.

In the first phase (denoted as workflow step 1 in Fig. 1) thesurvey questions were classified thematically (as described inChapter 2).

In the second phase (denoted as workflow steps 2–5 inFig. 1) a centrality of AGE within all survey questions wasdiscovered using MWST learning with an SC value of 1.Moreover, an initial view of mediating variables from otherthan the target theme of this study was also acquired by usingthe sameMWST learned model. In addition, from the MWSTBN we acquired preliminary information on how the othervariables, especially those describing the importance and us-age intensity of communications services, are connected withAGE. This analysis was done because the potential mediatingvariables may not belong only to the focus theme listed inChapter 2. NF was calculated to get numerical informationabout centrality of AGE. In addition, Mutual Information wasused to quantify the mutual dependence between the twovariables, i.e., to understand how much information one var-iable provides about another (see, e.g., Barber 2012; Friedmanet al. 1997). At this point we wanted also to ensure theusability of the MWST and NF analyses for centrality discov-ery and implemented learnings with different SC values andlearning methods.

A third phase (denoted as workflow steps 6–8 in Fig. 1)includes the learning of a BN by using Taboo, Taboo Order,EQ and SopLEQ to select a model which yields the lowestMDL, i.e., an optimized BN for further workflow phases.Also the final set of mediating variables from other themeswas selected.

In phase four (denoted as workflow steps 9 and 10 inFig. 1) domain expertise was added to the learning processin the form of temporal data. Temporal data was associatedwith the variables and this additional information was takeninto account in the re-learning of the BN. It is logical toassume, that newer technology may reduce the importanceof older technology, and therefore the arc direction should befrom variable representing newer to variable representingolder technology. The temporal order for variables was basedon qualitatively estimated technology groups arranged intochronological order. For example, because IM was introducedlater compared to SMS, it belongs to a newer technologygroup. Thus, if an arc between IM and SMS results from thelearning process, due to this temporality data, the arc directionis from IM to SMS.

In the fifth phase (denoted as workflow step 11 in Fig. 1) aBN with correct temporal arc directions was verified by ex-perts, because temporality does not automatically providecausal direction (see, e.g., Pearl 2009, chapter 7.5; Kekolahti

2011). The expert-based verification of causality consists ofreviewing all the arc directions and their relevance in themodel, and analyzing whether some latent variables, whichthis study might need, are missing. For example, arcs mightstill exist due to spurious relationships, or a clear causalconnection is missing even if it should exist based on expertviews. A re-learning is required with additional constraints,such as “arc is not allowed” or “arc is mandatory” betweencertain variables. This phase contains also a simplification ofthe BN by calculating the confidence for each arc as p-valueand by eliminating those arcs from the final network, wherenull hypothesis can be accepted and where the calculated p-value is greater than a pre-defined value (see the use of p-valuelater in this chapter). A causal BN is the output of this phase,but it is still based on the variables from observations only, i.e.,from the dataset. The above mentioned latent variable analysishas not been implemented in this study, as it could be aresearch topic in its own right. However, Chapter 5 discussesthe potential latent variables missing from the model.

The sixth phase (denoted as workflow step 12 in Fig. 1)contains the analyses of the causal influence (strength) fromthe learned causal BN’s originating from probabilistic rela-tions between individual’s age and other variables, usingPearl’s do(Xi=xi) calculus. Specifically, a conditional meananalysis is conducted by calculating the most probable meanvalue of caused variable given each state xi of the causingvariable (e.g., 5.3 AGE) including mediators. Pearl’s do cal-culus means that in the conditional mean analysis the parentalarcs of the causing variable in the causal BN were truncated.The delta between maximum and mean values from abovecalculation is the final causal strength. Additionally in thisphase we also compared the causal strength resulted fromPearl’s do-calculus with Jouffe’s Likelihood Matching algo-rithm (JLM). Causal strength calculation with JLM was donefrom BN, because the method does not require a causalnetwork. In Jouffe’s method (Kekolahti and Karikoski2013), Direct Effect (DE), is utilized to find causal relation-ships. DE measures the impact of a conditional mean of eachstate of variable X on the mean of variable Y utilizing theKullback's minimum cross-entropy method called MinxEnt(Shamilov et al. 2006). DE measures a causal relationshipsbetween X and Y by keeping all other variables’ distributionsin the network fixed. This fixing d-separates (blocks) theaffecting paths from all other variables except for the directeffect from X to Y (Conrady and Jouffe 2011).

As part of the workflow phase six, the differences betweena BN from workflow phase three and a causal BN from phasefive are analyzed by studying their structure and ContingencyTable Fit (CTF) differences. The temporality information inthe learning process, in addition to potential arc directionchanges and model simplifications, may lead to differencesbetween BN and causal BN, measured as CTF, but these areexpected to be minor and logical. If the difference is bigger

1318 Inf Syst Front (2015) 17:1313–1333

Page 7: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

than above defined value, then simplification, for example,should be re-evaluated. CTF (Xi 2013) is a goodness of fittype of measure and implemented in a following form in theused Bayesialab tool (Bayesialab library 2013b)

C ¼ 100 U−Nð Þ.

U−Cð Þ; ð2Þ

where C is the contingency, N is the mean of log likelihood ofdata given BN or causal BN, U is the mean of log likelihood ofdata given unconnected network, and C is the mean of loglikelihood of data given fully connected network. If C is100 %, then the BN or causal BN represents fully (withoutany approximation) the data. If C is 0 %, then the modelexpects that the variables are fully independent. An expertbased acceptable target for difference of CBN – CcausalBN hasbeen defined as 5 % or less, expressing that the BN and causalBN should be near each other, measured as CFT.

The p-values were used in two ways. The causal BN wassimplified as described in phase five. In phase six the confi-dence of the causal assumptions was analyzed by calculatingp-value between the path’s end points X and Y, where X isAGE andY is in turn 1.3i SMS, 1.4i EMAIL, 1.5i FB, 1.9i IM,1.10i CALL, and 1.11i LETTER. The pre-defined value for pwas set to 5 %.

4 Results

4.1 The importance of individual’s age with respectto the whole survey

Figure 2 shows a MWST learned BN with 100 variables,constructed by using the workflow phases one to two ofChapter 3.4. In this model the bubbles represent variables,and each variable represents one question in the survey. The

name of the variable is the same as the question number in thesurvey (see Appendix 1 for actual questions). The states(values) of the variables depend on the type of question; theyare either binary or discrete integers if questions are nominal.The bubble size is proportional to the variable’s NF value, thearcs between bubbles represent available conditional depen-dencies between two variables, and the width of the linecorrelates with Mutual Information value of this dependency.The arrow head of the arcs (i.e., direction information) hasbeen hidden to clarify the presentation.

Figure 2 demonstrates, that based on visual inspection,AGE is clearly dominant in the BN, as 13 variables aredirectly connected to it, and due to the relative size of itsbubble. The actual NF values are listed in Appendix 2, whereit is shown that NF slowly decreases to zero so that from 100questions roughly 25 have a NF value greater than 0.4 andabout 50 have less than 0.1. Qualitatively this can beinterpreted from a dependency point of view that only thequestions in the first group (NF>0.4) are practically importantand half of the questions are not important at all. In addition toAGE, examples of some other dominant questions, whichbelong to the first theme of Chapter 2, are 1.5i FB, 1.3uSMS, 1.4i EMAIL, and 1.9i IM. Examples of dominant ques-t ions which are not par t of th is theme are 5.5LIFESITUATION, 3.3a, 3.18 g, 2.3u BROWSING, and 2.7.Appendix 2 shows also, that the Finnish communes, individ-ual’s gender, and residence have a minor role in the BN,whereas household size and life situation seem to have a moreimportant role. High NF value of AGE and life situation canbe partially explained by a strongmutual dependency betweenthem, since typically younger individuals are students, middleaged individuals are at work, and older ones are pensioners.

We also wanted to verify how other learning methodsdemonstrate the centrality of AGE when using NF calculus.In addition to this we wanted to test whether it is possiblevisually to draw similar conclusions from BNs learned withTaboo and EQ methods, as can be concluded from an MWSTlearned BN. This test was done by using different SC values

Fig. 1 The workflow used in this study. The box describes the workflow steps and punch tape symbol documents, such as BN models or reports

Inf Syst Front (2015) 17:1313–1333 1319

Page 8: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

(0.2 ≤ SC ≤ 1), allowing and forbidding the nearlyd e t e rm in i s t i c r e l a t i on sh i p be tween AGE andLIFESITUATION. The age category of Ficora (2011) wasalso used in addition to the evenly distributed age category.

The following conclusions can be drawn from the test.Table 1 shows that with all the tested parameter combinations,the interviewee’s age indeed is the most central variable withrespect to the whole survey. In other words, age is relativelythe most important variable in the network. Based on Table 1,the centrality order behind AGE depends on the SC value andlearning method. The difference is especially visible betweenMWSTand Taboo, and MWSTand EQ due to the restrictionson parent–child relationships inMWST.When using Taboo orEQ learning and SC lower than 0.8, the network became toocomplex from computational point of view due to the highnumber of variables with a lot of parents. Our conclusionbased on above results is that NF can be used as an indicativemeasure for centrality (relative importance) of the variable inthe model. As seen from Table 1, strong centrality (many arcsper variable and/or high KLD value between variables) canclearly be discovered by using NF, but also visually, whenMWST is used. The latter can be demonstrated with a simpleexample in Appendix 5. It shows, that a BN model with tensof variables is becoming very complex, especially when thenumber of parents is not restricted as is the case with EQ. Onthe other hand, even when the EQ model looks confusing

visually, it is still possible to study individual associations andto recognize the central role of AGE from MWST modelshowing that MWSTcan be used to quickly form an overviewimpression from the model.

4.2 Causal relationships between ageand the importance/usage intensity of communicationsservices

The causal BN used in this chapter was constructed andanalyzed by using the workflow phases three to six describedin Chapter 3.4. The initial discovery of mediating questions(from question themes two to six) was also conducted with avisual inspection of the paths from Fig. 2 (as described in theworkflow phase 1). As a result, one mediating variable, name-ly 2.3u BROWSING, was discovered. Taboo Order gave thelowest MDL scores of the tested learning algorithms Taboo,Taboo Order, EQ and SopLEQ. Based on the Taboo Orderlearned network, the following variables were re-identified asmediators: 3.2 INTERNETCONNECTION, 2.3u BROWS-ING, 2.4u VIDEO&MUSIC, 2.6u TV, 6.2u MUSIC, and6.4u FILETRANSFER. Interestingly, none of the demograph-ic questions acted as a mediator. One variable, 1.2MOBILEPHONE, was identified as agnostic and was re-moved from the networks. The reason for this was that

Fig. 2 Bayes Network (only DAG part) from MWST learning based on100 survey questions. The red bubbles are the 10 variables with highestNF value. The arc thickness indicates qualitatively the size of Mutual

Information between variables. The numerical code is the question num-ber (see Appendix 1)

1320 Inf Syst Front (2015) 17:1313–1333

Page 9: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

99.3 % of the interviewees owned a mobile phone. Thus, thisvariable is agnostic to age and to any other questions.

In temporality analysis, we defined seven technologygroups and arranged them in a chronological order from oldest(temporal index 1) to youngest (temporal index 7). One tech-nology group contains communication technologies with sim-ilar launch times. The technology groups were (from oldest toyoungest) 1) letter, 2) landline, phone call, 3) Internet at home,e-mail, 4) mobile phone, 5) SMS, 6) IM, Internet calls,Facebook & Twitter, listening to music from Internet,watching videos and TV from Internet, and 7) usage of IM,Facebook & Twitter, Internet calls from mobile phone, andwatching video and Internet TV frommobile phone. In the re-learning phase with Taboo Order and temporal indices, the arcdirections between the questions were not fully defined bytemporal indices but also by the learning because some of thequestions had the same temporal indices.

Before a p-value analysis, we wanted to understand theeffect of the above temporality analysis on the structure,which consisted now of 19 variables and 40 arcs. Thestructure comparison between the BN (reference) and caus-al BN (with temporal indices) was performed, and thefollowing similarities and differences were found: 26 arcswere common, eight arcs were common but arc directionwas changed, seven arcs were discarded and seven added.Regarding the V-structures, seven were common, fiveadded, and 10 discarded. This means that due to the addi-tion of temporality data, 45.8 % of the BN’s connectionswere changed in the causal BN and the overall structurebecame simpler. The arc directions changed were logicaland expected due to the temporality data. Careful inspec-tion of connectivity between variables showed that some ofthe discarded arcs possibly resulted from potential spuriousor weak dependencies (e.g., between 1.4u EMAIL and2.4u VIDEO&MUSIC). An interesting difference betweenthe structures was the existence of the arc from 1.4iEMAIL to 1.3i SMS in the BN, whereas in the causal BNit was excluded and instead, the arc from 1.5i FB to 1.3iSMS was added by the learning process. In other words, astatistical dependency between perceived importance of e-mail and SMS was found in the BN, but instead of it, adependency was found from perceived importance ofFacebook & Twitter to SMS in the causal BN. Both soundlogical, but as the CTF calculation gave exactly the sameresults (C=42.2 %), the conclusion was to keep the causalBN structure regarding the parent for 1.3i SMS. In general,the expert based review of the achieved causal constructdid not change the network anymore. Therefore, the causalBN was entirely based on added temporality information.

The arcs’ p-value analysis, implemented as a final step,clearly simplified the model: two arcs and variables 1.2MOBILEPHONE, 6.2u MUSIC, and 6.4u FILETRANSFERwere removed due to “null hypothesis accepted”, and 1.1T

able1

Five

highestN

Fvaluesusingdifferentlearningparameters.Fmeans

agecategories15–24,25–34,35–49,50–64,65–79used

byFicora,and

A≠L

Smeans

forbidding

thearcbetween5.3AGEand

5.5LIFESITUATIO

N

MWST;

SC=0.3

MWST;

SC=0.3F

MWST

;SC=0.2

MWST

;SC=1.0

MWST;S

C=1A≠L

STaboo;

SC=0.8

Taboo;

SC=1.0

EQ;S

C=0.8

EQ;S

C=1.0

Question

NF

Question

NF

Question

NF

Question

NF

Question

NF

Question

NF

Question

NF

Question

NF

Question

NF

5.3AGE

1.85

5.3AGE

1.53

5.3AGE

1.85

5.3AGE

1.8

5.3AGE

1.06

5.3AGE

1.69

5.3AGE

1.7

5.3AGE

1.7

5.3AGE

1.71

3.18

g0.84

3.18

g1.08

3.18

g0.84

3.18

g0.84

3.18

g0.84

5.5LIFESIT.

0.98

5.5LIFESIT.

0.98

1.4i

EMAIL

1.02

5.5LIFESIT.

0.98

1.5i

0.84

3.3a

0.89

1.5i

FB0.84

5.5LIFESIT.

0.74

3.2IN

TERNET

0.79

1.4i

EMAIL

0.88

1.4i

EMAIL

0.88

5.5LIFESIT.

0.97

1.4i

EMAIL

0.96

5.5LIFESIT.

0.82

1.9u

IM0.82

5.5LIFESIT.

0.82

1.3u

SMS

0.73

1.3u

SMS

0.73

3.18

g0.85

3.18

g0.88

3.18

g0.89

3.18

g0.92

1.3u

SMS

0.73

1.9i

IM0.71

1.3u

SMS

0.73

3.3a

0.73

3.3a

0.73

3.3a

0.8

3.3a

0.8

3.3a

0.84

3.3a

0.83

Inf Syst Front (2015) 17:1313–1333 1321

Page 10: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

LANDLINE just because it did not have any other connec-tions except with 5.3 AGE.

Figure 3 presents the final causal BN as a DAG and withonly a few CPTs included (due to large size of many of them).The model consists of three types of questions; firstly thequestion of individual’s age, secondly the questions classifiedaccording to the first theme (see Chapter 2), and thirdly the sixquestions from other themes acting as mediators. The follow-ing qualitative conclusions can be drawn from the model’sDAG:

& An individual’s age and the perceived importance of acommunications service together affect the usage intensityof that specific service. This is visible in 1.3u SMS, 1.4uEMAIL, 1.5u FB, 1.6u INTERNET, and 1.9u IM. Internetcalls (1.8u SKYPE) is an exception, which is affected bythe usage of Facebook & Twitter during free time, and IMusage intensity.

& An individual’s age directly affects only 1.5i FB, i.e., howimportant Facebook & Twitter are perceived. The variable1.5i FB acts as a central mediator between age and theperceived importance of other communications services.This means that the perceived importance of Facebook &Twitter directly affects the perceived importance of e-mail,SMS, and IM, while the age affects them only indirectly.Perceived importance of letter and phone call are behind along path from 5.3 AGE. Thus, it is expected that they arerather independent from variable 5.3 AGE, and that theirperceived importance is affected by how important SMS isperceived.

& Internet access at home seems to be another central medi-ator between 5.3 AGE and many variables.

& An individual’s age affects directly the variables 2.3uBROWSING, 2.4u VIDEO&MUSIC, 2.6u TV, and 1.5uFB.

& 5.3 AGE and 1.8u SKYPE are the start and end points ofthe whole causal BN. Therefore, usage intensity of Inter-net calls is “a product of” many factors, starting from ageand ending at the usage intensity of Instant Messaging andthe usage intensity of Internet for staying in touch withothers during free time.

In order to quantify the above qualitative conclusions, acausal analysis with the model visualized in Fig. 3 was con-ducted between 5.3 AGE and each question 1.3i SMS, 1.4iEMAIL, 1.5i FB, 1.9i IM, 1.10i CALL, and 1.11i LETTERwith the workflow step six described in Chapter 3.4. Theresults are seen in Fig. 4 and Appendix 6. This analysisanswers to the question of “what kind of causal relationshipsexist between individual’s age and perceived importance ofcommunication services”. Similar analysis between 1.5i FBand other perceived importance related questions were con-ducted. This analysis sheds light on the role of 1.5i FB as a

mediator for the perceived importance of other communica-tions services. Also a causal analysis between 5.3 AGE andthe usage intensity of communications services was conductedas well as the mediator role analysis of 1.5i FB for usageintensities. The results are seen in Fig. 5 and Appendix 6.

Figure 4a demonstrates that when age increases, perceivedimportance clearly decreases in case of Facebook& Twitter, e-mail, Instant Messaging and SMS and slightly in the case ofphone calls. Phone calls are perceived as important on aver-age, whereas Instant Messaging and Facebook & Twitter arenot very important. Figure 4b demonstrates that a dependencyexists between the perceived importance of Facebook & Twit-ter and the perceived importance of all studied communicationservices, except letter. Numerical causal values can be foundfrom Appendix 6. When looking at the DAG in Fig. 3 and thedependency graphs of Fig. 4a and b and numerical causalvalues in Appendix 6, we can conclude the following: a causalrelationship exists between individual’s age and perceivedimportance of communications services (only very small inletter and phone calls) and also, that a direct relationship existsonly with Facebook & Twitter and it acts as a central mediatortowards other perceived importance variables. In addition toFacebook& Twitter, the availability of Internet access at homeis a clear mediator.

Figure 5a demonstrates that a decreasing tendency of usageintensity of communication services exists (with mobilephone) as individual’s age increases. Based on Fig. 5b theperceived importance of Facebook & Twitter acts as a medi-ator to the usage intensities of all measured communicationsservices. The conclusion is, that an individual’s age affects,not only the perceived importance, but additionally the usageintensity. In this dependency relationship, the perceived im-portance of communications services in general, Internet con-nection at home and especially the importance of Facebook &Twitter act as mediators for usage intensities.

The causal strength results were compared between Pearl’sdo operator using causal BN, the same using BN (beforeaddition of temporality data) and Jouffe’s LikelihoodMatching (JLM) using BN (see Chapter 3.4, phase six). Ap-pendices 6 and 7 indicate that Pearl’s do-operator method doesnot find any causality when using BN (supposing that causalBN is a real causal network). The results are heavily depen-dent on the structure of BN and arc directions. Some of themwere pointing towards 5.3 AGE in constructed BN and there-fore do-operator made 5.3 AGE independent from other var-iables. Thus, it is important, that the BN structure is causalwhen using do-operator. Instead, JLM does not require a fullycausal model, it’s easy to use, which both motivate to verifyhow far the results are from Pearl’s method. JLM seems towork logically when comparing the results with Pearl’s meth-od: if the do-operator finds some causality, JLM does too. Onthe other hand, JLM calculates much lower causal values, asseen from Appendices 6 and 7.

1322 Inf Syst Front (2015) 17:1313–1333

Page 11: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

4.3 The reliability of the results

The Ficora telephone survey conducted in Finland during2011 consists of 3008 participants with ages between 15–79 years. Ficora used 95 % confidence level to report theconfidence intervals per answer percentage distributions andper size of the group. This paper used those questions an-swered by all 3008 participants and thus the group size for thisstudy was also 3008. For that size of the group the confidenceinterval varies between ±1 % and ±1.9 % depending on theanswer distribution (Ficora 2011). A weight factor to adjustthe survey participants’ gender and age distribution to com-mune level in Finland was not used and thus the results reflectthe situation in Finland, but not down to the commune level.

Chapter 4.2 described how some variables and arcs with p-values larger than 5 % were discarded from the network in

order to simplify it without losing relevant relationships andinformation. This together with approximations in the learningprocess resulted in differences between the data and the mod-el. CTF for the final causal BN was 44.52 %, which is ratherlow compared to 100 % for a fully connected network. How-ever, this value does not tell much about the usability of thenetwork. In fact CTF fits better for comparing the networksfrom the same source than being a global measure aboutnetwork quality. A more practical method is to compare thep-values and G-tests calculated from the data source and fromthe model, as done in Appendix 4. Based on this, we can statethat the observed changes in the graph’s slopes in Fig. 4a and bwould be unlikely under the null hypothesis except for threerelationships, namely the paths from 5.3 AGE to 1.11i LET-TER, from 5.3 AGE to 1.10 CALL, and from 5.3 AGE to 1.8uSKYPE (the last is not visible in Fig. 4).

Fig. 3 Causal BN with example CPTs. Black arcs denote direct causaleffect from 5.3 AGE, blue arcs direct causal effect from variables de-scribing perceived importance of communications services, orange arcs

direct causal effects from mediating variables not belonging to focustheme, and green arcs direct causal effects from variables describingusage intensity

Inf Syst Front (2015) 17:1313–1333 1323

Page 12: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

5 Discussion

The thinking behind the interventional approach is that theintervened variable, i.e., individual’s age in this case, causeschanges in the perceived importance and usage intensity ofcertain communications services only if the dependency be-tween them would persist under the right sort of manipulation(of age). Because an individual’s age cannot be manipulated,the results could be interpreted that under similar conditions asin 2011, individuals, when getting older, start to react towardscommunications services as described in the Figs. 4 and 5. Arealistic causal BN may have numerous additional (latent)variables, which can lead to very complex models(Greenland 2010). These latent variables might be missingfrom the model because a) the observational data did notcontain them, b) the model’s granularity always reflects theinterest and focus of the study, and c) because computationalrestrictions exist. In our modeling we did not use hiddenvariable discovery processes nor did we make any similaranalysis from earlier and later years. Therefore, as describedby Greenland (2010, p. 373) ‘some or all of the arrows in the

graph may retain informal causal interpretations; but theymay be causally wrong, and yet the graph can still be correctfor predictive purposes’. Thus, due to the special feature ofindividual’s age as causal variable, a study from only 1 yearand with potential missing latent variables, we could call therelationships, described by the arcs in the model in Fig. 3 asinformal causal relationships and use the model for informalcausal analysis.

The potentially missing mediator variables could includevalue of information as a function of time, different socialcircles, asymmetry in equipment, and adaptation of new tech-nical innovations (see, e.g., Grinter and Palen 2002). Othervariables potentially affecting the targets include networkexternalities (Courcoubetis and Weber 2003) and communi-cation technology affordances (Karikoski 2013). Because weare dealing with communications services, different socialcircles might be accessible with different services. In generalthe value (importance) of a service to a user increases the moreother people are using it (network externalities), and so peoplemight simply not use a given service because their contacts arenot using it. Although the data contain high level information

Fig. 4 Left (Fig. 4a): Conditionalmean of perceived importance ofcommunications services givenmean of different age ranges.Right (Fig. 4b): Conditional meanof perceived importance ofcommunications services givendifferent perceived importancelevels of Facebook & Twitter

Fig. 5 Left (Fig. 5a): Conditionalmean of active usage ofcommunication services withmobile phone given mean ofindividual’s age in each agerange. Right (Fig. 5b):Conditional mean of active usageof communication services withmobile phone given eachperceived importance level ofFacebook & Twitter

1324 Inf Syst Front (2015) 17:1313–1333

Page 13: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

about Internet connectivity and equipment, the equipmentasymmetry might still exist which potentially affects theobserved results. Karikoski (2013) has identified that differentcommunications services have different affordances, whichleads to them being used in different ways. For instance,SMSs seem to afford personal and discreet communication,whereas IM affords interactive group communication withmultimedia content. Therefore, these services are not neces-sarily direct substitutes of each other. Thus, when interpretingthe results, the reader should remember that potential impor-tant mediating variables might have not been taken into ac-count in the data collection phase.

Several direct and indirect relationships between age, per-ceived importance of a communications service, and usageintensity of a communications service were identified in thispaper. In general older people seem to perceive all services(except the letter) as less important than the younger ones.Based on the results it is expected that letters and phone callsin general are rather independent from an individual’s age.Moreover, Internet access at home seems to be a centralmediator between age and many services, which is expectedas some of the services require Internet connectivity for usage.

Age has a direct relationship only with the perceived impor-tance of Facebook & Twitter among all communications ser-vices, whereas Facebook & Twitter act as a mediator towardsthe importance of e-mail, SMS, and IM. The perceived impor-tance of Facebook & Twitter increases the usage intensity ofIM, e-mail, Facebook & Twitter, and even SMS. This is inter-esting as it can be speculated that the OTT (Over-the-Top)services are cannibalizing the services traditionally providedby mobile operators, such as SMS. This result indicates thatalthough an individual might perceive Facebook & Twitter asimportant it does not directly materialize in lower usage of SMSand the consequent cannibalization of operator’s revenues.

Causal dependency can be used for controlling. The con-trolling with age means two things in practice (as age itselfcannot be intervened): 1) planning of communications servicesand devices for age categories, and 2) taking into account thatthe perceived importance is increasing as a function of time ascurrent users are getting older. The special role of Facebook &Twitter types of services (or similar future services) as positivemediators to “boost” the usage of other communications ser-vices should be taken into account by device manufacturerswhile planning the devices. Mobile operators should notice thepositive (informal) causal effect of the perceived importance ofFacebook & Twitter on the importance of IM, although thisstudy did not find clear negative causal nor statistical relation-ships between the perceived importance of Facebook & Twit-ter, and the usage intensity of SMS.

Based on the concise comparison test between Pearl’s do-operator and Jouffe’s Likelihood Matching method, the latterseems to fit well to initial or non-scientific practical causalanalysis which does not require high degree of accuracy.

6 Conclusion

We studied 1) the dependency between individual’s age andperceived importance of communications services, 2) the de-pendency between age and usage intensity of communicationsservices, and 3) the dependency between perceived importanceof Facebook & Twitter and perceived importance of othercommunications services 4) the dependency between perceivedimportance of Facebook & Twitter and usage intensity ofcommunications services, and 5) differences in causal resultsregarding Pearl’s do calculus and Jouffe’s LikelihoodMatching. The services analyzed were phone call, SMS, IM,e-mail, Internet forums and community services such asFacebook & Twitter, and traditional letter. The study used atelephone survey dataset conducted by Ficora during 2011consisting of 3008 participants in Finland. The results are basedon constructed Bayesian Networks (BN) and causal BN, usingdifferent unsupervised learning algorithms for the survey data.

A six phased workflow was documented and utilized tocreate a BN and causal BN from the survey data. MaximumWeight Spanning Tree learning algorithm and Node Forcemetrics were utilized to test qualitatively the centrality ofindividual’s age with regards to the answers to all the ques-tions in the survey. This test can be seen as a gate to further in-depth quantitative analysis. Four heuristic unsupervised learn-ing algorithms were tested, each with different structural co-efficient parameters. The BN used in causal analysis waslearned with Taboo Order because it produced the lowestMinimum Description Length score (MDL score) from Ta-boo, Taboo Order, EQ and SopLEQ. The causality analysiswas conducted by adapting Pearl’s do(Xi=xi) calculus andresults were presented as the most probable mean value ofcaused variable given each state of the causing variable as wellas a delta between most probable maximum and minimummean. The calculations show a dependency (non-linear) be-tween individual’s age and the perceived importance of com-munications services. The communications services are be-coming less important as the age increases, except for phonecall and letter, for which no age dependencies exist. The samephenomenon was visible in the usage intensities of SMS, e-mail, and Facebook & Twitter. Also clear dependencies werefound between the perceived importance of Facebook &Twitter and perceived importance of other services (exceptletter and phone call). These results sound natural and expect-ed. On the other hand the perceived importance and usageintensity of Facebook & Twitter is the only variable which canbe stated to be directly caused by an individual’s age.Facebook & Twitter in turn acts as a central mediator variabletowards the perceived importance levels of other communica-tions services and also noteworthy is the central mediator roleof Internet access at home between age and many services.The dedicated role of Facebook & Twitter types of servicesshould be taken into account by device manufacturers while

Inf Syst Front (2015) 17:1313–1333 1325

Page 14: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

planning their devices for target users as well as mobileoperators and media companies while planning their serviceofferings and pricing strategies. For example, manufacturersshould keep traditional devices with phone call and SMScapabilities in their portfolio to serve a portion of currentseniors with ages above 60 years, who are not motivated tostart using IM or Facebook & Twitter types of services.Similarly operators should tailor their service bundles andprices so that the age is one criterion for the bundles. Forexample, a senior package might contain only very limitedusage of just phone calls and SMS.

This study did not focus in latent variable analysis inconstruction of a causal BN. Some potential latent variableswere discussed, which may act as mediators or confoundingvariables between individual’s age and perceived importanceand usage intensity of communications services. Due to lackof hidden variable discovery process in the study, the specialrole of age as a causing variable and the fact that the studyused data from one year, the discovered causal relationships(following the thinking of Greenland (2010)) should in fact becalled as informal causal relationships.

Acknowledgments The work has been supported by the MoMIE pro-ject of Aalto University and the Future Internet Graduate School FIGS.

Appendices

Appendix 1

Actual survey questions translated from Finnish by the au-thors. The coding in all the themes is as follows:Code/Question/Answer option(s) related to the theme.

Theme 1: Importance and usage intensity of communicationsservices

1.1 LANDLINE/Do you have a landline in your household?/1) Yes, 2) No

1.2 MOBILEPHONE/Do you have a personal mobilephone voice subscription?/1) Yes, 2) No

1.3u SMS, 1.4u EMAIL, 1.5u FB/Which of the followingmobile communications services have you used or are usingactively? Say yes to all relevant./1) SMS, 2) E-mail, 3) Inter-net communication, such as Facebook, Twitter or Messenger

1.6u INTERNET/To which of the following do you useInternet access during free time? Say yes to all relevant./1) Forstaying in touch or communication (e-mail, Facebook,Messenger)

1.7 MOREBANDWIDTH/Do you especially need a fastconnection for some of the following? Say yes to all relevant./1) For staying in touch or communication (e-mail, Facebook,Messenger

1.8u SKYPE/How often do you use Internet-based voiceservices, for example, Skype or Messenger calls?/1) Daily, 2)Weekly, 3) Occasionally, 4) Not at all, 5) Cannot say

1.9u IM/How often do you use Instant Messaging, such asMessenger?/1) Daily, 2)Weekly, 3) Occasionally, 4) Not at all,5) Cannot say

1.10i CALL, 1.3i SMS, 1.11i LETTER, 1.4i EMAIL,1.5i FB, 1.9i IM/Next I’ll list some communications services.Please describe how important each of them is to you, by usingthe following rating scale (Telephone, SMS, Traditional letter,E-mail, Different forums and communities in Internet, InstantMessaging)./1) So important to me that I cannot live without it,2) Rather important to me, 3) Not very important to me, 4)Hardly important to me, I could live without, 5) Don’t know

Theme 2: Importance of how to follow daily news in generaland from Internet

2.1/Fromwhich of the following do you mostly follow currentnews during free time?/1) TV or teletext, 2) Printed newspa-per, 3) Internet, 4) Radio, 5) Some other media, 6) I don’tfollow current news at all, 7) I cannot say

2.2a-2.2d/If you follow current news from the Internet,which are the most important ways? Say yes to all relevant./a) Watching TV program type of news from TV channels’web pages (Yle Areena, MTV3 Katsomo), b) Reading text-based news, for example, from newspapers’, TV-channels’ ordigital newspapers’web pages (HS.fi, Yle.fi, Uusisuomi.fi), c)Using services enabled by social media (Twitter, Facebook,blogs), d) I cannot say

2.3u BROWSING, 2.4u VIDEO&MUSIC/Which of thefollowing mobile communications services have you used orare using actively? Say yes to all relevant./1) Informationbrowsing or reading news in the Internet, 2) Watching videosand listening to music in the Internet

2.5, 2.6u TV/To which of the following do you use Internetaccess during free time? Say yes to all relevant./1) Informationbrowsing, 2) Watching videos or TV programs

2.7, 2.8/Do you especially need a fast connection for someof the following? Say yes to all relevant./1) Informationbrowsing, 2) Watching videos or TV programs

2.9/Would you be willing to separately pay for Internetservices, such as digital newspapers and/or games?/1) Youalready are paying for such services, 2) You could pay or havealready thought about paying, 3) You would not pay, 4) Youcannot say

Theme 3: Fixed to mobile Internet convergence

3.1a-3.1 h/If you do not have an Internet connection, whyhaven’t you acquired it? Say yes to all relevant./a) You don’tneed it at home, you can use it elsewhere, b) Installation andsubscription is difficult, c) Usage is difficult, d) It is expensive,

1326 Inf Syst Front (2015) 17:1313–1333

Page 15: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

e) No need for it, f) You don’t feel that the usage is safe, g)Other reason, what?, h) Cannot say

3.2 INTERNETCONNECTION/Do you have an Inter-net connection in your household?/1) Yes, 2) No

3.3a-3.3e/If you have an Internet connection, what kind ofconnections do you have?/a) Fixed Internet, aimed for usageonly at home (including WLAN, WiMAX, @450 etc.), b)Mobile broadband, which can be used also outside home witha USB dongle or similar, c) Fixed Internet but not broadband,e.g., ISDN, d) Other mobile Internet subscription (not broad-band), e) Cannot say

3.4a-3.4f/If you have a fixed Internet connection, is it…?/a) Landline based DSL (ADSL, VDSL, SDSL) or a propertyconnection, b) Cable-TV based, c) Optical fiber connection, d)Fixed wireless connection (WiMAX or @450), e) Other,what?, f) Don’t know/Cannot say

3.5/If you have a fixed Internet connection at home, do youhave a WLAN or Wi-Fi connected to it?/1) Yes, 2) No

3.6/If you have multiple Internet connections, which one isyour primary or most used connection?/1) Landline basedDSL (ADSL, VDSL, SDSL) or a property connection, 2)Cable-TV based, 3) Optical fiber connection, 4) Fixed wire-less connection (WiMAX tai @450), 5) Other, what?, 6)Don’t know/Cannot say

3.7/How satisfied are you in terms of your primary Internetconnection?/1) Very satisfied, 2) Rather satisfied, 3) Not sat-isfied nor unsatisfied, 4) Rather unsatisfied, 5) Very unsatis-fied, 6) Cannot say

3.8/If you have an Internet connection, what is the nominalspeed of the connection (i.e., the speed mentioned in market-ing and your subscription)?/1) Below 1Mbps, 2) 1 Mbps, 3) 2Mbps, 4) 8 Mbps, 5) 24 Mbps, 6) 100 Mbps or more, 7)Cannot say

3.9/If you have an Internet connection, would you say thatthe speed is…?/1) Suitable, 2) You could get along with aslower one, 3) You would need a faster one, 4) Cannot say

3.10/If you need a faster connection, which of the follow-ing broadband speeds would fit to your needs?/1) 512 Kbps,2) 1Mbps, 3) 2Mbps, 4) 8Mbps, 5) 24Mbps, 6) 100Mbps ormore, 7) Cannot say

3.11/If you have an Internet connection and you would liketo change the speed of it, are there suitable options available inthe market?/1) Yes, 2) No

3.12a-3.12e/If you have mobile broadband, what kind ofterminals do you use it with?/a)Mobile phone (terminal whichis used also for voice), b) Desktop computer (terminal notaimed to be mobile), c) Laptop computer, d) Tablet or similareasy to carry terminal, e) None of the alternatives/Cannot say

3.13a-3.13 g/If you use a laptop computer or tablet withyour mobile broadband, where is it used?/a) At home, b) Atwork or in school, c) While on the go or travelling, d) At yourrecreational house, e) Abroad, f) Somewhere else, g) None ofthe earlier, h) Cannot say

3.14/If you have mobile broadband, what was the primaryreason for acquiring it?/1) Affordable price, 2) Mobility, i.e., apossibility to use Internet independently of location, 3) Con-nection speed, 4) It was acquired as part of a service bundle,e.g., fixed plus mobile, 5) No other options were available, 6)No special reason, 7) Cannot say

3.15/If you have a fixed Internet connection or no Internetconnection at all, would you get along only with a mobilebroadband connection?/1) Yes, 2) No

3.16/If you don’t have a fixed Internet connection current-ly, have you had it at some point in time in your household?/1)Yes, 2) No

3.17/If you have had a fixed Internet connection at somepoint in time in your household, what was the primary reasonfor giving it up?/1) Expensive price, 2) Low usage, 3) Slowconnection, 4) Fixed Internet connections were no more avail-able, e.g., due to moving out or termination of the agreement,5) You were not satisfied with the quality of service, e.g., dueto customer service or lost connections, 6) Other, what?, 7)None of them, 8) Cannot say

3.18a-3.18 h/Do you especially need a fast connection forsome of the following? Say yes to all relevant./a) Digitaltransactions, b) Listening to music, c) Computer gaming, d)Down- and uploading of programs and other large files, e)Remote work or study, f) Other, what?, g) Connection speed isnot important to me, h) Cannot say

Theme 4: Contract types with service provider, Internet accessspeed versus monthly payments

4.1/If you have a mobile phone subscription, what is theprimary contract validity period?/1) Continued until furthernotice, 2) Terminable, a year or below (original duration ofcontract), 3) Terminable, over a year (original duration ofcontract), 4) Terminable (length not remembered), 5) Paidby employer (e.g., phone benefit), 6) Cannot say

4.2/If you have an Internet connection subscription, what isthe primary contract validity period?/1) Continued until fur-ther notice, 2) Terminable, a year or below (original durationof contract), 3) Terminable, over a year (original duration ofcontract), 4) Terminable (length not remembered), 5) Paid byemployer (e.g., phone benefit), 6) Cannot say

4.3/If you have a mobile Internet subscription (mobilebroadband or other), how do you pay for the data transmis-sion?/1) Fixed monthly fee, unlimited usage, 2) Fixed month-ly fee with decreasing speeds above a certain limit, 3) Fixedmonthly fee with extra fees charged above a certain limit, 4)Charging based on usage amount, 5) Employer pays, 6) Other,7) Cannot say

4.4/If a limit for data transmission exists, how many giga-bytes of unlimited data is included in the fixed priced part ofthe monthly fee?/1) Below 1 GB, 2) 1 GBs, 3) 5 GBs, 4) 20GBs, 5) 50 GBs or more, 6) Cannot say

Inf Syst Front (2015) 17:1313–1333 1327

Page 16: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

4.5/If you have an Internet connection subscription, what isthe monthly fee of your primary Internet connection?/1) 10Euros or less, 2) 10.01-15 Euros, 3) 15.01-20 Euros, 4) 20.01-25 Euros, 5) 25.01-30 Euros, 6) 30.01-35 Euros, 7) 35.01-40Euros, 8) 40.01-45 Euros, 9) 45.01-50 Euros, 10) 50.01-60Euros, 11) 60.01-70 Euros, 12) 71.01-80 Euros, 13) 81.01-90Euros, 14) Over 90 Euros, 15) Cannot say

Theme 5: Demographic questions

5.1 GENDER/What is your gender?/1) Female, 2) Male5.2 COMMUNES/In which commune are you currently

living?/Commune number5.3 AGE/What is your age?/Number of years5.4 RESIDENCE/Is your home located…?/1) In a rural

area, 2) In an urban area, 3) In a city center, 4) Cannot say5.5 LIFESITUATION/Are you at the moment…?/1) A

student, 2) Working, 3) A pensioner, 4) Unemployed or laid-off, 5) Other, 6) Cannot say

5.6 HOUSEHOLD/How many individuals belong to yourhousehold including yourself?/Number of individuals

Theme 6: Other questions

6.1u TRANSACTIONS, 6.2u MUSIC, 6.3u GAMES, 6.4uFILETRANSFER, 6.5u REMOTEWORK, 6.6/To whichof the following purposes do you use the Internet during freetime? Say yes to all relevant./1) Digital transactions, 2) Lis-tening to music, 3) Computer gaming, 4) Down- anduploading of programs and other large files, 5) Remote workor study, 6) Other, what?

6.8/Do you have a landline, an Internet connection (withany technique/way), or a personal mobile phone voice sub-scription in your household?/Cannot say

6.9, 6.1/Which of the following mobile services have youused or are using actively? Say yes to all relevant./1) None ofthe listed, 2) Cannot say

Appendix 2

NF of survey questions, calculated from an MWST learnedBN. Red cells denotes 10 highest NF values. The valuescorrespond to the BN in Fig. 2.

Questions NF Questions NF Questions NF Question NF

5.3 AGE 1.839 6.4u FILETRANSFER 0.266 3.9 0.073 3.13d 0.021

3.18g 0.84 4.3 0.257 3.13a 0.071 3.17 0.021

1.5i FB 0.837 3.18a 0.256 3.4f 0.068 2.2c 0.019

5.5 LIFESITUATION 0.815 1.4u EMAIL 0.255 2.5 0.068 5.2 COMMUNES 0.018

1.3u SMS 0.73 2.1 0.225 1.1 LANDLINE 0.066 2.2b 0.017

1.9i IM 0.723 3.3b 0.218 6.1u TRANSACTION 0.06 3.3d 0.016

1.4i EMAIL 0.711 3.18c 0.215 4.4 0.059 3.13f 0.014

3.3a 0.673 3.6 0.213 2.9 0.055 3.11 0.014

2.3u BROWSING 0.657 2.4u VIDEO&MUSIC 0.187 5.1 GENDER 0.053 2.2a 0.014

2.7 0.65 3.1 0.186 1.11i LETTER 0.044 3.1g 0.012

1.9u IM 0.596 1.7 MOREBANDWITH 0.183 3.4c 0.042 3.1f 0.011

1.3i SMS 0.561 6.7 0.17 3.12a 0.037 3.12d 0.01

6.9 0.515 6.5u REMOTEWORK 0.168 3.4d 0.036 1.2 MOBILEPHONE 0.009

3.2 INTERNETCON 0.481 3.4b 0.163 3.1b 0.035 3.13g 0.008

3.4a 0.474 6.3u GAMES 0.147 5.4 RESIDENCE 0.035 3.12e 0.007

6.2u MUSIC 0.469 1.8u SKYPE 0.139 3.1e 0.034 3.7 0.006

2.8 0.458 3.18b 0.132 3.18f 0.033 2.2d 0.006

4.6 0.454 1.10i CALL 0.127 3.3e 0.033 6.1 0.005

3.18d 0.409 3.15 0.123 3.5 0.032 6.8 0.005

2.6u TV 0.401 3.13c 0.118 6.6 0.029 3.1a 0.004

3.8 0.372 1.6u INTERNET 0.109 3.18h 0.028 4.5 0.004

1.5u FB 0.352 3.12b 0.099 3.13e 0.027 3.4e 0.004

5.6 HOUSEHOLD 0.281 3.12c 0.086 3.1c 0.027 3.3c 0.003

4.2 0.274 3.16 0.078 3.1d 0.026 3.1h 0.002

3.18e 0.273 3.4 0.076 3.13b 0.021 2.2e 0.001

1328 Inf Syst Front (2015) 17:1313–1333

Page 17: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

Appendix 3

Appendix 4

Table 2 Predefined states for perceived importance and usage intensity of interpersonal communications services

Perceived importance of communicationsservices, questions 1.3i, 1.4i, 1.5i, 1.9i, 1.10i,1.11i

Scale Usage Intensity of communicationsservices, questions 1.8u Skype, 1.9uIM

Scale Usage Intensity of communicationsservices, questions 1.3u SMS, 1.4uEMAIL, 1.5u FB

Scale

So important to me that I cannot live withoutit

4 Daily 4 Have been used or are using actively withmobile phone

1

Rather important to me 3 Weekly 3 Not have been used or are not using activelywith mobile phone

0

Not very important to me 2 Occasionally 2

Hardly important to me, I could live without 1 Not at all 1

Don’t know Cannot say

Table 3 The p-values and other statistical metrics for studied relationships

X - >Y p-valuemodel

p-valuedata

Mutualinformation

G-testmodel

G-testdata

Degree of freedommodel

Degree of freedomdata

5.3 AGE - >1.11i LETTER 96.84 % 0.00 % 0.002 7.24 65.239 16 16

5.3 AGE - >1.10i CALL 39.02 % 0.03 % 0.004 16.929 42.959 16 16

5.3 AGE - >1.8u SKYPE 37.58 % 0.00 % 0.005 21.368 74.044 20 20

1.5i FB - >1.11i LETTER 0.20 % 0.00 % 0.009 37.149 83.449 16 16

1.5i FB - >1.10i CALL1 0.00 % 0.00 % 0.015 62.721 72.787 16 16

5.3 AGE - >1.9u IM 0.00 % 0.00 % 0.021 87.271 543.593 20 20

5.3 AGE - >1.3u SMS 0.00 % 0.00 % 0.024 100.45 287.484 8 8

5.3 AGE - >1.3i SMS 0.00 % 0.00 % 0.036 148.054 305.51 16 16

1.5i FB - >1.3u SMS 0.00 % 0.00 % 0.036 149.327 217.948 8 8

1.5i FB - >1.4u EMAIL 0.00 % 0.00 % 0.053 222.443 245.465 8 8

5.3 AGE - >1.4u EMAIL 0.00 % 0.00 % 0.056 231.774 263.564 8 8

5.3 AGE - >1.9i IM 0.00 % 0.00 % 0.061 256.431 595.372 16 16

1.4i EMAIL - >1.4uEMAIL

0.00 % 0.00 % 0.073 303.332 423.695 8 8

1.5i FB ->1.9u IM 0.00 % 0.00 % 0.098 408.493 407.22 20 20

1.5i FB - >1.5u FB 0.00 % 0.00 % 0.099 412.824 412.824 8 8

1.5i FB - >1.3i SMS 0.00 % 0.00 % 0.1 417.26 417.26 16 16

5.3 AGE - >1.4i EMAIL 0.00 % 0.00 % 0.107 447.928 669.34 16 16

5.3 AGE - >1.5u FB 0.00 % 0.00 % 0.136 566.124 566.124 8 8

5.3 AGE - >1.5i FB 0.00 % 0.00 % 0.175 730.169 730.169 16 16

1.3i SMS - >1.3u SMS 0.00 % 0.00 % 0.211 879.188 879.188 8 8

1.5i FB - >1.4i EMAIL 0.00 % 0.00 % 0.274 1143.469 1143.469 16 16

1.5i FB - >1.9i IM 0.00 % 0.00 % 0.334 1390.689 1390.690 16 16

Inf Syst Front (2015) 17:1313–1333 1329

Page 18: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

Appendix 5

1330 Inf Syst Front (2015) 17:1313–1333

Fig. 6 Directed Acyclic Graphs (DAG) from two learning algorithms, MWST (upper part) and EQ (lower part) using 100 survey questions as variableswith SC = 1

Page 19: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

Appendix 6

Appendix 7

References

Akaike, H. (1973). Information theory and an extension of the maximumlikelihood principle. In B. N. Petrov & F. Csaki (Eds.), Secondinternational symposium on information theory (pp. 267–281).Budapest: Academiai Kiado.

Anderson, J., & Gerbing, D. (1988). Structural equation modeling inpractice: A review and recommended two-step approach.Psychological Bulletin, 103(3), 411–423.

Barber, D. (2012). Bayesian reasoning and machine learning. NewYork:Cambridge University Press.

Bayesialab (2013). Software version 5.2. http://www.bayesia.com/.Accessed 3 Jun 2013.

Bayesialab library (2013a). Score-based learning algorithms. http://library.bayesia.com/display/FAQ/Score-Based+Learning+Algorithms. Accessed 3 Jun 2013.

Bayesialab library (2013b). CTF and Deviance Formulas. http://library.bayesia.com/display/FAQ/CTF+and+Deviance+Formulas.Accessed 3 Jun 2013.

Table 4 Comparison of causal strengths: Pearl’s do-calculus using causal BN, the same using BN and Jouffe’s Likelihood Matching using BN. Thenumerical value is difference between maximum and minimum conditional mean of target value given mean of each range of AGE and 1.5iFB

AGE−1.5iFB

AGE−1.4iEMAIL

AGE−1.9iIM

AGE−1.3iSMS

AGE−1.10iCALL

AGE−1.11iLETTER

AGE−1.5uFB

AGE−1.4uEMAIL

Pearl do-oper.; causalBN

1.14 1.1 0.61 0.6 0.14 0.04 0.43 0.35

Jouffe’s LM; BN 0.62 0.5 0.25 0.1 0.02 0 0.26 0.12Pearl do-oper.; BN 0 0 0 0 0 0 0.41 0.2

AGE−1.3uSMS

AGE−1.9uIM

1.5iFB−1.91IM

1.5iFB−1.4iEMAIL

1.5iFB−1.3iSMS

1.5iFB−1.10iCALL

1.5iFB−1.11iLETTER

Pearl do-oper.; causalBN

0.2 0.2 1.46 1 0.73 0.2 0.06

Jouffe’s LM; BN 0.06 0.01 0.79 0.94 0.24 0.06 0.02

Pearl do-oper.; BN 0 0.04 1.44 0 0 0 0

Fig. 7 Comparison of causalstrengths: Pearl’s do-calculususing causal BN (denoted as doCausal with solid line), the sameusing BN (denoted as do withdotted line) and Jouffe’sLikelihood Matching (denoted asDE with dotted line) using BN asconditional mean of target valuegiven mean of each range of AGEand 1.5iFB

Inf Syst Front (2015) 17:1313–1333 1331

Page 20: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

Bouckaert, R. (2008). Bayesian Network Classifiers in Weka for Version3–5–7. New Zealand: University of Waikato. http://www.cs.waikato.ac.nz/~remco/weka.bn.pdf. Accessed 3 Jun 2013.

Chow, C. K., & Liu, C. N. (1968). Approximating discrete probabilitydistributions with dependence trees. IEEE Transactions onInformation, 14(3), 462–467.

Conrady, S., & Jouffe, L. (2011). Causal inference and direct effects.Conrady Applied Science, LLC. http://www.conradyscience.com/images/white_papers/causal_inference_v16.pdf. Accessed 3 Jun2013.

Courcoubetis, C., &Weber, R. (2003).Pricing communication networks -Economics, technology and modeling. New York: Wiley.

Dawid, P. (2010). Seeing and doing: The Pearlian synthesis. In R.Dechter, H. Geffner, & J. Y. Halpern (Eds.), Heuristics, probabilityand causality—A tribute to Judea pearl (pp. 309–325). London:College Publications.

de Bailliencourt, T., Beauvisage, T., Granjon, F., & Smoreda, Z. (2011).Extended Sociability and Relational Capital Management:Interweaving ICTs and social relations. In R. Ling & S. Campbell(Eds.), Mobile communication: Bringing us together or tearing usapart? (pp. 151–179). Transaction Publishers: New Brunswick.

Elwert, F. (2013). Graphical causal models. S. L. Morgan (Ed.),Handbook of Causal Analysis for Social Research, Handbooks ofSociology and Social Research, doi:10.1007/978-94-007-6094-313, © Springer ScienceCBusiness Media Dordrecht 2013.

Ficora (2011). The Consumer survey on communications services (inFinnish). Finnish Communications Regulatory Authority. https://www.viestintavirasto.fi/attachments/Viestintapalvelujen_kuluttajatutkimus_2011.pdf. Accessed 3 Jun 2013.

Fox, S. (2001). Wired seniors. Report, Pew Internet & AmericanLife Project. http://www.pewinternet.org/~/media//Files/Reports/2001/PIP_Wired_Seniors_Report.pdf.pdf. Accessed 3Jun 2013.

Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian networkclassifiers. Machine Learning, 29(2–3), 131–163.

Gelman, A., & Hill, J. (2006). Data analysis using regression andmultilevel/Hierarchical models (1st ed.). New York: CambridgeUniversity Press.

Geng, Z., & Li, G. (2002). Conditions for non-confounding and collaps-ibility without knowledge of completely constructed causal dia-grams. Scandinavian Journal of Statistics, 29(1), 169–181.

Gerpott, T. J., Thomas, S., & Weichert, M. (2012). Usage of establishedand novel mobile communication services: substitutional, indepen-dent or complementary? Information Systems Frontiers (OnlineFirst Collection).

Greenland, S. (2010). Overthrowing the tyranny of null hypotheseshidden in causal diagrams. In R. Dechter, H. Geffner, & J. Y.Halpern (Eds.), Heuristics, probability and causality—A tribute toJudea pearl (pp. 365–382). London: College Publications.

Greenland, S., & Brumback, B. (2002). An overview of relations amongcausal modelling methods. International Journal of Epidemiology,31(5), 1030–1037.

Grinter, R. E., & Palen, L. (2002). Instant messaging in teen life. InProceedings of the 2002 ACM conference on Computer supportedcooperative work (CSCW’02), pp. 21–30.

Grotzer, T. A., & Perkins, D. N. (2000). A taxonomy of causalmodels: The conceptual leaps between models and students’reflections on them. Paper presented at the NationalAssociation of Research in Science Teaching AnnualInternational Conference (NARST 2000).

Grünwald, P. D. (2007). The minimum description length principle.Cambridge: MIT Press.

Hagmayer, Y., Sloman, S., Lagnado, D., & Waldmann, M. R.(2007). Causal reasoning through intervention. In A. Gopnik& L. Schulz (Eds.), Causal learning (pp. 86–100). NewYork: Oxford University Press.

Howard, P. E. N., Rainie, L., & Jones, S. (2001). Days and nights on theinternet: the impact of a diffusing technology. American BehavioralScientist, 45(3), 383–404.

Hyun, U. K., Kim, T. Y., & Lee, S. Y. (2011). Framework for networkmodularization and Bayesian network analysis to investigate theperturbed metabolic network. BMC Systems Biology, 5(Suppl 2),S14.

Jouffe, L., & Munteanu, P. (2001). New search strategies for learningbayesian networks. In Proceedings of the Tenth InternationalSymposium on Applied Stochastic Models and Data Analysis(ASMDA 2001), pp. 591–596.

Karikoski, J. (2013). Empirical analysis of mobile interpersonal commu-nication service usage. Dissertation, Aalto University.

Kekolahti, P. (2011). Using Bayesian Belief Networks for Modeling ofCommunication Service Provider Businesses. Paper presented at the8th Bayesian Modelling Applications Workshop.

Kekolahti, P., & Karikoski, J. (2013). Analysis of mobile service usagebehaviour with bayesian belief networks. Journal of UniversalComputer Science, 19(3), 325–352.

Lam, W., & Bacchus, F. (1994). Learning Bayesian Belief Networks: anapproach based on the MDL principle. Computational Intelligence,10(3), 269–293.

Lee, K. C., & Choi, D. Y. (2011). A Bayesian Network-BasedManagement of Individual Creativity: Emphasis on SensitivityAnalysis with TAN. In Ngoc Thanh Nguyen, Chong-Gun Kim, &Adam Janiak (Eds.), Intelligent Information and Database Systems.Third International Conference, ACIIDS 2011, Daegu, Korea, April2011, Proceedings, Part II.

Munteanu, P. (2001). The EQ Framework for Learning EquivalenceClasses of Bayesian Networks. In Proceedings of the 2001 I.E.International Conference on Data Mining (ICDM-01), pp. 417–424.

Nadkarni, S., & Shenoy, P. (2004). A causal mapping approach toconstructing Bayesian networks. Decision Support Systems,38(2004), 259–281.

Oxford Dictionaries (2013). http://oxforddictionaries.com/. Accessed 3Jun 2013.

Pearl, J. (2009).Causality (2nd ed.). NewYork: CambridgeUniversity Press.Pilling, D., & Barrett, P. (2008). Text communication preferences of deaf

people in the United Kingdom. Journal Deaf Studies and DeafEducation, 13(1), 92–103.

Rissanen, J. (1978). Modeling by shortest data description. Automatica,14(465–471), 1978.

Rogers, E. M. (2003). Diffusion of innovations (5th ed.). New York: FreePress.

Rosenbaum, P., & Rubin, D. (1983). The central role of the propensityscore in observational studies for causal effects. Biometrika, 70(1),41–55. Printed in Great Britain.

Schwarz, G. (1978). Estimating the dimension of the model. The Annalsof Statistics, 6(2), 461–464.

Scutari, M. (2010). Learning Bayesian Networks with the bnlearn RPackage. Journal of Statistical Software, 35(3), 1–22.

Shamilov, A., Asan, Z., & Giriftinoglu, C. (2006). Estimation byMinxEnt Principle. In Proceedings of the 9th WSEASInternational Conference on Applied Mathematics (WSEAS2006), pp. 436–440.

Sweeney, J. C., & Soutar, G. N. (2001). Consumer perceived value: thedevelopment of a multiple item scale. Journal of Retailing, 77(2),203–220.

Tenenhaus, M., Vinzi, V., Chatelin, Y., Lauro, C. (2005). PLS path model-ing. Computational statistics & data analysis 48.1 (2005) 159-205.

Thayer, S. E., & Ray, S. (2006). Online communication preferencesacross age, gender, and duration of internet use. Cyberpsychologyand Behavior, 9(4), 432–440.

Verkasalo, H., López-Nicolás, C., Molina-Castillo, F. J., & Bouwman, H.(2010). Analysis of users and non-users of smartphone applications.Telematics and Informatics, 27(3), 242–255.

1332 Inf Syst Front (2015) 17:1313–1333

Page 21: The effect of an individual’s age on the perceived importance and usage intensity of communications services — A Bayesian Network analysis

Weirich, P. (2012). Causal decision theory. In E. N. Zalta (Ed.), TheStanford encyclopedia of philosophy. Stanford University. http://plato.stanford.edu/archives/win2012/entries/decision-causal/.Accessed 3 Jun 2013.

Woodward, J. (2013). Causation and manipulability, the stanford encyclope-dia of philosophy (Winter 2013 Edition), E. N. Zalta (Ed.). URL:http://plato.stanford.edu/archives/win2013/entries/causation-mani/.

Xi, J. (2013). Polytopes arising from binary multi-way contingency tablesand characteristic Imsets for Bayesian networks. Theses andDissertations–Statistics. Paper 5. http://uknowledge.uky.edu/statistics_etds/5.

Yang, Z., & Peterson, R. T. (2004). Customer perceived value, satisfac-tion, and loyalty: the role of switching costs. Psychology &Marketing, 21(10), 799–822.

Yun, Z., & Keong, K. (2004). Improved MDL score for learningof Bayesian networks. In Proceedings of the InternationalConference on Artificial Intelligence in Science andTechnology (AISAT 2004).

Pekka Kekolahti Lic.Sc. (Tech.), is a postgraduate student at theDepartment of Communications and Networking, Aalto University,Finland. His research interest is in modeling of variety of complextelecommunications business related phenomena using Bayesian Net-works. Pekka Kekolahti holds a M.Sc. and Lic.Sc. (Tech.) fromHelsinki University of Technology.

Juuso Karikoski holds a D.Sc (Tech.) from Aalto University, Finland.His doctoral thesis focused on the empirical analysis of mobile interper-sonal communication service usage. Currently he works as an advisor inAalto University.

Antti Riikonen M.Sc. (Tech.), is a postgraduate student at the Depart-ment of Communications andNetworking, Aalto University, Finland. Hisresearch interests are related to diffusion and usage of mobile devices andservices. Antti Riikonen holds a M.Sc. (Technology) from HelsinkiUniversity of Technology.

Inf Syst Front (2015) 17:1313–1333 1333