This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Information Sciences 486 (2019) 240–253
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
A circular-linear dependence measure under Johnson–Wehrly
distributions and its application in Bayesian networks
Ignacio Leguey
a , b , ∗, Pedro Larrañaga
b , Concha Bielza
b , Shogo Kato
c
a Economía Financiera y Contabilidad e Idioma Moderno department, Facultad de Ciencias Jurídicas y Sociales, Universidad Rey Juan
Carlos de Madrid, Spain b Artificial Intelligence department, Universidad Politécnica de Madrid, Campus de Montegancedo, 28660 Boadilla del Monte, Madrid,
Spain c Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan
a r t i c l e i n f o
Article history:
Received 17 August 2017
Revised 17 January 2019
Accepted 21 January 2019
Available online 21 February 2019
2010 MSC:
62H11
94A17
Keywords:
Circular-linear mutual information
Tree-structured Bayesian network
Dependence measures
Directional statistics
a b s t r a c t
Circular data jointly observed with linear data are common in various disciplines. Since
circular data require different techniques than linear data, it is often misleading to use
usual dependence measures for joint data of circular and linear observations. Moreover,
although a mutual information measure between circular variables exists, the measure has
drawbacks in that it is defined only for a bivariate extension of the wrapped Cauchy distri-
bution and has to be approximated using numerical methods. In this paper, we introduce
two measures of dependence, namely, (i) circular-linear mutual information as a measure
of dependence between circular and linear variables and (ii) circular-circular mutual infor-
mation as a measure of dependence between two circular variables. It is shown that the
expression for the proposed circular-linear mutual information can be greatly simplified for
a subfamily of Johnson–Wehrly distributions. We apply these two dependence measures to
learn a circular-linear tree-structured Bayesian network that combines circular and linear
variables. To illustrate and evaluate our proposal, we perform experiments with simulated
data. We also use a real meteorological data set from different European stations to create
a circular-linear tree-structured Bayesian network model.
I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253 241
Fig. 1. Flow chart for the choice of dependence measures. MI for two linear variables, CMI for two circular variables, CLMI for a pair of linear and circular
variables. These measures allow the modelling of a tree-shaped Bayesian network with circular and/or linear variables.
Several models exist for data consisting of circular and linear observations, many of which focus on a circular-linear
regression [7,10,36] . In addition, Mardia and Sutton [27] proposed a bivariate distribution that combines the von Mises
distribution with Gaussian distributions on the cylinder. Abe and Ley [1] proposed the WeiSSVM, a cylindrical distribution
based on combinations of the sine-skewed von Mises distribution and the Weibull distribution. Furthermore, Johnson and
Wehrly [14] presented circular-linear distributions, and proposed a method to obtain a bivariate circular-linear distribution
with specified marginal distributions. In this work, we prove that the conditional distributions of a subfamily of Johnson–
Wehrly family are well known and mathematically tractable.
Many studies have determined measures for the mutual dependence between linear variables [See 23 , 33 , 34, among oth-
ers] . Among them, one of the most well-known is the mutual information measure [4,38] . The latter measure is based on
the similarity between the joint density function and the product of its marginal density functions. To extend this measure
to the circular domain, Leguey et al. [20] developed a circular-circular mutual-information (CMI) measure. However, this
measure is only applicable to circular variables with marginal distributions that follow wrapped Cauchy distributions and
has to be approximated using numerical methods. Therefore, for the case when the two variables are in the circular domain,
we propose a CMI measure with no constraints on the underlying circular distributions. The CMI measure is shown to have
the nice property that it can be expressed in a closed form for a general family of bivariate distributions.
To the best of our knowledge, there are no measures of mutual information for pairs of circular and linear variables.
Such a measure is potentially useful for learning Bayesian networks such as the tree-augmented naive Bayes model (TAN)
[9] , where the dependence between the variables is captured via probabilistic graphical models. Furthermore, the Chow–
Liu algorithm [3] , which is used to learn a TAN and based on the mutual information of all pairs of variables, guarantees
that the graph is a maximum likelihood tree. Therefore, a measure of mutual information for a pair of circular and linear
variables is necessary to allow the presence of linear and circular variables in such models. We address this problem and
propose a circular-linear mutual-information measure (CLMI) of the dependence between a circular variable and a linear
one.
Generally, Bayesian network models that combine linear and circular variables ignore the circular nature and consider all
of them as linear-continuous, such as Gaussian Bayesian networks [see 17 , 37, among others] . Alternatively, it is also com-
mon to discretize [32] the variables (with the related loss of information) to take advantage of a plethora of learning and
inference algorithms for discrete Bayesian network models. In both models, the circular variable is not appropriately han-
dled. In this work, we develop a circular-linear Bayesian network model with a tree structure that captures the relationship
between circular and linear variables. The model is based on our proposed CLMI and CMI measures ( Fig. 1 ), together with
the traditional mutual information of linear variables and the bivariate distribution proposed by Johnson and Wehrly [14] .
The paper is organized as follows. Section 2 reviews the wrapped Cauchy distribution and the Gaussian distribution
as representatives on the circular and linear domains, respectively. Section 3 reviews the angular-linear Johnson–Wehrly
bivariate distribution, and shows that a submodel of their distribution has tractable and well-known conditional distribu-
tions. Section 4 discusses the proposed CMI and CLMI measures. Section 5 applies the measures presented in Section 4 , and
presents the proposed circular-linear tree-structured Bayesian network model, as well as its evaluation in synthetic domains.
Section 6 compares the proposed model to a Gaussian Bayesian network model and a discrete Bayesian network model over
a real-world meteorological data set recorded from meteorological stations located in Europe. Last, Section 7 concludes the
paper and discusses possible avenues for future work.
2. Representative distributions for circular data and for linear data
One of the best-known distributions defined on the circle is the wrapped Cauchy distribution [21] . We use this circular
distribution as the underlying marginal model for the circular data in the proposed Bayesian networks. Similarly, we use the
Gaussian distribution as the underlying distribution for linear data and apply it to our Bayesian networks.
242 I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253
These two distributions are both well known, and share some nice properties. For example, they are both easy to simu-
late, computationally very fast and have tractable bivariate extensions. In addition, the marginal and conditional distributions
of these extensions belong to the same family and are mathematically tractable.
In the following subsections, we review the wrapped Cauchy distribution, Gaussian distribution, and their bivariate ex-
tensions. As known properties related to the bivariate extensions, we discuss their marginal and conditional distributions,
as well as other characteristics relevant to our study. These include the complex-form expression for the wrapped Cauchy
distribution and the expression for the mutual information between two variables with a bivariate Gaussian density.
2.1. Wrapped cauchy distributions
2.1.1. Definitions
A circular random variable � that follows a wrapped Cauchy distribution [21] , denoted by wC ( μ, ɛ ), has the following
We denote ( Z 1 , Z 2 ) distributed as in Eq. (4) as
( Z 1 , Z 2 ) ∼ bC ∗( η1 , η2 , ε 1 , ε 2 , ρ).
2.1.3. Marginals and conditionals
Theorem 1. (Kato and Pewsey [16] ) A random vector ( Z 1 , Z 2 ) with density as in Eq. (4) has marginals Z 1 ∼ C ∗( ε1 η1 ) and
Z 2 ∼ C ∗( ε2 η2 ), and conditionals Z 1 | Z 2 = z 2 ∼ C ∗(−η1 [ A ◦ (z 2 η2 ) q ]) and Z 2 | Z 1 = z 1 ∼ C ∗(−η2 [ A
T ◦ (z 1 η1 ) q ]) , where A is defined
as in Eq. (5) , A
T is the transpose of A , and
A ◦ z =
a 11 z + a 12 .
a 21 z + a 22
I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253 243
2.1.4. Parameter estimation
For the bivariate wrapped Cauchy distribution, the method of moments [2] for the parameter estimation has several
advantages over the method of maximum likelihood [16] . The method of moments for the bivariate wrapped Cauchy distri-
bution is computationally very fast and easier to implement. In addition, the formulae for the parameter estimates can all
be expressed in a closed form, and all estimates are always within a range.
Let { (θ1 j , θ2 j ) j = 1 , . . . , N} be a random sample from a bwC ( μ1 , μ2 , ɛ 1 , ɛ 2 , ρ) ( Eq. (2) ). Then, the estimators obtained
using the method of moments for μ1 , μ2 , ε1 , ε2 and ρ are [16]
ˆ μk = arg( ̄R k ) , ˆ ε k = | ̄R k | , k = 1 , 2 ,
with R k =
1 N
∑ N j=1 exp
(iθk j
), and
ˆ ρ =
1
N
(
| N ∑
j=1
exp
(i (1 j − 2 j )
)| − | N ∑
j=1
exp
(i (1 j + 2 j )
)| )
, (6)
where r j = 2 arctan
(1+ ̂ ε r 1 − ˆ ε r
tan
(θr j − ˆ μr
2
)), r = 1 , 2 .
2.1.5. Simulation
The simulation of data from a univariate wrapped Cauchy distribution can be performed easily. An efficient method is
to use the Möbius transformation over a circular uniform distribution, U (0, 2 π ), as explained in [29] . There is also another
method, already implemented in the “Circular” R package [40] . This method consists of wrapping (i.e., mod 360) from the
simulation of a (linear) Cauchy distribution with location parameter μ and scale parameter −log(ε) . If ε = 1 , the distribution
is a point mass at μ, whereas if ε = 0 , the wrapped Cauchy distribution is distributed as the circular uniform, U (0, 2 π ).
The bivariate wrapped Cauchy distribution has (univariate) wrapped Cauchy marginals and conditionals. Therefore, the
random variates from the bivariate wrapped Cauchy can be generated from two circular uniform U (0, 2 π ) random variates
by using the relationship f (z 1 , z 2 ) = f ( z 1 | z 2 ) f ( z 2 ) (see Theorem 1 ). Here, once f ( z 2 ) is simulated, it is possible to simulate
f ( z 1 | z 2 ) and, hence, to obtain f ( z 1 , z 2 ).
2.2. Gaussian distributions
2.2.1. Definitions
The Gaussian distribution is widely known. Please refer to [13,19,41] for good reviews on this topic.
In this study, we denote a linear random variable X that follows a Gaussian distribution as N ( ι, σ ). Here, ι ∈ R is the
mean parameter and σ ∈ (0, ∞ ) is the variance parameter.
Similarly, we denote a random vector ( X 1 , X 2 ) that follows a bivariate Gaussian distribution as bN ( ι1 , ι2 , σ 1 , σ 2 , γ ). In
this case, ι1 , ι2 ∈ (−∞ , ∞ ) are the mean parameters of the marginals, and σ 1 , σ 2 ∈ (0, ∞ ) are the variance parameters of
the marginal. Then, γ ∈ [ −1 , 1] is the correlation between X 1 and X 2 .
2.2.2. Marginals and conditionals
A random vector ( X 1 , X 2 ) that follows bN ( ι1 , ι2 , σ 1 , σ 2 , γ ) has marginals X 1 ∼ N ( ι1 , σ 1 ) and X 2 ∼ N ( ι2 , σ 2 ) and condition-
als X 1 | X 2 = x 2 ∼ N(ι1 +
σ1 σ2
γ (x 2 − ι2 ) , (1 − γ 2 ) σ 2 1 ) and X 2 | X 1 = x 1 ∼ N(ι2 +
σ2 σ1
γ (x 1 − ι1 ) , (1 − γ 2 ) σ 2 2 ) .
2.2.3. Parameter estimation
Let { (x 1 i , x 2 i ) i = 1 , . . . , N} be a random sample from bN ( ι1 , ι2 , σ 1 , σ 2 , γ ).
For Gaussian distributions, the estimates from the maximum likelihood method and the method of moments of the
parameters ι1 , ι2 , σ2 1 , σ 2
2 and γ are the same, and given by
ˆ ιk =
1
N
N ∑
i =1
x ki , ˆ σk =
√
ˆ σ 2 k , k = 1 , 2 ,
with ˆ σ 2 k
=
1 N
∑ N i =1 (x ki − ˆ ιk )
2 and
ˆ γ =
ˆ σ12
ˆ σ 2 1
ˆ σ 2 2
, (7)
where ˆ σ12 =
1 N
∑ n i =1 (x 1 i − ˆ ι1 )(x 2 i − ˆ ι2 ) .
2.2.4. Mutual information
Let X 1 and X 2 be Gaussian variables. Then, the mutual information (MI) between X 1 and X 2 is as follows [4] :
MI (X 1 , X 2 ) = −1
2
log (1 − γ 2 ) ,
where γ is the correlation coefficient between X and X , as defined in Eq. (7) .
1 2
244 I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253
3. Circular-linear distribution of johnson and Wehrly
In this section, we review the method proposed by Johnson and Wehrly [14] to obtain angular-linear bivariate distribu-
tions with arbitrary marginal distributions.
After reviewing the general form of the Johnson–Wehrly angular-linear distribution, we propose a subfamily of their
model in which the conditionals and marginals are mathematically tractable.
3.1. Definition
Let f �( θ ) and f X ( x ) be probability density functions on the circle and on the line, respectively. Suppose that F �( θ ) is the
cumulative distribution function (CDF) of f �( θ ) defined with respect to a fixed, arbitrary origin and that F X ( x ) is the CDF of
f X ( x ). Then, the distribution of Johnson and Wehrly [14] is defined by the density
f (θ, x ) = 2 πg ( 2 πF �(θ ) − 2 πqF X (x ) ) f �(θ ) f X (x ) , (8)
where 0 ≤ θ < 2 π , x ∈ R , q ∈ {−1 , 1 } decides the positive or negative association between the two variables, and g (.) is a
density on the circle.
Let a random vector ( �, X ) have the density given by Eq. (8) . Then, the marginal distribution of � has the density
f �( θ ) and CDF F �( θ ), while the marginal distribution of X has the density f X ( x ) and CDF F X ( x ). However, the conditional
distributions are not tractable in general.
3.2. Conditionals
Let a random vector ( �, X ) follow the distribution (8) . Then, changing the variables U = 2 πF �(�) and V = 2 πF X (X ) , the
density function of ( U, V ) can be expressed as
f (u, v ) =
1
2 πg(u − q v ) ,
where g (.) is a density on the circle and q ∈ {−1 , 1 } . We propose the following subfamily of the family given by Eq. (8) which has tractable conditional distributions.
Theorem 2. Let a random vector ( �, X ) follow the distribution given by Eq. (8) with g (.) being the wrapped Cauchy density given
by Eq. (1) with location parameter μg and concentration parameter εg . Assume that U = 2 πF �(�) and V = 2 πF X (X ) . Then,
U| X = x ∼ wC(2 πqF X (x ) + μg , ε g ) and V | � = θ ∼ wC(q (2 πF �(θ ) − μg ) , ε g ) .
In particular, if �∼ wC ( μθ , ɛ θ ), X ∼ N(ιx , σ 2 x ) and F �(θ ) =
∫ t 0 f �(t ) dt , then, it holds that
�| X = x ∼ wC(μ, ε) ,
where μ = arg( ̂ φθ | x ) , ε = | ̂ φθ | x | , ˆ φθ | x =
ε g exp ( i (2 πqF X (x ) + μg − ν) ) + ε θ exp ( iμθ )
1 + ε g ε θ exp ( i (2 πqF X (x ) + μg − μθ − ν) ) ,
and ν = arg { (1 − ε θ exp ( −iμθ ) ) / (1 − ε θ exp ( iμθ ) ) } . Proof. The conditional density function of � given X = x and that of X given � = θ can be expressed as
f (θ | x ) =
f (θ, x )
f X (x )
and
f (x | θ ) =
f (θ, x )
f �(θ ) ,
respectively. Changing the variable U = 2 πF �(�) in f ( θ | x ), we obtain
f (u | x ) = f (θ | x ) ∣∣∣∣∂θ
∂u
∣∣∣∣ = g(u − 2 πqF X (x )) ,
where q ∈ {−1 , 1 } . Similarly, changing the variable V = 2 πF X (X ) in f ( x | θ ) leads to
f (v | θ ) = f (x | θ )
∣∣∣∣∂x
∂v
∣∣∣∣ = g(2 πF �(θ ) − q v ) .
Let g (.) be the wrapped Cauchy density function given by Eq. (1) , with location parameter μg and concentration param-
eter εg , as defined in Kato [15] . Then, the following hold:
U| X = x ∼ wC(2 πqF X (x ) + μg , ε g )
I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253 245
and
V | � = θ ∼ wC(q (2 πF �(θ ) − μg ) , ε g ) . (9)
Consider the case where �∼ wC ( μθ , ɛ θ ) and X ∼ N(ιx , σ 2 x ) . Without loss of generality, the origin of the cumulative dis-
tribution function of � is assumed to be zero (i.e., F �(θ ) =
∫ θ0 f �(t ) dt ).
The conditional of � given X = x has a wrapped Cauchy distribution. To see this, we first note that U and � have the
following relationship:
exp ( iU ) = exp ( iν) exp ( i �) − ε θ exp ( iμθ )
1 − ε θ exp ( i (� − μθ ) )
or
exp ( i �) =
exp ( i (U − ν) ) + ε θ exp ( iμθ )
1 + ε θ exp ( i (U − ν − μθ ) ) ,
where exp ( iν) = (1 − ε θ exp ( −iμθ ) ) / (1 − ε θ exp ( iμθ ) ) . McCullagh [29] showed that if exp ( iU ) ∼ C ∗( α1 exp ( i β1 )), then
exp ( iU ) + α2 exp ( iβ2 )
1 + α2 exp ( i (U − β2 ) ) ∼ C ∗
(α1 exp ( iβ1 ) + α2 exp ( iβ2 )
1 + α1 α2 exp ( i (β1 − β2 ) )
),
where 0 ≤α1 , α2 < 1 and 0 < β1 , β2 ≤ 2 π . Using this result, we have
exp ( i �) | X = x ∼ C ∗( ̂ φθ | x ) or, in polar-coordinate form,
�| X = x ∼ wC(arg( ̂ φθ | x ) , | ̂ φθ | x | ) , where
ˆ φθ | x =
ε g exp ( i (2 πqF X (x ) + μg − ν) ) + ε θ exp ( iμθ )
1 + ε g ε θ exp ( i (2 πqF X (x ) + μg − μθ − ν) ) .
�
For the rest of this subsection, we consider a subfamily presented in Theorem 2 in which the marginal of the circular
variable has the wrapped Cauchy and that of the linear variable has the Gaussian.
Therefore, the conditional of the circular variable given the linear variable follows a known and tractable distribution
(i.e., a wrapped Cauchy distribution ( Eq. (1) )). It follows from Theorem 2 that the conditional X| � = θ itself does not follow
any well-known distribution. However, Theorem 2 implies that
2 π (
X − ιx
σx
)| � = θ ∼ wC(q (2 πF �(θ ) − μg ) , ε g ) ,
where denotes the CDF of the standard Gaussian distribution N (0, 1), namely, (x ) =
∫ x −∞
φ(t ) dt , where φ is the Gaus-
sian density with ιx = 0 and σx = 1 . Since it is easy to evaluate the CDF of the standard Gaussian distribution numerically,
numerical calculations associated with the conditional distribution of X given � = θ can be conducted efficiently.
4. General circular-circular and circular-linear mutual information
Mutual dependence measures between two linear variables have been studied at length, including the works of [23,34] ,
among many others. In the case of linear data, one of the best-known measures is mutual information (MI) [4,38] , which
determines the similarity between the joint density and the product of its marginal densities. For circular data, the first
mutual-information measure was developed recently by [20] and is called CMI. However, CMI is defined only for a bivariate
wrapped Cauchy variable.
In this section, we redefine the CMI such that the measure can be used for any bivariate circular variables. Then, we
present a closed-form expression for the CMI for a general family of bivariate circular distributions. This study is also the
first to propose a mutual-information measure for circular and linear variables, which we call CLMI.
4.1. Circular-circular mutual information
In [20] , the CMI is approximated using numerical methods. Here, we present a simple expression for CMI for a general
family of distributions.
Let �, � be a pair of circular variables. Then, the CMI between � and � is defined by
CMI (�, �) =
∫ 2 π
0
∫ 2 π
0
f (θ, ψ) log
{f (θ, ψ)
f (θ ) f (ψ)
}d ψd θ, (10)
where f ( θ ) is the marginal density of �, f (ψ) is the marginal density of � , and f ( θ , ψ) is the joint density of ( �, �).
� �
246 I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253
Following a similar method to that in Johnson and Wehrly [14] , Wehrly and Johnson presented a general family for
circular variables in [42] . Here, we consider the following subfamily with the joint density function
f (θ, ψ) = 2 πδ(2 πF �(θ ) − 2 πqF �(ψ)) f �(θ ) f �(ψ) , (11)
where 0 ≤ θ , ψ < 2 π , f �(θ ) and f �(ψ) are any probability density functions on the circle, F �(θ ) and F �(ψ) are the CDFs
of f �(θ ) and f �(ψ) , respectively, q ∈ { 1 , −1 } decides the positive or negative association between the two variables, and
δ( ν) is the density of the wrapped Cauchy distribution wC ( μδ , ɛ δ).
Assume that a random vector ( �, �) has the distribution given by Eq. (11) . Then, it holds that the marginal distribution
of � has the density f �( θ ) and the CDF F �( θ ) and that the marginal distribution of � has the density f � ( ψ) and the CDF
F � ( ψ). As pointed out in [16] , the bivariate wrapped Cauchy distribution from Eq. (2) is a special case of the Wehrly and
Johnson family [42] .
Theorem 3. Let ( �, �) have the distribution given by Eq. (11) . Then, the CMI between � and � defined in Eq. (10) is given by
CMI (�, �) = 2 π
∫ 2 π
0
δ(t) log { δ(t ) } dt .
In particular, if δ is the density of the wrapped Cauchy distribution wC ( μδ , ɛ δ), then
CMI (�, �) = − log (1 − ε 2 δ ) . (12)
Proof. Let f ( θ , ψ) be the joint density function given by Eq. (11) and defined in Johnson and Wehrly [42] . Then, the CMI
250 I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253
Fig. 4. European locations of the meteorological stations from the WDCGG data set.
Table 2
Station name, variable described, variable used in the model, number of non-missing cases ( N i ), circu-
lar mean ˆ μ or linear mean ̂ ι (where applicable), unit of measure, concentration ˆ ε or standard deviation
ˆ σ (where applicable), and type of variable (C: Circular; L: Linear) for the 17 numeric variables of the
WDCGG data set. The circular variables range from 0 to 2 π .
Station Variable Name N i ˆ μ/ ̂ ι Units ˆ ε / ̂ σ Type
M Ocean Wind direction stmWD 1288 −1.86 radians 0.06 C
Wind speed stmWS 1288 8.93 m / s 3.91 L
Pallas- Wind direction palWD 150 −2.76 radians 0.18 C
Sammaltunturi Wind speed palWS 150 6.4 m / s 3.21 L
Lampedusa Wind direction lmpWD 446 −0.58 radians 0.20 C
Wind speed lmpWS 446 6.62 m / s 3.94 L
Relative humidity lmpRH 446 56.4 % 27.8 L
Atmospheric pressure lmpAP 446 1010 hPa 6.29 L
Atmospheric temp. lmpAT 446 19.59 Celsius 5.26 L
Hegyhatsal Wind direction hunWD 557 −0.67 radians 0.07 C
Wind speed hunWS 557 3.82 m / s 3.20 L
Dwejra Wind direction gozWD 157 −0.77 radians 0.39 C
Wind speed gozWS 157 2.79 m / s 2.17 L
Black Sea Wind direction bscWD 550 0.47 radians 0.22 C
Wind speed bscWS 550 4.73 m / s 2.56 L
Baltic Sea Wind direction balWD 1169 −1.77 radians 0.21 C
Wind speed balWS 1169 9.93 m / s 4.90 L
two Eastern Europe stations. Therefore, it seems that our circular-linear model is capturing the dependence relationships
between the variables of the data set properly.
The Schwarz Bayesian information criterion (SBIC) [35] is a model selection criterion, where the lowest value is preferred.
It is based on the likelihood function with an overfitting penalty, and is defined as
SBIC = −2 l n ̂
L + l n (N) w,
I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253 251
Fig. 5. Circular-linear tree-structured Bayesian network for the WDCGG meteorological data set. The names of the variables are shown in Table 2 . The
selected root node is the wind direction at the M Ocean station. Dashed border node lines indicate circular variables, while solid border node lines
indicate linear variables. Nodes with the same colour are recorded at the same station. Nodes with similar colour tones are located close to each other
geographically.
Table 3
SBIC comparison between the circular-linear
Bayesian network model, Gaussian Bayesian net-
work model, and discrete Bayesian network model
for the WDCGG data set.
Bayesian network
Model SBIC
Circular-Linear −5 . 9197 ∗ 10 192
Gaussian −2 . 0896 ∗ 10 169
Discrete −3 . 9851 ∗ 10 4
where ˆ L is the likelihood function value, N is the sample size, and w is the number of parameters to be estimated in the
model.
We use the SBIC to compare our model to a Gaussian Bayesian network model, where we assume that all variables follow
Gaussian continuous distributions. We also compare our model to a discrete Bayesian network model using the discretization
of variables. This discretization is carried out by considering each variable as linear, and then creating 10 partitions of equal
width (these vary for each variable, depending on their corresponding domain). Note that for those cases where the variables
are circular, the linear domain is considered to be between 0 and 2 π .
Comparing the SBIC values in Table 3 , we observe that our circular-linear model clearly outperforms the other two
models, which ignore the circular nature of the circular variables and treat them as (linear) continuous or discrete
variables.
252 I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253
7. Conclusions
Circular data are often observed together with linear data in the sciences. In this study, we showed that our subfamily
of Johnson & Wehrly bivariate distributions has tractable properties, such as well-known marginals and conditionals and
a closed-form expression for the estimators of the parameters. We presented a CLMI measure that measures the mutual
dependence between a circular variable and a linear variable by determining the similarity between the joint density and
the product of their marginal densities. We also extended the definition of the CMI measure. We have shown that the CLMI
and CMI can be expressed in a simple and closed form for our distributions for circular-linear data and bivariate circular
data, respectively.
In addition, we described experimental results that illustrate how to use these measures (i.e., the CLMI and CMI) with the
well-known mutual information between linear variables. To the best of our knowledge, this study is the first to develop a
circular-linear tree-structured Bayesian network model that can capture the dependence between any possible pair of linear
and circular variables.
Then, we applied our algorithm for a tree-structured Bayesian network model to a real data set in order to model the
relationships between circular and linear measurements recorded at seven meteorological stations located in Europe. Here,
we observed that the proposed model captures the strong dependence between variables recorded in geographically close
stations well and outperforms other models, which assume all variables to be Gaussian or discrete.
There are several potential applications of our tree-structured Bayesian network model, some of them are related to
sports (e.g., baskets in basketball or goals in football), social behaviour (e.g., hand gesture recognition, arms movement) and
meteorological events (e.g., twister progress, earthquake epicentre location and expansion), among many others.
Working with a combination of circular and linear statistics is a non-trivial task. Applications within Bayesian networks
and machine learning research for graphical models open a challenging field. As future work, we intend to adapt the pro-
posed circular-linear graphical model to perform real-time classification. In addition, dropping the dimension constraint (of
one parent) in our model would be another interesting path to explore in order to extend this model to a more general
Bayesian network case allowing more than one parent per node.
References
[1] T. Abe, C. Ley, A tractable, parsimonious and flexible model for cylindrical data, with applications, Econom. Stat. (2016) Inpress., doi: 10.1016/j.ecosta.
2016.04.001 . [2] K. Bowman , L. Shenton , Methods of moments, Encycl. Stat. Sci. 5 (1985) 467–473 .
[3] C. Chow, C. Liu, Approximating discrete probability distributions with dependence trees, IEEE Trans. Inf. Theory 14 (3) (1968) 462–467, doi: 10.1109/TIT.1968.1054142 .
[4] T.M. Cover , J.A. Thomas , Elements of Information Theory, John Wiley & Sons, 2012 . [5] J. Demšar , Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (2006) 1–30 .
[6] N.I. Fisher , Statistical Analysis of Circular Data, Cambridge University Press, 1995 . [7] N.I. Fisher, A.J. Lee, Regression models for an angular response, Biometrics 48 (3) (1992) 665–677, doi: 10.2307/2532334 .
[8] M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, J. Am. Stat. Assoc. 32 (200) (1937) 675–701,
doi: 10.1080/01621459.1937.10503522 . [9] N. Friedman, D. Geiger, M. Goldszmidt, Bayesian network classifiers, Mach. Learn. 29 (1997) 131–163, doi: 10.1023/A:100746552 .
[10] A.L. Gould, A regression technique for angular variates, Biometrics 25 (4) (1969) 683–700, doi: 10.2307/2528567 . [11] I.S. Gradshteyn , I.M. Ryzhik , Table of Integrals, Series, and Products, 7th ed., Academic Press, 2007 .
[12] S.R. Jammalamadaka , A. Sengupta , Topics in Circular Statistics, World Scientific, 2001 . [13] N.L. Johnson , S. Kotz , N. Balakrishnan , Distributions in Statistics: Continuous Univariate Distributions, Houghton Mifflin, 1970 .
[14] R.A. Johnson, T.E. Wehrly, Some angular-linear distributions and related regression models, J. Am. Stat. Assoc. 73 (363) (1978) 602–606, doi: 10.1080/
01621459.1978.10480062 . [15] S. Kato, A distribution for a pair of unit vectors generated by Brownian motion, Bernoulli 15 (3) (2009) 898–921, doi: 10.3150/08-BEJ178 .
[16] S. Kato, A. Pewsey, A Möbius transformation-induced distribution on the torus, Biometrika 102 (2) (2015) 359–370, doi: 10.1093/biomet/asv003 . [17] C. Kenley , Influence Diagram Models with Continuous Variables, Stanford University, 1986 Ph.D. thesis .
[18] J.T. Kent, D.E. Tyler, Maximum likelihood estimation for the wrapped Cauchy distribution, J. Appl. Stat. 15 (2) (1988) 247–254, doi: 10.1080/0266476880 0 0 0 0 029 .
[19] S. Kotz , N. Balakrishnan , N.L. Johnson , Continuous Multivariate Distributions, Models and Applications, John Wiley & Sons, 2004 .
[20] I. Leguey, C. Bielza, P. Larrañaga, Tree-structured Bayesian networks for wrapped Cauchy directional distributions, in: Advances in Artificial Intelligence,vol. 9868, Springer, 2016, pp. 207–216, doi: 10.1007/978- 3- 319- 44636- 3 _ 19 .
[21] P. Lévy , L’addition des variables aléatoires définies sur une circonférence, Bulletin de la Société Mathématique de France 67 (1939) 1–41 . [22] C. Ley , T. Verdebout , Modern Directional Statistics, CRC Press, 2017 .
[23] S. Lloyd, On a measure of stochastic dependence, Theory. Probab. Appl. 7 (3) (1962) 301–312, doi: 10.1137/1107028 . [24] K.V. Mardia , Statistics of directional data, J. R. Stat. Soc. Ser. B (Methodological) 37 (3) (1975) 349–393 .
[25] K.V. Mardia, G. Hughes, C.C. Taylor, H. Singh, A multivariate von Mises distribution with applications to bioinformatics, Can. J. Stat. 36 (1) (2008)
[27] K.V. Mardia , T.W. Sutton , A model for cylindrical variables with applications, J. R. Stat. Soc. Ser. B (Methodological) 40 (2) (1978) 229–233 . [28] K.V. Mardia, J. Voss, Some fundamental properties of a multivariate von mises distribution, Commun. Stat. 43 (2014) 1132–1144, doi: 10.1080/03610926.
2012.670353 . [29] P. McCullagh, Möbius transformation and cauchy parameter estimation, Ann. Stat. 24 (2) (1996) 787–808, doi: 10.1214/aos/1032894465 .
[30] R. von Mises , Über die ǣganzzahligkeit ǥ der atomgewichte und verwandte fragen, Zeitschrift für Physik 19 (1918) 490–500 .
[31] P. Nemenyi , Distribution-free multiple comparisons, Biometrics 18 (2) (1962) 263 . [32] F. Nojavan, S. Qian, C. Stow, Comparative analysis of discretization methods in Bayesian networks, Environ. Modell. Softw. 87 (2017) 64–71, doi: 10.
1016/j.envsoft.2016.10.007 . [33] A. Rényi, On measures of dependence, Acta Math. Acad. Sci. Hungarica 10 (3–4) (1959) 441–451, doi: 10.1007/BF02024507 .
[34] A. Rényi, On the dimension and entropy of probability distributions, Acta Math. Acad. Sci. Hungarica 10 (1–2) (1959) 193–215, doi: 10.1007/BF02063299 .[35] G. Schwarz, Estimating the dimension of a model, Ann. Stat. 6 (2) (1978) 461–464, doi: 10.1214/aos/1176344136 .
I. Leguey, P. Larrañaga and C. Bielza et al. / Information Sciences 486 (2019) 240–253 253
[36] A. Sengupta , On the construction of probability distributions for directional data, Bull. Indian Math. Soc. 96 (2) (2004) 139–154 . [37] R. Shachter, C. Kenley, Gaussian influence diagrams, Manage. Sci. 35 (5) (1989) 527–550, doi: 10.1287/mnsc.35.5.527 .
[38] C.E. Shannon, A mathematical theory of communication, Mob. Comput. Commun. Rev. 5 (1) (2001) 3–55, doi: 10.1145/584091.584093 . [39] P. Spirtes , C.N. Glymour , R. Scheines , Causation, Prediction, and Search, MIT Press, 20 0 0 .
[40] R.D.C. Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, 2008. [41] Y. Tong , The Multivariate Normal Distribution, Springer, 1990 .
[42] T.E. Wehrly, R.A. Johnson, Bivariate models for dependence of angular observations and a related Markov process, Biometrika 67 (1) (1980) 255,