University of South Carolina University of South Carolina Scholar Commons Scholar Commons Theses and Dissertations 2018 Uncertainty Estimation of Deep Neural Networks Uncertainty Estimation of Deep Neural Networks Chao Chen University of South Carolina - Columbia Follow this and additional works at: https://scholarcommons.sc.edu/etd Part of the Computer Sciences Commons Recommended Citation Recommended Citation Chen, C.(2018). Uncertainty Estimation of Deep Neural Networks. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/5035 This Open Access Dissertation is brought to you by Scholar Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected].
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
University of South Carolina University of South Carolina
Scholar Commons Scholar Commons
Theses and Dissertations
2018
Uncertainty Estimation of Deep Neural Networks Uncertainty Estimation of Deep Neural Networks
Chao Chen University of South Carolina - Columbia
Follow this and additional works at: https://scholarcommons.sc.edu/etd
Part of the Computer Sciences Commons
Recommended Citation Recommended Citation Chen, C.(2018). Uncertainty Estimation of Deep Neural Networks. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/5035
This Open Access Dissertation is brought to you by Scholar Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected].
Table 2.1 Average test RMSE of the proposed algorithm on different layersand ensemble size. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Table 2.2 Average test LL of the proposed algorithm on different layersand ensemble size. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Table 2.3 Significance tests of the average test RMSE among VI, PBP andthe proposed algorithm with different ensembles and layers. . . . . 23
Table 2.4 Significance tests of the average test RMSE among MC-dropout,Deep Ensembles, and the proposed algorithm with different en-sembles and layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Table 2.5 Significance tests of the average test LL among VI, PBP and theproposed algorithm with different ensembles and layers. . . . . . . 23
Table 2.6 Significance tests of the average test LL among MC-dropout,Deep Ensembles, and the proposed algorithm with different en-sembles and layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Table 2.7 Number of parameters of selected networks. . . . . . . . . . . . . . 27
Table 3.1 Basic information of the five events. . . . . . . . . . . . . . . . . . 35
Table 4.1 Concepts are used for the concept inventory . . . . . . . . . . . . . 52
Table 4.2 Summary of students answers for each question in each class(Numbers of students who selected the correct choice are indi-cated by circle) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 4.3 Conditional probabilities for a question related to two concepts . . 61
Figure 3.1 A RNN with an input layer (blue), a hidden layer (red), and anoutput layer (green). Units within the dotted regions are optional. 32
Figure 3.2 Gated mechanism of a LSTM cell as described by [39]. . . . . . . 34
Figure 3.3 Architecture of the network used in this study. . . . . . . . . . . . 38
Figure 3.4 Predicted sub-events with the proposed algorithm for the 2013Boston marathon event. The distance lines above the bluethreshold line indicate identified outliers, and the red color in-dicates the Boston bombing moment (identified=40, true=16,σ2ε=2.13). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Figure 3.5 Predicted sub-events with the proposed algorithm for the 2013Superbowl event. The distance lines above the blue thresholdline indicate identified outliers, and the red color indicates thepower outbreak (identified=33, true=18, σ2
ε=2.19). . . . . . . . . 40
Figure 3.6 Predicted sub-events with the proposed algorithm for the 2013OSCAR event. The distance lines above the blue thresholdline indicate identified outliers, and the red color indicates theOSCAR starting moment (identified=39, true=25, σ2
ε=1.98). . . . 41
x
Figure 3.7 Predicted sub-events with the proposed algorithm for the 2013AllStar event. The distance lines above the blue threshold lineindicate identified outliers, and the red color indicates the All-Star starting moment (identified=33, true=22, σ2
ε=1.56). . . . . . 42
Figure 3.8 Predicted sub-events with the proposed algorithm for the 2013Zimmerman trial news event. The distance lines above the bluethreshold line indicate identified outliers, and the red color in-dicates the verdict moment (identified=23, true=17, σ2
ε=1.21). . . 43
Figure 3.9 Performance of the algorithm on the marathon event for differ-ent ensemble size. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 3.10 Performance of the algorithm on the Boston marathon eventfor different sigma value. . . . . . . . . . . . . . . . . . . . . . . . 44
Table 2.4: Significance tests of the average test RMSE among MC-dropout, DeepEnsembles, and the proposed algorithm with different ensembles and layers.
RMSE MC Dropout Deep EnsemblesDataset EnKF-200-1 EnKF-1000-1 EnKF-1000-5 EnKF-200-1 EnKF-1000-1 EnKF-1000-5
Table 2.6: Significance tests of the average test LL among MC-dropout, Deep En-sembles, and the proposed algorithm with different ensembles and layers.
LL MC Dropout Deep EnsemblesDataset EnKF-200-1 EnKF-1000-1 EnKF-1000-5 EnKF-200-1 EnKF-1000-1 EnKF-1000-5
where the sample mean and covariance matrix are obtained using the propagated
ensemble memebers to the observable.
When the square of the Mahalanobis distance passes the following test, the ob-
servations is considered to not be an outlier and a plausible outcome of the model.
Here the degree of freedom used to obtain χ20.05 is q.
m2d ≤ χ2
0.05 (3.3)
3.4.2 Subevent Detection
An event is confined by space and time. Specifically, it consists of a set of subevents,
depicting different facets of an event [111]. As an event evolves, users usually post
new statuses to capture new states as subevents of the main issue [91]. Within an
event, some unexpected situations or results may occur and surprise users, such as
the bombing during the Boston Marathon and the power outage during the 2013
Superbowl. Subevent detection provides a deeper understanding of the threats to
better manage the situation within a crisis [112].
By formalizing it as an outlier detection task, we built dynamic models to detect
subevents based upon the retrieved Twitter data and the proposed window embedding
representation described in the following sections.
3.4.3 Data
We collected the data from Jan. 2, 2013 to Oct. 7, 2014 with the Twitter streaming
API and selected five national events for the outlier detection task. The five events
include the 2013 Boston Marathon event, the 2013 Superbowl event, the 2013 OSCAR
35
event, the 2013 NBA AllStar event, and the Zimmerman trial event. Each of these
events consists of a variety of subevents, such as the bombing for the marathon
event, the power outrage for the Superbowl event, the nomination moment of the
best picture award, the ceremony for the NBA AllStar MVP, and the verdict of the
jury for the Zimmerman trial event.
For these case studies, we filtered out relevant tweets with event-related keywords
and hashtags, preprocessed the data to remove urls and mentioned users. The basic
information of each event is provided in Table 3.1.
3.4.4 Window Embedding
In computational linguistics, distributed representations of words have shown some
advantages over raw co-occurrence count since they can capture the contextual infor-
mation of words. As categorized by Boroni et al. [84], distributed semantic models
can be termed as count models or prediction models. On one hand, count mod-
els, including LSA, HAL, and Hellinger PCA, can efficiently use the statistics of the
co-occurrence information but limited to capture complex patterns beyond word sim-
ilarities. On the other hand, prediction models, such as NNLM and word2vec, can
capture complex patterns of the words but limited to use the statistics information
of the words. To cope with limits of each approach, Pennington et al. [108] proposed
a weighted least squares objective J as shown follows:
J =V∑
i,j=1f(Xij)(wTi wj + bi + bj − logXij)2 (3.4)
where Xij is the number of times word j in the context of word i, wi and bi are
the word vector and bias of word i, wj and bj are the context word vector and bias
of word j, and f is a pre-defined weighting scheme as follows.
36
f(x) =
(x/xmax)α if x < xmax
1 otherwise
Vector representations can be used as features and they have been successfully
applied in many natural language processing applications [108]. Through some ex-
periments, we decided to use the 100 dimension GloVe vector representation that
were trained with 27 billion tweets. We further used the Probabilistic PCA to reduce
the vector dimensionality into d latent components that could capture at least 99%
variability of the original information.
Here, we define sentence embedding as the average of its word vectors. Given a
sentence, it consists of n words represented by vectors ed1, ed2, ..., edn, and the sentence
embedding sdi is defined as ∑ni=1 e
di /n. Furthermore, we define a window embedding
wdt as the average of its sentence vectors. For a given time window, it is composed of
m sentence vectors sd1, sd2, ..., sdm, and a window embedding wdt is defined as∑mi=1 s
di /m.
As we use a moving window approach, we grouped every l-size window wd1, wd2, ..., w
dl
into a training input X, and label wdl+1 as the training input Y . Based upon the
grouped data, we can train our proposed multivariate EnKF-LSTM model. With
some experiments, we chose 5 as the number of latent components d, 5 minutes as
the time window t, and 32 as the grouping size l.
3.4.5 Implementation
The implemented 2 layer network is shown in Figure 3.3. The input layer consists of
5 nodes, each hidden layer consists of 32 LSTM cells, and the output layer consists of
5 output nodes. In this implementation, we include the forget gate proposed by [37].
The implementation is based upon Tensorflow,and it could be easily extended for
deeper architectures or variants of LSTMs.
37
Figure 3.3: Architecture of the network used in this study.
3.5 Results
The outlier detection results are provided in Figure 3.4-3.8. In terms of the results,
we observe 40, 33, 39, 33, and 23 identified sub-events, respectively. Of those sub-
events, 16, 18, 25, 22, and 17 are verified as true sub-events. We set the initial
sigma value of the noise covariance matrix in the EnKF update step to 1.0, and then
further optimized them to 2.13, 2.19, 1.98, 1.56, and 1.21 with Maximum Likelihood
Estimation.
To further evaluate our model, we compared it with Gaussian Process (GP) and
MC dropout [34]. The comparison result is provided in Table 3.2. The GP model
yielded the best recall value in three of the five events, indicating that it captured most
true sub-events. On the other hand, it also misidentified many normal time windows
as sub-events, thus yielding many false positives and low precision. Compared to
the GP model, our proposed enkf_lstm algorithm reliably captured many true sub-
events and yield the best precision in these five events. Though, on the other hand, it
missed capturing some true sub-events and had worse recall performance in four of the
five events. In terms of the F1 score, our proposed algorithm has the best performance
38
Figure 3.4: Predicted sub-events with the proposed algorithm for the 2013 Bostonmarathon event. The distance lines above the blue threshold line indicate identifiedoutliers, and the red color indicates the Boston bombing moment (identified=40,true=16, σ2
ε=2.13).
in two of the five cases. The MC dropout model, however, has the worst performance
for this specific outlier detection task. Since MC dropout is mathematically equivalent
to variational inference, which under-estimates the uncertainty, the model mislabels
many normal time windows as outliers.
For the proposed algorithm, ensemble size N and the initial sigma value of the
noise covariance matrix σ2ε are two important hyper-parameters. To further evaluate
their effects on the performance, we provided a sensitivity analysis of the hyper-
parameters for the 2013 Boston marathon event. Based upon Figure 3.9, the algo-
rithm yielded the best result with an ensemble size at 200. In general, the evaluation
metrics increase until 200 and then slightly decrease, implying the proposal algo-
rithm can capture the dynamics of the posterior weights with a medium sample size.
39
Figure 3.5: Predicted sub-events with the proposed algorithm for the 2013 Superbowlevent. The distance lines above the blue threshold line indicate identified outliers,and the red color indicates the power outbreak (identified=33, true=18, σ2
ε=2.19).
Table 3.2: Evaluation metrics on different algorithms.
Event Model Precision Recall F1 ScoreGP 37.5 64.3 47.4
Figure 3.6: Predicted sub-events with the proposed algorithm for the 2013 OSCARevent. The distance lines above the blue threshold line indicate identified outliers,and the red color indicates the OSCAR starting moment (identified=39, true=25,σ2ε=1.98).
According to Figure 3.10, the evaluation metrics peaked at 0.05 and then slightly
decreased with larger value.
3.5.1 Discussion
In this work, we proposed a novel algorithm to estimate the posterior weights for
LSTMs, and we further developed a framework for outlier detection. Based upon
the proposed algorithm and framework, we applied them for five real-world outlier
detection tasks using Twitter streams. As shown in the above section, the proposed
algorithm can capture the uncertainty of the non-linear multivariate distribution and
outperform Gaussian process and MC dropout in terms of precision. We also eval-
uated the sensitivity of the algorithm on different ensemble size and variance value
41
Figure 3.7: Predicted sub-events with the proposed algorithm for the 2013 AllStarevent. The distance lines above the blue threshold line indicate identified outliers,and the red color indicates the AllStar starting moment (identified=33, true=22,σ2ε=1.56).
of the prior distribution of the weights with the Boston marathon data. However,
the performance of the model is further affected by several other hyper-parameters,
including the batch size, the number of layers, the number of nodes in each layer,
and the choice of window size, and the sensitivity of these hyper-parameters will be
evaluated in future research.
42
Figure 3.8: Predicted sub-events with the proposed algorithm for the 2013 Zim-merman trial news event. The distance lines above the blue threshold line indicateidentified outliers, and the red color indicates the verdict moment (identified=23,true=17, σ2
ε=1.21).
Figure 3.9: Performance of the algorithm on the marathon event for different ensemblesize.
43
Figure 3.10: Performance of the algorithm on the Boston marathon event for differentsigma value.
44
Chapter 4
Student Knowledge Estimation with Bayesian
Network
4.1 Student Knowledge Estimation
Intelligent Tutoring Systems (ITS) have been studied since 1980s [113] and research
from this area is becoming more important because of the advancement of computa-
tion and the increase of class size. As Butz et al. [15] explained, ITSs are computer-
based systems that can provide functionalities such as estimation of students’ un-
derstanding and individualized instruction in a similar way to traditional one-to-one
tutoring. It is of particular importance when the enrollment of post-secondary stu-
dents keeps growing 1 while the instructors have limited time for providing feedback.
Knowledge of the students are hidden from direct measurements; however, ITS
can help us to estimate the latent knowledge of students from quizzes. So far, there
are a number of ITSs that are used for different domains, such as BITS [15], Andes
[145], ViSMod [156], KERMIT[48].
As Butz et al. [15] claimed, there exists four common components of traditional
ITSs, including knowledge domain, student model, teaching strategies, and user in-
terface. Specifically, student models are pre-defined user models that are used to
track the states and needs of each student. The teaching strategies are the instruc-
tion styles of the system, such as the way of providing recommendations, while the
user interface of the system provides the capacity to interact with users.
1http://nces.ed.gov/fastfacts/display.asp?id=98
45
Of all components, student models are considered the key component of any adap-
tive tutoring system [93] due to their capability of storing information (e.g. example
and learning styles) about the students. Based upon student models, we can further
estimate the knowledge of each student and provide an individualized and optimum
learning path for the students. As explained by Chrysafiadi and Virvou [24], student
models can be used to estimate the knowledge level and cognitive states of students,
identifying learning styles and preferences, selecting proper learning methods (e.g.
providing tutorials), recognizing weaknesses and strengths to further recommend in-
dividualized feedback.
Due to the difficulty of model construction and initialization, many researchers
realize the complexity of the issue and put forward many possible approaches [93].
The typical approaches of constructing student models and initializing parameters
are using expert knowledge, data-driven estimations, or a synthesis of those two.
In this study, we address three issues involved in student knowledge modeling.
That is, estimation of knowledge level of students, distractors or misconceptions
identification, and evaluation of questions design. We address the first issue by con-
structing Bayesian Student models to estimate the posterior of the knowledge of a
student for a specific concept given assessment answers. We address the second issue
by proposing a novel optimization procedure. And we address the third issue by
designing a novel index to evaluate whether a question is proper or not.
Concept inventories (CIs) are commonly used to make inferences about students’
knowledge. Concept inventories are used for different branches of science including,
but not limited to, Electromagnetics [102], Discrete Mathematics [3], Statistics [138],
Electric Circuits [104], Signals and Systems [147], Thermodynamics [92], Strength of
According to Jensen [61], a Bayesian Network (BN) provides a graphic and math-
ematical depiction of the joint probability over a group of random variables. Before
we introduce the power of a BN, we need to illustrate the concept of the Joint Proba-
bility Distribution (JPD). As discussed in [15], a JPD is defined as a function p over
a set of random and discrete variables V = v1, v2, ..., vn if they meet the following
properties:
• 0 <= p(v) <= 1.0, for each v ∈ dom(V )
• ∑v∈dom(V ) p(v) = 1
Where dom(V ) is the Cartesian product of each variable domain in U .
49
One of the main advantages of using BNs is to get a compact form of the joint
distribution on V . Obviously, the operation of obtaining the joint distribution directly
on V takes O(2n) for n variables. However, the utilization of BNs can speed up the
acquisition of JPD based upon the notion of conditional independence [15]. Specific,
random variables v1 and v3 are conditionally independent given v2 if
p(v1|v2v3) = p(v1|v2) (4.5)
The compact form of the joint distribution is then acquired with the chain rule.
According to Jensen [61], the JPD, p(U), is specified as follows based upon the chain
rule:
p(U) =n∏i=1
p(vi|pa(vi)) (4.6)
Where pa(vi) are the parents of vi in BN.
As Jensen [61] claimed, a BN consists of four basic elements. That is, it includes
a set of variables as well as edges between variables, a number of mutually exclusive
states for each variable, a Directed Acyclic Graph (DAG) constructed by the variables
and edges, and a conditional probability table attached to each variable. If no parents
exist for node A, then the conditional probability attached to the node becomes
the prior. In this study, all the basic elements of the network are pre-specified.
Particularly, we use domain expertise to construct the network structure, and estimate
the conditional tables as well as priors with an integration of expert knowledge and
a stepwise procedure introduced later in this section.
The Expectation Maximization (EM) algorithm is used to estimate the parameters
of BN with missing observed data [61]. In general, the EM algorithm is an iterative
approach to estimate the maximum likelihood of the parameters with an expectation
step and a maximization step. In the expectation step, we compute the expectation
of the data by using current parameters θ0ld, and later we obtain an updated set of
50
parameters θnew by maximizing the expectation with regard to each old parameter.
Then the algorithm runs in a repetitive way with these two steps until it converges
or reaches the pre-specified iteration steps.
A formal description of the algorithm for EM can be found in Jensen [61] and it is
also provided in Algorithm 3. In the algorithm, p(vi|pa(vi)) represents the conditional
probability for variable vi in its kth state, given the jth configuration of the parents
of vi, and sp(vi) represents the state space of variable vi.
Algorithm 3 EM algorithm for Bayesian Network [61]Choose initial parameters θold.Define stopping criteria ε > 0.Set t := 0.while | log2 p(D|θt)− p(D|θt−1)| > ε doE step: Compute the expectation of the likelihood function:
Eθt [N(vi, pa(vi))|D] =∑d∈D
p(vi, pa(vi)|d, θt)
M step: Estimate θijk with the expected counts using maximum likelihood:
Inspired from the force concept inventory [47, 46, 54], Statics concept inventory is
developed to assess conceptual understanding, and identify the students’ misconcep-
tions regarding basic concepts in statics. Four clusters of concepts are introduced by
Steif [133, 135] and used in this paper. These clusters of concepts are summarized
in Table 4.1. The CI contains 27 questions, and to answer them correctly, a student
51
Table 4.1: Concepts are used for the concept inventory
Concept Definition
C1Forces are always in equal and opposite
pairs acting between bodies, which are usually in contact
C2
Distinctions must be drawn between a force, a momentdue to a force about a point, and a couple. Twocombinations of forces and couples are statically
equivalent to one another if they have the same netforce and moment.
C3
The possibilities of forces between bodies that areconnected to, or contact, one another can be reducedby virtue of the bodies themselves, the geometry of the
connection and/or assumptions on friction.
C4
Equilibrium conditions always pertain to the externalforces acting directly on a chosen body, and a body isin equilibrium if the summation of forces on it is zero
and the summation of moments on it is zero.
should know one or some of the mentioned concepts. The model used in this paper
is illustrated in Figure 4.1. The arrows show the logical connection between concepts
and questions. For example we believe that to learn concept C2, students first need
to learn concept C3. Also, to answer some of questions, students need to learn more
than one concept.
Construction of the student model is based upon the integration of BN and under-
standing of the student curricular structure. In the knowledge tracking domain, the
knowledge of a student, which is also the understanding of particular concepts (e.g.
1st cluster of concepts, and 2nd cluster of concepts) is treated as hidden variables
with a known state and an unknown state. The hidden variables are investigated by
using observed variables which are the answers to the concept inventory questions
(e.g. q_q1) in Figure 4.2.
An instance of our student model is provided in Figure 4.2. According to the
figure, each node represents either a concept or a question, and each edge of the graph
represents a connection between concepts and questions or a connection between
concepts. With the post-test data, we insert the answers to questions as evidence
52
C1C2
C3 C4 Q 1
Q 4
Q 7Q10
Q13
Q16
Q19
Q22 Q25
Q 2
Q 5
Q 8
Q11
Q14
Q17
Q20
Q23
Q26
Q 3
Q 6
Q 9
Q12
Q15
Q18
Q21
Q24
Q27
Figure 4.1: Relationships of concepts and questions
into the model and estimate the posterior of each concept for each student. The blue
color of a concept node indicates the probability of knowing the concept given the
answers to the 27 questions.
Knowledge tracking based upon BN are limited by three factors, including the
selection of nodes, structure building of the network, and initialization of the priors
and conditionals [87]. In this work, we integrate concept inventory to select the
concepts for the model. The structure of the network is specified in terms of the
expert knowledge and concept inventory. Due to its factorization of complex joint
distribution into marginal distributions, we initialize the priors and conditionals over
all nodes and edges using our expertise. Somehow the way of initializing the model
neglects the data characteristics, which fails to differentiate the students’ performance
53
known 16%unknown84%
c_c1known 40%unknown60%
c_c3
known 15%unknown85%
c_c2known 16%unknown84%
c_c4
a 0%b 0%c 0%d 100%e 0%
q_q1
a 0%b 0%c 0%d 100%e 0%
q_q2
a 0%b 100%c 0%d 0%e 0%
q_q3
a 100%b 0%c 0%d 0%e 0%
q_q4
a 100%b 0%c 0%d 0%e 0%
q_q5
a 0%b 0%c 0%d 0%e 100%
q_q6
a 0%b 0%c 100%d 0%e 0%
q_q7a100%b 0%c 0%d 0%e 0%
q_q8a 0%b 100%c 0%d 0%e 0%
q_q9a 0%b 100%c 0%d 0%e 0%
q_q10a 0%b 0%c 0%d 100%e 0%
q_q11a 0%b 100%c 0%d 0%e 0%
q_q12
a 0%b 0%c 0%d 100%e 0%
q_q13
a 0%b 0%c100%d 0%e 0%
q_q14
a 0%b 0%c 0%d 0%e 100%
q_q15
a100%b 0%c 0%d 0%e 0%
q_q16
a 0%b 0%c100%d 0%e 0%
q_q17
a 0%b 100%c 0%d 0%e 0%
q_q18
a100%b 0%c 0%d 0%e 0%
q_q19
a 0%b 0%c 100%d 0%e 0%
q_q20a 0%b 0%c 0%d 0%e 100%
q_q21a 0%b 0%c 0%d 100%e 0%
q_q22a 0%b 0%c 0%d 0%e 100%
q_q23a 0%b 0%c 0%d 0%e 100%
q_q24
a 0%b 100%c 0%d 0%e 0%
q_q25
a100%b 0%c 0%d 0%e 0%
q_q26a 0%b 0%c 100%d 0%e 0%
q_q27
Figure 4.2: Bayesian Network Model of Student Knowledge for Statics Concept In-ventory
54
in each semester. Thus we adopt a novel optimization procedure which is a data-
driven approach to obtain the optimized parameters for the prior of the first cluster
of concepts and all the conditionals between the concepts and questions.
Specifically, the initial prior probability for knowing the first cluster of concepts
(p(c_c1) is defined as 0.7, since it is a pre-requisite concept that is learned in the
Linear Algebra class before the Statics class. In this work, we also explore the rest of
the concepts, such as the second cluster of concepts and the third cluster of concepts,
which are developed based upon the first cluster of concepts. The transition prob-
ability between pairs of concepts is determined by the capability of the instructors
of communicating the knowledge to the students. In this study, we assume the in-
structor is capable of communicating those concepts effectively to the students, thus
we assign a high probability (e.g. 0.95) to p(c_c2 = known|c_c1 = known). On the
other hand, if a student has no knowledge of c_c1 (p(c_c1 = unknown)), then the
probability of not knowing c_c2 (p(c_c2 = unknown|c_c1 = known)) is set as a high
value (e.g. 0.95) as well. In this case, we assume that the lack of understanding of a
basic concept will impede the understanding of a more advanced concept. Similarly,
we define all other conditional probabilities between the concepts and questions.
According to our model, it should be kept in mind that the probability of knowing
a more advanced concept (e.g. 3rd cluster of concepts) is lower in the absence of any
testing. Specifically, when the prior of knowing the 1st cluster of concepts is set as 0.7,
the prior of knowing the rest of the clusters of concepts are 0.6485, 0.68, and 0.62015,
respectively. The objectives of this research can be discussed in aspects. First, we
can estimate the level of understanding of each individual student for the instructed
concepts with a Bayesian approach by analyzing the evidences from the concept in-
ventory tests. Secondly, we can reveal the misconceptions of the questions in the tests
when the concept is unknown to the students using our proposed parameter learning
algorithm. The initial conditional probability between the concepts and questions is
55
equally distributed over the answers (e.g. p(q_q1 = X|unknown) = 0.2 where X is
either A, B, C, D, and E) when the concept is unknown to the students. However,
this setting ignores the misconceptions of the students at the moment of assessment
because some answers are more distracting than others. Based upon the proposed
parameter learning procedure, we can learn the optimized conditional probabilities
using the data and recognize the misconceptions of the students to develop remedial
interventions. Thirdly, we can investigate the design of the questions from the per-
spective of instructors to explore whether a question is a well designed question or a
badly designed (or too difficult) question. To address this goal, we use the likelihood
plots and identify the badly-designed questions if we have a p-value larger than a
pre-specified threshold such as 0.05. More detailed discussion about the second goal
and the third goal are provided in the following sections.
4.3.2 Misconception Identification
Conditional probabilities can be used to identify the distractors. When a student
does not know the concept, there is a higher chance to select a choice other than the
correct one.
To get more informative priors and conditionals, we learn the parameters from
experimental data with the EM algorithm provided in the JSMILE APIs. However,
current JSMILE APIs only infers entire conditionals which include the probabilities
of answering the quizzes when the concept is known. This may result in logical
inconsistencies for extreme datasets. A typical case is that all students answer the
questions incorrectly. To cope with the issue and obtain proper parameters, we
establish a novel optimization procedure shown in Algorithm 4 to estimate the prior
of the first cluster of concepts and conditionals between concepts and questions.
56
Algorithm 4 Optimization procedure for parameter learningSet an initial model evidence log2p(D|θ0).Define stopping criteria ε > 0.Set t := 0.while | log2 p(D|θtC)− log2 p(D|θt−1
C )| > ε doEstimate θQ with EM.Update p(Q = X|C = known) for question Q and concept C.Estimate θC with EM.Set θt+1
C = θC , t = t+ 1.end whileUpdate p(Q = X|C = known) for question Q and concept C.
4.3.3 Ill-designed Question Identification
To assess the validity of the optimized Bayesian network model, M , we propose
a predictive validation metric evaluated on a hold-out dataset. Using the training
dataset D6=i,j consisting of student’s answers to 26 out of the 27 questions in the
concept inventory, we can infer the posterior probabilities associated with knowing the
four concepts for each student, Sj. Student’s answer, xij, to the validation question,
Qi, is used to evaluate the following likelihood function obtained from the posterior
predictive distribution corresponding to the ith question.
p(Qi = xij|Sj, D6=i,j,M) (4.7)
This likelihood function can be used to define a measure of how well the model
fits all N students’ answers for the ith question. This goodness-of-fit measure can be
expressed using the following expected log-likelihood under the assumption that all
students are equally likely.
E[log p(Qi = xi|S,D 6=i,M)] =N∑j=1
p(Sj) log p(Qi = xij|Sj, D6=i,j,M)
= 1N
N∑j=1
log p(Qi = xij|Sj, D6=i,j,M) (4.8)
By itself, this goodness-of-fit measure is non-informative. A reference measure is
required to assess whether the model can explain the validation data. Note that by
57
generating a random answer rij from a discrete uniform distribution corresponding
to Qi, and for each student Sj, we can generate a similar measure as in Eq. 4.8 for
how well the model fits random answers.
E[log p(Qi = ri|S,D 6=i,M)] = 1N
N∑j=1
log p(Qi = rij|Sj, D6=i,j,M) (4.9)
It is expected that a model that has predictive capability will better fit the ac-
tual answers than the synthetically generated random answers. Thus, the measure in
Eq. 4.8 is expected to be larger than the one in Eq. 4.9. However, the expected log-
likelihood in Eq. 4.9 is a random variable with a distribution induced by the discrete
uniform distribution over the answers of Qi. As a result, the proposed predictive vali-
dation metric takes the form of a standard hypothesis test - where the null hypothesis
is that the model has no predictive capability and the alternative is that the model
has predictive capability. The null hypothesis is rejected at the significance level α.
friction coefficient which is 0.5. Both questions have good distractors, because the
instructor can understand what the misconception is. In the first one, the miscon-
ception is behavior of connection. A pinned connection cannot resist in front of the
60
Table 4.3: Conditional probabilities for a question related to two concepts
C1 known unknownC4 known unknown known unknowna 0.05 0.6638 0.9982 0.0478b 0.05 1.9e-6 0.0004 0.3965c 0.05 1.9e-6 0.0004 0.1585d 0.05 2.1e-6 0.0004 0.3176e 0.8 0.3361 0.0004 0.0794
Question 19: The force F is known and the other loads on the plate areunknowns to be determined. Consider drawing a free body diagram ofthe plate, including the unknown reaction of the pin.
Figure 4.3: Question 19 related to C3
Table 4.4: Conditional probabilities for question 19
rotation. In the second question, the distractor represents one of the most common
misconceptions in friction questions [137]. The tangential force is obtained equal to
the normal force times the friction coefficient, while its magnitude is greater than the
force needed to maintain the equilibrium.
4.4.3 Ill-designed questions
The p-values for each question are calculated using 50, 000 sets of random samples
corresponding to students’ answers to the ith question. Table 4.6 summarizes the
61
Question 22: Three blocks are stacked on top of one another on atable. Then, the horizontal forces shown are applied. The frictioncoefficient is 0.5 between all contacting surfaces. (This is both thestatic and kinetic coefficient of friction.)Which of the following represents the horizontal component of theforce acting on the lower face of the top (20 N) block?
Figure 4.4: Question 22 related to C3
Table 4.5: Conditional probabilities for question 22
p-values for all the questions in the concept inventory. Note that for 7 questions the
model does not exhibit predictive capability at 0.05 significance level.
Here are some examples of the first type of design problems. For question 11
which is shown in figure 4.5, there is 68% chance of selecting the correct choice when
62
Question 11: The platform is kept in the position shown by a roller,link and hydraulic cylinder. The coefficient of friction between theroller and the dump is 0.6. What is the direction of the roller on theplatform at the point of interest?
Figure 4.5: Question 11 related to concept C3
Table 4.7: Conditional probabilities for question 11
a student does not know the related concept. Another example could be question
20, shown in figure 4.6. As it shown in Table 4.8, the chance of selecting the correct
answer, when a student does not know the concept is 61%. Another good example
of these kinds of questions is question 10.
Here is an example of the second type of design problems. Question 9 is a good
example of those hard questions, only 4 students among 34 could answers this question
correctly and its p-value is around 0.84. It shows that the model is not able to make
63
Question 20: Part 1 and part 2 are welded to each other. Forces Fand G are known, and the other loads are unknowns to be determined.Consider drawing a free body diagram of part 1, including the unknownreaction of part 2 on part 1. Unknowns A, B, and C could be positive,negative or zero.Which of the following is the correct free body diagram for the forcesand/or moments exerted by part 2 on part 1 at welded section?
Figure 4.6: Question 20 related to concept C3
prediction for students in this question. They are several possible reasons for that.
First, looking at the optimized conditional probabilities, it is one of those questions
where the chance of selecting the correct answer when a student does not know
the concept is more than other choices, p=0.31. Furthermore there are only three
questions related to concept C2, which may not be enough to predict the performance
of students in this question.
A good design problem can be illustrated by Question 7. Question 7, related
to concept C2, is one of the questions with a low p-value. Figure 4.10 shows the
related histogram and p-value for the class of 2008, considering classes of 2007 and
2006 for optimization. The value of p-value shows indicates strong evidence against
the null hypothesis that the model cannot predict the performance of students. In
other words, assuming the model inference about student knowledge is acceptable,
the question and its related concept is designed in a way that can be used to infer
about student knowledge in concept C2.
64
Question 9: The two forces with magnitudes 7 N and 10 N act in thedirections shown through points A and B, which are denote with dots.These forces keep the member in equilibrium while it is subjected toother forces acting in the plane (shown at the right).Assuming the other forces stay the same, what load(s) could replacethe 7 N and 10 N forces and maintain equilibrium ?
Figure 4.7: Question 9 related to concept C2
4.5 Educational Component
Student knowledge estimation plays a core role in the process of student learning.
Proper and prompt estimation can help provide important information to both in-
structors and students for remedial intervention. In this study, we explore using
concept inventory for student knowledge estimation. Meanwhile, our system can help
each student identify the individualized misconception of each question. By proposing
a question design metric, we can further measure the effectiveness of each question.
We implemented the backend system with Java and JSMILE APIs. With the sys-
tem, we can construct student models with prior knowledge of the structure of a given
concept inventory. By feeding student exam data into the system, we can obtain per-
sonalized student models and provide individualized suggestion for intervention. The
implemented system is publically available at https://bitbucket.org/uqlab/scilaf.
65
Figure 4.8: Histogram of blind guessing in addition to the actual performance ofstudents score.
4.6 Conclusions
In this work, we firstly developed a data-driven approach to assess the latent student
knowledge by constructing Bayesian student models. Then we put forward a novel
algorithm to identify the misconceptions for each quiz question, and thus we could
provide individualized and remedial interventions for each student. In the end, we
proposed a novel index to evaluate the student models as well as measuring the design
of each question. According to the results, we identified common distractors found
for the concept inventory data. As the model is capable of discovering individualized
misconception, it can provide timely intervention after each test. Furthermore, the
measurement index showed that 20 of the 27 questions exhibit predictive capability
with the student model, while several inproper-designed questions were discussed.
66
Question 7: A 200 N-mm couple acting counter-clockwise keeps themember in equilibrium while it is subjected to other forces acting inthe plane (shown schematically at the left). The four dots denoteequally spaced points along the member. Assuming the other forcesstay the same, what load(s) could replace the 200 N-mm couple andmaintain equilibrium?
Figure 4.9: Question 7 related to concept C2
Figure 4.10: Histogram of blind guessing in addition to the actual performance ofstudents score.
67
Bibliography
[1] Hamed Abdelhaq, Christian Sengstock, and Michael Gertz. “EvenTweet: On-line Localized Event Detection from Twitter”. In: Proc. VLDB Endow. 6.12(Aug. 2013), pp. 1326–1329. issn: 2150-8097. doi: 10.14778/2536274.2536307.url: http://dx.doi.org/10.14778/2536274.2536307.
[2] James Allan. “Introduction to topic detection and tracking”. In: Topic detec-tion and tracking. Kluwer Academic Publishers, 2002, pp. 1–16. isbn: 0-7923-7664-1.
[3] Vicki L Almstrum et al. “Concept inventories in computer science for thetopic discrete mathematics”. In: ACM SIGCSE Bulletin. Vol. 38(4). ACM.2006, pp. 132–145.
[4] Dianne L Anderson, Kathleen M Fisher, and Gregory J Norman. “Develop-ment and evaluation of the conceptual inventory of natural selection”. In:Journal of research in science teaching 39.10 (2002), pp. 952–978.
[5] Rebecca A Atadero et al. “Project-Based Learning in Statics: Curriculum,Student Outcomes, and On-going Questions”. In: age 24 (2014), p. 1.
[6] Farzindar Atefeh and Wael Khreich. “A Survey of Techniques for Event Detec-tion in Twitter”. In: Comput. Intell. 31.1 (2015), pp. 132–164. issn: 0824-7935.doi: 10.1111/coin.12017.
[7] Janelle Margaret Bailey. Development of a Concept Inventory to Assess Stu-dents’ Understanding and Reasoning Difficulties About the Properties and For-mation of Stars. 2006. url: http://hdl.handle.net/10150/193643.
[8] Nilesh Bansal and Nick Koudas. “BlogScope: Spatio-temporal Analysis of theBlogosphere”. In: Proceedings of the 16th International Conference on WorldWide Web. WWW ’07. New York, NY, USA: ACM, 2007, pp. 1269–1270. isbn:978-1-59593-654-7. doi: 10.1145/1242572.1242802. url: http://doi.acm.org/10.1145/1242572.1242802.
68
[9] H. Becker, M. Naaman, and L. Gravano. “Beyond trending topics: Real-worldevent identification on Twitter”. In: Fifth International AAAI Conference onWeblogs and Social Media. 2011.
[10] David M. Blei. “Probabilistic Topic Models”. In: Commun. ACM 55.4 (Apr.2012), pp. 77–84. issn: 0001-0782. doi: 10.1145/2133806.2133826. url:http://doi.acm.org/10.1145/2133806.2133826.
[11] Charles Blundell et al. “Weight Uncertainty in Neural Networks”. In: Pro-ceedings of the 32Nd International Conference on International Conferenceon Machine Learning - Volume 37. ICML’15. Lille, France: JMLR.org, 2015,pp. 1613–1622. url: http://dl.acm.org/citation.cfm?id=3045118.3045290.
[12] Stacey Lowery Bretz and Kimberly J Linenberger. “Development of the enzyme–substrate interactions concept inventory”. In: Biochemistry and Molecular Bi-ology Education 40.4 (2012), pp. 229–233.
[13] Thang D. Bui et al. “Deep Gaussian Processes for Regression using Approx-imate Expectation Propagation”. In: ICML. Vol. 48. JMLR Workshop andConference Proceedings. JMLR.org, 2016, pp. 1472–1481.
[14] Gregoire Burel et al. “On Semantics and Deep Learning for Event Detectionin Crisis Situations”. In: ESWC 2017. Portoroz, Slovenia, 2017.
[15] C. J. Butz, S. Hua, and R. B. Maguire. “A Web-based Bayesian IntelligentTutoring System for Computer Programming”. In: Web Intelli. and AgentSys. 4.1 (Jan. 2006), pp. 77–97. issn: 1570-1263. url: http://dl.acm.org/citation.cfm?id=1239784.1239789.
[16] SM Case and DB Swanson. Item writing manual: Constructing written testquestions for the basic and clinical sciences. 2002.
[17] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. “Information Cred-ibility on Twitter”. In: Proceedings of the 20th International Conference onWorld Wide Web. WWW ’11. Hyderabad, India: ACM, 2011, pp. 675–684.isbn: 978-1-4503-0632-4. doi: 10.1145/1963405.1963500. url: http://doi.acm.org/10.1145/1963405.1963500.
[18] Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. “Emerging Topic De-tection on Twitter Based on Temporal and Social Terms Evaluation”. In: Pro-ceedings of the Tenth International Workshop on Multimedia Data Mining.MDMKDD ’10. Washington, D.C.: ACM, 2010, 4:1–4:10. isbn: 978-1-4503-0220-3. doi: 10.1145/1814245.1814249. url: http://doi.acm.org/10.1145/1814245.1814249.
69
[19] Junghoon Chae et al. “Spatiotemporal social media analytics for abnormalevent detection and examination using seasonal-trend decomposition”. In:2012 IEEE Conference on Visual Analytics Science and Technology, VAST2012, Seattle, WA, USA, October 14-19, 2012. 2012, pp. 143–152. doi: 10.1109/VAST.2012.6400557. url: https://doi.org/10.1109/VAST.2012.6400557.
[20] Deepayan Chakrab. and Kunal Punera. “Event Summarization Using Tweets”.In: (2011).
[21] A. L. Chandrasegaran, David F. Treagust, and Mauro Mocerino. “The de-velopment of a two-tier multiple-choice diagnostic instrument for evaluatingsecondary school students’ ability to describe and explain chemical reactionsusing multiple levels of representation”. In: Chem. Educ. Res. Pract. 8 (32007), pp. 293–307.
[22] Ling Chen and Abhishek Roy. “Event Detection from Flickr Data ThroughWavelet-based Spatial Analysis”. In: Proceedings of the 18th ACM Conferenceon Information and Knowledge Management. CIKM ’09. Hong Kong, China:ACM, 2009, pp. 523–532. isbn: 978-1-60558-512-3. doi: 10.1145/1645953.1646021. url: http://doi.acm.org/10.1145/1645953.1646021.
[23] Flavio Chierichetti et al. “Event Detection via Communication Pattern Anal-ysis”. In: Proceedings of the Eighth International Conference on Weblogs andSocial Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014.2014. url: http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8088.
[24] Konstantina Chrysafiadi and Maria Virvou. “Review: Student Modeling Ap-proaches: A Literature Review for the Last Decade”. In: Expert Syst. Appl.40.11 (Sept. 2013), pp. 4715–4729. issn: 0957-4174. doi: 10.1016/j.eswa.2013.02.007. url: http://dx.doi.org/10.1016/j.eswa.2013.02.007.
[25] Clyde H Coombs, John Edgar Milholland, and Frank Burton Womer. “Theassessment of partial knowledge”. In: Educational and Psychological Measure-ment 16.1 (1956), pp. 13–37.
[26] J. E. Corter et al. “Bugs and biases: Diagnosing misconceptions in the under-standing of diagrams”. In: Proceedings of the 31st Annual Conference of theCognitive Science Society. Ed. by N. A. Taatgen and H. van Rijn. Austin, TX:Cognitive Science Society, 2009, pp. 756–761.
[27] Thomas Deane et al. “Development of the biological experimental designconcept inventory (BEDCI)”. In: CBE-Life Sciences Education 13.3 (2014),pp. 540–551.
70
[28] John S. Denker and Yann LeCun. “Transforming Neural-Net Output Levels toProbability Distributions”. In: NIPS. Morgan Kaufmann, 1990, pp. 853–859.
[29] Marilu Dick-Perez et al. “A quantum chemistry concept inventory for physicalchemistry classes”. In: Journal of Chemical Education 93.4 (2016), pp. 605–612.
[30] Jerome Epstein. “Development and validation of the Calculus Concept Inven-tory”. In: Proceedings of the ninth international conference on mathematicseducation in a global community. Vol. 9. Charlotte, NC. 2007, pp. 165–170.
[31] Geir Evensen. “The Ensemble Kalman Filter: theoretical formulation and prac-tical implementation”. In: 53 (2003), pp. 343–367. doi: 10.1007/s10236-003-0036-9.
[32] Tristan Fletcher. “The Kalman Filter Explained”. 2010.
[33] Yarin Gal and Zoubin Ghahramani. “A Theoretically Grounded Application ofDropout in Recurrent Neural Networks”. In: Advances in Neural InformationProcessing Systems 29: Annual Conference on Neural Information ProcessingSystems 2016, December 5-10, 2016, Barcelona, Spain. 2016, pp. 1019–1027.
[34] Yarin Gal and Zoubin Ghahramani. “Dropout As a Bayesian Approximation:Representing Model Uncertainty in Deep Learning”. In: Proceedings of the 33rdInternational Conference on International Conference on Machine Learning -Volume 48. ICML’16. New York, NY, USA: JMLR.org, 2016, pp. 1050–1059.url: http://dl.acm.org/citation.cfm?id=3045390.3045502.
[35] Kathy Garvin-Doxas and Michael W Klymkowsky. “Understanding random-ness and its impact on student learning: lessons learned from building the Bi-ology Concept Inventory (BCI)”. In: CBE-Life Sciences Education 7.2 (2008),pp. 227–233.
[36] Arthur Gelb. “Applied Optimal Estimation”. In: The MIT Press, 1974. isbn:0262570483, 9780262570480.
[37] Felix A. Gers, JÃijrgen Schmidhuber, and Fred Cummins. “Learning to For-get: Continual Prediction with LSTM”. In: Neural Computation 12 (1999),pp. 2451–2471.
[38] George Goguadze et al. “Evaluating a Bayesian Student Model of DecimalMisconceptions”. In: EDM. 2011.
[39] Alex Graves. “Generating Sequences With Recurrent Neural Networks.” In:CoRR (2014). url: https://arxiv.org/pdf/1308.0850.pdf.
71
[40] Alex Graves. “Practical Variational Inference for Neural Networks”. In: Pro-ceedings of the 24th International Conference on Neural Information Process-ing Systems. NIPS’11. Granada, Spain: Curran Associates Inc., 2011, pp. 2348–2356. isbn: 978-1-61839-599-3.
[41] Gary L Gray et al. “The dynamics concept inventory assessment test: Aprogress report and some results”. In: American Society for Engineering Edu-cation Annual Conference & Exposition. 2005.
[42] James H Hanson and Julia MWilliams. “Using writing assignments to improveself-assessment and communication skills in an engineering statics course”. In:Journal of engineering education 97.4 (2008), p. 515.
[43] Habibah Norehan Haron et al. “Self-regulated learning strategies between theperforming and non-performing students in statics”. In: Interactive Collabora-tive Learning (ICL), 2014 International Conference on. IEEE. 2014, pp. 802–805.
[44] Eric L. Haseltine and James B. Rawlings. “Critical Evaluation of ExtendedKalman Filtering and Moving-Horizon Estimation”. In: Industrial & Engi-neering Chemistry Research 44.8 (June 2004), pp. 2451–2460. doi: 10.1021/ie034308l. url: http://dx.doi.org/10.1021/ie034308l.
[45] José Miguel Hernández-Lobato and Ryan P. Adams. “Probabilistic Backprop-agation for Scalable Learning of Bayesian Neural Networks”. In: Proceedings ofthe 32Nd International Conference on International Conference on MachineLearning - Volume 37. ICML’15. Lille, France: JMLR.org, 2015, pp. 1861–1869. url: http://dl.acm.org/citation.cfm?id=3045118.3045316.
[46] David Hestenes and Ibrahim Halloun. “Interpreting the force concept inven-tory”. In: The Physics Teacher 33.8 (1995), pp. 502–506.
[47] David Hestenes, Malcolm Wells, Gregg Swackhamer, et al. “Force conceptinventory”. In: The physics teacher 30.3 (1992), pp. 141–158.
[48] Randall W. Hill, Jr., andW. Lewis Johnson. “Designing an Intelligent TutoringSystem for Database Modelling”. In: Proceedings of the world conference ofartificial intelligence in education. 1993, pp. 273–281.
[49] Geoffrey Hinton et al. “Improving neural networks by preventing co-adaptationof feature detectors”. In: CoRR abs/1207.0580 (2012). url: http://arxiv.org/abs/1207.0580.
[50] Geoffrey E. Hinton and Drew van Camp. “Keeping the Neural Networks Simpleby Minimizing the Description Length of the Weights”. In: Proceedings of the
[51] Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-term Memory”. In:Neural Comput. 9.9 (Nov. 1997), pp. 1735–1780. issn: 0899-7667. doi: 10.1162/neco.1997.9.8.1735. url: http://dx.doi.org/10.1162/neco.1997.9.8.1735.
[52] Anneke Hommels, Akira Murakami, and Nishimura Shin-Ichi. “Comparison ofthe Ensemble Kalman filter with the Unscented Kalman filter: application tothe construction of a road embankment”. In: Proceedings of the 19th EuropeanYoung Geotechnical Engineer Conference. Gyor, Hungary, 2009.
[53] Yuan Huang et al. “Understanding US regional linguistic variation with Twit-ter data analysis”. In: Computers, Environment and Urban Systems (2015).issn: 0198-9715. url: http://www.sciencedirect.com/science/article/pii/S0198971515300399.
[54] Douglas Huffman and Patricia Heller. “What Does the Force Concept Inven-tory Actually Measure?.” In: Physics Teacher 33.3 (1995), pp. 138–43.
[55] Jonathan Hurlock and Max L. Wilson. “Searching Twitter: Separating theTweet from the Chaff.” In: ICWSM. Ed. by Lada A. Adamic, Ricardo A.Baeza-Yates, and Scott Counts. The AAAI Press, 2011.
[56] Tommi S. Jaakkola and Michael I. Jordan. “Bayesian parameter estimation viavariational methods”. In: statistics and computing 10 (Jan. 2000), pp. 25–37.
[57] Anthony Jacobi et al. “A concept inventory for heat transfer”. In: Frontiersin Education, 2003. FIE 2003 33rd Annual. Vol. 1. IEEE. 2003, T3D–12.
[58] Bernard J. Jansen et al. “Twitter Power: Tweets As Electronic Word of Mouth”.In: J. Am. Soc. Inf. Sci. Technol. 60.11 (Nov. 2009), pp. 2169–2188. issn: 1532-2882. doi: 10.1002/asi.v60:11. url: http://dx.doi.org/10.1002/asi.v60:11.
[59] Akshay Java et al. “Why We Twitter: Understanding Microblogging Usage andCommunities”. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007Workshop on Web Mining and Social Network Analysis. WebKDD/SNA-KDD’07. San Jose, California: ACM, 2007, pp. 56–65. isbn: 978-1-59593-848-0. doi:10.1145/1348549.1348556. url: http://doi.acm.org/10.1145/1348549.1348556.
73
[60] Andrew H. Jazwinski. “Stochastic processes and filtering theory”. In: Mathe-matics in science and engineering 64. New York, NY [u.a.]: Acad. Press, 1970.isbn: 0123815509.
[61] Finn V. Jensen and Thomas D. Nielsen. Bayesian Networks and DecisionGraphs. 2nd. Springer Publishing Company, Incorporated, 2007.
[62] Simon J. Julier and Jeffrey K. Uhlmann. “Unscented Filtering and NonlinearEstimation”. In: PROCEEDINGS OF THE IEEE. 2004, pp. 401–422.
[63] Pamela Kalas et al. “Development of a meiosis concept inventory”. In: CBE-Life Sciences Education 12.4 (2013), pp. 655–664.
[64] Andrej Karpathy and Li Fei-Fei. “Deep Visual-Semantic Alignments for Gen-erating Image Descriptions”. In: IEEE Trans. Pattern Anal. Mach. Intell.39.4 (Apr. 2017), pp. 664–676. issn: 0162-8828. doi: 10.1109/TPAMI.2016.2598339. url: https://doi.org/10.1109/TPAMI.2016.2598339.
[65] Matthias Katzfuss, Jonathan R. Stroud, and Christopher K. Wikle. “Under-standing the Ensemble Kalman Filter”. In: The American Statistician 70.4(2016), pp. 350–357. doi: 10.1080/00031305.2016.1141709.
[66] Yoon Kim et al. “Character-aware Neural Language Models”. In: Proceed-ings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI’16.Phoenix, Arizona: AAAI Press, 2016, pp. 2741–2749. url: http://dl.acm.org/citation.cfm?id=3016100.3016285.
[67] Duane Knudson et al. “Development and evaluation of a biomechanics conceptinventory”. In: Sports Biomechanics 2.2 (2003), pp. 267–277.
[68] Fantian Kong et al. “Mobile Robot Localization Based on Extended KalmanFilter”. In: 2006 6th World Congress on Intelligent Control and Automation.Vol. 2. 2006, pp. 9242–9246. doi: 10.1109/WCICA.2006.1713789.
[69] Stephen Krause et al. “Development, testing, and application of a chemistryconcept inventory”. In: Frontiers in Education, 2004. FIE 2004. 34th Annual.IEEE. 2004, T1G–1.
[70] John Krumm and Eric Horvitz. “Eyewitness: Identifying Local Events viaSpace-time Signals in Twitter Feeds”. In: Proceedings of the 23rd SIGSPA-TIAL International Conference on Advances in Geographic Information Sys-tems. GIS ’15. Bellevue, Washington: ACM, 2015, 20:1–20:10. isbn: 978-1-4503-3967-4. doi: 10.1145/2820783.2820801. url: http://doi.acm.org/10.1145/2820783.2820801.
74
[71] Haewoon Kwak et al. “What is Twitter, a Social Network or a News Media?”In: Proceedings of the 19th International Conference on World Wide Web.WWW ’10. Raleigh, North Carolina, USA: ACM, 2010, pp. 591–600. isbn:978-1-60558-799-8. doi: 10.1145/1772690.1772751. url: http://doi.acm.org/10.1145/1772690.1772751.
[72] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. “Simpleand Scalable Predictive Uncertainty Estimation using Deep Ensembles”. In:Advances in Neural Information Processing Systems 30: Annual Conference onNeural Information Processing Systems 2017, 4-9 December 2017, Long Beach,CA, USA. 2017, pp. 6405–6416. url: http://papers.nips.cc/paper/7219-simple - and - scalable - predictive - uncertainty - estimation - using -deep-ensembles.
[73] Norm G. Lederman et al. “Views of nature of science questionnaire: Towardvalid and meaningful assessment of learners’ conceptions of nature of science”.In: Journal of Research in Science Teaching 39.6 (2002), pp. 497–521. issn:1098-2736. doi: 10.1002/tea.10034. url: http://dx.doi.org/10.1002/tea.10034.
[74] Kathy Lee et al. “Adverse Drug Event Detection in Tweets with Semi-SupervisedConvolutional Neural Networks”. In: Proceedings of the 26th InternationalConference on World Wide Web. WWW ’17. Perth, Australia: InternationalWorld Wide Web Conferences Steering Committee, 2017, pp. 705–714. isbn:978-1-4503-4913-0.
[75] Kyumin Lee, Brian David Eoff, and James Caverlee. “Seven Months with theDevils: A Long-Term Study of Content Polluters on Twitter.” In: ICWSM. Ed.by Lada A. Adamic, Ricardo A. Baeza-Yates, and Scott Counts. The AAAIPress, 2011. url: http://dblp.uni-trier.de/db/conf/icwsm/icwsm2011.html#LeeEC11.
[76] Ryong Lee, Shoko Wakamiya, and Kazutoshi Sumiya. “Discovery of UnusualRegional Social Activities Using Geo-tagged Microblogs”. In: World Wide Web14.4 (July 2011), pp. 321–349. issn: 1386-145X.
[77] Richard B Lewis. “Creative Teaching and Learning in a Statics Class.” In:Engineering Education 81.1 (1991), pp. 15–18.
[78] Rui Li et al. “TEDAS: A Twitter-based Event Detection and Analysis System”.In: Proceedings of the 2012 IEEE 28th International Conference on Data En-gineering. ICDE ’12. Washington, DC, USA: IEEE Computer Society, 2012,pp. 1273–1276. isbn: 978-0-7695-4747-3. doi: 10.1109/ICDE.2012.125. url:http://dx.doi.org/10.1109/ICDE.2012.125.
75
[79] Julie C Libarkin and Steven W Anderson. “Development of the geoscienceconcept inventory”. In: Proceedings of the National STEM Assessment Con-ference, Washington DC. 2006, pp. 148–158.
[80] Xiao Lin and Gabriel Terejanu. “Fast Approximate Data Assimilation forHigh-Dimensional Problems”. In: 2017. url: https://arxiv.org/abs/1708.02340.
[81] Thomas A Litzinger et al. “A cognitive study of problem solving in statics”.In: Journal of Engineering Education 99.4 (2010), pp. 337–353.
[82] Ran Liu, Rony Patel, and Kenneth R. Koedinger. “Modeling Common Miscon-ceptions in Learning Process Data”. In: Proceedings of the Sixth InternationalConference on Learning Analytics & Knowledge. LAK ’16. Edinburgh, UnitedKingdom: ACM, 2016, pp. 369–377. isbn: 978-1-4503-4190-5. doi: 10.1145/2883851.2883967. url: http://doi.acm.org/10.1145/2883851.2883967.
[83] David J. C. MacKay. “A Practical Bayesian Framework for BackpropagationNetworks”. In: Neural Comput. 4.3 (May 1992), pp. 448–472. issn: 0899-7667.doi: 10.1162/neco.1992.4.3.448. url: http://dx.doi.org/10.1162/neco.1992.4.3.448.
[84] GermÃąn Kruszewski Marco Baroni, Georgiana Dinu. “Don’t count, predict!A systematic comparison of context-counting vs. context-predicting seman-tic vectors”. In: 52nd Annual Meeting of the Association for ComputationalLinguistics, ACL 2014 - Proceedings of the Conference 1 (2014), pp. 238–247.
[85] A. Marcus et al. “TwitInfo: Aggregating and visualizing microblogs for eventexploration”. In: Proceedings of the 2011 annual conference on Human factorsin computing systems. ACM. 2011, pp. 227–236.
[86] Adam Marcus et al. “Processing and Visualizing the Data in Tweets”. In:SIGMOD Record 40.4 (Dec. 2011), pp. 21–27.
[87] Dimitris Margaritis. “Learning Bayesian Network Model Structure From Data”.PhD thesis. School of Computer Science, Carnegie-Mellon University, 2003.
[88] Jay Martin, John Mitchell, and Ty Newell. “Development of a concept inven-tory for fluid mechanics”. In: Frontiers in Education, 2003. FIE 2003 33rdAnnual. Vol. 1. IEEE. 2003, T3D–23.
[89] Jay Mathews. “Just whose idea was all this testing”. In: The Washington Post14 (2006).
76
[90] Michael Mathioudakis and Nick Koudas. “TwitterMonitor: Trend Detectionover the Twitter Stream”. In: Proceedings of the 2010 ACM SIGMOD In-ternational Conference on Management of Data. SIGMOD ’10. Indianapo-lis, Indiana, USA: ACM, 2010, pp. 1155–1158. isbn: 978-1-4503-0032-2. doi:10.1145/1807167.1807306. url: http://doi.acm.org/10.1145/1807167.1807306.
[91] Polykarpos Meladianos et al. “Degeneracy-Based Real-Time Sub-Event Detec-tion in Twitter Stream.” In: ICWSM. Ed. by Meeyoung Cha, Cecilia Mascolo,and Christian Sandvig. AAAI Press, 2015, pp. 248–257. isbn: 978-1-57735-733-9.
[92] K Clark Midkiff, Thomas A Litzinger, and DL Evans. “Development of en-gineering thermodynamics concept inventory instruments”. In: Frontiers inEducation Conference, 2001. 31st Annual. Vol. 2. IEEE. 2001, F2A–F23.
[93] Eva MillÃąn and JosÃľ-Luis PÃľrez de-la Cruz. “A Bayesian Diagnostic Algo-rithm for Student Modeling and its Evaluation.” In: User Model. User-Adapt.Interact. 12.2-3 (2002), pp. 281–330.
[94] Multiple-Choice Test Preparation Manual.
[95] Mor Naaman, Hila Becker, and Luis Gravano. “Hip and trendy: Characterizingemerging trends on Twitter”. In: JASIST 62.5 (2011), pp. 902–918.
[96] Radford M. Neal. Bayesian Learning for Neural Networks. Secaucus, NJ, USA:Springer-Verlag New York, Inc., 1996. isbn: 0387947248.
[97] Radford M. Neal and Geoffrey E. Hinton. “Learning in Graphical Models”. In:ed. by Michael I. Jordan. Cambridge, MA, USA: MIT Press, 1999. Chap. AView of the EM Algorithm That Justifies Incremental, Sparse, and Other Vari-ants, pp. 355–368. isbn: 0-262-60032-3. url: http://dl.acm.org/citation.cfm?id=308574.308679.
[98] Jeffrey L Newcomer. “Inconsistencies in students’ approaches to solving prob-lems in Engineering Statics”. In: 2010 IEEE Frontiers in Education Conference(FIE). IEEE. 2010, F3G–1.
[99] Jeffrey L Newcomer and Paul S Steif. “Student explanations of answers toconcept questions as a window into prior misconceptions”. In: Proceedings.Frontiers in Education. 36th Annual Conference. IEEE. 2006, pp. 6–11.
[100] Jeffrey L Newcomer and Paul S Steif. “Student thinking about static equilib-rium: Insights from written explanations to a concept question”. In: Journalof Engineering Education 97.4 (2008), pp. 481–490.
77
[101] Jeffrey Nichols, Jalal Mahmud, and Clemens Drews. “Summarizing SportingEvents Using Twitter”. In: Proceedings of the 2012 ACM International Con-ference on Intelligent User Interfaces. IUI ’12. Lisbon, Portugal: ACM, 2012,pp. 189–198. isbn: 978-1-4503-1048-2. doi: 10.1145/2166966.2166999. url:http://doi.acm.org/10.1145/2166966.2166999.
[102] Branislav M Notaros. “Concept inventory assessment instruments for elec-tromagnetics education”. In: Antennas and Propagation Society InternationalSymposium, 2002. IEEE. Vol. 1. IEEE. 2002, pp. 684–687.
[103] Brendan O’Connor, Michel Krieger, and David Ahn. “TweetMotif: ExploratorySearch and Topic Summarization for Twitter.” In: ICWSM. Ed. by William W.Cohen and Samuel Gosling. The AAAI Press, 2010. url: http://dblp.uni-trier.de/db/conf/icwsm/icwsm2010.html#OConnorKA10.
[104] Tokunbo Ogunfunmi and Mahmudur Rahman. “A concept inventory for anelectric circuits course: Rationale and fundamental topics”. In: Proceedings of2010 IEEE International Symposium on Circuits and Systems. IEEE. 2010,pp. 2804–2807.
[105] Levent Ozbek and Umit Ozlale. “Employing the extended Kalman filter inmeasuring the output gap”. In: Journal of Economic Dynamics and Control29.9 (Sept. 2005), pp. 1611–1622.
[106] Leysia Palen et al. “Twitter-based Information Distribution during the 2009Red River Valley Flood Threat”. In: Bulletin of the American Society forInformation Science and Technology (2010).
[107] Bo Pang and Lillian Lee. “Seeing stars: Exploiting class relationships for sen-timent categorization with respect to rating scales”. In: Proceedings of ACL.2005, pp. 115–124.
[108] Jeffrey Pennington, Richard Socher, and Christopher D Manning. “Glove:Global Vectors for Word Representation.” In: EMNLP. Vol. 14. 2014, pp. 1532–1543.
[109] Sasa Petrovic, Miles Osborne, and Victor Lavrenko. “Streaming First StoryDetection with application to Twitter.” In: HLT-NAACL. The Associationfor Computational Linguistics, 2010, pp. 181–189. url: http://dblp.uni-trier.de/db/conf/naacl/naacl2010.html#PetrovicOL10.
[110] Timothy A Philpot et al. “Using games to teach statics calculation procedures:Application and assessment”. In: Computer Applications in Engineering Edu-cation 13.3 (2005), pp. 222–232.
78
[111] Daniela Pohl, Abdelhamid Bouchachia, and Hermann Hellwagner. “AutomaticSub-event Detection in Emergency Management Using Social Media”. In: Pro-ceedings of the 21st International Conference on World Wide Web. WWW ’12Companion. Lyon, France: ACM, 2012, pp. 683–686. isbn: 978-1-4503-1230-1.
[112] Daniela Pohl, Abdelhamid Bouchachia, and Hermann Hellwagner. “Social Me-dia for Crisis Management: Clustering Approaches for Sub-Event Detection”.In: Multimedia Tools and Applications (2013).
[113] M. C. Polson and J. J. Richardson, eds. Foundations of Intelligent TutoringSystems. Hillsdale, NJ, USA: L. Erlbaum Associates Inc., 1988. isbn: 0-805-80053-0.
[114] Ana-Maria Popescu and Marco Pennacchiotti. “Detecting Controversial Eventsfrom Twitter”. In: Proceedings of the 19th ACM International Conference onInformation and Knowledge Management. CIKM ’10. Toronto, ON, Canada:ACM, 2010, pp. 1873–1876. isbn: 978-1-4503-0099-5. doi: 10.1145/1871437.1871751. url: http://doi.acm.org/10.1145/1871437.1871751.
[115] Kevin Rawson and Tom Stahovich. “Predicting course performance from home-work habits”. In: Proceedings of the 2013 American society for engineeringeducation annual conference and exposition. 2013.
[116] Jim Richardson et al. “Development of a concept inventory for strength ofmaterials”. In: Frontiers in Education, 2003. FIE 2003 33rd Annual. Vol. 1.IEEE. 2003, T3D–29.
[117] Robert Rippey. “Probabilistic testing”. In: Journal of Educational Measure-ment 5.3 (1968), pp. 211–215.
[118] Isabelle Rivals and Léon Personnaz. “A recursive algorithm based on the ex-tended Kalman filter for the training of feedforward neural models”. In: Neu-rocomputing 20.1-3 (1998), pp. 279–294.
[119] Hasim Sak, Andrew W. Senior, and Françoise Beaufays. “Long short-termmemory recurrent neural network architectures for large scale acoustic model-ing”. In: INTERSPEECH 2014, 15th Annual Conference of the InternationalSpeech Communication Association, Singapore, September 14-18, 2014. 2014,pp. 338–342.
[120] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. “Earthquake ShakesTwitter Users: Real-time Event Detection by Social Sensors”. In: Proceedingsof the 19th International Conference on World Wide Web. WWW ’10. Raleigh,North Carolina, USA: ACM, 2010, pp. 851–860. isbn: 978-1-60558-799-8. doi:
[121] Hanan Samet et al. “Reading News with Maps by Exploiting Spatial Syn-onyms”. In: Commun. ACM 57.10 (Sept. 2014), pp. 64–77. issn: 0001-0782.doi: 10.1145/2629572. url: http://doi.acm.org/10.1145/2629572.
[122] Jagan Sankaranarayanan et al. “TwitterStand: News in Tweets”. In: Proceed-ings of the 17th ACM SIGSPATIAL International Conference on Advances inGeographic Information Systems. GIS ’09. Seattle, Washington: ACM, 2009,pp. 42–51. isbn: 978-1-60558-649-6. doi: 10.1145/1653771.1653781. url:http://doi.acm.org/10.1145/1653771.1653781.
[123] Antti Savinainen and Philip Scott. “Using the Force Concept Inventory tomonitor student learning and to plan teaching”. In: Physics Education 37.1(2002), p. 53. url: http://stacks.iop.org/0031-9120/37/i=1/a=307.
[124] Erich Schubert, Michael Weiler, and Hans-Peter Kriegel. “SigniTrend: Scal-able Detection of Emerging Topics in Textual Streams by Hashed SignificanceThresholds”. In: Proceedings of the 20th ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining. KDD ’14. New York, NewYork, USA: ACM, 2014, pp. 871–880. isbn: 978-1-4503-2956-9. doi: 10.1145/2623330.2623740. url: http://doi.acm.org/10.1145/2623330.2623740.
[125] David A. Shamma, Lyndon Kennedy, and Elizabeth F. Churchill. “Tweet theDebates: Understanding Community Annotation of Uncollected Sources”. In:Proceedings of the First SIGMM Workshop on Social Media. WSM ’09. Bei-jing, China: ACM, 2009, pp. 3–10. isbn: 978-1-60558-759-2. doi: 10.1145/1631144.1631148. url: http://doi.acm.org/10.1145/1631144.1631148.
[126] Chao Shen et al. “A Participant-based Approach for Event SummarizationUsing Twitter Streams”. In: Human Language Technologies: Conference ofthe North American Chapter of the Association of Computational Linguistics,Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia,USA. 2013, pp. 1152–1162. url: http://aclweb.org/anthology/N/N13/N13-1135.pdf.
[127] Emir H Shuford Jr, Arthur Albert, and H Edward Massengill. “Admissibleprobability measurement procedures”. In: Psychometrika 31.2 (1966), pp. 125–145.
[128] Sharad Singhal and Lance Wu. “Advances in Neural Information ProcessingSystems 1”. In: ed. by David S. Touretzky. San Francisco, CA, USA: MorganKaufmann Publishers Inc., 1989. Chap. Training Multilayer Perceptrons withthe Extended Kalman Algorithm, pp. 133–140. isbn: 1-558-60015-9.
80
[129] Edward Snelson and Zoubin Ghahramani. “Variable Noise and DimensionalityReduction for Sparse Gaussian processes”. In: UAI ’06, Proceedings of the 22ndConference in Uncertainty in Artificial Intelligence, Cambridge, MA, USA,July 13-16, 2006. 2006.
[130] Daniel Soudry, Itay Hubara, and Ron Meir. “Expectation Backpropagation:Parameter-Free Training of Multilayer Neural Networks with Continuous orDiscrete Weights.” In: NIPS. Ed. by Zoubin Ghahramani et al. 2014, pp. 963–971. url: http://dblp.uni-trier.de/db/conf/nips/nips2014.html#SoudryHM14.
[131] Nitish Srivastava et al. “Dropout: A Simple Way to Prevent Neural Networksfrom Overfitting”. In: J. Mach. Learn. Res. 15.1 (Jan. 2014), pp. 1929–1958.
[132] Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. “TrainingVery Deep Networks”. In: Proceedings of the 28th International Conferenceon Neural Information Processing Systems - Volume 2. NIPS’15. Montreal,Canada: MIT Press, 2015, pp. 2377–2385. url: http : / / dl . acm . org /citation.cfm?id=2969442.2969505.
[133] Paul S Steif. “An articulation of the concepts and skills which underlie en-gineering statics”. In: Frontiers in Education, 2004. FIE 2004. 34th Annual.IEEE. 2004, F1F–5.
[134] Paul S Steif. “Comparison between performance on a concept inventory andsolving of multifaceted problems”. In: Frontiers in Education, 2003. FIE 200333rd Annual. Vol. 1. IEEE. 2003, T3D–17.
[135] Paul S Steif. “Initial data from a statics concept inventory”. In: Proceedingsof the 2004, American Society of Engineering Education Conference and Ex-position, St. Lake City, UT. 2004.
[136] Paul S Steif and John A Dantzler. “A statics concept inventory: Developmentand psychometric analysis”. In: Journal of Engineering Education 94.4 (2005),p. 363.
[137] Paul S Steif and Mary A Hansen. “New practices for administering and analyz-ing the results of concept inventories”. In: Journal of Engineering Education96.3 (2007), p. 205.
[138] Andrea Stone et al. “The statistics concept inventory: A pilot study”. In:Frontiers in Education, 2003. FIE 2003 33rd Annual. Vol. 1. IEEE. 2003,T3D–1.
81
[139] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence Learningwith Neural Networks”. In: Proceedings of the 27th International Conferenceon Neural Information Processing Systems. NIPS’14. Montreal, Canada: MITPress, 2014, pp. 3104–3112. url: http://dl.acm.org/citation.cfm?id=2969033.2969173.
[140] Pinchas Tamir. “Some issues related to the use of justifications to multiple-choice answers”. In: Journal of Biological Education 23.4 (1989), pp. 285–292.doi: 10.1080/00219266.1989.9655083.
[141] Gabriel A. Terejanu. “Unscented kalman filter tutorial”. In: Workshop onLarge-Scale Quantification of Uncertainty. Sandia National Laboratories. 2009,pp. 1–6.
[142] Michael E. Tipping and Chris M. Bishop. “Probabilistic Principal ComponentAnalysis”. In: Journal of the Royal Statistical Society, Series B 61 (1999),pp. 611–622.
[143] Andranik Tumasjan et al. “Election Forecasts With Twitter”. In: Social Sci-ence Computer Review 29.4 (Nov. 2011), pp. 402–418. issn: 1552-8286. doi:10.1177/0894439310386557.
[144] George Valkanas and Dimitrios Gunopulos. “How the Live Web Feels AboutEvents”. In: Proceedings of the 22Nd ACM International Conference on Infor-mation & Knowledge Management. CIKM ’13. San Francisco, California, USA:ACM, 2013, pp. 639–648. isbn: 978-1-4503-2263-8. doi: 10.1145/2505515.2505572.
[145] Kurt Vanlehn et al. “The Andes Physics Tutoring System: Lessons Learned”.In: Int. J. Artif. Intell. Ed. 15.3 (Aug. 2005), pp. 147–204. issn: 1560-4292.url: http://dl.acm.org/citation.cfm?id=1434930.1434932.
[146] Stella Vosniadou and William F. Brewer. “Mental models of the earth: Astudy of the conceptual change in childhood”. In: Cognitive Psychology (1992),pp. 535–585. doi: doi:10.1016/0010-0285(92)90018-W.
[147] Kathleen E Wage et al. “The signals and systems concept inventory”. In: IEEETransactions on Education 48.3 (2005), pp. 448–461.
[148] Eric A. Wan and Rudolph Van Der Merwe. “The Unscented Kalman Filter forNonlinear Estimation”. In: 2000, pp. 153–158.
[149] Hao Wang et al. “A System for Real-time Twitter Sentiment Analysis of 2012U.S. Presidential Election Cycle”. In: Proceedings of the ACL 2012 SystemDemonstrations. ACL ’12. Jeju Island, Korea: Association for Computational
82
Linguistics, 2012, pp. 115–120. url: http://dl.acm.org/citation.cfm?id=2390470.2390490.
[150] Xiaofeng Wang, Donald E. Brown, and Matthew S. Gerber. “Spatio-temporalmodeling of criminal incidents using geographic, demographic, and twitter-derived information.” In: ISI. Ed. by Daniel Zeng et al. IEEE, 2012, pp. 36–41. isbn: 978-1-4673-2105-1.
[151] Rik Warren, Robert E. Smith, and Anne K. Cybenko. Use of MahalanobisDistance for Detecting Outliers and Outlier Clusters in Markedly Non-normalData: A Vehicular Traffic Example. Tech. rep. Air Force Materiel Command,2011.
[152] Jianshu Weng and Bu-Sung Lee. “Event Detection in Twitter”. In: Proceedingsof the Fifth International Conference on Weblogs and Social Media, Barcelona,Catalonia, Spain. 2011. url: http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2767.
[153] Yiming Yang, Tom Pierce, and Jaime Carbonell. “A Study of Retrospectiveand On-line Event Detection”. In: Proceedings of the 21st Annual Interna-tional ACM SIGIR Conference on Research and Development in InformationRetrieval. SIGIR ’98. Melbourne, Australia: ACM, 1998, pp. 28–36. isbn: 1-58113-015-5. doi: 10.1145/290941.290953. url: http://doi.acm.org/10.1145/290941.290953.
[154] J. S. Yedidia, W. T. Freeman, and Y. Weiss. “Constructing Free-energy Ap-proximations and Generalized Belief Propagation Algorithms”. In: IEEE Trans.Inf. Theor. 51.7 (July 2005), pp. 2282–2312. issn: 0018-9448. url: http ://dx.doi.org/10.1109/TIT.2005.850085.
[155] Zhijun Yin et al. “Geographical topic discovery and comparison”. In: Pro-ceedings of the 20th international conference on World wide web. ACM. 2011,pp. 247–256.
[156] Juan Diego Zapata Rivera. “Learning Environments Based on Inspectable Stu-dent Models”. AAINQ83573. PhD thesis. Saskatoon, Canada: University ofSaskatchewan, 2003. isbn: 0-612-83573-1.
[157] Chao Zhang et al. “GeoBurst: Real-Time Local Event Detection in Geo-TaggedTweet Streams.” In: SIGIR. Ed. by Raffaele Perego et al. ACM, 2016, pp. 513–522. isbn: 978-1-4503-4069-4. url: http://dblp.uni-trier.de/db/conf/sigir/sigir2016.html#ZhangZYZZKWH16.
[158] Chao Zhang et al. “TrioVecEvent: Embedding-Based Online Local Event De-tection in Geo-Tagged Tweet Streams”. In: Proceedings of the 2017 ACM
83
SIGKDD International Conference on Knowledge Discovery and Data Min-ing. KDD 2017. Halifax, Nova Scotia, Canada: ACM, 2017.
[159] Siqi Zhao et al. “Human as Real-Time Sensors of Social and Physical Events:A Case Study of Twitter and Sports Games”. In: CoRR abs/1106.4300 (2011).
[160] Xiangmin Zhou and Lei Chen. “Event Detection over Twitter Social MediaStreams”. In: The VLDB Journal 23.3 (June 2014), pp. 381–400. issn: 1066-8888. doi: 10.1007/s00778-013-0320-3. url: http://dx.doi.org/10.1007/s00778-013-0320-3.
[161] Arkaitz Zubiaga et al. “Towards Real-time Summarization of Scheduled Eventsfrom Twitter Streams”. In: Proceedings of the 23rd ACM Conference on Hy-pertext and Social Media. HT ’12. Milwaukee, Wisconsin, USA: ACM, 2012,pp. 319–320. isbn: 978-1-4503-1335-3. doi: 10.1145/2309996.2310053. url:http://doi.acm.org/10.1145/2309996.2310053.