Answering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T.Vogelstein Dept of Statistical Science & Mathematics, Duke University Institute for Data Intensive Engineering and Sciences, Johns Hopkins University Endeavor Scientist Fellowship, Child Mind Institute I’ve tried to avoid text being down here so everybody can see everything
25
Embed
Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Answering Neuroscience Questions from Connectomics Data using Statistical Tools
Joshua T. VogelsteinDept of Statistical Science & Mathematics, Duke University
Institute for Data Intensive Engineering and Sciences, Johns Hopkins UniversityEndeavor Scientist Fellowship, Child Mind Institute
I’ve tried to avoid text being down here so everybody can see everything
Take Home Messages• Graphs are mathematical objects too!
• Standard (“Euclidean”) statistical tools are inappropriate
• Nonetheless, we can write down statistical distributions over graphs
• We can formally state many neurobiological questions via statistical graph theory (SGT)
• We can map graphs to Euclidean space, we want those mappings to have desired statistical properties such as consistency, robustness, etc.
• Sometimes STG may be useful
Outline• Motivation
• Some theory stuff
• (an application)
• Celebrations!
A Concrete Motivating Example
• We estimate graphs from two populations of brains (e.g., different psychiatric conditions, sex, personalities, etc.)
• We want to know: are the two populations different
• This is like a two-sample t-test for graph-valued observations
What I Do & Don’t Care About(for the purposes of this talk)
• Don’t: How to estimate graphs
• Don’t: Where the graphs came from, eg, MRI, EM, Calcium, Ephys, etc.
• Do: I assume somebody gave me graphs estimated from neural data, some how, using some experimental technique, with neurons and synapses wrong/missing, from some species, at some scale, and i don’t care how (for the purposes of this talk)
Formal Statement of Problem
• G1,...,Gn ~ F0, Gn+1,....Gn+m ~ F1
• H0: F0 = F1
• HA: F0 != F1
• NB: all graphs have the same vertex set (for here, for now)
Graph are Mathematical Objects Too
• G=(V,E)
• V is a set of vertices (nodes) (perhaps a vertex is a neuron)
• E is a set of edges (arcs/links) (perhaps an edge is a synapse)
• Graphs are simple meaning: edges are binary, undirected, no loops (for here, for now)
• I am not analyzing functions of graphs (eg, degree distribution) in this talk; that is an interesting and complementary topic
Why Not Just Use Lasso?
• A is an adjacency matrix, where A(u,v)=1 iff u~v
• Let A & A’ be adjacency matrices of two graphs
• We could vectorize and then use standard techniques, but we might lose some structure from the data
• For example, rows & cols of A correspond to the same vertex, if we vectorize, standard analysis techniques do not use that information
Recall: All of Statistics• The statistical properties of a hypothesis
test (e.g., its power) depends on a statistical model
• For example, a t-test is optimal under certain assumptions data
• But when data are corrupted, robust methods, such as the rank-sum test, have higher power
Conjecture: SGT might be useful to cast and address connectomics questions
Distributions over graphs
• G ~ P, P is some distribution over graphs
• P is discrete, so P(G) is the likelihood of graph G
• Two extremes examples: (i) ER(n,p), (ii) Categorical(theta)
• Number of possible graphs with n vertices?
(draw it; booyah Pillow!)
Latent Position Random Graphs• P[A(u,v)] = f(u,v) in (0,1)
• Posit the existence of a latent vector for each vertex
• The probability of a connection twix u & v is independent of everything conditioned on the two latent vectors
• Intuition from: (i) social network analysis, (ii) neuroscience
• We can also include observed attributes for each vector
Random Dot Product Graphs• Let Xu in R^d for each u
• f(u,v) = <Xu,Xv>
• X=(X1,...,Xn) can be estimated consistently up to a rotation via eig
• X can be estimated quickly via eig
• For sparse graphs, X can be estimated even with n=10^6 or more