Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals Tom Dupr´ e La Tour 1 Thomas Moreau 2 Mainak Jas 1 Alexandre Gramfort 2 1 LTCI, T´ el´ ecom ParisTech, Universit´ e Paris Saclay, Paris, France, 2 Parietal, INRIA, Universit´ e Paris Saclay, Saclay, France Code available at: https://alphacsc.github.io 1. Convolutional Sparse Coding (CSC) Convolutional linear model : X = K X k =1 z k * D k + E , E∼N (0,σI ) (1) with signal X ∈ R P ×T (P sensors and T samples), K patterns D k ∈ R P ×L (duration L) and activations z k ∈ R e T such that e T = T - L +1. Multivariate CSC min D k ,z n k N X n=1 1 2 X n - K X k =1 z n k * D k 2 2 + λ K X k =1 kz n k k 1 , s.t. kD k k 2 2 ≤ 1 and z n k ≥ 0 , (2) Multivariate CSC with rank-1 constraint min u k ,v k ,z n k N X n=1 1 2 X n - K X k =1 z n k * (u k v > k ) 2 2 + λ K X k =1 kz n k k 1 , s.t. ku k k 2 2 ≤ 1 , kv k k 2 2 ≤ 1 and z n k ≥ 0 . (3) ← One source in the brain is spread linearly and instantaneously over all sensors. The rank-1 hypothesis is particularly suited for MEG signals. 2. Z -step: solving for the activations The Z -step solves (2) or (3) for a fixed dictionary. We solve it using Greedy Coordinate Descent (GCD): I Optimization problem for one coordinate has a close form: z 0 k [t] = max β k [t] - λ kD k k 2 2 , 0 (4) with β k [t]= h D k * X - ∑ K l =1 z l * D l + z k [t]e t * D k i [t]. I Greedily update coefficient (k 0 ,t 0 ) = argmax |Z k [t] - Z 0 k [t]| Locally Greedy Coordinate Descent (LGCD): [Moreau et al., 2018] I Select the best coordinate locally, on one of M contiguous segments C m : (k 0 ,t 0 ) = argmax (k,t)∈C m |Z k [t] - Z 0 k [t]| I For M = b e T/(2L - 1)c, the computational complexity of choosing the coefficient matches the complexity of performing the update. I It is efficient when the updates are weakly dependent and when the solution is sparse. 3. D -step: solving for the atoms The D -step solves (2) or (3) for a fixed activations. We solve it using Projected Gradient Decent (PGD): I Separate minimization over {u k } k and {v k } k . I The step size is set using a Armijo backtracking line-search. Function and gradients computations: I The gradient relatively to a full atom D k = u k v > k ∈ R P ×L : ∇ D k E = N X n=1 (z n k ) > * X n - K X l =1 z n l * D l =Φ k - K X l =1 Ψ k,l * D l , (5) where Φ k ∈ R P ×L and Ψ k,l ∈ R 2L-1 are constant during a D -step and can be precomputed. I The gradients relatively to u k and v k are obtained using the chain rule: ∇ u k E =(∇ D k E )v k ∈ R P , (6) ∇ v k E = u > k (∇ D k E ) ∈ R L , (7) I E can be computed, up to a constant term C , with the following: E = K X k =1 u > k (∇ D k E )v k + C. (8) ⇒ Using this pre-computation helps scaling with the number of channels as the computation are in O (P + L) instead of O (PL). 4. Multivariate speed benchmark Speed benchmark for univariate-CSC (top) and multivariate-CSC (bottom). 5. Simulated Signals I Signals are generated following (1) over P channels. I Recovered temporal patterns b v k are evaluated using: loss(b v )= min s∈S(K ) K X k =1 min ( kb v s(k ) - v k k 2 2 , kb v s(k ) + v k k 2 2 ) . ⇒ More channels improves the pattern recovery as it disentangling super-imposed patterns. 6. Experimental Signals (a)Temporal waveform (b) Spatial pattern (c) PSD (dB) (d) Dipole fit Atom learned using the MNE-somatosensory dataset. The learned temporal pattern illustrate mu-waveforms described for instance in [Cole and Voytek, 2017]. NIPS, 2018 Montreal, Canada