Discrete Denoising with Shifts Taesup Moon Yahoo! Labs EE477 Guest Lecture November 10, 2011 Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 1 / 24
Discrete Denoising with Shifts
Taesup Moon
Yahoo! Labs
EE477 Guest LectureNovember 10, 2011
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 1 / 24
Discrete Denoising with Shifts
1 Prediction with Experts’ Advice
2 Discrete Denoising with ShiftsRecap of DUDEMotivationNew algorithm: S-DUDEResults
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 2 / 24
Discrete Denoising with Shifts Recap of DUDE
Discrete denoising
Xt, Zt, Xt take values in finite alphabets
Choose Xn1 as close as possible to Xn
1 , based on theentire Zn
1Ex) text correction, image denoising, DNA sequence analyses, etc.Performance metric: per-symbol average loss
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 3 / 24
Discrete Denoising with Shifts Recap of DUDE
DUDE is the first universal discretedenoiser
DUDE - [Weissman et.al 05]
For location t to be denoised, do :
1 fix the window size k
2 find left k-context (`1, . . . , `k) and right k-context (r1, . . . , rk) of zt
`1 `2 · · · `k zt r1 r2 · · · rk
3 count all occurrences of symbols in zn with the same context
4 decide on xt according to
xt(zt+kt−k) = simple rule(Π,Λ, count vector[zn, zt−1
t−k, zt+kt+1 ], zt)
Whenever DUDE sees zt+1t−kztz
t+kt+1 , it makes the same decision for zt
DUDE is a “sliding window” denoiser
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 4 / 24
Discrete Denoising with Shifts Recap of DUDE
Ex 1 : stationary bit stream gets corrupted
Xn : 00000011111110000000000111111111100000001111111110000
Zn : 00100011101110010001000111110111100000011110111110001
source : binary Markov chain with p = 0.1, sequence length n = 106
1! p
0 1
p
p
1! p
noise : BSC(δ = 0.1)
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 5 / 24
Discrete Denoising with Shifts Recap of DUDE
Ex 1 : stationary bit stream gets corrupted
Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001
source : binary Markov chain with p = 0.1, sequence length n = 106
1! p
0 1
p
p
1! p
noise : BSC(δ = 0.1)
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 5 / 24
Discrete Denoising with Shifts Recap of DUDE
Ex 1 : stationary bit stream gets corrupted
Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001
source : binary Markov chain with p = 0.1, sequence length n = 106
1! p
0 1
p
p
1! p
noise : BSC(δ = 0.1)
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 5 / 24
Discrete Denoising with Shifts Recap of DUDE
DUDE achieves the optimal BER as thewindow size grows
0 1 2 3 4 5 60.5
0.6
0.7
0.8
0.9
1
Window size k
Bit e
rror r
ate/
Bit error rate plot
Bayes Optimum = 0.558
DUDE = 0.561
Window size k is a design parameter for given sequencelength n
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 5 / 24
Discrete Denoising with Shifts Recap of DUDE
DUDE attains the optimum performancesfor stationary sources
For a denoiser Xn = {Xt(zn)}nt=1,
LXn(xn, zn) =1
n
n∑
t=1
Λ(xt, Xt(zn))
is the performance measure
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24
Discrete Denoising with Shifts Recap of DUDE
DUDE attains the optimum performancesfor stationary sources
main results of DUDE : when k = kn < d12 log|Z| ne,
1 For any stationary process X,
limn→∞
[E(LXn
DUDE(Xn, Zn)
)− min
Xn∈Dn
E(LXn(Xn, Zn)
)]= 0
Dn is the set of all denoisers in the world
DUDE attains the Bayes optimal performance
2 For all x ∈ X∞,
limn→∞
[LXn
DUDE(xn, Zn)−Dk(xn, Zn)
]= 0 w.p.1
Dk(xn, zn) : the best performance among Sk
DUDE is as good as the best sliding window denoiser
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24
Discrete Denoising with Shifts Recap of DUDE
DUDE attains the optimum performancesfor stationary sources
main results of DUDE : when k = kn < d12 log|Z| ne,
1 For any stationary process X,
limn→∞
[E(LXn
DUDE(Xn, Zn)
)− min
Xn∈Dn
E(LXn(Xn, Zn)
)]= 0
Dn is the set of all denoisers in the world
DUDE attains the Bayes optimal performance
2 For all x ∈ X∞,
limn→∞
[LXn
DUDE(xn, Zn)−Dk(xn, Zn)
]= 0 w.p.1
Dk(xn, zn) : the best performance among Sk
DUDE is as good as the best sliding window denoiser
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24
Discrete Denoising with Shifts Recap of DUDE
DUDE attains the optimum performancesfor stationary sources
main results of DUDE : when k = kn < d12 log|Z| ne,
1 For any stationary process X,
limn→∞
[E(LXn
DUDE(Xn, Zn)
)− min
Xn∈Dn
E(LXn(Xn, Zn)
)]= 0
Dn is the set of all denoisers in the world
DUDE attains the Bayes optimal performance
2 For all x ∈ X∞,
limn→∞
[LXn
DUDE(xn, Zn)−Dk(xn, Zn)
]= 0 w.p.1
Dk(xn, zn) : the best performance among Sk
DUDE is as good as the best sliding window denoiser
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24
Discrete Denoising with Shifts Recap of DUDE
DUDE attains the optimum performancesfor stationary sources
main results of DUDE : when k = kn < d12 log|Z| ne,
1 For any stationary process X,
limn→∞
[E(LXn
DUDE(Xn, Zn)
)− min
Xn∈Dn
E(LXn(Xn, Zn)
)]= 0
Dn is the set of all denoisers in the world
DUDE attains the Bayes optimal performance
2 For all x ∈ X∞,
limn→∞
[LXn
DUDE(xn, Zn)−Dk(xn, Zn)
]= 0 w.p.1
Dk(xn, zn) : the best performance among Sk
DUDE is as good as the best sliding window denoiser
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24
Discrete Denoising with Shifts Motivation
Ex 2 :piecewise stationary bit stream getscorruptedXn : 00000011111110000000000111111101100011011011011010110
Zn : 00100011101110010001000111110101100011111011010010100
source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2
1! p
0 1
p
p
1! p
noise : BSC(δ = 0.1)
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24
Discrete Denoising with Shifts Motivation
Ex 2 :piecewise stationary bit stream getscorruptedXn : 00000011111110000000000111111101100011011011011010110Zn : 00100011101110010001000111110101100011111011010010100
source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2
1! p
0 1
p
p
1! p
noise : BSC(δ = 0.1)
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24
Discrete Denoising with Shifts Motivation
Ex 2 :piecewise stationary bit stream getscorruptedXn : 00000011111110000000000111111101100011011011011010110Zn : 00100011101110010001000111110101100011111011010010100
source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2
1! p
0 1
p
p
1! p
noise : BSC(δ = 0.1)
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24
Discrete Denoising with Shifts Motivation
Does DUDE achieve the optimal BER?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24
Discrete Denoising with Shifts Motivation
Does DUDE achieve the optimal BER?
0 1 2 3 4 5 60.4
0.5
0.6
0.7
0.8
0.9
1
Window Size k
Bit e
rror r
ate/
Bit error rate plot
Bayes Optimum = 0.487
DUDE = 0.574
(+18%)
DUDE applies the same rule “regardless of the location”DUDE has a limitation for time- (space-) varying sources
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24
Discrete Denoising with Shifts Motivation
In practice, many sources are time-(space-) varying
text : English → Spanish → German . . .
voice : image :
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 8 / 24
Discrete Denoising with Shifts Motivation
In practice, many sources are time-(space-) varying
text : English → Spanish → German . . .
voice :
image :
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 8 / 24
Discrete Denoising with Shifts Motivation
In practice, many sources are time-(space-) varying
text : English → Spanish → German . . .
voice : image :
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 8 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Can we do better than the DUDE whenthe source varies?
Questions
1 Can we perform as if we knew the source including its change points?
2 If so, can we do it efficiently?
answers1 Yes. S-DUDE can do essentially as well as if it knows
the source and its change points
2 Yes. S-DUDE is a linear complexity algorithm
[M and Weissman, IEEE Trans. Info. Theory, Nov 09]
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 9 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Can we do better than the DUDE whenthe source varies?
Questions
1 Can we perform as if we knew the source including its change points?
2 If so, can we do it efficiently?
answers1 Yes. S-DUDE can do essentially as well as if it knows
the source and its change points
2 Yes. S-DUDE is a linear complexity algorithm
[M and Weissman, IEEE Trans. Info. Theory, Nov 09]
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 9 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Can we do better than the DUDE whenthe source varies?
Questions
1 Can we perform as if we knew the source including its change points?
2 If so, can we do it efficiently?
answers1 Yes. S-DUDE can do essentially as well as if it knows
the source and its change points
2 Yes. S-DUDE is a linear complexity algorithm
[M and Weissman, IEEE Trans. Info. Theory, Nov 09]
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 9 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Take a closer look at the binary example
Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :
zt+3t−3 :
↓xt :
0100110︸ ︷︷ ︸↓0
0101110︸ ︷︷ ︸↓1
010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Take a closer look at the binary example
Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :
zt+3t−3 :
↓xt :
0100110︸ ︷︷ ︸↓0
0101110︸ ︷︷ ︸↓1
010 • 110 defined a “say-what-you-see” mapping in the middle
DUDE employs the same mapping whenever it sees 010 • 110
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Take a closer look at the binary example
Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :
zt+3t−3 :
↓xt :
0100110︸ ︷︷ ︸↓0
0101110︸ ︷︷ ︸↓1
010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Take a closer look at the binary example
Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :
zt+3t−3 :
↓xt :
0100110︸ ︷︷ ︸↓0
0101110︸ ︷︷ ︸↓1
010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110
Only 4 single-letter mappings in binary example“say-what-you-see”,“flip-what-you-see”,“always-say-0”,“always-say-1”
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Take a closer look at the binary example
Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :
zt+3t−3 :
↓xt :
0100110︸ ︷︷ ︸↓0
0101110︸ ︷︷ ︸↓1
010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110
DUDE counts n0 and n1 for 010 • 110 andif n0 ≈ n1 → “say-what-you-see”if n0 � n1 → “always-say-0”if n0 � n1 → “always-say-1”threshold depends on δ
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Employing shifting single-letter mappingswill be helpful
Suppose 0’s 1’s at 010 • 110 looked like
0000100011000011111111011101︸ ︷︷ ︸swys
“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”
Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss
How can we decide when to shift to what?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Employing shifting single-letter mappingswill be helpful
Suppose 0’s 1’s at 010 • 110 looked like
00001000110000︸ ︷︷ ︸ 11111111011101︸ ︷︷ ︸all− 0 all− 1
“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”
Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss
How can we decide when to shift to what?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Employing shifting single-letter mappingswill be helpful
Suppose 0’s 1’s at 010 • 110 looked like
00001000110000︸ ︷︷ ︸ 11111111011101︸ ︷︷ ︸all− 0 all− 1
“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”
Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss
How can we decide when to shift to what?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Employing shifting single-letter mappingswill be helpful
Suppose 0’s 1’s at 010 • 110 looked like
00001000110000︸ ︷︷ ︸ 11111111011101︸ ︷︷ ︸all− 0 all− 1
“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”
Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss
How can we decide when to shift to what?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Employing shifting single-letter mappingswill be helpful
Suppose 0’s 1’s at 010 • 110 looked like
00001000110000︸ ︷︷ ︸ 11111111011101︸ ︷︷ ︸all− 0 all− 1
“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”
Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss
How can we decide when to shift to what?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Snm is a class of shifting single-lettermappings
Ideally, shifting every time to the correct mapping would be thebest
equivalent to knowing the source sequence ⇒ impossible!
We limit the number of shifts to m
Snm : class of single-letter mappings shifting at most m times for
sequence length n, e.g.,
swys!
swys{s1, · · · , sn} :
zn :
all-0 all-1
|Snm| ≤
(nm
)· |S|m, |S| = |Z||X | (number of single-letter mappings)
Deciding when to shift to what m times⇔ Selecting the best combination in Sn
m
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Snm is a class of shifting single-lettermappings
Ideally, shifting every time to the correct mapping would be thebest
equivalent to knowing the source sequence ⇒ impossible!
We limit the number of shifts to m
Snm : class of single-letter mappings shifting at most m times for
sequence length n, e.g.,
swys!
swys{s1, · · · , sn} :
zn :
all-0 all-1
|Snm| ≤
(nm
)· |S|m, |S| = |Z||X | (number of single-letter mappings)
Deciding when to shift to what m times⇔ Selecting the best combination in Sn
m
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Snm is a class of shifting single-lettermappings
Ideally, shifting every time to the correct mapping would be thebest
equivalent to knowing the source sequence ⇒ impossible!
We limit the number of shifts to m
Snm : class of single-letter mappings shifting at most m times for
sequence length n, e.g.,
swys!
swys{s1, · · · , sn} :
zn :
all-0 all-1
|Snm| ≤
(nm
)· |S|m, |S| = |Z||X | (number of single-letter mappings)
Deciding when to shift to what m times⇔ Selecting the best combination in Sn
m
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Snm is a class of shifting single-lettermappings
Ideally, shifting every time to the correct mapping would be thebest
equivalent to knowing the source sequence ⇒ impossible!
We limit the number of shifts to mSn
m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,
swys!
swys{s1, · · · , sn} :
zn :
all-0 all-1
|Snm| ≤
(nm
)· |S|m, |S| = |Z||X | (number of single-letter mappings)
Deciding when to shift to what m times⇔ Selecting the best combination in Sn
m
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Snm is a class of shifting single-lettermappings
Ideally, shifting every time to the correct mapping would be thebest
equivalent to knowing the source sequence ⇒ impossible!
We limit the number of shifts to mSn
m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,
swys!
swys{s1, · · · , sn} :
zn :
all-0 all-1
|Snm| ≤
(nm
)· |S|m, |S| = |Z||X | (number of single-letter mappings)
Deciding when to shift to what m times⇔ Selecting the best combination in Sn
m
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Snm is a class of shifting single-lettermappings
Ideally, shifting every time to the correct mapping would be thebest
equivalent to knowing the source sequence ⇒ impossible!
We limit the number of shifts to mSn
m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,
swys!
swys{s1, · · · , sn} :
zn :
all-0 all-1
|Snm| ≤
(nm
)· |S|m, |S| = |Z||X | (number of single-letter mappings)
Deciding when to shift to what m times⇔ Selecting the best combination in Sn
m
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Snm is a class of shifting single-lettermappings
Ideally, shifting every time to the correct mapping would be thebest
equivalent to knowing the source sequence ⇒ impossible!
We limit the number of shifts to mSn
m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,
swys!
swys{s1, · · · , sn} :
zn :
all-0 all-1
|Snm| ≤
(nm
)· |S|m, |S| = |Z||X | (number of single-letter mappings)
Deciding when to shift to what m times⇔ Selecting the best combination in Sn
m
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )
X = s(Z)x ! Z
Λ(x, s(Z)) : loss between x and s(Z)
not observable
But, from the knowledge of Π, we devise `(Z, s) such that
Ex
(`(Z, s)
)= Ex
(Λ(x, s(Z))
)
`(Z, s) is an unbiased estimate of Ex
(Λ(x, s(Z))
)
`(Z, s) : loss between Z and s(·)
observable
[Weissman et. al., Universal filtering via prediction, IEEE IT 07]
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )
X = s(Z)x ! Z
Λ(x, s(Z)) : loss between x and s(Z)
not observable
But, from the knowledge of Π, we devise `(Z, s) such that
Ex
(`(Z, s)
)= Ex
(Λ(x, s(Z))
)
`(Z, s) is an unbiased estimate of Ex
(Λ(x, s(Z))
)
`(Z, s) : loss between Z and s(·)
observable
[Weissman et. al., Universal filtering via prediction, IEEE IT 07]
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )
X = s(Z)x ! Z
Λ(x, s(Z)) : loss between x and s(Z)
not observable
But, from the knowledge of Π, we devise `(Z, s) such that
Ex
(`(Z, s)
)= Ex
(Λ(x, s(Z))
)
`(Z, s) is an unbiased estimate of Ex
(Λ(x, s(Z))
)
`(Z, s) : loss between Z and s(·)
observable
[Weissman et. al., Universal filtering via prediction, IEEE IT 07]
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )
X = s(Z)x ! Z
Λ(x, s(Z)) : loss between x and s(Z)
not observable
But, from the knowledge of Π, we devise `(Z, s) such that
Ex
(`(Z, s)
)= Ex
(Λ(x, s(Z))
)
`(Z, s) is an unbiased estimate of Ex
(Λ(x, s(Z))
)
`(Z, s) : loss between Z and s(·)
observable
[Weissman et. al., Universal filtering via prediction, IEEE IT 07]
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )
X = s(Z)x ! Z
Λ(x, s(Z)) : loss between x and s(Z)
not observable
But, from the knowledge of Π, we devise `(Z, s) such that
Ex
(`(Z, s)
)= Ex
(Λ(x, s(Z))
)
`(Z, s) is an unbiased estimate of Ex
(Λ(x, s(Z))
)
`(Z, s) : loss between Z and s(·)
observable
[Weissman et. al., Universal filtering via prediction, IEEE IT 07]
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )
X = s(Z)x ! Z
Λ(x, s(Z)) : loss between x and s(Z)
not observable
But, from the knowledge of Π, we devise `(Z, s) such that
Ex
(`(Z, s)
)= Ex
(Λ(x, s(Z))
)
`(Z, s) is an unbiased estimate of Ex
(Λ(x, s(Z))
)
`(Z, s) : loss between Z and s(·)
observable
[Weissman et. al., Universal filtering via prediction, IEEE IT 07]Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )
X = s(Z)x ! Z
Λ(x, s(Z)) : loss between x and s(Z)
not observable
But, from the knowledge of Π, we devise `(Z, s) such that
Ex
(`(Z, s)
)= Ex
(Λ(x, s(Z))
)
`(Z, s) is an unbiased estimate of Ex
(Λ(x, s(Z))
)
`(Z, s) : loss between Z and s(·)observable
[Weissman et. al., Universal filtering via prediction, IEEE IT 07]Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE is defined by minimizing the sumof the estimated losses
For each context c (e.g., 010 • 110),S-DUDE finds
S , arg minS∈Snc
m
∑
i∈context c
`(zi, si)
vs. arg minS∈Snc
m
∑
i∈context c
Λ(xi, si(zi))
and applies them
Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE is defined by minimizing the sumof the estimated losses
For each context c (e.g., 010 • 110),S-DUDE finds
S , arg minS∈Snc
m
∑
i∈context c
`(zi, si)
vs. arg minS∈Snc
m
∑
i∈context c
Λ(xi, si(zi))
and applies them
Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE is defined by minimizing the sumof the estimated losses
For each context c (e.g., 010 • 110),S-DUDE finds
S , arg minS∈Snc
m
∑
i∈context c
`(zi, si)
vs. arg minS∈Snc
m
∑
i∈context c
Λ(xi, si(zi))
and applies them
Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE is defined by minimizing the sumof the estimated losses
For each context c (e.g., 010 • 110),S-DUDE finds
S ,
arg minS∈Snc
m
∑
i∈context c
`(zi, si)
vs. arg minS∈Snc
m
∑
i∈context c
Λ(xi, si(zi))
and applies them
Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE is defined by minimizing the sumof the estimated losses
For each context c (e.g., 010 • 110),S-DUDE finds
S , arg minS∈Snc
m
∑
i∈context c
`(zi, si)
vs. arg minS∈Snc
m
∑
i∈context c
Λ(xi, si(zi))
and applies them
Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE is defined by minimizing the sumof the estimated losses
For each context c (e.g., 010 • 110),S-DUDE finds
S , arg minS∈Snc
m
∑
i∈context c
`(zi, si)
vs. arg minS∈Snc
m
∑
i∈context c
Λ(xi, si(zi))
and applies them
Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE is defined by minimizing the sumof the estimated losses
For each context c (e.g., 010 • 110),S-DUDE finds
S , arg minS∈Snc
m
∑
i∈context c
`(zi, si)
vs. arg minS∈Snc
m
∑
i∈context c
Λ(xi, si(zi))
and applies them
Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE is defined by minimizing the sumof the estimated losses
For each context c (e.g., 010 • 110),S-DUDE finds
S , arg minS∈Snc
m
∑
i∈context c
`(zi, si)
vs. arg minS∈Snc
m
∑
i∈context c
Λ(xi, si(zi))
and applies them
Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE can be implemented with atwo-pass algorithm
again binary, BSC(δ) example
problem : find best {s1, · · · , sn} ∈ Snm that minimizes
∑nt=1 `(zt, st)
si ∈ {all-0, all-1, swys, fwys}
to solve,
1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic
programming
3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE can be implemented with atwo-pass algorithm
again binary, BSC(δ) example
problem : find best {s1, · · · , sn} ∈ Snm that minimizes
∑nt=1 `(zt, st)
si ∈ {all-0, all-1, swys, fwys}
to solve,
1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic
programming
3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE can be implemented with atwo-pass algorithm
again binary, BSC(δ) example
problem : find best {s1, · · · , sn} ∈ Snm that minimizes
∑nt=1 `(zt, st)
si ∈ {all-0, all-1, swys, fwys}
to solve,
1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic
programming
3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE can be implemented with atwo-pass algorithm
again binary, BSC(δ) example
problem : find best {s1, · · · , sn} ∈ Snm that minimizes
∑nt=1 `(zt, st)
si ∈ {all-0, all-1, swys, fwys}
to solve,
1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic
programming
3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE can be implemented with atwo-pass algorithm
again binary, BSC(δ) example
problem : find best {s1, · · · , sn} ∈ Snm that minimizes
∑nt=1 `(zt, st)
si ∈ {all-0, all-1, swys, fwys}
to solve,
1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n
2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamicprogramming
3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE can be implemented with atwo-pass algorithm
again binary, BSC(δ) example
problem : find best {s1, · · · , sn} ∈ Snm that minimizes
∑nt=1 `(zt, st)
si ∈ {all-0, all-1, swys, fwys}
to solve,
1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic
programming
3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
S-DUDE can be implemented with atwo-pass algorithm
again binary, BSC(δ) example
problem : find best {s1, · · · , sn} ∈ Snm that minimizes
∑nt=1 `(zt, st)
si ∈ {all-0, all-1, swys, fwys}
to solve,
1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic
programming
3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Mt stores minimum sum of estimatedlosses up to tAgain binary, BSC(δ) example
Problem : find best {s1, · · · , sn} ∈ Snm that minimizes
∑nt=1 `(zt, st)
si ∈ {all-0, all-1, swys, fwys}Elements of Mt are defined to be the minimum sum up to t, e.g.,
Mt
all-0 swysall-1 fwys
i
m
Mt(i, swys) = min{s1,··· ,st}∈St
i
{`(zt, st = swys) +t−1∑
r=1
`(zr, sr)}
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 16 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
First pass uses dynamic programming
Only two possible cases to attain Mt(i, swys)
1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 17 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
First pass uses dynamic programming
Only two possible cases to attain Mt(i, swys)
1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)
2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 17 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
First pass uses dynamic programming
Only two possible cases to attain Mt(i, swys)
1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 17 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
First pass uses dynamic programming
Only two possible cases to attain Mt(i, swys)
Mt(i, swys) =`(zt, swys) + min
{Mt−1(i, swys),min1≤j≤|S|Mt−1(i− 1, j)
}
same for all other elements
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 17 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Second pass extracts S and denoise
When t = n,sn = arg minj∈{all−0,all−1,swys,fwys}Mn(m, j), xn = sn(zn)
minS!Snm
!nt=1 !(zt, st)
Mn
all-0 swysall-1 fwys
mmin
for t = n− 1, · · · , 1 : follow the optimal path and denoise!
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 18 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Second pass extracts S and denoise
When t = n,sn = arg minj∈{all−0,all−1,swys,fwys}Mn(m, j), xn = sn(zn)
minS!Snm
!nt=1 !(zt, st)
Mn
all-0 swysall-1 fwys
mmin
for t = n− 1, · · · , 1 : follow the optimal path and denoise!
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 18 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The complexity of S-DUDE is linear in nand m
Complexity
space : O(mn|Z|2k)time : O(mn|Z|2k)practical
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 19 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The complexity of S-DUDE is linear in nand m
Complexityspace : O(mn|Z|2k)time : O(mn|Z|2k)
practical
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 19 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
The complexity of S-DUDE is linear in nand m
Complexityspace : O(mn|Z|2k)time : O(mn|Z|2k)practical
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 19 / 24
Discrete Denoising with Shifts New algorithm: S-DUDE
Summary of S-DUDE
S-DUDE (Shifting DUDE)
For location t to be denoised, do :
1 fix the window size k, set the number of shifts m
2 find left k-context (`1, . . . , `k) and right k-context (r1, . . . , rk) of zt
`1 `2 · · · `k zt r1 r2 · · · rk
3 on all positions that share the same context c with zt
find S = arg minS∈Sncm
Pt∈context c `(zt, st)
4 decide on xt according to
xt = st(zt), where st(·) comes from S
We can also show that if we set m = 0, S-DUDE coincides withDUDE
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 20 / 24
Discrete Denoising with Shifts Results
S-DUDE achieves the optimum loss fortime-(space-) varying sourcesWhen k = kn <
12 log|Z| n,
Theorem 1 (stochastic setting)
For all piecewise stationary processes X,
limn→∞
[E(LXn
S-DUDE(Xn, Zn)
)− min
Xn∈Dn
E(LXn(Xn, Zn)
)]= 0,
provided that the number of stationary segments is m = o(n) w.p.1
Theorem 2 (individual sequence setting)
When m = o(n), for all x ∈ X∞,
limn→∞
[LXn
S-DUDE(xn, Zn)−Dk,m(xn, Zn)
]= 0 w.p.1
where Dk,m(xn, zn) is the best performance attained by k-th order slidingwindow denoisers that can shift at most m times
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 21 / 24
Discrete Denoising with Shifts Results
S-DUDE achieves the optimum loss fortime-(space-) varying sourcesWhen k = kn <
12 log|Z| n,
Theorem 1 (stochastic setting)
For all piecewise stationary processes X,
limn→∞
[E(LXn
S-DUDE(Xn, Zn)
)− min
Xn∈Dn
E(LXn(Xn, Zn)
)]= 0,
provided that the number of stationary segments is m = o(n) w.p.1
Theorem 2 (individual sequence setting)
When m = o(n), for all x ∈ X∞,
limn→∞
[LXn
S-DUDE(xn, Zn)−Dk,m(xn, Zn)
]= 0 w.p.1
where Dk,m(xn, zn) is the best performance attained by k-th order slidingwindow denoisers that can shift at most m times
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 21 / 24
Discrete Denoising with Shifts Results
No denoiser is better than S-DUDE
Strong converse
If m = Θ(n), no denoiser can achieve previous theorems.
m = o(n) is a necessary and sufficient condition for previous theorems!
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 22 / 24
Discrete Denoising with Shifts Results
Ex 2 : piecewise stationary bit stream(revisited)Xn : 00000011111110000000000111111111100000001111111110000
Zn : 00100011101110010001000111110111100000011110111110001
source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2
1! p
0 1
p
p
1! p
noise : flips bits with probability δ = 0.1
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 23 / 24
Discrete Denoising with Shifts Results
Ex 2 : piecewise stationary bit stream(revisited)Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001
source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2
1! p
0 1
p
p
1! p
noise : flips bits with probability δ = 0.1
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 23 / 24
Discrete Denoising with Shifts Results
Ex 2 : piecewise stationary bit stream(revisited)Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001
source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2
1! p
0 1
p
p
1! p
noise : flips bits with probability δ = 0.1
!
0
1 1
01! !
1! !
!
⇒ optimal BER attained by the Forward-Backward Recursion
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 23 / 24
Discrete Denoising with Shifts Results
Can S-DUDE achieve the Bayes optimalperformance?
0 1 2 3 4 5 60.4
0.5
0.6
0.7
0.8
0.9
1
Bit e
rror r
ate/
Window size k
Bit error rate plot
Bayes Optimum = 0.487
DUDE = 0.574
S DUDE (m=1) = 0.498
(+2.3%)
⇒ m can be regarded as another design parameter indevising a discrete denoiser
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 24 / 24
Discrete Denoising with Shifts Results
Can S-DUDE achieve the Bayes optimalperformance?
0 1 2 3 4 5 60.4
0.5
0.6
0.7
0.8
0.9
1Bi
t erro
r rat
e/
Window size k
Bit error rate plot
Bayes Optimum = 0.487
DUDE = 0.574
S DUDE (m=1) = 0.498
(+2.3%)
⇒ m can be regarded as another design parameter indevising a discrete denoiser
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 24 / 24
Discrete Denoising with Shifts Results
Can S-DUDE achieve the Bayes optimalperformance?
0 1 2 3 4 5 60.4
0.5
0.6
0.7
0.8
0.9
1Bi
t erro
r rat
e/
Window size k
Bit error rate plot
Bayes Optimum = 0.487
DUDE = 0.574
S DUDE (m=1) = 0.498
(+2.3%) ⇒ m can be regarded as another design parameter indevising a discrete denoiser
Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 24 / 24