Unified Maximum Likelihood Estimation of Symmetric Distribution Properties Jayadev Acharya Hirakendu Das Alon Orlitsky Ananda Suresh Cornell Yahoo UCSD Google Frontiers in Distribution Testing Workshop FOCS 2017
UnifiedMaximumLikelihoodEstimation
ofSymmetricDistributionProperties
Jayadev AcharyaHirakendu DasAlon Orlitsky AnandaSuresh
Cornell Yahoo UCSD Google
FrontiersinDistributionTestingWorkshopFOCS2017
SpecialThankstoโฆ
Idonโtsmile Ionlysmile
Symmetricproperties
๐ซ ={(๐๐, โฆ๐๐) }- collectionofdistributionsover{1,โฆ,k}
Distributionproperty:๐:๐ซโโ
๐ issymmetricifunchangedunderinputpermutations
Entropy๐ฏ ๐ โ โ๐๐๐๐๐๐๐๐
๏ฟฝ๏ฟฝ
Supportsize๐บ ๐ โ โ ๐ ๐๐5๐๏ฟฝ๐
Rรฉnyi entropy,supportcoverage,distancetouniformity,โฆ
Determinedbytheprobabilitymultiset{๐๐, ๐๐, โฆ , ๐๐ }
Symmetricproperties
๐ = (๐๐, โฆ๐๐), (๐ finiteorinfinite)- discretedistribution
๐ ๐ , apropertyof๐
๐ ๐ issymmetricifunchangedunderinputpermutations
Entropy ๐ฏ ๐ โ โ๐๐๐๐๐๐๐๐
๏ฟฝ๏ฟฝ
Supportsize๐บ ๐ โ โ ๐ ๐๐5๐๏ฟฝ๐
Rรฉnyi entropy,supportcoverage,distancetouniformity,โฆ
Determinedbytheprobabilitymultiset{๐๐, ๐๐, โฆ , ๐๐ }
Propertyestimation
๐ unknowndistributionin๐ซ
Givenindependentsamples๐๐, ๐๐, โฆ , ๐๐ง โผ ๐ฉ
Estimate๐ ๐
Samplecomplexity ๐บ ๐, ๐, ๐บ, ๐น
Minimum๐ necessaryto
Estimate๐ ๐ ยฑ ๐บ
Witherrorprobability< ๐น
Plug-inestimation
Use๐ฟ๐, ๐ฟ๐, โฆ , ๐ฟ๐ โผ ๐tofindanestimate๐Fof๐
Estimate๐ ๐ by๐ ๐F
Howtoestimate๐?
SequenceMaximumLikelihood(SML)
๐๐ฌ๐ฆ๐ฅ = ๐๐ซ๐ ๐ฆ๐๐ฑ๐ ๐(๐๐ )= ๐๐ซ๐ ๐ฆ๐๐ฑ
๐(๐) ๐ท๐๐(๐๐ )
๐ฅQ= โ, โ, ๐ก๐๐,๐,๐๐๐๐ = ๐๐ซ๐ ๐ฆ๐๐ฑ๐๐ ๐ ยท ๐(๐)
๐๐,๐,๐๐๐๐ (h)=2/3 ๐๐,๐,๐๐๐๐ (t)=1/3
Sameasempirical-frequencydistribution
Multiplicity๐ต๐ - #times๐ฅ appearsin๐๐
๐๐ฌ๐ฆ๐ฅ ๐ =๐ต๐๐
PriorWork
Differentestimatorforeachproperty
Usesophisticatedapproximationtheoryresults
A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions
Property Notation SML OptimalEntropy H(p) k
"
?Support size S(p)
k
k log 1
"
?Support coverage S
m
(p)
m
m ?Distance to u kp๏ฟฝ uk
1
k
"
2 ?
Table 2. Estimation complexity for various properties, up to a constant factor. For all properties shown, PML achieves the best knownresults. Citations are for specialized techniques, PML results are shown in this paper. Support and support coverage results have beennormalized for consistency with existing literature.
Property Notation SML Optimal ReferencesEntropy H(p) k
"
k
log k
1
"
(Valiant & Valiant, 2011a; Wu &Yang, 2016; Jiao et al., 2015)
Support size S(p)
k
k log 1
"
k
log k
log
2
1
"
(Wu & Yang, 2015)Support coverage S
m
(p)
m
m m
logm
log
1
"
(Orlitsky et al., 2016)Distance to u kp๏ฟฝ uk
1
k
"
2k
log k
1
"
2 (Valiant & Valiant, 2011b; Jiaoet al., 2016)
Table 3. Estimation complexity for various properties, up to a constant factor. For all properties shown, PML achieves the best knownresults. Citations are for specialized techniques, PML results are shown in this paper. Support and support coverage results have beennormalized for consistency with existing literature.
Forseveralimportantproperties
Empirical-frequencypluginrequiresฮ ๐ samples
Newcomplex(non-plugin)estimatorsneedฮ ๐log ๐
samples
Entropyestimation
SMLestimateofentropy= โ _`alog a
_`๏ฟฝb
Samplecomplexity:ฮ ๐/๐Variouscorrectionsproposed:Miller-Maddow,Jackknifedestimator,Coverageadjusted,โฆSamplecomplexity:ฮฉ(๐) foralltheaboveestimators
Entropyestimation[Paninskiโ03]:๐(๐) samplecomplexity(existential)
[ValiantValiantโ11a]:ConstructiveLPbasedmethods:ฮgh
ijk h[ValiantValiant11b,WuYangโ14,HanJiaoVenkatWeissmanโ14]:
Simplifiedalgorithms,andgrowthrate:ฮ hgijk h
New(asofAugust)Results
Unified,simple,sample-optimalapproachforallaboveproblems
Plug-inestimator,replacesequencemaximumlikelihood
withprofilemaximumlikelihood
Profiles๐, ๐, ๐ or ๐, ๐, ๐or๐, ๐, ๐โ sameestimate
Oneelementappearedonce,onappearedtwice
Profile:Multi-setofmultiplicities:๐ฝ ๐ฟ๐๐ = {๐ต๐: ๐ โ ๐ฟ๐๐}
๐ฝ(๐, ๐, ๐) = ๐ฝ(๐, ๐, ๐) = {๐, ๐}
๐ฝ(๐ถ, ๐ธ, ๐ท, ๐ธ) = {๐, ๐, ๐}
Sufficientstatisticforsymmetricproperties
Profilemaximumlikelihood[+SVZโ04]Profile probability
๐ ฮฆ = u ๐(๐ฅva)๏ฟฝ
w bxy zw
Maximizetheprofileprobability
๐w{|} = argmax
{๐(ฮฆ ๐va )
SeeโOnestimatingtheprobabilitymultisetโ,Orlitsky,Santhanam,Viswanathan,Zhangforadetailedtreatment,andanargumentforcompetitivedistributionestimation.
Profilemaximumlikelihood(PML)[+SVZโ04]
Profile probability
๐ ฮฆ = u ๐(๐ฅa)๏ฟฝ
by:w bxy zw
DistributionMaximizingtheprofileprobability
๐w{|} = argmax
{๐(ฮฆ)
PMLcompetitivefordistributionestimation
PMLexample๐Q = h, h, t
๐๐,๐,๐๐๐๐ (h)=2/3๐๐,๐,๐๐๐๐ (t)=1/3
ฮฆ h, h, t = 1,2๐ ฮฆ = 1,2 = ๐ ๐ , ๐ , ๐ + ๐ ๐ , ๐, ๐ + ๐ ๐, ๐ , ๐
=3p(s,s,d)
= Qv โ ๐๏ฟฝ ๐ฅ ๐(๐ฆ)๏ฟฝ
b๏ฟฝ๏ฟฝ
๐๏ฟฝ|} 1, 2 =31
23
๏ฟฝ 13 +
13
๏ฟฝ 23 =
1827 =
23
PMLof{1,2}
P({1,2})=p(s,s,d)+p(s,d,s)+p(d,s,s)=3p(s,s,d)
p s, s, d = ฮฃ๏ฟฝ๏ฟฝ๏ฟฝ๐๏ฟฝ ๐ฅ ๐ ๐ฆ= ฮฃ๏ฟฝ๐๏ฟฝ ๐ฅ 1 โ ๐ ๐ฅ
โคยผฮฃ๏ฟฝ๐ ๐ฅ = v๏ฟฝ
๐{|}(s,s,d)=ยผ
(1/2,1/2)โ p(s,s,d)=1/8+1/8=1/4
๐{|}({1,2})=ยพ Recall:๐๏ฟฝ|} 1, 2 =2/3
PML({1,1,2})
ฮฆ(๐ผ, ๐พ, ๐ฝ, ๐พ) = {1,1, 2}
๐{|} 1,1,2 = ๐[5]
PMLcanpredictexistenceofnewsymbols
ProfilemaximumlikelihoodPMLof{1,2}is{ยฝ,ยฝ}
๐{|} 1,2 =31
12
๏ฟฝ 12 +
12
๏ฟฝ 12 =
34 >
1827
ฮฃ๏ฟฝ๏ฟฝ๏ฟฝ๐๏ฟฝ ๐ฅ ๐ ๐ฆ = ฮฃ๏ฟฝ๐๏ฟฝ ๐ฅ 1 โ ๐ ๐ฅ โคยผฮฃ๏ฟฝ๐ ๐ฅ = v๏ฟฝ
๐a = ๐ผ, ๐พ, ๐ฝ, ๐พ,ฮฆ ๐a = {1,1, 2}
๐{|} 1,1,2 = ๐[5]
PMLcanpredictexistenceofnewsymbols
PMLPlug-inToestimateasymmetricproperty๐
Find๐{|} ฮฆ(๐a)Output๐(๐{|})
Simple
UnifiedNotuningparameters
Someexperimentalresults(c.2009)
Uniform500symbols350samples
2x6,3x4,13x3,63x2,161x1242appeared, 258didnot
U[500],350x,12experiments
Uniform500symbols350samples
2x6,3x4,13x3,63x2,161x1248appeared,258didnot
700samples
U[500],700x,12experiments
Staircase
15Kelements,5steps,~3x30KsamplesObserve8,882elts6,118missing
Zipf
Underliesmanynaturalphenomenapi=C/i,i=100โฆ15,00030,000samplesObserve9,047elts5,953missing
1990Census- LastnamesSMITH 1.006 1.006 1JOHNSON 0.810 1.816 2WILLIAMS 0.699 2.515 3JONES 0.621 3.136 4BROWN 0.621 3.757 5DAVIS 0.480 4.237 6MILLER 0.424 4.660 7WILSON 0.339 5.000 8MOORE 0.312 5.312 9TAYLOR 0.311 5.623 10
AMEND 0.001 77.478 18835ALPHIN 0.001 77.478 18836ALLBRIGHT 0.001 77.479 18837AIKIN 0.001 77.479 18838ACRES 0.001 77.480 18839ZUPAN 0.000 77.480 18840ZUCHOWSKI 0.000 77.481 18841ZEOLLA 0.000 77.481 18842
18,839 names77.48% population~230 million
1990Census- Lastnames18,839 lastnamesbasedon~230 million35,000 samples,observed9,813 names
Coverage(#newsymbols)Zipfdistributionover15Kelements
Sample30KtimesEstimate:#newsymbolsinsampleofsizeฮป*30K
Good-Toulmin:
EstimatePML&predictExtendstoฮป>1Appliestootherproperties
ฮป<1
ฮป>1
ฮป>1
FindingthePMLdistribution
EMalgorithm[+ Pan,Sajama,Santhanam,Viswanathan,Zhangโ05- โ08]
ApproximatePMLviaBethePermanents[Vontobel]
ExtensionsofMarkovChains[Vatedka,Vontobel]
Noprovablealgorithmsknown
MotivatedValiant&Valiant
MaximumLikelihoodEstimationPlugin
Generalpropertyestimationtechnique
๐ซ- collectionofdistributionsoverdomain๐ต
๐:๐ซ โ โ anyproperty(sayentropy)
MLEestimator
Given๐ง โ ๐ต
Determine๐ยชยซยฌยญ โ argmax{โ๐ซ
๐ ๐ง
Output๐(๐ยชยซยฌยญ)
HowgoodisMLE?
CompetitivenessofMLEplugin
๐ซ - collectionofdistributionsoverdomain๐ต
๐ยฎ: ๐ต โ โ any estimaorsuchthatโ๐ โ ๐ซ, ๐ โผ ๐
Pr ๐ ๐ โ ๐ยฎ ๐ > ๐ < ๐ฟ
MLEpluginerrorboundedby
Pr ๐ ๐ โ ๐(๐ยชยซยฌยญ) > 2 โ ๐ < ๐ฟ โ |๐ต|
Simple,universal,competitivewithany๐ยฎ
Quiz:Probabilityofunlikelyoutcomes6-sideddie,p=(๐v, ๐๏ฟฝ, โฆ , ๐ยต)
๐ยถ โฅ 0,andฮฃ๐ยถ = 1,otherwisearbitrary
Z๏ฝp
๐ยบ ๐ยช โค 1/6
(1,0,โฆ,0)โ Pr(๐ยชโค1/6)=0
(1/6,โฆ,1/6)โ Pr(๐ยช โค1/6)=1
๐ยผ ๐ยช โค 0.01
๐ยผ ๐ยช โค 0.01 = ฮฃยถ:{ยพยฟร.รv ๐ยถ โค 6 โ 0.01 = 0.06
Canbeanything
โค0.06
CompetitivenessofMLEplugin- proof๐ยฎ: ๐ต โ โ: โ๐ โ ๐ซ, ๐ โผ ๐โ Pr ๐ ๐ โ ๐ยฎ ๐ > ๐ < ๐ฟ
thenPr ๐ ๐ โ ๐(๐ยชยซยฌยญ) > 2 โ ๐ < ๐ฟ โ |๐ต|
Forall๐งsuchthat๐ ๐ง โฅ ๐ฟ: 1) ๐ ๐ โ ๐ยฎ ๐ง โค ๐
2)๐ยชยซยฌยญ ๐ง โฅ ๐ ๐ง > ๐ฟ,hence ๐(๐ยชยซยฌยญ) โ ๐ยฎ ๐ง โค ๐
Triangleinequality: ๐(๐ยชยซยฌยญ) โ ๐ ๐ โค 2๐
If ๐(๐ยชยซยฌยญ) โ ๐ ๐ > 2๐then๐ ๐ง < ๐ฟ,
Pr ๐ ๐ยชยซยฌยญ โ ๐ ๐ > 2๐ โค Pr ๐ ๐ < ๐ฟ โค u ๐(๐ง)๏ฟฝ
{ ยช รร
โค ๐ฟ โ |๐ต|
PMLperformancebound
If๐ = ๐ ๐, ๐, ๐, ๐ฟ ,then๐{|} ๐, ๐, 2 โ ๐, ฮฆa โ ๐ฟ โค ๐
|ฮฆa|:numberofprofilesoflength๐
Profileoflengthn:partitionofn
{3},{1,2},{1,1,1}โ 3,2+1,1+1+1ฮฆa = ๐๐๐๐ก๐๐ก๐๐๐#๐๐๐
Hardy-Ramanujan:|ฮฆa| < ๐Q a๏ฟฝ
Easy:๐ijk aโ a๏ฟฝ
If ๐ = ๐บ ๐, ๐, ๐บ, ๐ร๐ ๐๏ฟฝ ,then ๐บ๐๐๐ ๐, ๐, ๐๐บ, ๐ร ๐๏ฟฝ โค ๐
SummarySymmetricpropertyestimationPMLplug-inapproach
SimpleUniversalSampleoptimalforknownsublinearproperties
FuturedirectionsProvablyefficientalgorithmsIndependentprooftechnique
ThankYou!
PMLforsymmetricfForany symmetricproperty๐,
if๐ = ๐ ๐, ๐, ๐, 0.1 ,then๐{|} ๐, ๐, 2 โ ๐, 0.1 = ๐(๐๏ฟฝ).
Proof.Bymediantrick,๐ ๐, ๐, ๐, ๐ร| = ๐ ๐ โ ๐ .Therefore,
๐{|} ๐, ๐, 2 โ ๐, ๐Q aโ |๏ฟฝ ร| = ๐(๐ โ ๐),Pluggingin๐ = ๐ถ โ ๐,givesthedesiredresult.
Bettererrorprobabilitiesโ warmup
Estimatingadistribution๐ over[๐] toL1distance๐ w.p.>0.9 requiresฮ(๐/๐๏ฟฝ) samples.Proof:Exercise.Estimatingadistribution๐ over[๐] toL1distance๐ w.p.1 โ ๐รh requiresฮ(๐/๐๏ฟฝ) samples.Proof:โข Empiricalestimator๏ฟฝฬ๏ฟฝโข |๏ฟฝฬ๏ฟฝ โ ๐| hasboundeddifferenceconstant(b.d.c.)2/๐โข ApplyMcDiarmidโs inequality
BettererrorprobabilitiesRecall
๐ ๐ป, ๐, ๐, 2/3 = ฮ๐
๐ โ log ๐โข Existingoptimalestimators:highb.d.c.โข Modifythemtohavesmallb.d.c.,andstillbeoptimalโข Inparticular,cangetb.d.c.=๐รร.รร (exponentcloseto1)
โข Withtwicethesampleserrordropssuper-fast
๐ ๐ป, ๐, ๐, ๐รaร.ร = ฮ hgโ ijk h
Similarresultsforotherproperties
EvenapproximatePMLworks!โข Perhaps findingexactPMLishard.โข EvenapproximatePMLworks.
Findadistribution๐ suchthat
๐ ฮฆ ๐va โฅ ๐รa.ร โ ๐{|} ฮฆ(๐va)
Eventhisisoptimal(forlarge๐)
InFisherโswordsโฆ
OfcoursenobodyhasbeenabletoprovethatMLEisbestunderallcircumstances.MLEcomputedwithalltheinformationavailablemayturnouttobeinconsistent.Throwingawayasubstantialpartoftheinformationmayrenderthemconsistent.
R.A.Fisher
ProofofPMLperformanceIf๐ = ๐บ ๐, ๐, ๐บ, ๐น ,then๐บ๐๐๐ ๐, ๐, ๐ โ ๐บ, ๐ฑ๐ โ ๐น โค ๐
๐ ๐, ๐, ๐, ๐ฟ ,achievedbyanestimator๐ยฎ(ฮฆ(๐a))
โข Profilesฮฆ ๐a suchthat๐ ฮฆ ๐a > ๐ฟ,
๐รรร ฮฆ โฅ ๐ ฮฆ > ๐ฟ๐ ๐wรรร โ ๐ ๐ โค ๐ ๐wรรร โ ๐ยฎ ฮฆ + ๐ยฎ ฮฆ โ ๐ ๐ < 2๐
โข Profileswith๐ ฮฆ ๐a < ๐ฟ,
๐(๐ ฮฆ ๐a < ๐ฟ < ๐ฟ โ |ฮฆa|