1. Solving a classification problem first may be wasteful 2. Need to address class distribution drift in test sets Quantification Performance Measures 1. Capture quantification goals directly, OR 2. Balance quantification and classification goals (hybrid) 3. Challenging to optimize on voluminous, streaming data 1. Receive a data point 2. Fix dual variables, take SGD step to update model 3. Fix model, take SGD steps to update dual variables 4. Updates extremely cheap: closed form for dual variables Goal: Estimate the relative prevalence of classes of interest in large unlabeled populations in online, streaming settings Applications of Quantification Sentiment Analysis KatyCipriano The best part of the meal is the dessert which they dont make themselves – just sayin . @ bouzagloabc 2 hours ago Tweet JuliaChild Loved the food – worth the 45 minute wait! Can’t wait for my Sunday brunch at ABC . @ bouzagloabc 1 hours ago Tweet GordonRamsay It was RAAAAW. @ bouzagloabc 3 days ago Tweet PaulaDeen @ GordonRamsay Samy the owner threw me out just for pointing that out! Disastrous service 2 days ago Tweet Several applications directly require estimates of class ratios a.k.a. Counting, Class probability re-estimation, Class prior estimation Epidemiology Challenges Online Optimization Methods for the Quantification Problem Purushottam Kar¹, Shuai Li², Harikrishna Narasimhan³, Sanjay Chawla⁴, Fabrizio Sebastiani⁴ ¹IIT Kanpur, India, ²University of Insubria, Italy, ³Harvard University, USA, ⁴QCRI-HBKU, Qatar Full Paper: http://tinyurl.com/quantonline 22 nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining Quantification Performance Measures † ‡ Quantification Performance Measure Hybrid Performance Measure Nested Concave Measures Pseudo-concave Measures NegKLD† QMeasure‡ BAKLD‡ CQReward‡ BKReward‡ Nested Concave Measures Normalized Square Score† 1. Dual computation of nested functions difficult, costly updates 2. Solution: apply duality to nested functions in nested manner! Key Idea 1. Use the level set function as a proxy objective function 2. Exploit the fact that the level set functions are concave Key Idea Fenchel Duality Level Set Structure Level sets are convex Fenchel “dual” Dual variables Any ccv function Linear in TPR and TNR for fixed values of dual variables! NEMSIS (streaming) Pseudo Concave Measures CAN (non-streaming) Guarantee for NEMSIS, SCAN 1. Execute E and M steps approximately in “streaming epochs” 2. E epochs use streaming data to estimate 3. M epochs execute NEMSIS on streaming data - optimize proxy 4. Epochs made progressively longer: more accurate E,M steps SCAN (streaming) Find new level Optimize proxy Progress in proxy provably linked to progress in perf. Level function ccv cvx ccv E M E M E M E M … Guarantee for CAN Experimental Results ccv: concave cvx: convex Superior accuracies and training times across quant and hybrid measures as well as datasets NS: dual updates made using actual TPR/TNR values not surrogates KDD08 PPI Covertype KDD08 Adult Cod-RNA Covertype Adult Attractive trade-off b/w quant/class performance using BAKLD perf. Robustness to drift in class proportions (smaller is better in PosKLD) Theoretical Guarantees Classification accuracy: 50% But … #False pos. = #False neg. ⇒ Perfect quantification (Perfect classification impossible) Balanced Accuracy (BA) Observation: All quantification measures naturally nested concave or pseudo concave – exploit to optimize scalably? Psephology Cause-specific Mortality analysis Transfer Learning