8/29/2006 1 Lecture1 – Introduction and Organization Rice ELEC 697 Farinaz Koushanfar Fall 2006
Jan 23, 2016
8/29/2006 1
Lecture1 – Introduction and Organization
Rice ELEC 697
Farinaz Koushanfar
Fall 2006
8/29/2006 2
Summary
• Syllabus
• Course outline
• Motivation
• Class census
8/29/2006 3
Syllabus – ELEC 697
• Title: “Applications of Modern Statistical Learning Theory in Embedded Networked Systems”
• Instructor – Farinaz Koushanfar, Rice University
• Meeting time – 02:30 PM - 03:50 PM TR
• Meeting place – 2014, Duncan Hall
• Prerequisites– Self-contained, but assuming undergraduate level knowledge of
probability and math
8/29/2006 4
Syllabus - Overview and Goals
• Overview– Practical statistical learning methods and tools – Modeling and optimizing emerging embedded systems– Research areas: embedded networked systems, sensor networks, your
research area, assuming you will need the methods there– Emphasizing the methods rather than the theoretical aspects
• Goals – Solid understanding of the state-of-the-art learning methods– Hands-on experience with statistical modeling SW – Applications of statistical modeling in SN, Internet, Networks,
Intrusion detection, CAD, VLSI– A universal tool for your own research
8/29/2006 5
Syllabus – Book and More…
• Textbook– The elements of statistical learning: data mining, inference,
and prediction, T. Hastie; R. Tibshirani; J. Friedman; New York : Springer, 2001.
• Recommended further reading– Pattern Classification (2nd ed.), by R. Duda
; P. Hart; D. Stork; Wiley Interscience, 2001. – Modern Applied Statistics with S-PLUS
, Third Edition, W. Venables; B. Ripley; Springer, 1999.– Papers from the literature
• Course webpage– http://www.ece.rice.edu/~fk1/classes/ELEC697.htm
8/29/2006 6
Syllabus – Grading and Project
• Grading– Weekly assignments (20%)– Mid-semester oral presentation (15%)– Paper presentation and discussion (15%)– Class project report (30%) – Class project presentation (20%)
• Project– Groups of 1 or 2 (collaborations encouraged)– Dataset to analyze and model, can be more theoretical – Either propose or select from my projects/datasets
8/29/2006 7
Syllabus - Software
• Hands-on experience with data analysis and modeling tool
• S programming language (Splus/R)
• You can download R from CRAN at: http://cran.us.r-project.org/
• Documentation is also available at CRAN
• Many more resources available on the web
8/29/2006 8
Course Outline• Week 1: Orientation and overview of supervised learning and its
applications in embedded networks • Week 2: Intro to R, Linear regression, model selection, validation • Week 3: Applications of regression in embedded networks (HW 0)• Week 4: Linear classification: LDA, logistic, separating hyperplanes• Week 5: Applications of classifications in embedded networks (HW 1)• Week 6: Available datasets, possible project proposals, and project
selection• Week 7: Model assessment and selection • Week 8: Applications of models selection and validation in embedded
networked systems (HW 2)
8/29/2006 9
Course Outline (Cont’d)• Week 9: Kernel methods• Week 10: Applications of kernel methods in embedded networked systems
(HW 3)• Week 11: Mid-term project proposal and presentations • Week 12: Model inference and averaging: boosting, ML, EM• Week 13: Applications of model inference in embedded networked
systems (HW4)• Week 14: Progress report -- presenting the related work to your project and
your goals • Week 15: Summary• Week 16: Final project presentation and reports (Report)
+ Paper presentations!
8/29/2006 10
Class Consensus
• Tell me about yourself!• Your name• Your year of study• Your field – or your interest• Your advisor
8/29/2006 11
Statistical Learning - General
Key role in science, finance, and industry. Examples:• Predict the prob. of a second hearth attack
(demographic, diet, clinical measures)• Stock prices in 6 months (company performance and
economic data)• Estimate no.’s in a handwritten zip-code• Estimate the glucose in diabetic patient blood
(infrared absorption spectrum)• Identify the risk factors in a prostate cancer (clinical
and demographic variables)
8/29/2006 12
Sensor Networks (SN)
Contaminant Transport
Courtesy of Prof. Deborah Estrin (UCLA-CENS)
Environmental SensingSeismic Response
xb
ow
MIC
A2
D
OT
mo
tes
8/29/2006 13
Statistical Learning - SN
• Classification/target detection• Modeling the biological systems• Inter-sensor modeling
– Sleeping coordination, compression, intrusion detection/security
• Characterization of sensors - a rapidly growing market, e.g.– Pressure sensors – revenue: $4,018.8M in 2004, projected $5,545.1M
in 2011– Image sensors - $4B++ in 2005, led by the camera phone application– Fiber-optic sensors - $288.1M now, will be $304.3M in 2006– Bio-sensors - ??– Proximity, Photoelectric, Linear Displacement Sensors - $1B in 2004,
will be 1.05B in 2007 – Nano-sensors – will grow more than 30%+ by 2009
Sensors & Transducers Magazine (S&T e-Digest), Vol.62, Issue 12, December 2005, pp.456-461
8/29/2006 14
Statistical Learning – VLSI/CAD
• nanometer-scale devices: increased process variation and decreased predictability of circuit performance
• Traditionally corner-case models were used – pessimistic• The magnitude of variations in the gate length, are predicted to
increase from 35% in a 130nm technology to ~60% in a 70nm• The variations are specified the fraction 3/ • The major trade-off is the computational efficiency
King, Wada, Woo, IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING,VOL. 17, NO. 2, MAY 2004
Photoresist line pattern PDF
8/29/2006 15
Sources of Variations
• Process variations– The value of process parameters observed after fabrication– Parametric yield: the fraction of manufactured samples that
meet the performance constraints• Environmental variations• Modeling variations
– Power and delay models used to perform design, analysis and optimization are inaccurate
• Other sources– Change in process parameters with time– Hot electrons– Process instability
8/29/2006 16
The Theme of the Course
• About practical learning methods – something you can learn and use in your research
• This is not an embedded system design course nor a sensor network design course!
• The research topics are to motivate real applications of the statistical learning in other fields
• You do not need any prior knowledge of these subjects to learn in this course
• Dynamic reading list
8/29/2006 17
Learning from Data
• Supervised learning– Outcome measurement: either categorical or quantitative
– Predict outcome from a set of features
– Training set of data
– A good learner can predict a testing set well
• Unsupervised learning– Only features, no outcome
8/29/2006 18
Example 1: Email Spam
• Categorical outcome: spam or email
• 4601 email messages
• Rule based learning, e.g.– if (%george < 0.6) & (%you > 1.5) then spam
else email
8/29/2006 19
Example 2: Prostate Cancer
• Correlation b/w the level of prostate specific antigen (PSA) and clinical predictors
• Regression problem!
8/29/2006 20
Example 3: Handwritten Digit Recognition
• Automatic envelope sorting procedure• 16x16 8-bit grayscale, intensity from 0-255• Classification problem!