In Defense of Probabilistic Static Analysis BEN LIVSHITS SHUVENDU LAHIRI MICROSOFT RESEARCH
Jan 06, 2016
In Defense of Probabilistic Static
Analysis
BEN LIVSHITS
SHUVENDU LAHIRI
MICROSOFT RESEARCH
FROM THE PEOPLE WHO BROUGHT YOU SOUNDINESS.ORG…
STATIC ANALYSIS: UNEASY TRADEOFFS
too imprecise
useless results
may not scale
does not scale
overkill for some things
possibly still too imprecise for others
WHAT IS MISSING IS
ANALYSIS ELA S T I C I TY
OUR APPROACH IS PROBABILISTIC TREATMENT
Points-to(p, v, h)
• MANY INTERPRETATIONS ARE POSSIBLE
• OUR CERTAINTY IN THE FACT BASED ON STATIC EVIDENCE SUCH AS PROGRAM STRUCTURE
• OUR CERTAINTY BASED ON RUNTIME OBSERVATIONS
• OUR CERTAINLY BASED ON PRIORS OBTAINED FROM THIS OR OTHER PROGRAMS
Object x = new Object();
try {
} catch(...){
x = null;
}
if(...){ // branch direction info
x = new Object();
}else{
x = null;
}
$(‘mydiv’).css(‘color’:’red’);
BENEFITS
RESULT PRIORITIZATION
• STATIC ANALYSIS RESULTS CAN BE NATURALLY RANKED OR PRIORITIZED IN TERMS OF CERTAINTY, NEARLY A REQUIREMENT IN A SITUATION WHERE ANALYSIS USERS ARE FREQUENTLY FLOODED WITH RESULTS
ANALYSIS DEBUGGING
• PROGRAM POINTS OR EVEN STATIC ANALYSIS INFERENCE RULES AND FACTS LEADING TO IMPRECISION CAN BE IDENTIFIED WITH THE HELP OF BACKWARD PROPAGATION
MORE BENEFITS
HARD AND SOFT RULES
• IN AN EFFORT TO MAKE THEIR ANALYSIS FULLY SOUND, ANALYSIS DESIGNERS OFTEN COMBINE CERTAIN INFERENCE RULES WITH THOSE THAT COVER GENERALLY UNLIKELY CASES TO MAINTAIN SOUNDNESS
• NATURALLY BLENDING SUCH INFERENCE RULES TOGETHER, BY GIVING HIGH PROBABILITIES TO THE FORMER AND LOW PROBABILITIES TO THE LATTER ALLOWS US TO BALANCE SOUNDNESS AND UTILITY CONSIDERATIONS
INFUSING WITH PRIORS
• END-QUALITY OF ANALYSIS RESULTS CAN OFTEN BE IMPROVED BY DO- MAIN KNOWLEDGE SUCH AS INFORMATION ABOUT VARIABLE NAMING, CHECK-IN INFORMATION FROM SOURCE CONTROL REPOSITORIES, BUG FIX DATA FROM BUG REPOSITORIES, ETC.
SIMPLE ANALYSIS IN DATALOG
1. x=3;
2. y=null;
3. z=null;
4. z=x;
5. if(...){
6. z=null;
7. y=5;
8. }
9. w=*z
// transitive flow propagation1. FLOW(x,z) :- FLOW(x,y), ASSIGN(y,z)2. FLOW(a,c) :- FLOW(a,b),
ASSIGNCOND(b,c)3. FLOW(x,x). // nullable variables4. NULLABLE(x) :- FLOW(x,y), ISNULL(y) // error detection5. ERROR(a) :- ISNULL(a), DEREF(a)6. ERROR(a) :- !ISNULL(a),
NULLABLE(a), DEREF(a)
RELAXING THE RULES
// transitive flow propagationFLOW(x,y) ^ ASSIGN(y,z) => FLOW(x,z).1 FLOW(a,b) ^ ASSIGNCOND(b,c) => FLOW(a,c)FLOW(x,x).
// transitive flow propagationFLOW(x,z) :- FLOW(x,y), ASSIGN(y,z).FLOW(a,c) :- FLOW(a,b), ASSIGNCOND(b,c).FLOW(x,x).
// nullable variablesFLOW(x,y) ^ ISNULL(y) => NULLABLE(x).
// nullable variablesNULLABLE(x) :- FLOW(x,y), ISNULL(y).
// error detectionISNULL(a)^ DEREF(a) => ERROR(a).0.5 !ISNULL(a) ^ NULLABLE(a) ^ DEREF(a) => ERROR(a).
// error detectionERROR(a) :- ISNULL(a), DEREF(a).ERROR(a) :- !ISNULL(a), NULLABLE(a), DEREF(a).
// priors and shaping distributions3 !FLOW(x,y).
PROBABILISTIC INFERENCE WITH ALCHEMY
• TUNING THE RULES
• TUNING THE WEIGHTS
• SEMANTICS ARE NOT AS OBVIOUS
• SHAPING PRIORS IS NON-TRIVIAL, BUT FRUITFUL
X1
U1
W1
Z1
Z2
W4
Z3
W3
Y1
W5
W6
W7
W8
W9
W10 W11
0.616988 0.614989
0.567993
0.560994 0.544996
CHALLENGES
• LEARNING THE WEIGHTS
• EXPERT USERS
• LEARNING (NEED LABELED DATASET)
• WHAT CLASS OF STATIC ANALYSIS CAN BE MADE ELASTIC?
• DATALOG
• ABSTRACT INTERPRETATION
• DECISION PROCEDURE (SMT)-BASED