Characterizing and Characterizing and Reasoning about Security Reasoning about Security Vulnerabilities Vulnerabilities Shuo Chen Shuo Chen Center for Reliable and High- Center for Reliable and High- Performance Computing Performance Computing Coordinated Science Laboratory Coordinated Science Laboratory University of Illinois at Urbana- University of Illinois at Urbana- Champaign Champaign Preliminary Examination, May 4 Preliminary Examination, May 4 th th , , 2004 2004 Committee Chair: Prof. Ravishankar K. Iyer Committee Chair: Prof. Ravishankar K. Iyer Committee: Prof. Vikram Adve Committee: Prof. Vikram Adve
39
Embed
Characterizing and Reasoning about Security Vulnerabilities Shuo Chen Center for Reliable and High-Performance Computing Coordinated Science Laboratory.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Characterizing and Reasoning about Characterizing and Reasoning about Security VulnerabilitiesSecurity Vulnerabilities
Shuo ChenShuo ChenCenter for Reliable and High-Performance ComputingCenter for Reliable and High-Performance Computing
Coordinated Science LaboratoryCoordinated Science LaboratoryUniversity of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign
Preliminary Examination, May 4Preliminary Examination, May 4thth, 2004, 2004
Committee Chair: Prof. Ravishankar K. IyerCommittee Chair: Prof. Ravishankar K. IyerCommittee: Prof. Vikram Adve Committee: Prof. Vikram Adve
Prof. Jose Meseguer Prof. Jose Meseguer Prof. David Nicol Prof. David Nicol
Significance of Software Implementation ErrorsSignificance of Software Implementation Errors Bugtraq: 70% of security vulnerabilities due to Bugtraq: 70% of security vulnerabilities due to
implementation errors.implementation errors.
Access Validation
Error10%
Boundary Condition
Error21%
Failure to Handle
Exceptional Conditions
11%
Unknown6%
Design Error18%
Environment Error1%
Input Validation
Error23%
Origin Validation
Error3%
Race Condition
Error2%
Configuration Error5%
What I Have DoneWhat I Have Done Analyzed CERT and Bugtraq reports and the corresponding Analyzed CERT and Bugtraq reports and the corresponding
application source code.application source code. Developed a new FSM representation to decompose each security Developed a new FSM representation to decompose each security
vulnerability to a series of elementary activities (primitive FSMs), vulnerability to a series of elementary activities (primitive FSMs), each indicating a simple predicate. each indicating a simple predicate.
The FSM analysis showedThe FSM analysis showed– Many vulnerabilities (Many vulnerabilities ( 66%) due to 66%) due to
pointer taintedness: user input value pointer taintedness: user input value used as a pointer value (which should used as a pointer value (which should be transparent to users).be transparent to users).
– A significant portion of vulnerabilities A significant portion of vulnerabilities (( 33.6%) due to errors in library 33.6%) due to errors in library functions or incorrect invocations of functions or incorrect invocations of library functionslibrary functions
Format String 7%
Globbing2%
Heap Corruption
8%
Integer Overflow
6%
Buffer Overflow
44%
Other33%
The FSM modeling led to a formal reasoning approach to The FSM modeling led to a formal reasoning approach to examine pointer taintedness in applications.examine pointer taintedness in applications.
Formal Analysis of Pointer TaintednessFormal Analysis of Pointer Taintedness Pointer Taintedness: a pointer value, including a return : a pointer value, including a return
address, is derived directly or indirectly from user input. address, is derived directly or indirectly from user input. (formally defined using equational logic) (formally defined using equational logic)
Provides a unifying perspective for reasoning about a Provides a unifying perspective for reasoning about a significant number of security vulnerabilities.significant number of security vulnerabilities.
The notion of pointer taintedness enables:The notion of pointer taintedness enables:– Static analysis: reasoning about the possibility of pointer taintedness Static analysis: reasoning about the possibility of pointer taintedness
by source code analysis; by source code analysis; – Runtime checking: inserting assertions in object code to check Runtime checking: inserting assertions in object code to check
pointer taintedness at runtime; pointer taintedness at runtime; – Hardware architecture-based support to detect pointer taintedness.Hardware architecture-based support to detect pointer taintedness.
Current focus: extraction of security specifications of library Current focus: extraction of security specifications of library functions based on pointer taintedness semantics. functions based on pointer taintedness semantics.
Publications of My ResearchPublications of My Research Papers:
– J. Xu, S. Chen, Z. Kalbarczyk, R. K. Iyer. "An Experimental Study of Security Vulnerabilities Caused by Errors". DSN 2001.
– S. Chen, J. Xu, R. K. Iyer, K. Whisnant. "Modeling and Analyzing the Security Threat of Firewall Data Corruption Caused by Instruction Transient Errors". DSN 2002.
– S. Chen, Z. Kalbarczyk, J. Xu, R. K. Iyer. "A Data-Driven Finite State Machine Model for Analyzing Security Vulnerabilities". DSN 2003.
– S. Chen, K. Pattabiraman, Z. Kalbarczyk, R. K. Iyer, “Formal Reasoning of Various Categories of Widely Exploited Security Vulnerabilities Using Pointer Taintedness Semantics”, IFIP Information Security Conference, 2004.
Security Vulnerability Report– S. Chen and J. Xu, “Bugtraq ID 6255: NULL HTTPD Heap
Corruption Vulnerability”, the Bugtraq List.
A Finite State Machine Approach for A Finite State Machine Approach for Analyzing Security VulnerabilitiesAnalyzing Security Vulnerabilities
Overview of the StudyOverview of the Study An analysis of security vulnerability databases (CERT and An analysis of security vulnerability databases (CERT and
Bugtraq)Bugtraq) Examination of security vulnerabilities at the application Examination of security vulnerabilities at the application
source-code levelsource-code level A security vulnerability usually consists of a series of A security vulnerability usually consists of a series of
vulnerabilities in multiple elementary activities. Each can be vulnerabilities in multiple elementary activities. Each can be represented by a primitive FSM, indicating a simple represented by a primitive FSM, indicating a simple predicate.predicate.
Provide formalism in reasoning and describing security Provide formalism in reasoning and describing security vulnerabilities.vulnerabilities.
Usefulness of the formalism: discovery of the HTTP daemon Usefulness of the formalism: discovery of the HTTP daemon heap overflow vulnerability.heap overflow vulnerability.
Observation from Data AnalysisObservation from Data Analysis
Vulnerability ID and Vulnerability ID and NameName
Assigned Assigned CategoryCategory
Description in Bugtraq Description in Bugtraq ReportReport
Elementary Elementary ActivityActivity
#3163:#3163: Sendmail signed Sendmail signed integer overflowinteger overflow
Input Input validation errorvalidation error
A negative input integer is A negative input integer is accepted as an array indexaccepted as an array index
Get an input Get an input integerinteger
#5493:#5493: FreeBSD System FreeBSD System Call Signed Integer Call Signed Integer VulnerabilityVulnerability
Boundary Boundary condition errorcondition error
A negative value supplied for A negative value supplied for the argument allows the argument allows exceeding the boundary of an exceeding the boundary of an arrayarray
Use the integer Use the integer as the index to as the index to an arrayan array
##39583958: : RSYNC Signed RSYNC Signed Array Index Remote Code Array Index Remote Code Execution VulnerabilityExecution Vulnerability
Access Access validation errorvalidation error
A remotely supplied signed A remotely supplied signed value is used as an array value is used as an array index, allowing the index, allowing the corruption of a function corruption of a function pointer or a return address.pointer or a return address.
Execute a code Execute a code referred by a referred by a function function pointer or a pointer or a return addressreturn address
Same vulnerabilities can be classified in different categories. Why? Because of the existence of multiple elementary activities.
Primitive FSMPrimitive FSM
We use We use Primitive FSM (pFSM)Primitive FSM (pFSM) to depict an elementary to depict an elementary activity, which specifies a predicate (SPEC) that should activity, which specifies a predicate (SPEC) that should be guaranteed in order to ensure security.be guaranteed in order to ensure security.
IMP
L_A
CC
EP
T
IMPL_REJECT
SPEC_REJECT
SPEC_ACCEPT
SPEC Check State
Reject State
Accept State
Size(PostD
ata)<length(in
put)contentLen<0
contentLen>=0
length(input) <= Size(PostData)
Op 1: Read user input from a socket into a heap buffer
get (contentLen, input)
Calloc PostData[1024+contentLen] Copy input from the socket
B->fd=AB->bk=C
B->fd and B->bk changed
Heap structure corrupted *
When buf is freed, execute B->fd->bk = B->bkB->fd and B->bk
unchanged
A function pointer corrupted *
pFree changed -
pFree unchanged -
Execute pFree when function free is called
Attacker’s malicious code is executed
Calloc is called
- Load pFree to the memory during program initialization
Op 1: Read User Data from a Socket to a Heap BufferOp 1: Read User Data from a Socket to a Heap Buffer
Sendmail Debugging Function Signed Integer Sendmail Debugging Function Signed Integer Overflow (Bugtraq #3163)Overflow (Bugtraq #3163)
Operation 1:Write integer i to tTvect[x]
addr_setuid unchanged
tTvect[x]=i
Operation 2:Manipulate the function pointer
addr_setuid changed
Execute code referred by addr_setuid
convert str_i and str_x to integer i and x
( integer represented by str_x) > 231
x 100
x > 100
?
Execute malicious code
get text strings str_x and str_i
?
x < 0 or x > 100
0 x 100
Function pointer is tainted *
Load the function pointer
( integer represented
by str_x) 2 31
pFSM1
pFSM2
pFSM3
Modeled VulnerabilitiesModeled Vulnerabilities
Signed Integer OverflowSigned Integer Overflow Heap CorruptionHeap Corruption Stack OverflowStack Overflow Format String VulnerabilitiesFormat String Vulnerabilities File Race ConditionsFile Race Conditions Some Input Validation VulnerabilitiesSome Input Validation Vulnerabilities
Formal Reasoning of Security Formal Reasoning of Security Vulnerabilities by Pointer Vulnerabilities by Pointer
Taintedness SemanticsTaintedness Semantics
Pointer Taintedness Caused VulnerabilitiesPointer Taintedness Caused Vulnerabilities Format string vulnerability Format string vulnerability
– Taint an argument pointer of functions such as Taint an argument pointer of functions such as printf, printf, fprintf, sprintf fprintf, sprintf andand syslog. syslog.
Stack smashing Stack smashing – Taint a return address.Taint a return address.
Heap corruption Heap corruption – Taint the free-chunk doubly-linked list of the heap.Taint the free-chunk doubly-linked list of the heap.
Glibc globbing vulnerabilities Glibc globbing vulnerabilities – User input resides in a location that is used as a pointer User input resides in a location that is used as a pointer
by the parent function of by the parent function of glob().glob().
Example of Format String VulnerabilityExample of Format String Vulnerability
In vfprintf(), if (fmt points to “%n”) then **ap = (character count)
Vulnerable code: recv(buf); printf(buf); /* should be printf(“%s”,buf) */
• A store represents a snapshot of the memory state at a point in the program execution. • For each memory location, we can evaluate two properties: content and taintedness (true/false).• Operations on memory locations:
•The fetch operation Ftch(S,A) gives the content of the memory address A in store S•The location-taintedness operation LocT(S,A) gives the taintedness of the location A in store S
• Operations on expressions:•The evaluation operation Eval(S,E) evaluates expression E in store S•The expression-taintedness operation ExpT(S,E) computes the taintedness of expression E in store S
Axioms of Axioms of EvalEval and and ExpTExpT operations operationsEval(S, I) = I // I is an integer constantEval(S, ^ E1) = Ftch(S, Eval(S,E1))Eval(S, E1 + E2) = Eval(S, E1) + Eval(S, E2)Eval(S, E1 - E2) = Eval(S, E1) - Eval(S, E2) … …ExpT (S, I) = falseExpT(S, ^ E1) = LocT(S,Eval(S,E1)) ExpT(S,E1 + E2) = ExpT(S,E1) or ExpT((S,E2)ExpT(S,E1 - E2) = ExpT(S,E1) or ExpT((S,E2)… …
E.g., is the expression (^100)–2 tainted?ExpT(S, (^100)–2) = ExpT(S, (^100)) or ExpT(S, 2) = LocT(S,100) or false = LocT(S,100)
Note: ^ is the dereference operator, ^100 gives the content in the location 100
Semantics of Language LSemantics of Language L Extend the semantics proposed by Extend the semantics proposed by Goguen and Malcolm Goguen and Malcolm The following operations (arithmetic/logic) are defined:The following operations (arithmetic/logic) are defined:
– +, -, *, /, %, !, &&, ||, !=, ==, ……+, -, *, /, %, !, &&, ||, !=, ==, …… The following instructions are defined:The following instructions are defined:
Axioms defining Axioms defining movmov instruction semantics instruction semantics– Specify the effects of applying Specify the effects of applying movmov instruction on a store instruction on a store– Allow taintedness to propagate from Exp2 to [Exp1].Allow taintedness to propagate from Exp2 to [Exp1].
Axioms defining the semantics of Axioms defining the semantics of recvrecv (similarly, (similarly, scanfscanf, , recvfromrecvfrom))– Specify the memory locations tainted by the recv call.
Extracting Function Specifications Extracting Function Specifications by Theorem Proverby Theorem Prover
a) Suppose S1 is the store before Line L1, then LocT(S1,dst) = false b) If S0 is the store before Line L0, and S2 is the store after Line L1, then
I < Eval(S0, ^dst) or Eval(S0, ^dst+dstsize) I => LocT(S2,I) = LocT(S0, I)
c) Suppose S3 is the store before Line L2, then LocT(S3,dst) = false
Theorem generation
Theorem prover
Specifications Suggested by Specifications Suggested by Theorem ProverTheorem Prover
Specifications that are extracted by Specifications that are extracted by the theorem proving approachthe theorem proving approach– srclensrclen <= <= dstsizedstsize– The buffers The buffers srcsrc and and dstdst do not do not
overlap in such a way that the buffer overlap in such a way that the buffer dstdst covers the NULL-terminator of covers the NULL-terminator of the the srcsrc string. string.
– The buffers The buffers dstdst and and srcsrc do not cover do not cover the function frame of strcpy.the function frame of strcpy.
– Initially, Initially, dst dst is not taintedis not tainted
Documented in Linux man page
Not documented
Suppose when function strcpy() is called, the Suppose when function strcpy() is called, the sizesize of of destination buffer (dst) is destination buffer (dst) is dstsizedstsize, the , the lengthlength of user of user input string (src) is input string (src) is srclensrclen
Example Scenario Example Scenario
Destination buffer should not cover the function frame of strcpy.
Are the extracted specifications possible to be violated in application code?
Other ExamplesOther Examples A simplied version of A simplied version of printf()printf()
– 55 lines of C code55 lines of C code– Four security specifications are extracted, including one Four security specifications are extracted, including one
indicating format string vulnerabilityindicating format string vulnerability Function Function free()free() of a heap management system of a heap management system
– 36 lines of C code36 lines of C code– Seven security specifications are extracted, including several Seven security specifications are extracted, including several
specifications indicating heap corruption vulnerabilities.specifications indicating heap corruption vulnerabilities. Socket read functions of Apache HTTPD and NULL Socket read functions of Apache HTTPD and NULL
HTTPDHTTPD– The Apache function is proved to be free of pointer taintedness.The Apache function is proved to be free of pointer taintedness.– Two (known) vulnerabilities are exposed in the theorem proving Two (known) vulnerabilities are exposed in the theorem proving
process. process.
SummarySummary FSM representation: decompose each FSM representation: decompose each
vulnerability to multiple simple predicates (with vulnerability to multiple simple predicates (with real vulnerability examples)real vulnerability examples)
A common characteristic of many predicates: A common characteristic of many predicates: their violations result in pointer taintednesstheir violations result in pointer taintedness
Defined a memory model to reason about Defined a memory model to reason about pointer taintednesspointer taintedness
Developed a theorem proving approach to Developed a theorem proving approach to extract security specifications from library extract security specifications from library functionsfunctions
Future DirectionsFuture Directions Develop a VCGen (verification condition generator) to Develop a VCGen (verification condition generator) to
facilitate theorem proving. (in progress)facilitate theorem proving. (in progress) Apply the pointer taintedness analysis to a substantial Apply the pointer taintedness analysis to a substantial
number of commonly used library functions to extract number of commonly used library functions to extract their security specifications. their security specifications.
Compiler techniques for inserting “guarding code” to Compiler techniques for inserting “guarding code” to check unproved properties at runtime.check unproved properties at runtime.
Explore the possibility of building the taintedness Explore the possibility of building the taintedness notion into virtual machines.notion into virtual machines.
Architecture supports for pointer taintedness detection. Architecture supports for pointer taintedness detection. A module working with RSE (Reliability and Security A module working with RSE (Reliability and Security Engine).Engine).
Backup SlidesBackup Slides
Format String VulnerabilityFormat String Vulnerability
Elementary Activity 1 ofElementary Activity 1 of Sendmail Sendmail VulnerabilityVulnerability
?
pFSM1
a
Elementary Activity 1: get user input Get strings str_x and str_i, convert them to integers x and i
(integer represented by str_x) > 231
(integer represented
by str_x) 2 31Convert str_x and str_i to integers x and i
Get str_x and str_i
Elementary Activity 2 ofElementary Activity 2 of Sendmail Sendmail VulnerabilityVulnerability
pFSM2
Elementary Activity 2: assign debug level
Convert str_x and str_i to integers x and i
x<0 or x>100
0x 100
x >100
x 100
tTvect[x]=i
A function pointer (psetuid) is corrupted
Elementary Activity 3 ofElementary Activity 3 of Sendmail Sendmail VulnerabilityVulnerability
?
pFSM3
Elementary Activity 3: manipulation of function pointer psetuid A function pointer
(psetuid) is corrupted
starting sendmail program
Load psetuid to the memory psetuid is changed
psetuid is unchanged
Execute the code referred by psetuid
Execute malicious code
Appropriateness of DereferenceAppropriateness of Dereference A data value x is appropriate to be dereferenced if and A data value x is appropriate to be dereferenced if and
only if one of the following condition is true, assuming only if one of the following condition is true, assuming Y,Z are integer constants:Y,Z are integer constants:– x is &foo (foo is a program variable) x is &foo (foo is a program variable) – x is malloc(Y) x is malloc(Y) – If there exist values a, b and c that are appropriate to If there exist values a, b and c that are appropriate to
dereference, (recursive definition) dereference, (recursive definition) and x = a + b – c + Zand x = a + b – c + Z
Theorems to prove for indirect write Theorems to prove for indirect write mov [^E1] <- E2mov [^E1] <- E2– E1 should be appropriate to dereferenceE1 should be appropriate to dereference– If E2 is not appropriate to dereference, then [^E1] should not If E2 is not appropriate to dereference, then [^E1] should not
be appropriate to dereference.be appropriate to dereference.
About Equational LogicAbout Equational LogicA logic defined by equations. Equations are used to rewrite symbolic terms (by replacing the term on the left of the equation with the term on the right of the term). Emphasize on its executability.
Define the natural number (NAT):Operators: 0 : a constant of NAT s_ : NAT -> NAT (successor operator) _+_ : NAT NAT -> NAT (addition operator)Equations: 0 + N = N (s M) + N = M + (s N)Example: (s s s 0) + (s s 0) = (s s 0) + (s s s 0) = (s 0) + (s s s s 0) = 0 + (s s s s s 0) = s s s s s 0 Intuitively, this represents “3 + 2 = 5”
Semantics of Semantics of movmov and and recvrecv Axioms of mov instructionAxioms of mov instruction
Ftch((S ; mov [E1] <- E2),X) = Eval(S,E2) if (Eval(S,E1) is X) .
Ftch((S ; mov [E1] <- E2),X) = Ftch(S,X) if not (Eval(S,E1) is X) .
LocT((S ; mov [E1] <- E2),X) = ExpT(S,E2) if (Eval(S,E1) is X) .
LocT((S ; mov [E1] <- E2),X) = LocT(S,X) if not (Eval(S,E1) is X) .
Semantics of Semantics of recvrecv (similarly, (similarly, scanfscanf, , recvfromrecvfrom))
– LocT(S ; call recv (sock , buf , len, flag), A) = true if Eval(S,buf) <= A and A < Eval(S, buf + len) .
– LocT(S ; call recv (sock , buf , len, flag), A) = LocT(S, A) otherwise .
Related WorkRelated Work Security ModelingSecurity Modeling
– Sheyner and Wing: Attack graphsSheyner and Wing: Attack graphs– Ortalo and Deswarte: Markov modelsOrtalo and Deswarte: Markov models
Taintedness analysisTaintedness analysis– Perl runtimePerl runtime– CQUAL and SPLINT: taintedness of program variables. CQUAL and SPLINT: taintedness of program variables.
» A symbol gets tainted only if an explicit C statement passes a tainted value to it by A symbol gets tainted only if an explicit C statement passes a tainted value to it by assignment, argument passing or function return. No underlying memory model. assignment, argument passing or function return. No underlying memory model.
» Not sufficient to detect real pointer taintedness vulnerabilities.Not sufficient to detect real pointer taintedness vulnerabilities.
Position My WorkPosition My Work
Security Specs
Library Functions
Application Code
e.g.,src_len < dst_size (strcpy)src and dst do not overlap (strcpy)Do not free a stack bufferDo not double free a bufferFirst argument of printf cannot come from user… …
Existing static analysis tools
My work
Presentation OutlinePresentation Outline A Brief Description of FSM Approach of A Brief Description of FSM Approach of
Modeling and Analyzing Security Modeling and Analyzing Security VulnerabilitiesVulnerabilities
Real Examples of Pointer TaintednessReal Examples of Pointer Taintedness Definition of Pointer Taintedness in Definition of Pointer Taintedness in
Equational LogicEquational Logic Extraction of Function Specifications by Extraction of Function Specifications by
Theorem ProvingTheorem Proving Summary and Future DirectionsSummary and Future Directions
Extraction of Security Specs of Library Extraction of Security Specs of Library Functions using Pointer TaintednessFunctions using Pointer Taintedness
A formal approach to reason about potential vulnerabilities in A formal approach to reason about potential vulnerabilities in library source code.library source code.
Reasoning based on a hypothetical memory model: a boolean Reasoning based on a hypothetical memory model: a boolean property property taintednesstaintedness associated with each memory location. associated with each memory location.
The semantics of pointer taintedness defined in equational The semantics of pointer taintedness defined in equational logic.logic.
A theorem prover employed to extract security specifications of A theorem prover employed to extract security specifications of library functions. library functions.
Security specifications extracted by the analysis:Security specifications extracted by the analysis:– expose different classes of known security vulnerabilities, expose different classes of known security vulnerabilities,
such as format string, heap corruption and buffer overflow such as format string, heap corruption and buffer overflow vulnerabilities; vulnerabilities;
– indicate function invocation scenarios that may expose new indicate function invocation scenarios that may expose new vulnerabilities.vulnerabilities.
Observations from Data Analysis (cont.)Observations from Data Analysis (cont.)
Exploiting a vulnerability involves multiple Exploiting a vulnerability involves multiple vulnerable vulnerable operationsoperations on several objects. on several objects.
Exploits must pass through multiple Exploits must pass through multiple elementaryelementary activitiesactivities, each providing an opportunity for , each providing an opportunity for performing a security check.performing a security check.
For each elementary activity, the vulnerability data For each elementary activity, the vulnerability data and corresponding code inspections allow us to and corresponding code inspections allow us to define a define a predicatepredicate, which if violated, will result in a , which if violated, will result in a security vulnerability.security vulnerability.