Top Banner
Chapter 1 Windows NT: An Inside Look
222

Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Nov 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 1

Windows NT: An Inside Look

Page 2: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter begins with an evaluation of Windows NT and then examines the overall

architecture of the operating system.

THIS BOOK IS AN EXPLORATION of the internals of the Windows NT operating system. Before

entering the jungle of Windows NT internals, an overview of the topic is necessary. In this

chapter, we explain the overall structure of the Windows NT operating system.

EVALUATING WINDOWS NT

The qualities of an operating system are the result of the way in which the operating system is

designed and implemented. For an operating system to be portable, extensible, and

compatible with previous releases, the basic architecture has to be well designed. In the

following sections, we evaluate Windows NT in light of these issues.

Portability

As you know, Windows NT is available on several platforms, namely, Intel, MIPS, Power PC,

and DEC Alpha. Many factors contribute to Windows NT抯 portability. Probably the most

important factor of all is the language used for implementation. Windows NT is mostly coded in

C, with some parts coded in C++. Assembly language, which is platform specific, is used only

where necessary. The Windows NT team also isolated the hardware-dependent sections of

the operating system in HAL.DLL. As a result, the hardware-independent portions of Windows

NT can be coded in a high-level language, such as C, and easily ported across platforms.

Extensibility

Windows NT is highly extensible, but because of a lack of documentation, its extensibility

features are rarely explored. The list of undocumented features starts with the subsystems.

The subsystems provide multiple operating system interfaces in one operating system. You

can extend Windows NT to have a new operating system interface simply by adding a new

subsystem program. Windows NT provides Win32, OS/2, POSIX, Win16, and DOS interfaces

using the subsystems concept, but Microsoft keeps mum when it comes to documenting the

procedure to add a new subsystem.

The Windows NT kernel is highly extensible because of dynamically loadable kernel modules

that are loaded as device drivers. In Windows NT, Microsoft provides enough documentation

for you to write hardware device drivers–that is, hard disk device drivers, network card device

drivers, tape drive device drivers, and so on. In Windows NT, you can write device drivers that

do not control any hardware device. Even file systems are loaded as device drivers under

Windows NT.

Another example of Windows NT 抯 extensibility is its implementation of the system call

interface. Developers commonly modify operating system behavior by hooking or adding

system calls. The Windows NT development team designed the system call interface to

facilitate easy hooking and adding of system calls, but again Microsoft has not documented

Page 3: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

these mechanisms.

Compatibility

Downward compatibility has been a long-standing characteristic of Intel microprocessors and

Microsoft operating systems, and a key to the success of these two giants. Windows NT had to

allow programs for DOS, Win16, and OS/2 to run unaltered. Compatibility is another reason

the NT development team went for the subsystem concept. Apart from binary compatibility,

where the executable has to be allowed to run unaltered, Windows NT also provides source

compatibility for POSIX-compliant applications. In another attempt to increase compatibility,

Windows NT supports other file systems, such as the file allocation table (FAT) file system

from DOS and the High Performance File System (HPFS) from OS/2, in addition to the native

NT file system (NTFS).

Maintainability

Windows NT is a big piece of code, and maintaining it is a big job. The NT development team

has achieved maintainability through an object-oriented design. Also, the breakup of the

operating system functionality into various layers improves maintainability. The topmost layer,

which is the one that is seen by the users of the operating system, is the subsystems layer.

The subsystems use the system call interface to provide the application programming interface

(API) to the outside world. Below the system call interface layer lies the NT executive, which in

turn rests on the kernel, which ultimately relies on the hardware abstraction layer (HAL) that

talks directly with the hardware.

The NT development team抯 choice of programming language also contributes to Windows

NT抯 maintainability. As we stated previously, the entire operating system has been coded in

C and C++, except for a few portions where the use of assembly language was inevitable.

Plus Points over Windows 95/98

Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT.

Windows NT is a high-end operating system that offers additional features separate from those

provided by conventional PC or desktop operating systems, such as process management,

memory management, and storage management.

Security

Windows NT is a secure operating system based on the following characteristic: A user needs

to log in to the system before he or she can access it. The resources in the system are treated

as objects, and every object has a security descriptor associated with it. A security descriptor

has access control lists attached to it that dictate which users can access the object.

All this being said, a secure operating system cannot be complete without a secure file system,

and the FAT file system from the days of DOS does not have any provision for security. DOS,

being a single-user operating system, did not care about security.

Page 4: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

In response to this shortcoming, the Windows NT team came up with a new file system based

on the HPFS, which is the native file system for OS/2. This new native file system for Windows

NT, known as NTFS, has support for access control. A user can specify the access rights for a

file or directory being created under NTFS, and NTFS allows only the processes with proper

access rights to access that file or directory.

Caution: Keep in mind that no system is 100 percent secure. Windows NT, although remarkably

secure, is not DoD compliant. (For the latest news on DoD compliance, check out

http://www.fcw.com/pubs/fcw/1998/0727/fcw-newsdodsec-7-27-98.htm.)

Multiprocessing

Windows NT supports symmetric multiprocessing, the workstation version of Windows NT can

support two processors, and the server version of Windows NT can support up to four

processors. The operating system needs special synchronization constructs for supporting

multiprocessing. On a single-processor system, critical portions of code can be executed

without interruption by disabling all the hardware interrupts. This is required to maintain the

integrity of the kernel data structures. In a multiprocessor environment, it is not possible to

disable the interrupts on all processors. Windows NT uses spin locks to protect kernel data

structures in a multiprocessor environment.

Note: Multiprocessing can be classified as asymmetric and symmetric. In asymmetric

multiprocessing, a single processor acts as the master processor and the other processors act as

slaves. Only the master processor runs the kernel code, while the slaves can run only the user

threads. Whenever a thread running on a slave processor invokes a system service, the master

processor takes over the thread and executes the requested kernel service. The scheduler, being a

kernel code, runs only on the master processor. Thus, the master processor acts as the scheduler,

dispatching user mode threads to the slave processors. Naturally, the master processor is heavily

loaded and the system is not scalable. Compare this with symmetric multiprocessing, where any

processor can run the kernel code as well as the user code.

International Language Support

A significant portion of PC users today use languages other than English. The key to reaching

these users is to have the operating system support their languages. Windows NT achieves

this by adopting the Unicode standard for character sets. The Unicode standard has 16-bit

character set, while ASCII uses an 8-bit character set. The first 256 characters in Unicode

match the ASCII character set. This leaves enough space for representing characters from

non-Latin scripts and languages. The Win32 API allows Unicode as well as ASCII character

sets, but the Windows NT kernel uses and understands only Unicode. Although the application

programmer can get away without knowing Unicode, device driver developers need to be

familiar with Unicode because the kernel interface functions accept only Unicode strings and

the driver entry points are supplied with Unicode strings.

Page 5: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Multiprogramming

Windows NT 3.51 and Windows NT 4.0 lack an important feature, namely, the support for

remote login or Telnet of a server operating system. Both these versions of Windows NT can

operate as file servers because they support the common Internet file system (CIFS) protocol.

But they cannot act as CPU servers because logging into a Windows NT machine over the

network is not possible. Consequently, only one user can access a Windows NT machine at a

time. Windows 2000 plans to overcome this deficiency by providing a Telnet server along with

the operating system. This will enable multiple programmers to log in on the machine at the

same time, making Windows 2000 a true server operating system.

Note: Third-party Telnet servers are available for Windows NT 3.51 and Windows NT 4.0. However,

Microsoft 抯 own Telnet server comes only with Windows 2000.

DELVING INTO THE WINDOWS NT ARCHITECTURE

Windows NT borrows its core architecture from the MACH operating system, which was

developed at Carnegie Mellon University. The basic approach of the MACH operating system

is to reduce the kernel size to the minimum by pushing complex operating system functionality

outside the kernel onto user-level server processes. This client-server architecture of the

operating system serves yet another purpose: It allows multiple APIs for the same operating

system. This is achieved by implementing the APIs through the server processes.

The MACH operating system kernel provides a very simple set of interface functions. A server

process implementing a particular API uses these interface functions to provide a more

complex set of interface functions. Windows NT borrows this idea from the MACH operating

system. The server processes in Windows NT are called as the subsystems. NT抯 choice of

the client-server architecture shows its commitment to good software management principles

such as modularity and structured programming. Windows NT had the option to implement the

required APIs in the kernel. Also, the NT team could have added different layers on top of the

Windows NT kernel to implement different APIs. The NT team voted in favor of the subsystem

approach for purposes of maintainability and extensibility.

The Subsystems

There are two types of subsystems in Windows NT: integral subsystems and environment

subsystems. The integral subsystems, such as the security manager subsystem, perform

some essential operating system task. The environment subsystems enable different types of

APIs to be used on a Windows NT machine. Windows NT comes with subsystems to support

the following APIs:

§ Win32 Subsystem. The Win32 subsystem provides the Win32 API. The applications

conforming to the Win32 API are supposed to run unaltered on all the 32-bit platforms provided

by Microsoft–that is, Windows NT, Windows 95, and Win32s. Unfortunately, as you will see later

Page 6: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

in this book, this is not always the case.

§ WOW Subsystem. The Windows on Windows (WOW) subsystem provides backward

compatibility to 16-bit Windows applications, enabling Win16 applications to run on Windows NT.

These applications can run on Windows NT unless they use some of the undocumented API

functions from Windows 3.1 that are not defined in Windows NT.

§ NTVDM Subsystem. The NT Virtual DOS Machine (NTVDM) provides a text-based

environment where DOS applications can run.

§ OS/2 Subsystem. The OS/2 subsystem enables OS/2 applications to run. WOW, NTVDM, and

OS/2 are available only on Intel platforms because they provide binary compatibility to

applications. One cannot run the executable files or binary files created for one type of

processor on another type of processor because of the differences in machine code format.

§ POSIX Subsystem. The POSIX subsystem provides API compliance to the POSIX 1003.1

standard.

The applications are unaware of the fact that the API calls invoked are processed by the

corresponding subsystem. This is hidden from the applications by the respective client-side

DLLs for each subsystem. This DLL translates the API call into a local procedure call (LPC).

LPC is similar to the remote procedure call (RPC) facility available on networked Unix

machines. Using RPC, a client application can invoke a function residing in a server process

running on another machine over the network. LPC is optimized for the client and the server

running on the same machine.

THE WIN32 SUBSYSTEM

The Win32 subsystem is the most important subsystem. Other subsystems such as WOW and

OS/2 are provided mainly for backward compatibility, while the POSIX subsystem is very

restrictive in functionality. (For example, POSIX applications do not have access to any

network that exists.) The Win32 subsystem is important because it controls access to the

graphics device. In addition, the other subsystems are actually Win32 applications that use the

Win32 API to provide their own different APIs. In essence, all the subsystems are based on the

core Win32 subsystem.

The Win32 subsystem in Windows NT 3.51 contains the following components:

§ CSRSS.EXE. This is the user mode server process that serves the USER and GDI calls.

Note: Traditionally, Windows API calls are classified as user/gdi calls and kernel calls. The majority

of user/gdi functions are related to the graphical user interface (GUI) and reside in USER.DLL under

Windows 3.x. The kernel functions are related to non-GUI O/S services–such as file system

management and process management–and reside in KERNEL.EXE under Windows 3.x.

§ KERNEL32.DLL. The KERNEL.EXE in Windows 3.1 has changed to KERNEL32.DLL in

Windows NT. This is more than a change in name. The KERNEL.EXE contained all the kernel

code for Windows 3.1, while KERNEL32.DLL contains just the stub functions. These stub

functions call the corresponding NTDLL.DLL functions, which in turn invoke system call code in

Page 7: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

the kernel.

§ USER32.DLL. This is another client-side DLL for the Win32 subsystem. The majority of the

functions in USER32.DLL are stub functions that convert the function call to an LPC for the

server process.

§ GDI32.DLL. The functions calls related to the graphical device interface are handled by another

client-side DLL for the Win32 subsys tem. The functions in GDI32.DLL are similar to those in

USER32.DLL in that they are just stubs invoking LPCs for the server process.

Under Windows NT 4.0 and Windows 2000, the functionality of CSRSS is moved into a kernel

mode driver (WIN32K.SYS) and USER32 and GDI32 use the system calls interface to call the

services in WIN32K.SYS.

The Core

We have to resort to new terminology for explaining the kernel component of the Windows NT

operating system. Generally, the part of an operating system that runs in privileged mode is

called as the kernel. The Windows NT design team strove to achieve a structured design for

the operating system. The privileged-mode component of Windows NT is also designed in a

layered fashion. A layer uses only the functions provided by the layer below itself. The main

layers in the Windows NT core are the HAL, the kernel, and the NT executive. Because one of

the layers running in privileged mode is itself called as the kernel, we had to come up with a

new term that refers to all these layers together. We抣 l refer to it as the core of Windows NT.

Note: Most modern microprocessors run in at least two modes:normal and privileged. Some

machine instructions can be executed only when the processor is in privileged mode. Also, some

memory area can be marked as “to be accessed in privileged mode only.” The operating systems

use this feature of the processors to implement a secure operating environment for multitasking.

The user processes run in normal (nonprivileged) mode, and the operating system kernel runs in

privileged mode. Thus, the operating system ensures that user processes cannot harm the

operating system.

This division of the Windows NT core into layers is logical. Physically, only the HAL comes as a

separate module. The kernel, NT executive, and the system call layer are all packed in a single

NTOSKRNL.EXE (or NTKRNLMP.EXE, for multiprocessor systems). Though they are

considered part of the NT executive in this chapter, the device drivers (including the file system

drivers) are separate driver modules and are loaded dynamically.

THE HAL

The lowest of the aforementioned layers is the hardware abstraction layer, which deals directly

with the hardware of the machine. The HAL, as its name suggests, hides hardware

idiosyncrasies from the layers above it. As we mentioned previously, Windows NT is a highly

portable operating system that runs on DEC Alpha, MIPS, and Power-PC, in addition to Intel

machines. Along with the processor, the other aspects of a machine, such as the bus

architecture, interrupt handling, and DMA management also change. The HAL.DLL file

Page 8: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

contains the code that hides the processor- and machine-specific details from other parts of

the core. The kernel component of the core and the device drivers use the HAL interface

functions. Thus, only the HAL code changes from platform to platform; the rest of the core

code that uses the HAL interface is highly portable.

THE KERNEL

The kernel of Windows NT offers very primitive but essential services such as multiprocessor

synchronization, thread scheduling, interrupt dispatching, and so on. The kernel is the only

core component that cannot be preempted or paged out. All the other components of the

Windows NT core are preemptive. Hence, under Windows NT, one can find more than one

thread running in privileged mode. Windows NT is one of the few operating systems in which

the core is also multithreaded.

A very natural question to ask is “Why is the kernel nonpreemptive and nonpageable?” Actually,

you can page out the kernel, but a problem arises when you page in. The kernel is responsible

for handling page faults and bringing in the required pages in memory from secondary storage.

Hence, the kernel itself cannot be paged out, or rather, it cannot be paged in if it is paged out.

The same problem prevents the disk drivers supporting the swap space from being pageable.

As the kernel and the device drivers use the HAL services, naturally, the HAL is also

nonpreemptive.

THE NT EXECUTIVE

The NT executive constitutes the majority of the Windows NT core. It sits on top of the kernel

and provides a complex interface to the outside world. The executive is designed in an

object-oriented manner. The NT executive forms the part of the Windows NT core that is fully

preemptive. Generally, the core components added by developers form a part of the NT

executive or rather the I/O Manager. Hence, driver developers should always keep in mind that

their code has to be fully preemptive.

The NT executive can further be subdivided into separate components that implement different

operating system functionality. The various components of the executive are described in the

following sections.

THE OBJECT MANAGER

Windows NT is designed in an object-oriented fashion. Windows, devices, drivers, files,

mutexes, processes, and threads have one thing in common: All of them are treated as objects.

In simpler terms, an object is the data bundled with the set of methods that operate on this

data. The Object Manager makes the task of handling objects much easier by implementing

the common functionality required to manage any type of object. The main tasks of the Object

Manager are as follows:

§ Memory allocation/deallocation for objects.

Page 9: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

§ Object name space maintenance. The Windows NT object name space is structured as a tree,

just like a file system directory structure. An object name is composed of the entire directory

path, starting from the root directory. The Object Manager is responsible for maintaining this

object name space. Unrelated processes can access an object by getting a handle to it using

the object 抯 name.

§ Handle maintenance. To use an object, a process opens the object and gets back a handle.

The process can use this handle to perform further operations on the object. Each process has

a handle table that is maintained by the Object Manager. A handle table is nothing more than an

array of pointers to objects; a handle is just an index in this array. When a process refers to a

handle, the Object Manager gets hold of the actual object by indexing the handle in the handle

table.

§ Reference count maintenance. The Object Manager maintains a reference count for objects,

and automatically deletes an object when the corresponding reference count drops to zero. The

user mode code accesses objects via handles, while the kernel mode code uses pointers to

directly access objects. The Object Manager increments the object reference count for every

handle pointing to the particular object. The reference count is decremented whenever a handle

to the object is closed. Whenever the kernel mode code references an object, the reference

count for that object is incremented. The reference count is decremented as soon as the kernel

mode code is finished accessing the object.

§ Object security. The Object Manager also checks whether a process is allowed to perform a

certain operation on an object. When a process creates an object, it specifies the security

descriptor for that object. When another process tries to open the object, the Object Manager

verifies whether the process is allowed to open the object in the specified mode. The Object

Manager returns a handle to the object if the open request succeeds. As described earlier, a

handle is simply an index in a per-process table that has pointers to actual objects. The mode in

which the open request on an object is granted is stored in the handle table along with the object

pointers. Later, when the process tries to access the object using the handle, the Object

Manager ensures that proper access rights are associated with the handle.

THE I/O MANAGER The I/O Manager controls everything related to input and output. It

provides a framework that all the I/O-related modules (device drivers, file systems, Cache

Manager, and network drivers) must adhere to.

§ Device Drivers. Windows NT supports a layered device driver model. The I/O Manager defines

a common interface that all the device drivers need to provide. This ensures that the I/O

Manager can treat all the devices in the same manner. Also, device drivers can be layered, and

a device driver can expect the same interface from the driver sitting below it. A typical example

of layering is the device driver stack to access a hard disk. The lowest-level driver can talk in

terms of sectors, tracks, and sides. There may be a second layer that can deal with hard disk

partitions and provide an interface for dealing with logical block numbers. The third layer can be

a volume manager driver that can club several partitions into volumes. Finally, a file system

driver that provides an interface to the outside world can sit on top of the volume manager.

§ File Systems. File systems are also coded as loadable device drivers under Windows NT.

Consequently, a file system can be stacked on top of a disk device driver. Also, multiple file

systems can be layered in such a manner that each layer adds to the functionality. For example,

Page 10: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

a replication file system can be layered on top of a normal disk file system. The replication file

system need not implement the code for on-disk structure modifications.

§ Cache Manager. In her book Inside Windows NT, Helen Custer considers the Cache Manager

part of the I/O Manager, though the Cache Manager does not adhere to the device driver

interface. The Cache Manager is responsible for ensuring faster file read/write response.

Though hard disk speeds are increasing, reading/writing to a hard disk is much slower than

reading/writing to RAM. Hence, most operating systems cache the file data in RAM to satisfy the

read requests without needing to read the actual disk block.

Also, a write request can be satisfied without actually writing to the disk. The actual block write

happens when system activity is low. This technique is called as delayed write.

Another technique called as read ahead improves response time. In this technique, the

operating system guesses the disk blocks that will be read in the future, depending on the

access patterns. These blocks are read even before they are requested. The Cache Manager

uses the memory mapping features of the Virtual Memory Manager to implement caching.

§ Network Drivers. The network drivers have an interface standard different from regular device

drivers. The network card drivers stick to the network driver interface specification (NDIS)

standard. The drivers providing transport level interface are layered above the network card

drivers and provide transport driver interface (TDI).

THE SECURITY REFERENCE MONITOR The Security Reference Monitor is responsible for

validating a process 抯 access permissions against the security descriptor of an object. The

Object Manager uses the services of the Security Reference Monitor while validating a

process 抯 request to access any object.

THE VIRTUAL MEMORY MANAGER

An operating system performs two essential tasks:

1. It provides a virtual machine, which is easy to program, on top of raw hardware, which is

cumbersome to program. For example, an operating system provides services to access and

manipulate files. Maintaining data in files is much easier than maintaining data on a raw hard

disk.

2. It allows the applications to share the hardware in a transparent way. For example, an operating

system provides applications with a virtual view of the CPU, where the CPU is exclusively

allotted to the application. In reality, the CPU is shared by various applications, and the

operating system acts as an arbitrator.

These two tasks are performed by the Virtual Memory Manager component of the operating

system when it comes to the hardware memory. Modern microprocessors need an intricate

data structure setup (for example, the segment table setup or the page table setup) for

accessing the memory. The Virtual Memory Manager performs this task for you, which makes

life easier. Furthermore, the Virtual Memory Manager enables the applications to share the

physical memory transparently. It presents each application with a virtual address space where

the entire address space is owned by the application.

Page 11: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The virtual memory concept is one of the key concepts in modern operating systems. The idea

behind it is as follows. In case the operating system loads the entire program in memory while

executing it, the size of the program is severely constrained by the size of physical memory. A

very straightforward solution to the problem is not to load the entire program in memory at one

time, but to load portions of it as and when required. A fact that supports this solution is the

locality of reference phenomenon.

Note: A process accesses only a small number of adjacent memory locations, if one considers a

small time frame. This is even more pronounced because of the presence of looping constructs. In

other words, the access is localized to a small number of memory pages, which is the reason it is

called as locality of reference.

The operating system needs to keep only the working set of a process in memory. The rest of

the address space of the process is supported by the swap space on the secondary storage.

The Virtual Memory Manager is responsible for bringing in the pages from the secondary

storage to the main memory in case the process accesses a paged-out memory location. The

Virtual Memory Manager is also responsible for providing a separate address space for every

process so that no process can hamper the behavior of any other process. The Virtual Memory

Manager is also responsible for providing shared memory support and memory-mapped files.

The Cache Manager uses the memory-mapping interface of the Virtual Memory Manager.

Note: A working set is the set of memory pages that needs to be in memory for a process to

execute without incurring too many page faults. A page fault is the hardware exception received by

the operating system when an attempt is made to access a paged-out memory location.

THE PROCESS MANAGER The Process Manager is responsible for creating processes and

threads. Windows NT makes a very clear distinction between processes and threads. A

process is composed of the memory space along with various objects (such as files, mutexes,

and others) opened by the process and the threads running in the process. A thread is simply

an execution context–that is, the CPU state (especially the register contents). A process has

one or more threads running in it.

THE LOCAL PROCEDURE CALL FACILITY The local procedure call (LPC) facility is

specially designed for the subsystem communication. LPC is based on remote procedure call

(RPC), which is the de facto Unix standard for communication between processes running on

two different machines. LPC has been optimized for communication between processes

running on the same machine. As discussed earlier, the LPC facility is used as the

communication mechanism between the subsystems and their client processes. A client

thread invokes LPC when it needs some service from the subsystem. The LPC mechanism

passes on the parameters for the service invocation to the server thread. The server thread

executes the service and passes the results back to the client thread using the LPC facility.

WIN32K.SYS: A Core Architecture Modification

Page 12: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

In Windows NT 3.51, the KERNEL32.DLL calls are translated to system calls via NTDLL.DLL, while

the GDI and user calls are passed on to the Win32 subsystem process. Windows NT 4.0 has

maintained more or less the same architecture as Version 3.51. However, there is a major

modification in the core architecture (apart from the completely revamped GUI).

In Windows NT 4.0, Microsoft moved the entire Win32 subsystem to the kernel space in an attempt

to improve performance. A new device driver, WIN32K.SYS, implements the Win32 API, and API

calls are translated as system calls instead of LPCs. These system calls invoke the functions in the

new WIN32K.SYS driver. Moving the services out of the subsystem process avoids the context

switches required to process a service request. In Windows NT 3.51, each call to the Win32

subsystem involves two context switches: one from the client thread to the subsystem thread, and

the second from the subsystem thread back to the client thread. Windows 2000 also continues with

the kernel implementation of the Win32 subsystem.

As you will see in Chapter 8, in Windows NT 3.51 the Win32 subsystem uses quick LPC, which is

supposed to be much faster than regular LPC. Still, two context switches per GDI/user call is quite

a bit of overhead. In Windows NT 4.0 and Windows 2000, the GDI/user calls are processed by the

kernel mode driver in the context of the calling thread, thus avoiding the context switching

overheads.

THE SYSTEM CALL INTERFACE The system call interface is a very thin layer whose only job

is to direct the system call requests from the user mode processes to appropriate functions in

the Windows NT core. Though the layer is quite thin, it is a very important because it is the

face of the core (kernel mode) component of Windows NT that the outside user-mode world

sees. The system call interface defines the services offered by the core.

The key portion of the system call interface is to change the processor mode from user mode

to privileged mode. On Intel platforms, this can be achieved through software interrupts.

Windows NT uses the software interrupt 2Eh to implement the system call interface. The

handing routine for interrupt 2Eh passes on the control to the appropriate routine in the core

component, depending on the requested system service ID. NTDLL.DLL is the user mode

component of the system call interface. The user mode programs call NTDLL.DLL functions

(through KERNEL32.DLL functions). The NTDLL.DLL functions are stub routines that set up

appropriate parameters and trigger interrupt 2Eh.. The stub functions in NTDLL.DLL also pass

the system service ID to the interrupt 2Eh handler. The interrupt handler indexes the service ID

in the system call table to get to the core function that fulfills the requested system service. The

interrupt handler calls this core function after copying the required parameters from the user

mode stack to the kernel mode stack.

SUMMARY

In this chapter, we discussed the overall architecture of Windows NT. Windows NT architecture

is robust in the areas of portability, extensibility, compatibility, and maintainability. Features

such as security, symmetric multiprocessor support, and international language support

Page 13: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

position the Windows NT operating system on the high end of the scale compared to Windows

95.

The subsystems that run in user mode and the Windows NT core that runs in kernel mode

make up the operating system environment. The Win32 subsystem is the most important of

the environment subsystems. The Win32 subsystem comprises the client-side DLLs and the

CSRSS process. The Win32 subsystem implements the Win32 API atop the native services

provided by the Windows NT core.

The Windows NT core comprises the hardware abstraction layer (HAL), the kernel, the

Windows NT executive, and the system call interface. The NT executive, which forms a major

portion of the NT core, consists of the Object Manager, the I/O Manager, the Security

Reference Monitor, the Virtual Memory Manager, the Process Manager, and the local

procedure call (LPC) facility.

The chapters that follow cover the main components of the Windows NT operating system in

detail.

Page 14: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 2

Writing Windows NT Device Driver

Page 15: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter covers the software requirements for building Windows NT device drivers, the

procedure for building device drivers, and the structure of a typical device driver.

MOST OF THE SAMPLES IN this book are Windows NT kernel mode device drivers. This chapter

contains the information you need to build device drivers and understand the samples in this

book. This chapter is not a complete guide to writing device drivers. The best sources of

information for detailed coverage of the topic are Art Baker’s The Windows NT Device Driver

Book: A Guide for Programmers and the documentation that ships with the Windows NT

Device Driver Kit (DDK).

PREREQUISITES TO WRITING NT DEVICE DRIVERS

You must install the following tools to create a working development environment for Windows

NT kernel mode device drivers:

Windows NT Device Driver Kit (DDK) from Microsoft For the development of device drivers,

you need to install the Device Driver Kit on your machine. The Device Driver Kit is available

with the MSDN Level 2 subscription. The kit consists of sets of header files, libraries, and tools

that enable easy development of device drivers.

32-bit compiler You need a 32-bit compiler to compile the device drivers. We strongly

recommend using the Microsoft compiler to build the samples in this book.

Win32 Software Development Kit (SDK) Although it is not necessary for compiling the samples

from this book, we recommend installing the latest version of the Win32 SDK on your machine.

Also, when you build device drivers using the DDK tools, you should set the environment

variable MSTOOLS to point to the location where the Win32 SDK is installed. You can fake the

installation of the Win32 SDK by adding the environment variable MSTOOLS with the System

applet in the Control Panel.

DRIVER BUILD PROCEDURE

The Windows NT 4.0 Device Driver Kit installation adds four shortcuts to the Start menu: Free

Build Environment, Checked Build Environment, DDK Help, and Getting Started. The Free

Build Environment and Checked Build Environment shortcuts both refer to a batch file called

SETENV.BAT, but have different command line arguments. Assuming that the DDK is installed

in directory E:\DDK40, the Free Build Environment shortcut refers to this command line:

%SystemRoot%\System32\cmd.exe /k E:\DDK40\bin\setenv.bat

E:\DDK40 free

The Checked Build Environment shortcut, on the other hand, refers to this command line:

%SystemRoot%\System32\cmd.exe /k E:\DDK40\bin\setenv.bat E:\DDK40

Page 16: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

checked

Both shortcuts spawn CMD.EXE and ask it to execute the SETENV.BAT file with appropriate

parameters. After executing the command, CMD.EXE still keeps running because of the

presence of the /k switch. The SETENV.BAT file sets the environment variables, which are

added to the CMD.EXE process’s environment variable list. The DDK tools, which are

spawned from CMD.EXE, refer to these environment variables. SETENV.BAT sets the

environment variables, including BUILD_DEFAULT, BUILD_DEFAULT_TARGETS,

BUILD_MAKE_PROGRAM, and DDKBUILDENV.

The drivers are compiled using the utility called BUILD.EXE, which is shipped with the DDK.

This utility takes as input a file named SOURCES. This file contains the list of source files to be

compiled to build the driver. This file also contains the name of the target executable, the type

of the target executable (for example, DRIVER or PROGRAM), and the path of the directory

where the target executable is to be created.

Each sample device driver included with the DDK contains a makefile. However, this is not the

actual makefile for the device driver sample. Instead, the makefile for each sample device

driver includes a common makefile, named MAKEFILE.DEF, which is present in the INC

directory of the DDK installation directory.

Here is the sample makefile from the DDK sample:

#

# DO NOT EDIT THIS FILE!!! Edit .\sources. if you want to add a new source

# file to this component. This file merely indirects to the real make file

# that is shared by all the driver components of the Windows NT DDK

#

!INCLUDE $(NTMAKEENV)\makefile.def

Some of the driver samples in this book have Assembly language files (.ASM files). You

cannot refer to the .ASM file directly into the SOURCES file. Instead, you have to create a

directory called I386 in the directory where the source files for the drivers are kept. All

the .ASM files for the drivers must be kept in the I386 directory. The BUILD.EXE utility

automatically uses ML.EXE to compile these .ASM files.

BUILD.EXE generates the appropriate driver or application based on the settings specified in

the SOURCES file and using the platform-dependent environment variables. If there are any

errors during the BUILD process, the errors are logged to a file called as BUILD.ERR. If there

are any warnings, they are logged to the BUILD.WRN file. Also, the BUILD utility generates a

file called BUILD.LOG, which contains lists of commands invoked by the BUILD utility and the

messages given by these tools.

STRUCTURE OF A DEVICE DRIVER

Page 17: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Just as every Win32 application has an entry point (main/WinMain), every kernel mode device

driver has an entry point called DriverEntry. A special process called SYSTEM loads the device

drivers. Hence, the DriverEntry of each device driver is called in the context of the SYSTEM

process. Each device driver is represented by a device name in the system, so each driver has

to create a device name for its device. This is done with the IoCreateDevice function. If Win32

applications need to open the handle to a device driver, the driver needs to create a symbolic

link for its device in the DosDevices object directory. This is done using a call to

IoCreateSymbolicLink. Typically, in the DriverEntry routine of a device driver, the device object

and the symbolic link object are created for a device and some driver or device-specific

initialization is performed.

Most of the device driver samples in this book involve pseudo device drivers. These drivers do

not control any physical device. Instead, they complete tasks that can be performed only from

the device driver. (The device driver runs at the most privileged mode of the processor–Ring 0

in Intel processors.) In addition, the DriverEntry is supposed to provide sets of entry points for

other functions, such as OPEN, CLOSE, DEVICEIOCONTROL, and so on. These entry points

are provided by filling in some fields in the device object, which is passed as a parameter to

the DriverEntry function.

Because most of the drivers in this book are pseudo device drivers, the DriverEntry routine is

the same for all of them. Only the device driver 杝 pecific initialization is different. Instead of

repeating the same piece of code in each of the driver samples, a macro is written. The macro

is called MYDRIVERENTRY:

#define MYDRIVERENTRY(DriverName,DeviceId,DriverSpecificInit)

PDEVICE_OBJECT deviceObject=NULL;

NTSTATUS ntStatus;

WCHAR deviceNameBuffer[]=L"\\Device\\"##DriverName;

UNICODE_STRING deviceNameUnicodeString;\

WCHAR deviceLinkBuffer[]=L"\\DosDevices\\"##DriverName;

UNICODE_STRING deviceLinkUnicodeString;

RtlInitUnicodeString(&deviceNameUnicodeString,

deviceNameBuffer);

ntStatus = IoCreateDevice(DriverObject,

0,

&deviceNameUnicodeString,

##DeviceId,

0,

TRUE,

&deviceObject);

if (NT_SUCCESS(ntStatus)){

RtlInitUnicodeString(&deviceLinkUnicodeString,

deviceLinkBuffer);

Page 18: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

ntStatus= IoCreateSymbolicLink(

&deviceLinkUnicodeString,

&deviceNameUnicodeString);

if (!NT_SUCCESS(ntStatus)) {

IoDeleteDevice (deviceObject);

return ntStatus;

}

ntStatus=##DriverSpecificInit;

if (!NT_SUCCESS(ntStatus)) {

IoDeleteDevice (deviceObject);

IoDeleteSymbolicLink(&deviceLinkUnicodeString);

return ntstatus;

}

DriverObject->MajorFunction[IRP_MJ_CREATE] =

DriverObject->MajorFunction[IRP_MJ_CLOSE] =

DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] =

DriverDispatch;

DriverObject->DriverUnload=DriverUnload;

return STATUS_SUCCESS;

} else

{

return ntStatus;

};

The macro takes the following three parameters:

§ The first parameter is the name of the driver, which will be used for creating the device name

and symbolic link.

§ The second parameter is the device ID, which uniquely identifies the device.

§ The third parameter is the name of the function, which contains the driver-specific initialization.

The macro expands into calling the necessary functions such as IoCreateDevice and

IoCreateSymbolicLink. If these functions succeed, the driver calls the driver-specific

initialization function specified by the third parameter. If the function returns failures, the macro

returns the error code of the specific initialization function. If the function succeeds, the macro

fills in various function pointers for other functions supported by the driver in the DriverObject.

Once this macro is used in the DriverEntry function, you need to write the DriverDispatch and

DriverUnload functions, as the macro refers to these functions.

The macro definition can be found in UNDOCNT.H on the included CD-ROM.

All the requests to device driver are sent in the form of an I/O Request packet (IRP). The driver

Page 19: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

expects the system to call the specific driver function for all device driver requests based on

the function pointers filled in during DriverEntry. We assume that all the driver functions are

filled in with the address of the DriverDispatch function in the following discussion.

The DriverDispatch function is called with an IRP containing the command code of

IRP_MJ_CREATE whenever an application opens a handle to a device driver using the

CreateFile API call. The DriverDispatch function is called with an IRP containing the command

code of IRP_MJ_CLOSE whenever an application closes its handle to a device driver using

the CloseHandle API function. The DriverDispatch function is called with an IRP containing the

command code of IRP_MJ_DEVICE_CONTROL whenever the application uses the

DeviceIoControl API function to send or receive data from a device driver. If the driver

functionality is being used by multiple processes, the driver can use the CREATE and CLOSE

entry points to perform per-process initialization.

Because all these requests end up calling DriverDispatch, you need to have a way to identify

the actual function requested. You can accomplish this by looking at the MajorFunction field in

an I/O Request Packet (IRP). The request packet contains the function code and any other

additional parameters required to complete the request. The DriverUnload routine is called

when the device driver is unloaded from the system. Just like DriverEntry, the DriverUnload

function is called in the context of the SYSTEM process. Typically, in a DriverUnload routine,

the device driver deletes the symbolic link and the device name created during DriverEntry and

performs some device-specific uninitialization.

SUMMARY

In this chapter, we covered the software requirements for building Windows NT device drivers,

the procedure for building device drivers, and the structure of a typical device driver. Along the

way, we explained a simple macro that you can use to generate the driver entry code for a

typical device drive.

Page 20: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 3

Win32 Implementations: A Comparative Look

Page 21: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter covers the Win32 implementation on Windows 95/98 and Windows NT. The

authors discuss the differences between these two implementations with respect to address

space, process startup, toolhelp functions, multitasking, thunking, device drivers, security, and

API calls.

EACH OPERATING SYSTEM provides sets of services–referred to as an application programming

interface (API)–to developers in some form or another. The developers write software

applications using this API. For example, DOS provides this interface in the form of the famous

INT 21h interface. Microsoft’s newer 32-bit operating systems, such as Windows 95 and

Windows NT, provide the interface in the form of the Win32 API.

Presently, there are four Win32 API implementations available from Microsoft:

§ Windows 95/98

§ Windows NT

§ Win32S

§ Windows CE

Of these, Win32S is very limited due to bugs and the restrictions of the underlying operating

system. Presently, Win32 API implementations on Windows 95/98 and Windows NT are very

popular among developers. Windows CE is meant for palmtop computers. The Win32 API was

first implemented on the Windows NT operating system. Later, the same API was made

available in Windows 95. Ideally, an application written using the standard Win32 API should

work on any operating system that supports the Win32 API implementation. (However, this is

not necessarily true due to the differences between the implementations.) The Win32 API

should hide all the details of the underlying implementations and provide a consistent view to

the outside world.

In this chapter, we focus on the differences between the implementations of the Win32 API

under Windows NT and Windows 95. As developers, you should be aware of these differences

while you develop applications that can run on both of these operating systems.

WIN32 API IMPLEMENTATION ON WINDOWS 95

The Win32 API is provided in the form of the famous trio of the KERNEL32, USER32, and

GDI32 dynamic link libraries (DLLs). However, in most cases, these DLLs are just wrappers

that use generic thunking to call the 16-bit functions.

Note: Generic thunking is a way of calling 16-bit functions from a 32-bit application. (More on

thunking later in this chapter.)

Page 22: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The major design goal for Windows 95 was backward compatibility. Hence, instead of porting

all the 16-bit functions to 32-bit, Microsoft decided to reuse the existing 16-bit code (from the

Windows 3. x operating system) by wrapping it in 32-bit code. This 32-bit code would in turn

call the 16-bit functions. This was a good approach because the tried-and-true 16-bit code was

already running on millions of machines all over the world. In this Win32 API implementation,

most of the functions from KERNEL32 thunk down to KRNL386, USER32 thunks down to

USER.EXE, and GDI32 thunks down to GDI.EXE.

WIN32 API IMPLEMENTATION ON WINDOWS NT

On Windows NT also, the Win32 API is provided in the form of the famous trio of the

KERNEL32, USER32, and GDI32 DLLs. However, this implementation is done completely

from scratch without using any existing 16-bit code, so it is purely a 32-bit implementation of

Win32 API. Even 16-bit applications end up calling this 32-bit API. Windows NT’s 16-bit

subsystem uses universal thunking to achieve this.

Note: Universal thunking is a way of calling 32-bit functions from 16-bit applications. (More on

thunking later in this chapter.)

KRNL386.EXE, USER.EXE, and GDI.EXE, which are used to support 16-bit applications,

thunk up to KERNEL32, USER32, and GDI32 through the WOW (Windows on Windows) layer.

Most of the functions provided by KERNEL32.DLL call one or more native system services to

do the actual work. The native system services are available through a DLL called

NTDLL.DLL.

XREF: All these system services are discussed in Chapter 6.

As far as USER32 and GDI32 are concerned, the implementation differs in NT versions 3.51

and later versions. Under Windows NT 3.51, a separate subsystem process implements the

USER32 and GDI32 calls. The DLLs USER32 and GDI32 contain stubs, which pass the

function parameters to the Win32 subsystem (CSRSS.EXE) and get the results back. The

communication between the client application and the Win32 subsystem is achieved by using

the local procedure call facility provided by the NT executive.

XREF: Chapter 8 covers the details of the local procedure call (LPC) mechanism.

Under Windows NT 4.0 and Windows 2000, the USER32 GDI32 calls the system services

provided by a kernel-mode device driver called WIN32K.SYS. USER32 and GDI32 contain

stubs that call these system services using the 2Eh interrupt. Hence, most of the functionality

of the Win32 Subsystem process (CSRSS.EXE) is taken over by the kernel-mode driver

(WIN32K.SYS). The CSRSS process still exists in NT 4.0 and Windows 2000–however, its role

is limited to mainly supporting Console I/O.

Page 23: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

It is interesting to note that the Win32 API completely hides NTDLL.DLL from the developer.

Actually, most of the functions provided by the Win32 API ultimately call one or more system

services. This system service layer is very powerful and many times contains functions that do

not have equivalent Win32 API functions. Most of the Windows NT Resource Kit utilities link to

this DLL implicitly.

WIN32 IMPLEMENTATION DIFFERENCES

Now we will consider a few aspects of the Win32 API implementation on Windows NT and

Windows 95 that might affect the way developers program using this so-called standard Win32

API.

Address Space

Both Windows 95 and Windows NT deal with flat, 32-bit linear addresses that give 4GB of

virtual address space. Of this, the upper 2GB (hereafter referred to as the shared address

space) is reserved for operating system use, and the lower 2GB (hereafter referred to as the

private address space) is used by the running process. The private address space of each

process is different for each process. Although the virtual addresses in the private address

space of all processes is the same, they may point to a different physical page. The addresses

in the shared address space of all the processes point to the same physical page.

Under Windows 95/98, the operating system DLLs, such as KERNEL32, USER32, and GDI32,

reside in the shared address space, whereas in Windows NT these DLLs are loaded in the

process’s private address space. Hence, under Windows 95/98, it is possible for one

application to interfere with the working of another application. For example, one application

can accidentally overwrite memory areas occupied by these DLLs and affect the working of all

the other processes.

Note: Although the shared address space is protected at the page table level, a kernel-mode

component (for example, a VXD) is able to write at any location in 4GB address space.

In addition, under Windows 95/98, it is possible to load a dynamic link library in the shared

address space. These DLLs will have the same problem described previously if the DLL is

used by multiple applications in the system.

Windows NT loads all the system DLLs, such as KERNEL32, USER32, and GDI32, in the

private address space. As a result, it is never possible for one application to interfere with the

other applications in the system without intending to do so. If one application accidentally

overwrites these DLLs, it will affect only that application. Other applications will continue to run

without any problems.

Memory-mapped files are loaded in the shared address space under Windows 95/98, whereas

Page 24: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

they are loaded in the private address space in Windows NT. In Windows 95/98, it is possible

for one application to create and map a memory-mapped file, pass its address to another

application, and have the other application use this address to share memory. This is not

possible under Windows NT. You have to explicitly create and map a named memory-mapped

file in one application and open and map the memory-mapped file in another application in

order to share it.

The address space differences have strong impacts on global API hooking. The topic of global

API hooking has been covered many times in different articles and books. There is still no

common API hooking solution for both Windows NT and Windows 95/98. The basic problem

with global API hooking is that under Windows 95/98, it is possible to load a DLL in shared

memory. Also, all the system DLLs reside in shared memory. Hooking an API call amounts to

patching the few instructions at the start of function and routing them to a function in a shared

DLL using a simple JMP instruction. This does not work under Windows NT because if you

patch the bytes at the start of the function, they will be patched only in your address space as

the function resides in the private address space.

To do any kind of global API hooking under Windows NT, you have to make sure that the

hooking is performed in each of the running processes. For this, you need to play with the

address space of other processes. In addition, the same hooking also needs to be done in

newly started processes. Windows NT provides a way to automatically load a particular DLL in

each process through the AppInit_DLL registry key.

Process Startup

There are several differences in the way the process is started under Windows 95/98 and

Windows NT. Although the same CreateProcess API call is used in Windows 95/98 and

Windows NT, the implementation is quite different. In this chapter, we are looking only at an

example of a CreateProcess API call. Ideally, both of the CreateProcess implementations

should give the same view to the outside world. When somebody says that a particular API call

is standard, this means that given a specific set of parameters to a function, the function

should behave exactly the same on all the implementations of this API call. In addition, the

function should return the same error codes based on the type of error.

Consider a simple problem such as detecting the successful start of an application. If you try to

spawn a program that has some startup problem (for example, implicitly linked DLLs are

missing), it should return an appropriate error code. The Windows 95/98 implementation

returns an appropriate error code such as STATUS_DLL_NOT_FOUND, whereas Windows

NT does not return any error. Windows NT’s implementation will return an error only if the file

spawned is not present at the expected location. This happens mainly because of the way the

CreateProcess call is implemented under Windows NT and Windows 95/98. When you spawn

a process in Windows 95/98, the complete loading and startup of the process is performed as

part of the CreateProcess call itself. That is, when the CreateProcess call returns, the

spawned process is already running.

Page 25: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

It is interesting to see Windows NT’s implementation of the CreateProcess call. Windows NT’s

CreateProcess calls the native system service (NtCreateProcess) to create a process object.

As part of this call, NTDLL.DLL is mapped in the process’s address space. Then, the

CreateProcess API calls the native system service to create the primary thread in the process

(NtCreateThread). The implicitly linked DLL loading does not happen as part of the

CreateProcess API call. Instead, the primary thread of the process starts at a function in

NTDLL.DLL. This function in turn loads the implicitly loaded DLLs. As a result, there is no way

for the caller to know whether the process has started properly or not. Of course, for GUI

applications, you can use WaitForInputIdle to synchronize with the startup of a process.

However, for non-GUI applications, there is no standard way to achieve this.

Toolhelp Functions

Win32 implementation on Windows 95/98 provides some functions that enable you to

enumerate the processes running in the system, module list, and so on. These functions are

provided by KERNEL32.DLL. The functions are CreateToolHelp32 SnapShot, Process32First,

Process32Next, and others. These functions are not implemented under Windows NT’s

implementation of KERNEL32. The programs that use these functions implicitly will not start at

all under Windows NT. The Windows NT 4.0 SDK comes with a new DLL called PSAPI.DLL,

which provides the equivalent functionality. The header file for this PSAPI.H is also included

with the Windows NT 4.0 SDK. Windows 2000 has this toolhelp functionality built into

KERNEL32.DLL.

Note: A function is implicitly linked if the program calls the function directly by name and includes

the appropriate .LIB file in the project. That is, it does not use GetProcAddress to get the address

of the function.

Multitasking

Both Windows 95 and Windows NT use time slice 朾 ased preemptive multitasking. However,

because the Windows 95 implementation of the WIN32 API depends largely on 16-bit code, it

has a few inherent drawbacks. The major one is the Win16Mutex. Because the existing 16-bit

code is not well suited for multitasking, the easiest choice for Microsoft was to ensure that the

16-bit code is not entered from multiple tasks. To achieve this, Microsoft came up with the

Win16Mutex solution.

Before entering the 16-bit code, the operating system acquires the Win16Mutex, and it leaves

the Win16Mutex while returning from 16-bit code. The Win16Mutex is always acquired when a

16-bit application is running, which results in reduced multitasking. Windows NT does not have

this problem because the entire code is 32-bit and is well suited for time slice 朾 ased

preemptive multitasking. Also, the 16-bit code thunks up to 32-bit code in the case of Windows

NT.

Thunking

Thunking enables 16-bit applications to run in a 32-bit environment and vice versa. It is a way

of calling a function written in one bitness from the code running at a different bitness. Bitness

Page 26: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

is a property of the processor, and you can program the processor to adjust the bitness.

Bitness decides the way instructions are decoded by the processor. There are two different

types of thunking available:

§ Universal thunking

§ Generic thunking

Universal thunking enables you to call a 32-bit function from 16-bit code, whereas generic

thunking enables you to call a 16-bit function from 32-bit code. Windows 95/98 supports both

generic and universal thunking, but Windows NT supports only universal thunking. As you saw

earlier in this chapter, generic thunking is used extensively in WIN32 API implementation of

Windows 95/98. For example, a 32-bit USER32.DLL calls functions from a 16-bit USER.EXE,

and a 32-bit GDI32.DLL calls functions from a 16-bit GDI.EXE. Various issues are involved in

thunking, such as converting 16:16 far pointers in 16-bit code to flat 32-bit address and

manipulating a stack for making a proper call from code running at one bitness to code running

at a different bitness. Microsoft provides tools such as thunk compilers to automate most of

these tasks.

Many vendors who write code for Windows 95/98 use generic thunking to avoid a major

redesign of their applications. For example, say a particular vendor has a product for Windows

3.1 and would like to port it to Windows 95. Instead of rewriting the code for Windows 95, an

easier solution is to use the majority of the existing 16-bit code and use generic thunking as a

way of calling this code from 32-bit applications. However, these applications need to be

rewritten for Windows NT as Windows NT does not support generic thunking.

Device Drivers

Device drivers are trusted components of the operating system that have full access to the

entire hardware. There are no restrictions on what device drivers can do. Each operating

system provides some way of adding new device drivers to the system. The device drivers

need to be written according to the semantics imposed by the operating system. The device

drivers are called virtual device drivers (VXD) in Windows 95/98, and they are called as

kernel-mode device drivers in Windows NT. Windows 95 uses LE file format for virtual device

drivers, whereas Windows NT uses the PE format. As a result, the applications that use VXDs

cannot be run on Windows NT. They need to be ported to a Windows NT (kernel-mode) device

driver.

XREF: Chapter 2 explains how to write device drivers.

Microsoft has come up with a Common Driver Model in Windows 98 and Windows 2000. At

this point, however, you need to port all the applications that use VXDs to Windows NT by

writing an equivalent kernel-mode driver.

Security

The major WIN32 API implementation difference between Windows 95/98 and Windows NT is

Page 27: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

security. Windows 95/98’s implementation does not have any support for security. In all the

Win32 API functions that have SECURITY ATTRIBUTES as one of the parameters, Windows

95/98’s implementation just ignores these parameters. This has some impact on the way a

developer programs. Registry APIs such as RegSaveKey and RegRestoreKey work fine under

Windows 95/98. However, under Windows NT, you need to do a few things before you can use

these functions. In Windows NT, there is a concept of privileges. There are different kinds of

privileges, such as Shutdown, Backup, and Restore. Before using a function such as

RegSaveKey, you need to acquire the Backup privilege. To use RegRestoreKey, you need to

acquire the Restore privilege, and to use the InitiateSystemShutdown function, you need to

acquire the Shutdown privilege.

Under Windows 95/98, anybody can install a VXD. To install a kernel-mode device driver

under Windows NT, you need administrator privilege for security purposes. As mentioned

previously, device drivers are trusted components of the operating system and have access to

the entire hardware. By requiring privileges to install a device driver, Windows NT restricts the

possibility that a guest account holder will install a device driver, which could potentially bring

the whole system down to its knees.

Newly Added API Calls

With each version of Windows NT, new APIs are being added to the WIN32 API set. Most of

these APIs do not have an equivalent API under Windows 95/98. Also, there are a few APIs,

such as CreateRemoteThread, that do not have the real implementation under Windows 95/98.

Under Windows 95/98, this function returns ERROR_CALL_NOT_IMPLEMENTED. As a result,

there will always be a few API calls that are not available on Windows 95/98 or are not

implemented on Windows 95/98. At this point, one can only hope that Microsoft will implement

the API in Windows 95/98 when they add a new API to Windows NT unless the API is

architecture dependent.

SUMMARY

This chapter covered the WIN32 API implementation on Windows 95/98 and Windows NT. We

discussed the differences between these two implementations with respect to address space,

process startup, toolhelp functions, multitasking, thunking, device drivers, security, and newly

added API calls.

Page 28: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 4

Memory Management

Page 29: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter examines memory models in Microsoft operating systems, examines how

Windows NT uses features of the 80386 processor's architecture, and explores the function of

virtual memory.

MEMORY MANAGEMENT HAS ALWAYS been one of the most important and interesting aspects of any

operating system for serious developers. It is an aspect that kernel developers ignore. Memory

management, in essence, provides a thumbnail impression of any operating system.

Microsoft has introduced major changes in the memory management of each new operating

system they have produced. Microsoft had to make these changes because they developed all

of their operating systems for Intel microprocessors, and Intel introduced major changes in

memory management support with each new microprocessor they introduced. This chapter is

a journey through the various Intel microprocessors and the memory management changes

each one brought along with it in the operating system that used it.

MEMORY MODELS IN MICROSOFT OPERATING SYSTEMS

Early PCs based on Intel 8086/8088 microprocessors could access only 640K of RAM and

used the segmented memory model. Consequently, good old DOS allows only 640K of RAM

and restricts the programmer to the segmented memory model.

In the segmented model, the address space is divided into segments. Proponents of the

segmented model claim that it matches the programmer’s view of memory. They claim that a

programmer views memory as different segments containing code, data, stack, and heap. Intel

8086 supports very primitive segmentation. A segment, in the 8086 memory model, has a

predefined base address. The length of each segment is also fixed and is equal to 64K. Some

programs find a single segment insufficient. Hence, there are a number of memory models

under DOS. For example, the tiny model that supports a single segment for code, data, and

stack together, or the small model that allows two segments–one for code and the other for

data plus stack, and so on. This example shows how the memory management provided by an

operating system directly affects the programming environment.

The Intel 80286 (which followed the Intel 8086) could support more than 640K of RAM. Hence,

programmers got new interface standards for accessing extended and expanded memory from

DOS. Microsoft’s second-generation operating system, Windows 3.1, could run on 80286 in

standard mode and used the segmented model of 80286. The 80286 provided better

segmentation than the 8086. In 80286’s model, segments can have a programmable base

address and size limit. Windows 3.1 had another mode of operation, the enhanced mode,

which required the Intel 80386 processor. In the enhanced mode, Windows 3.1 used the

paging mechanisms of 80386 to provide additional performance. The virtual 8086 mode was

also used to implement multiple DOS boxes on which DOS programs could run.

Page 30: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Windows 3.1 does not make full use of the 80386’s capabilities. Windows 3.1 is a 16-bit

operating system, meaning that 16-bit addresses are used to access the memory and the

default data size is also 16 bits. To make full use of 80386’s capabilities, a 32-bit operating

system is necessary. Microsoft came up with a 32-bit operating system, Windows NT. The rest

of this chapter examines the details of Windows NT memory management. Microsoft also

developed Windows 95 after Windows NT. Since both these operating systems run on 80386

and compatibles, their memory management schemes have a lot in common. However, you

can best appreciate the differences between Windows NT and Windows 95/98 after we review

Windows NT memory management. Therefore, we defer this discussion until a later section of

this chapter.

WINDOWS NT MEMORY MANAGEMENT OVERVIEW

We’ll first cover the view Windows NT memory management presents to the outside world. In

the next section, we explain the special features provided by Intel microprocessors to

implement memory management. Finally, we discuss how Windows NT uses these features to

implement the interface provided to the outside world.

Memory Management Interface— Programme r’s View

Windows NT offers programmers a 32-bit flat address space. The memory is not segmented;

rather, it is 4GB of continuous address space. (Windows NT marked the end of segmented

architecture–programmers clearly preferred flat models to segmented ones.) Possibly, with

languages such as COBOL where you need to declare data and code separately,

programmers view memory as segments. However, with new languages such as C and C++,

data variables and code can be freely mixed and the segmented memory model is no longer

attractive. Whatever the reason, Microsoft decided to do away with the segmented memory

model with Windows NT. The programmer need not worry whether the code/data fits in 64K

segments. With the segmented memory model becoming extinct, the programmer can breathe

freely. At last, there is a single memory model, the 32-bit flat address space.

Windows NT is a protected operating system; that is, the behavior (or misbehavior) of one

process should not affect another process. This requires that no two processes are able to see

each other’s address space. Thus, Windows NT should provide each process with a separate

address space. Out of this 4GB address space available to each process, Windows NT

reserves the upper 2GB as kernel address space and the lower 2GB as user address space,

which holds the user-mode code and data. The entire address space is not separate for each

process. The kernel code and kernel data space (the upper 2GB) is common for all processes;

that is, the kernel-mode address space is shared by all processes. The kernel-mode address

space is protected from being accessed by user-mode code. The system DLLs (for example,

KERNEL32.DLL, USER32.DLL, and so on) and other DLLs are mapped in user-mode space.

It is inefficient to have a separate copy of a DLL for each process. Hence, all processes using

the DLL or executable module share the DLL code and incidentally the executable module

Page 31: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

code. Such a shared code region is protected from being modified because a process

modifying shared code can adversely affect other processes using the code.

Sharing of the kernel address space and the DLL code can be called implicit sharing.

Sometimes two processes need to share data explicitly. Windows NT enables explicit sharing

of address space through memory-mapped files. A developer can map a named file onto some

address space, and further accesses to this memory area are transparently directed to the

underlying file. If two or more processes want to share some data, they can map the same file

in their respective address spaces. To simply share memory between processes, no file needs

to be created on the hard disk.

BELOW THE OPERATING SYSTEM

In her book Inside Windows NT, Helen Custer discusses memory management in the context

of the MIPS processor. Considering that a large number of the readers would be interested in a

similar discussion that focuses on Intel processors, we discuss the topic in the context of the

Intel 80386 processor (whose memory management architecture is mimicked by the later

80486 and Pentium series). If you are already conversant with the memory management

features of the 80386 processor, you may skip this section entirely.

We now examine the 80386’s addressing capabilities and the fit that Windows NT memory

management provides for it. Intel 80386 is a 32-bit processor; this implies that the address bus

is 32-bit wide, and the default data size is as well. Hence, 4GB (232 bytes) of physical RAM can

be addressed by the microprocessor. The microprocessor supports segmentation as well as

paging. To access a memory location, you need to specify a 16-bit segment selector and a

32-bit offset within the segment. The segmentation scheme is more advanced than that in

8086. The 8086 segments start at a fixed location and are always 64K in size. With 80386, you

can specify the starting location and the segment size separately for each segment.

Segments may overlap–that is, two segments can share address space. The necessary

information (the starting offset, size, and so forth) is conveyed to the processor via segment

tables. A segment selector is an index into the segment table. At any time, only two segment

tables can be active: a Global Descriptor Table (GDT) and a Local Descriptor Table (GDT). A

bit in the selector indicates whether the processor should refer to the LDT or the GDT. Two

special registers, GDTR and LDTR, point to the GDT and the LDT, respectively. The

instructions to load these registers are privileged, which means that only the operating system

code can execute them.

A segment table is an array of segment descriptors. A segment descriptor specifies the starting

address and the size of the segment. You can also specify some access permission bits with a

segment descriptor. These bits specify whether a particular segment is read-only, read-write,

executable, and so on. Each segment descriptor has 2 bits specifying its privilege level, called

as the descriptor privilege level (DPL).

Page 32: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The processor compares the DPL with the Requested Privilege Level (RPL) before granting

access to a segment. The RPL is dictated by 2 bits in the segment selector while specifying the

address. The Current Privilege Level (CPL) also plays an important role here. The CPL is the

DPL of the code selector being executed. The processor grants access to a particular segment

only if the DPL of the segment is less than or equal to the RPL as well as the CPL. This serves

as a protection mechanism for the operating system. The CPL of the processor can vary

between 0 and 3 (because 2 bits are assigned for CPL). The operating system code generally

runs at CPL=0, also called as ring 0, while the user processes run at ring 3. In addition, all the

segments belonging to the operating system are allotted DPL=0. This arrangement ensures

that the user mode cannot access the operating system memory segments.

It is very damaging to performance to consult the segment tables, which are stored in main

memory, for every memory access. Caching the segment descriptor in special CPU registers,

namely, CS (Code Selector), DS (Data Selector), SS (Stack Selector), and two

general-purpose selectors called ES and FS, solves this problem. The first three selector

registers in this list–that is, CS, DS, and SS–act as default registers for code access, data

access, and stack access, respectively.

To access a memory location, you specify the segment and offset within that segment. The first

step in address translation is to add the base address of the segment to the offset. This 32-bit

address is the physical memory address if paging is not enabled. Otherwise this address is

called as the logical or linear address and is converted to a physical RAM address using the

page address translation mechanism (refer to Figure 4-1).

Figure 4-1: Linear to physical address translation

The memory management scheme is popularly known as paging because the memory is

divided into fixed-size regions called pages. On Intel processors (80386 and higher), the size

Page 33: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

of one page is 4 kilobytes. The 32-bit address bus can access up to 4GB of RAM. Hence, there

are one million (4GB/4K) pages.

Page address translation is a logical to physical address mapping. Some bits in the

logical/linear address are used as an index in the page table, which provides a logical to

physical mapping for pages. The page translation mechanism on Intel platforms has two levels,

with a structure called page table directory at the second level. As the name suggests, a page

table directory is an array of pointers to page tables. Some bits in the linear address are used

as an index in the page table directory to get the appropriate page table to be used for address

translation.

The page address translation mechanism in the 80386 requires two important data structures

to be maintained by the operating system, namely, the page table directory and the page

tables. A special register, CR3, points to the current page table directory. This register is also

called Page Directory Base Register (PDBR). A page table directory is a 4096-byte page with

1024 entries of 4 bytes each. Each entry in the page table directory points to a page table. A

page table is a 4096-byte page with 1024 entries of 4 bytes (32 bits) each. Each Page Table

Entry (PTE) points to a physical page. Since there are 1 million pages to be addressed, out of

the 32 bits in a PTE, 20 bits act as upper 20 bits of physical address. The remaining 12 bits are

used to maintain attributes of the page.

Some of these attributes are access permissions. For example, you can denote a page as

read-write or read-only. A page also has an associated security bit called as the supervisor bit,

which specifies whether a page can be accessed from the user-mode code or only from the

kernel-mode code. A page can be accessed only at ring 0 if this bit is set. Two other bits,

namely, the accessed bit and the dirty bit, indicate the status of the page. The processor sets

the accessed bit whenever the page is accessed. The processor sets the dirty bit whenever

the page is written to. Some bits are available for operating system use. For example,

Windows NT uses one such bit for implementing the copy-on-write protection. You can also

mark a page as invalid and need not specify the physical page address. Accessing such a

page generates a page fault exception. An exception is similar to a software interrupt. The

operating system can install an exception handler and service the page faults. You’ll read more

about this in the following sections.

32-bit memory addresses break down as follows. The upper 10 bits of the linear address are

used as the page directory index, and a pointer to the corresponding page table is obtained.

The next 10 bits from the linear address are used as an index in this page table to get the base

address of the required physical page. The remaining 12 bits are used as offset within the

page and are added to the page base address to get the physical address.

THE INSIDE LOOK

In this section, we examine how Windows NT has selectively utilized existing features of the

Page 34: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

80386 processor’s architecture to achieve its goals.

Flat Address Space

First, let’s see how Windows NT provides 32-bit flat address space to the processes. As we

know from the previous section, Intel 80386 offers segmentation as well as paging. So how

does Windows NT provide a flat memory instead of a segmented one? Turn off segmentation?

You cannot turn off segmentation on 80386. However, the 80386 processor enables the

operating system to load the segment register once and then specify only 32-bit offsets for

subsequent instructions. This is exactly what Windows NT does. Windows NT initializes all the

segment registers to point to memory locations from 0 to 4GB, that is, the base is set as 0 and

the limit is set as 4GB. The CS, SS, DS, and ES are initialized with separate segment

descriptors all pointing to locations from 0 to 4GB. So now the applications can use only 32-bit

offset, and hence see a 32-bit flat address space. A 32-bit application running under Windows

NT is not supposed to change any of its segment registers.

Process Isolation

The next question that comes to mind is, “How does Windows NT keep processes from seeing

each other’s address space?” Again, the mechanism for achieving this design goal is simple.

Windows NT maintains a separate page table directory for each process and based on the

process in execution, it switches to the corresponding page table directory. As the page table

directories for different processes point to different page tables and these page tables point to

different physical pages and only one directory is active at a time, no process can see any

other process’s memory. When Windows NT switches the execution context, it also sets the

CR3 register to point to the appropriate page table directory. The kernel-mode address space

is mapped for all processes, and all page table directories have entries for kernel address

space. However, another feature of 80386 is used to disallow user-mode code from accessing

kernel address space. All the kernel pages are marked as supervisor pages; therefore,

user-mode code cannot access them.

Code Page Sharing in DLLs

For sharing code pages of a DLL, Windows NT maps corresponding page table entries for all

processes sharing the DLL onto the same set of physical pages. For example, if process A

loads X.DLL at address xxxx and process B loads the same X.DLL at address yyyy, then the

PTE for xxxx in process A’s page table and the PTE for yyyy in process B’s page table point to

the same physical page. Figure 4-2 shows two processes sharing a page via same page table

entries. The DLL pages are marked as read-only so that a process inadvertently attempting to

write to this area will not cause other processes to crash.

Page 35: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Figure 4-2: Sharing pages via same page table entries

Note: This is guaranteed to be the case when xxxx==yyyy. However, if xxxx!=yyyy, the physical

page might not be same. We will discuss the reason behind this later in the chapter.

Kernel address space is shared using a similar technique. Because the entire kernel space is

common for all processes, Windows NT can share page tables directly. Figure 4-3 shows how

processes share physical pages by using same page tables. Consequently, the upper half of

the page table directory entries are the same for all processes.

Page 36: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Figure 4-3: Sharing pages via same page directory entries

Listing 4-1 shows the sample program that demonstrates this.

Listing 4-1: SHOWDIR.C

/* Should be compiled in release mode to run properly */

#include <windows.h>

#include <string.h>

#include <stdio.h>

#include "gate.h"

/* Global array to hold the page directory */

DWORD PageDirectory[1024];

This initial portion of the SHOWDIR.C file contains, apart from the header inclusion, the global

definition for the array to hold the page directory. The inclusion of the header file GATE.H is of

interest. This header file prototypes the functions for using the callgate mechanism. Using the

callgate mechanism, you can execute your code in the kernel mode without writing a new

device driver.

XREF: We discuss the callgate mechanism in Chapter 10.

For this sample program, we need this mechanism because the page directory is not

accessible to the user-mode code. For now, it’s sufficient to know that the mechanism allows a

function inside a normal executable to be exec uted in kernel mode. Turning on to the definition

of the page directory, we have already described that the size of each directory entry is 4 bytes

and a page directory contains 1024 entries. Hence, the PageDirectory is an array of 1024

DWORDs. Each DWORD in the array represents the corresponding directory entry.

/* C function called from the assembly stub */

void _stdcall CFuncGetPageDirectory()

{

DWORD *PageDir=(DWORD *)0xC0300000;

int i=0;

for (i=0; i<1024; i++) {

PageDirectory[i] = PageDir[i];

}

}

CfuncGetPageDirectory() is the function that is executed in the kernel mode using the callgate

mechanism. This function simply makes a copy of the page directory in the user-mode

memory area so that the other user-mode code parts in the program can access it. The page

directory is mapped at virtual address 0xC0300000 in every process’s address space. This

address is not accessible from the user mode. The CFuncGetPageDirectory() function copies

Page 37: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

1024 DWORDs from the 0xC0300000 address to the global PageDirectory variable that is

accessible to the user-mode code in the program.

/* Displays the contents of page directory. Starting

* virtual address represented by the page directory

* entry is shown followed by the physical page

* address of the page table

*/

void DisplayPageDirectory()

{

int i;

int ctr=0;

printf("Page directory for the process, pid=%x\n",

GetCurrentProcessId());

for (i=0; i<1024; i++) {

if (PageDirectory[i]&0x01) {

if ((ctr%3)==0) {

printf("\n");

}

printf("%08x:%08x ", i << 22,

PageDirectory[i] & 0xFFFFF000);

ctr++;

}

}

printf("\n");

}

The DisplayPageDirectory() function operates in user mode and prints the PageDirectory array

that is initialized by the CfuncGetPageDirectory() function. The function checks the Least

Significant Bit (LSB) of each of the entries. A page directory entry is valid only if the last bit or

the LSB is set. The function skips printing invalid entries. The function prints three entries on

every line or, in other words, prints a newline character for every third entry. Each directory

entry is printed as the logical address and the address of the corresponding page table as

obtained from the page directory. As described earlier, the first 10 bits (or the 10 Most

Significant Bits [MSB]) of the logical address are used as an index in the page directory. In

other words, a directory entry at index i represents the logical addresses that have i as the

first 10 bits. The function prints the base of the logical address range for each directory entry.

The base address (that is, the least address in the range) has the last 22 bits (or 22 LSBs) as

zeros. The function obtains this base address by shifting i to the first 10 bits. The address of

the page table corresponding to the logical address is stored in the first 20 bits (or 20 MSBs) of

the page directory entry. The 12 LSBs are the flags for the entry. The function calculates the

page table address by masking off the flag bits.

main()

Page 38: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

{

WORD CallGateSelector;

int rc;

static short farcall[3];

/* Assembly stub that is called through callgate */

extern void GetPageDirectory(void);

/* Creates a callgate to read the page directory from Ring 3 */

rc = CreateCallGate(GetPageDirectory, 0, &CallGateSelector);

if (rc == SUCCESS) {

farcall[2] = CallGateSelector;

_asm {

call fword ptr [farcall]

}

DisplayPageDirectory();

getchar();

/* Releases the callgate */

rc=FreeCallGate(CallGateSelector);

if (rc!=SUCCESS) {

printf("FreeCallGate failed, CallGateSelector=%x, rc=%x\n",

CallGateSelector, rc);

}

} else {

printf("CreateCallGate failed, rc=%x\n", rc);

}

return 0;

}

The main() function starts by creating a callgate that sets up the GetPageDirectory() function

to be executed in the kernel mode. The GetPageDirectory() function is written in Assembly

language and is a part of the RING0.ASM file. The CreateCallGate() function, used by the

program to create the callgate, is provided by CALLGATE.DLL. The function returns with a

callgate selector.

XREF: The mechanism of calling the desired function through callgate is explained in Chapter 10.

We’ll quickly mention a few important points here. The callgate selector returned by

CreateCallGate() is a segment selector for the given function: in this case, GetPageDirectory().

To invoke the function pointed by the callgate selector, you need to issue a far call instruction.

The far call instruction expects a 16-bit segment selector and a 32-bit offset within the segment.

When you are calling through a callgate, the offset does not matter; the processor always

jumps at the start of the function pointed to by the callgate. Hence, the program only initializes

the third member of the farcall array that corresponds to the segment selector. Issuing a call

through the callgate transfers the execution control to the GetPageDirectory() function. This

Page 39: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

function calls the CfuncGetPageDirectory() function that copies the page directory in the

PageDirectory array. After the callgate call returns, the program prints the page directory

copied in the PageDirectory by calling the DisplayPageDirectory() function. The program frees

the callgate before exiting.

Listing 4-2: RING0.ASM

.386

.model small

.code

include ..\include\undocnt.inc

public _GetPageDirectory

extrn _CFuncGetPageDirectory@0:near

;Assembly stub called from callgate

_GetPageDirectory proc

Ring0Prolog

call _CFuncGetPageDirectory@0

Ring0Epilog

retf

_GetPageDirectory endp

END

The function to be called from the callgate needs to be written in assembly language for a

couple of reasons. First, the function needs to execute a prolog and an epilog, both of which

are assembly macros, to allow paging in kernel mode. Second, the function needs to issue a

far return at the end. The function leaves the rest of the job to the CFuncGetPageDirectory()

function written in C.

If you compare the output of the showdir program for two different processes, you find that the

upper half of the page table directories for the two processes is exactly the same except for

two entries. In other words, the corresponding kernel address space for these two entries is

not shared by the two processes.

Listing 4-3: First instance of SHOWDIR

Page directory for the process, pid=6f

00000000:01026000 00400000:00f65000 10000000:0152f000

5f800000:00e46000 77c00000:0076b000 7f400000:012cb000

7fc00000:0007e000 80000000:00000000 80400000:00400000

80800000:00800000 80c00000:00c00000 81000000:01000000

81400000:01400000 81800000:01800000 81c00000:01c00000

Page 40: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

82000000:02000000 82400000:02400000 82800000:02800000

82c00000:02c00000 83000000:03000000 83400000:03400000

83800000:03800000 83c00000:03c00000 84000000:04000000

84400000:04400000 84800000:04800000 84c00000:04c00000

85000000:05000000 85400000:05400000 85800000:05800000

85c00000:05c00000 86000000:06000000 86400000:06400000

86800000:06800000 86c00000:06c00000 87000000:07000000

87400000:07400000 87800000:07800000 87c00000:07c00000

a0000000:0153d000 c0000000:00e5d000 c0400000:00c9e000

c0c00000:00041000 c1000000:00042000 c1400000:00043000

c1800000:00044000 c1c00000:00045000 c2000000:00046000

c2400000:00047000 c2800000:00048000 c2c00000:00049000

c3000000:0004a000 c3400000:0004b000 c3800000:0004c000

c3c00000:0004d000 c4000000:0004e000 c4400000:0000f000

c4800000:00050000 c4c00000:00051000 c5000000:00052000

c5400000:00053000 c5800000:00054000 c5c00000:00055000

c6000000:00056000 c6400000:00057000 c6800000:00058000

c6c00000:00059000 c7000000:0005a000 c7400000:0005b000

c7800000:0005c000 c7c00000:0005d000 c8000000:0005e000

c8400000:0005f000 c8800000:00020000 c8c00000:00021000

c9000000:00022000 c9400000:00023000 c9800000:00024000

c9c00000:00025000 ca000000:00026000 ca400000:00027000

ca800000:00028000 cac00000:00029000 cb000000:0002a000

cb400000:0002b000 cb800000:0002c000 cbc00000:0002d000

cc000000:0002e000 cc400000:0002f000 cc800000:002f0000

ccc00000:002f1000 cd000000:002f2000 cd400000:002f3000

cd800000:002f4000 cdc00000:002f5000 ce000000:002f6000

ce400000:00037000 ce800000:00038000 cec00000:00039000

cf000000:0003a000 cf400000:0003b000 cf800000:0003c000

cfc00000:0003d000 d0000000:0003e000 d0400000:0003f000

d0800000:00380000 d0c00000:00301000 d1000000:00302000

d1400000:00303000 d1800000:00304000 d1c00000:00305000

d2000000:00306000 d2400000:00307000 d2800000:00308000

d2c00000:00309000 d3000000:0030a000 d3400000:0030b000

d3800000:0030c000 d3c00000:0030d000 d4000000:0030e000

d4400000:0004f000 d4800000:00310000 d4c00000:00311000

e1000000:00315000 e1400000:010fe000 fc400000:0038d000

fc800000:0038e000 fcc00000:0038f000 fd000000:00390000

fd400000:00391000 fd800000:00392000 fdc00000:00393000

fe000000:00394000 fe400000:00395000 fe800000:00396000

fec00000:00397000 ff000000:00398000 ff400000:00399000

Page 41: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

ff800000:0039a000 ffc00000:00031000

Listing 4-4: Second instance of SHOWDIR

Page directory for the process, pid=7d

00000000:00fa1000 00400000:00fa0000 10000000:0110a000

5f800000:015ac000 77c00000:01a73000 7f400000:013ac000

7fc00000:0145e000 80000000:00000000 80400000:00400000

80800000:00800000 80c00000:00c00000 81000000:01000000

81400000:01400000 81800000:01800000 81c00000:01c00000

82000000:02000000 82400000:02400000 82800000:02800000

82c00000:02c00000 83000000:03000000 83400000:03400000

83800000:03800000 83c00000:03c00000 84000000:04000000

84400000:04400000 84800000:04800000 84c00000:04c00000

85000000:05000000 85400000:05400000 85800000:05800000

85c00000:05c00000 86000000:06000000 86400000:06400000

86800000:06800000 86c00000:06c00000 87000000:07000000

87400000:07400000 87800000:07800000 87c00000:07c00000 a0000000:0153d000 c0000000:00d94000 c0400000:01615000

c0c00000:00041000 c1000000:00042000 c1400000:00043000

c1800000:00044000 c1c00000:00045000 c2000000:00046000

c2400000:00047000 c2800000:00048000 c2c00000:00049000

c3000000:0004a000 c3400000:0004b000 c3800000:0004c000

c3c00000:0004d000 c4000000:0004e000 c4400000:0000f000

c4800000:00050000 c4c00000:00051000 c5000000:00052000

c5400000:00053000 c5800000:00054000 c5c00000:00055000

c6000000:00056000 c6400000:00057000 c6800000:00058000

c6c00000:00059000 c7000000:0005a000 c7400000:0005b000

c7800000:0005c000 c7c00000:0005d000 c8000000:0005e000

c8400000:0005f000 c8800000:00020000 c8c00000:00021000

c9000000:00022000 c9400000:00023000 c9800000:00024000

c9c00000:00025000 ca000000:00026000 ca400000:00027000

ca800000:00028000 cac00000:00029000 cb000000:0002a000

cb400000:0002b000 cb800000:0002c000 cbc00000:0002d000

cc000000:0002e000 cc400000:0002f000 cc800000:002f0000

ccc00000:002f1000 cd000000:002f2000 cd400000:002f3000

cd800000:002f4000 cdc00000:002f5000 ce000000:002f6000

ce400000:00037000 ce800000:00038000 cec00000:00039000

cf000000:0003a000 cf400000:0003b000 cf800000:0003c000

cfc00000:0003d000 d0000000:0003e000 d0400000:0003f000

d0800000:00380000 d0c00000:00301000 d1000000:00302000

d1400000:00303000 d1800000:00304000 d1c00000:00305000

d2000000:00306000 d2400000:00307000 d2800000:00308000

d2c00000:00309000 d3000000:0030a000 d3400000:0030b000

Page 42: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

d3800000:0030c000 d3c00000:0030d000 d4000000:0030e000

d4400000:0004f000 d4800000:00310000 d4c00000:00311000

e1000000:00315000 e1400000:010fe000 fc400000:0038d000

fc800000:0038e000 fcc00000:0038f000 fd000000:00390000

fd400000:00391000 fd800000:00392000 fdc00000:00393000

fe000000:00394000 fe400000:00395000 fe800000:00396000

fec00000:00397000 ff000000:00398000 ff400000:00399000

ff800000:0039a000 ffc00000:00031000

Let’s analyze, one step at a time, why the two entries are different. The page tables

themselves need to be mapped onto some linear address. When Windows NT needs to

access the page tables, it uses this linear address range. To represent 4GB of memory divided

into 1MB pages of 4K each, we need 1K page tables each having 1K entries. To map these 1K

page tables, Windows NT reserves 4MB of linear address space in each process. As we saw

earlier, each process has a different set of page tables. Whatever the process, Windows NT

maps the page tables on the linear address range from 0xC0000000 to 0xC03FFFFF. Let’s call

this linear address range as the page table address range. In other words, the page table

address range maps to different page tables–that is, to different physical pages–for different

processes. As you may have noticed, the page table addresses range falls in the kernel

address space. Windows NT cannot map this crucial system data structure in the user address

space and allow user-mode processes to play with the memory. Ultimately, the result is that

two processes cannot share pages in the page table address range although the addresses lie

in the kernel-mode address range.

Exactly one page table is required to map 4MB address space because each page table has

1K entries and each entry corresponds to a 4K page. Consequently, Windows NT cannot

share the page table corresponding to the page table address range. This accounts for one of

the two mysterious entries in the page table directory. However, the entry’s mystery does not

end here–there is one more subtle twist to this story. The physical address specified in this

entry matches the physical address of the page table directory. The obvious conclusion is that

the page table directory acts also as the page table for the page table address range. This is

possible because the formats of the page table directory entry and PTE are the same on

80386.

The processor carries out an interesting sequence of actions when the linear address within

the page table address range is translated to a physical address. Let’s say that the CR3

register points to page X. As the first step in the address translation process, the processor

treats the page X as the page table directory and finds out the page table for the given linear

address. The page table happens to be page X again. The processor now treats page X as the

required page table and finds out the physical address from it. A more interesting case occurs

when the operating system is accessing the page table directory itself. In this case, the

physical address also falls in page X!

Let’s now turn to the second mysterious entry. The 4MB area covered by this page directory

Page 43: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

entry is internally referred to as hyperspace. This area is used for mapping the physical pages

belonging to other processes into virtual address space. For example, a function such as

MmMapPageInHyperspace() uses the virtual addresses in this range. This area is also used

during the early stages of process creation. For example, when a parent process such as

PROGMAN.EXE spawns a child process such as NOTEPAD.EXE, PROGMAN.EXE has to

create the address space for NOTEPAD.EXE. This is done as a part of the

MmCreateProcessAddressSpace() function. For starting any process, an address space must

be created for the process. Address space is nothing but page directory. Also, the upper-half

entries of page directory are common for all processes except for the two entries that we have

already discussed. These entries need to be created for the process being spawned. The

MmCreateProcessAddressSpace() function allocates three pages of memory: the first page for

the page directory, the second page for holding the hyperspace page table entries, and the

third page for holding the working set information for the process being spawned.

Once these pages are allocated, the function maps the first physical page in the address

space using the MmMapPageInHyperSpace() function. Note that the

MmMapPageInHyperSpace() function runs in the context of PROGMAN.EXE. Now the

function copies the page directory entries in the upper half of the page directory to the mapped

hyperspace virtual address. In short, PROGMAN.EXE creates the page directory for the

NOTEPAD.EXE.

Windows NT supports memory-mapped files. When two processes map the same file, they

share the same set of physical pages. Hence, memory-mapped files can be used for sharing

memory. In fact, Windows NT itself uses memory-mapped files to load DLLs and executables.

If two processes map the same DLL, they automatically share the DLL pages. The

memory-mapped files are implemented using the section object under Windows NT. A data

structure called PROTOPTE is associated with each section object. This data structure is a

variable-length structure based on the size of the section. This data structure contains a 4-byte

entry for each page in the virtual address space mapped by the section object. Each 4-byte

entry has the same structure as that of the PTE. When the page is not being used by any of

the processes, the protopte entry is invalid and contains enough information to get the page

back. In this case, the CPU PTE contains a fixed value that is 0xFFFFF480, which indicates

that accessing this page will be considered a protopte fault.

Now comes the toughest of all questions: "How can Windows NT give away 4GB of memory to

each process when there is far less physical RAM available on the board?" Windows NT, as

well as all other operating systems that allow more address space than actual physical

memory, uses a technique called virtual memory to achieve this. In the next section, we

discuss virtual memory management in Windows NT.

VIRTUAL MEMORY MANAGEMENT

The basic idea behind virtual memory is very simple. For each process, the operating system

Page 44: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

maps few addresses to real physical memory because RAM is expensive and relatively rare.

Remaining memory for each process is really maintained on secondary storage (usually a

hard disk). That’s why it is called virtual memory. The addresses that are not mapped on

physical RAM are marked as such. Whenever a process accesses such an address, the

operating system brings the data into memory from secondary storage. If the operating system

runs out of physical RAM, some data is thrown out to make space. We can always get back

this data because a copy is maintained on secondary storage. The data to be thrown out is

decided by the replacement policy. Windows NT uses First-In-First-Out (FIFO) replacement

policy. According to this policy, the oldest data (that is, the data that was brought in the RAM

first) is thrown out whenever there is a space crunch.

To implement virtual memory management, Windows NT needs to maintain a lot of data. First,

it needs to maintain whether each address is mapped to physical RAM or the data is to be

brought in from secondary storage when a request with the address comes. Maintaining this

information for each byte itself takes a lot of space (actually, more space than the address

space for which the information is to be maintained). So Windows NT breaks the address

space into 4KB pages and maintains this information in page tables. As we saw earlier, a page

table entry (PTE) consists of the address of the physical page (if the page is mapped to

physical RAM) and attributes of the page. Since the processor heavily depends on PTEs for

address translation, the structure of PTE is processor dependent.

If a page is not mapped onto physical RAM, Windows NT marks the page as invalid. Any

access to this page causes a page fault, and the page fault handler can bring in the page from

the secondary storage. To be more specific, when the page contains DLL code or executable

module code, the page is brought in from the DLL or executable file. When the page contains

data, it is brought in from the swap file. When the page represents a memory-mapped file area,

it is brought in from the corresponding file. Windows NT needs to keep track of free physical

RAM so that it can allocate space for a page brought in from secondary storage in case of a

page fault. This information is maintained in a kernel data structure called the Page Frame

Database (PFD). The PFD also maintains a FIFO list of in-memory pages so that it can decide

on pages to throw out in case of a space crunch.

Before throwing out a page, Windows NT must ensure that the page is not dirty. Otherwise, it

needs to write that page to secondary storage before throwing it out. If the page is not shared,

the PFD contains the pointer to PTE so that if the operating system decides to throw out a

particular page, it can then go back and mark the PTE as invalid. If the page is shared, the

PFD contains a pointer to the corresponding PROTOPTE entry. In this case, the PFD also

contains a reference count for the page. A page can be thrown out only if its reference count is

0. In general, the PFD maintains the status of every physical page.

The PFD is an array of 24-byte entries, one for each physical page. Hence, the size of this

array is equal to the number of physical pages that are stored in a kernel variable, namely,

MmNumberOfPhysicalPages. The pointer to this array is stored in a kernel variable, namely,

MmpfnDatabase. A physical page can be in several states–for example, it can be in-use, free,

Page 45: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

free but dirty, and so on. A PFD entry is linked in a doubly linked list, depending on the state of

the physical page represented by it. For example, the PFD entry representing a free page is

linked in the free pages list. Figure 4-4 shows these lists linked through the PFD. The forward

links are shown on the left side of the PFD, and the backward links are shown on the right side.

There are in all six kinds of lists. The heads of these lists are stored in following kernel

variables:

MmStandbyPageListHead

MmModifiedNoWritePageListHead

MmModifiedPageListHead

MmFreePageListHead

MmBadPageListHead

MmZeroedPageListHead

All these list heads are actually structures of 16 bytes each. Here is the structure definition:

Figure 4-4: Various lists linked through PFD

typedef struct PageListHead {

DWORD NumberOfPagesInList,

DWORD TypeOfList,

DWORD FirstPage,

DWORD LastPage

} PageListHead_t;

The FirstPage field can be used as an index into the PFD. The PFD entry contains a pointer to

the next page. Using this, you can traverse any of the lists. Here is the structure definition for

the PFD entry:

Page 46: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

typedef struct PfdEntry {

DWORD NextPage,

void *PteEntry/*PpteEntry,

DWORD PrevPage,

DWORD PteReferenceCount,

void *OriginalPte,

DWORD Flags;

} PfdEntry_t;

Using this, you can easily write a program to dump the PFD. However, there is one problem:

kernel variables, such as list heads, MmPfnDatabase, and MmNumberOfPhysicalPages, are

not exported. Therefore, you have to deal with absolute addresses, which makes the program

dependent on the Windows NT version and build type.

VIRTUAL ADDRESS DESCRIPTORS

Along with the free physical pages, Windows NT also needs to keep track of the virtual

address space allocation for each process. Whenever a process allocates a memory block–for

example, to load a DLL–Windows NT checks for a free block in the virtual address space,

allocates virtual address space, and updates the virtual address map accordingly. The most

obvious place to maintain this information is page tables. For each process, Windows NT

maintains separate page tables. There are 1 million pages, and each page table entry is 4

bytes. Hence, full page tables for a single process would take 4MB of RAM! There is a solution

to this: Page tables themselves can be swapped out. It is inefficient to swap in entire page

tables when a process wants to allocate memory. Hence, Windows NT maintains a separate

binary search tree containing the information about current virtual space allocation for each

process. A node in this binary search tree is called a Virtual Address Descriptor (VAD). For

each block of memory allocated to a process, Windows NT adds a VAD entry to the binary

search tree. Each VAD entry contains the allocated address range–that is, the start address

and the end address of the allocated block, pointers to left and right children VADs, and a

pointer to the parent VAD. The process environment block (PEB) contains a pointer, namely,

VadRoot, to the root of this tree.

Listing 4-5: VADDUMP.C

/* Should be compiled in release mode */

#define _X86_

#include <ntddk.h>

#include <string.h>

#include <stdio.h>

#include "undocnt.h"

#include "gate.h"

/*Define the WIN32 calls we are using, since we can not include both

Page 47: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

NTDDK.H and WINDOWS.H in the same ’C’ file.*/

typedef struct _OSVERSIONINFO{

ULONG dwOSVersionInfoSize;

ULONG dwMajorVersion;

ULONG dwMinorVersion;

ULONG dwBuildNumber;

ULONG dwPlatformId;

CCHAR szCSDVersion[ 128 ];

} OSVERSIONINFO, *LPOSVERSIONINFO;

BOOLEAN _stdcall GetVersionExA(LPOSVERSIONINFO);

PVOID _stdcall VirtualAlloc(PVOID, ULONG, ULONG, ULONG);

/* Max vad entries */

#define MAX_VAD_ENTRIES 0x200

/* Following variables are accessed in RING0.ASM */

ULONG NtVersion;

ULONG PebOffset;

ULONG VadRootOffset;

#pragma pack(1)

typedef struct VadInfo {

void *VadLocation;

VAD Vad;

} VADINFO, *PVADINFO;

#pragma pack()

VADINFO VadInfoArray[MAX_VAD_ENTRIES];

int VadInfoArrayIndex;

PVAD VadTreeRoot;

The initial portion of the VADDUM P.C file has a few definitions apart from the header inclusion.

In this program, we use the callgate mechanism as we did in the showdir program–hence the

inclusion of the GATE.H header file. After the header inclusion, the file defines the maximum

number of VAD entries that we’ll process. There is no limit on the nodes in a VAD tree. We use

the callgate mechanism for kernel-mode execution of a function that dumps the VAD tree in an

array accessible from the user mode. This array can hold up to MAX_VAD_ENTRIES entries.

Each entry in the array is of type VADINFO. The VADINFO structure has two members: the

address of the VAD tree node and the actual VAD tree node. The VAD tree node structure is

defined in the UNDOCNT.H file as follows:

typedef struct vad {

void *StartingAddress;

void *EndingAddress;

struct vad *ParentLink;

struct vad *LeftLink;

struct vad *RightLink;

Page 48: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

DWORD Flags;

}VAD, *PVAD;

The first two members dictate the address range represented by the VAD node. Each VAD tree

node maintains a pointer to the parent node and a pointer to the left child and the right child.

The VAD tree is a binary tree. For every node in the tree, the left subtree consists of nodes

representing lower address ranges, and the right subtree consists of nodes representing the

higher address ranges. The last member in the VAD node is the flags for the address range.

The VADDUMP.C file has a few other global variables apart from the VadInfoArray. A couple of

global variables are used while locating the root of the VAD tree. The PEB of a process points

to the VAD tree root for that process. The offset of this pointer inside the PEB varies with the

Windows NT version. We set the VadRootOffset to the appropriate offset value of the VAD root

pointer depending on the Windows NT version. There is a similar problem of Windows NT

version dependency while accessing the PEB for the process. We use the Thread

Environment Block (TEB) to get to the PEB. One field in TEB points to the PEB, but the offset

of this field inside the TEB structure varies with the Windows NT version. We set the PebOffset

variable to the appropriate offset value of the PEB pointer inside the TEB structure depending

on the Windows NT version. Another global variable, NtVersion, stores the version of Windows

NT running on the machine.

That leaves us with two more global variables, namely, VadInfoArrayIndex and VadTreeRoot.

The VadInfoArrayIndex is the number of initialized entries in the VadInfoArray. The

VadInfoArray entries after VadInfoArrayIndex are free. The VadTreeRoot variable stores the

root of the VAD tree.

The sample has been tested on Windows NT 3.51, 4.0 and Windows 2000 beta2. The sample

will run on other versions of Windows 2000, provided the offsets of VadRoot and PEB remain

same.

/* Recursive function which walks the vad tree and

* fills up the global VadInfoArray with the Vad

* entries. Function is limited by the

* MAX_VAD_ENTRIES. Other VADs after this are not

* stored

*/

void _stdcall VadTreeWalk(PVAD VadNode)

{

if (VadNode == NULL) {

return;

}

if (VadInfoArrayIndex >= MAX_VAD_ENTRIES) {

return;

}

VadTreeWalk(VadNode->LeftLink);

Page 49: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

VadInfoArray[VadInfoArrayIndex].VadLocation = VadNode;

VadInfoArray[VadInfoArrayIndex].Vad.StartingAddress =

VadNode->StartingAddress;

VadInfoArray[VadInfoArrayIndex].Vad.EndingAddress =

VadNode->EndingAddress;

if (NtVersion == 5) {

(DWORD)VadInfoArray[VadInfoArrayIndex].

Vad.StartingAddress <<= 12;

(DWORD)VadInfoArray[VadInfoArrayIndex].

Vad.EndingAddress += 1;

(DWORD)VadInfoArray[VadInfoArrayIndex].

Vad.EndingAddress <<= 12;

(DWORD)VadInfoArray[VadInfoArrayIndex].

Vad.EndingAddress -= 1;

}

VadInfoArray[VadInfoArrayIndex].Vad.ParentLink =

VadNode->ParentLink;

VadInfoArray[VadInfoArrayIndex].Vad.LeftLink =

VadNode->LeftLink;

VadInfoArray[VadInfoArrayIndex].Vad.RightLink =

VadNode->RightLink;

VadInfoArray[VadInfoArrayIndex].Vad.Flags =

VadNode->Flags;

VadInfoArrayIndex++;

VadTreeWalk(VadNode->RightLink);

}

The VadTreeWalk() function is executed in the kernel mode using the callgate mechanism.

The function traverses the VAD tree in the in-order fashion and fills up the VadInfoArray. The

function simply returns if the node pointer parameter is NULL or the VadInfoArray is full.

Otherwise, the function recursively calls itself for the left subtree. The recursion is terminated

when the left child pointer is NULL. The function then fills up the next free entry in the

VadInfoArray and increments the VadInfoArrayIndex to point to the next free entry. Windows

2000 stores the page numbers instead of the actual addresses in VAD. Hence, for Windows

2000, we need to calculate the starting address and the ending address from the page

numbers stored in these fields. As the last step in the in-order traversal, the function issues a

self-recursive to process the right subtree.

/* C function called through assembly stub */

void _stdcall CFuncDumpVad(PVAD VadRoot)

{

VadTreeRoot = VadRoot;

Page 50: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

VadInfoArrayIndex = 0;

VadTreeWalk(VadRoot);

}

The CfuncDumpVad is the caller of the VadTreeWalk() function. It just initializes the global

variables used by the VadTreeWalk() function and calls the VadTreeWalk() function for the root

of the VAD tree.

/* Displays the Vad tree */

void VadTreeDisplay()

{

int i;

printf("VadRoot is located @%08x\n\n",VadTreeRoot);

printf("Vad@\t Starting\t Ending\t Parent\t LeftLink\t RightLink\n");

for (i=0; i < VadInfoArrayIndex; i++) {

printf("%08x %08x %08x %8x %08x %08x\n",

VadInfoArray[i].VadLocation,

VadInfoArray[i].Vad.StartingAddress,

VadInfoArray[i].Vad.EndingAddress,

VadInfoArray[i].Vad.ParentLink,

VadInfoArray[i].Vad.LeftLink,

VadInfoArray[i].Vad.RightLink);

}

printf("\n\n");

}

The VadTreeDisplay() function is a very simple function that is executed in user mode. The

function iterates through all the entries initialized by the VadTreeWalk() function and prints the

entries. Essentially, the function prints the VAD tree in the infix order because the

VadTreeWalk() function dumps the VAD tree in the infix order.

void SetDataStructureOffsets()

{

switch (NtVersion) {

case 3:

PebOffset = 0x40;

VadRootOffset = 0x170;

break;

case 4:

PebOffset = 0x44;

VadRootOffset = 0x170;

break;

Page 51: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

case 5:

PebOffset = 0x44;

VadRootOffset = 0x194;

break;

}

}

As we described earlier, the offset of the PEB pointer within TEB and the offset of the VAD root

pointer within the PEB are dependent on the Windows NT version. The

SetDataStructureOffsets() function sets the global variables indicating these offsets depending

on the Windows NT version.

main()

{

WORD CallGateSelector;

int rc;

short farcall[3];

void DumpVad(void);

void *ptr;

OSVERSIONINFO VersionInfo;

VersionInfo.dwOSVersionInfoSize = sizeof(VersionInfo);

if (GetVersionEx(&VersionInfo) == TRUE) {

NtVersion = VersionInfo.dwMajorVersion;

}

if ((NtVersion < 3)||(NtVersion > 5)) {

printf("Unsupported NT version, exiting...");

return 0;

}

SetDataStructureOffsets();

/* Creates call gate to read vad tree from Ring 3 */

rc = CreateCallGate(DumpVad, 0, &CallGateSelector);

if (rc != SUCCESS) {

printf("CreateCallGate failed, rc=%x\n", rc);

return 1;

}

farcall[2] = CallGateSelector;

_asm {

call fword ptr [farcall]

}

printf("Dumping the Vad tree ...\n\n");

Page 52: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

VadTreeDisplay();

printf("Allocating memory using VirtualAlloc");

ptr = VirtualAlloc(NULL, 4096, MEM_COMMIT, PAGE_READONLY);

if (ptr == NULL) {

printf("Unable to allocate memory\n");

goto Quit;

}

printf("\nMemory allocated @%x\n", ptr);

_asm {

call fword ptr [farcall]

}

printf("\n\nDumping the Vad tree again...\n\n");

VadTreeDisplay();

Quit:

rc = FreeCallGate(CallGateSelector);

if (rc != SUCCESS) {

printf("FreeCallGate failed, Selector=%x, rc=%x\n",

CallGateSelector, rc);

}

return 0;

}

The main() function starts by getting the Windows NT version and calling

SetDataStructureOffsets() to set the global variables storing the offsets for the PEB and the

VAD tree root. It then creates a callgate in the same manner as in the SHOWDIR sample

program. Issuing a call through this callgate ultimately results in the execution of the

VadTreeWalk() function that fills up the VadInfoArray. The main() function then calls the

VadTreeDisplay() function to print the VadInfoArray entries.

We also show you the change in the VAD tree due to memory allocation in this sample

program. After printing the VAD tree once, the program allocates a chunk of memory. Then, the

program issues the callgate call again and prints the VAD tree after returning from the call. You

can observe the updates that happened to the VAD tree because of the memory allocation.

The program frees up the callgate before exiting.

Listing 4-6: RING0.ASM

.386

.model small

.code

public _DumpVad

extrn _CFuncDumpVad@4:near

extrn _PebOffset:near

extrn _VadRootOffset:near

Page 53: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

include ..\include\undocnt.inc

_DumpVad proc

Ring0Prolog

;Gets the current thread

MOV EAX,FS:[00000124h]

;Gets the current process

ADD EAX, DWORD PTR [_PebOffset]

MOV EAX,[EAX]

;Push Vad Tree root

ADD EAX, DWORD PTR [_VadRootOffset]

MOV EAX, [EAX]

PUSH EAX

CALL _CFuncDumpVad@4

Ring0Epilog

RETF

_DumpVad endp

END

The function to be called from the callgate needs to be written in the Assembly language for

reasons already described. The DumpVad() function gets hold of the VAD root pointer and

calls the CFuncDumpVad() function that dumps the VAD tree in the VadInfoArray. The function

gets hold of the VAD root from the PEB after getting hold of the PEB from the TEB. The TEB of

the currently executing thread is always pointed to by FS:128h. As described earlier, the offset

of the VAD root pointer inside PEB and the offset of the PEB pointer inside the TEB vary with

the Windows NT version. The DumpVad() function uses the offset values stored in the global

variable by the SetDataStructureOffsets() function.

Listing 4-7 presents the output from an invocation of the VADDUMP program. Note that the

VAD tree printed after allocating memory at address 0x300000 shows an additional entry for

that address range.

Listing 4-7: Program output

Dumping the Vad tree...

VadRoot is located @fe21a9c8

Vad@ Starting Ending Parent LeftLink RightLink

fe216b08 00010000 00010fff fe21a9c8 00000000 fe25a0e8

fe25a0e8 00020000 00020fff fe216b08 00000000 fe275da8

fe275da8 00030000 0012ffff fe25a0e8 00000000 fe22a428

fe22a428 00130000 00130fff fe275da8 00000000 fe26b328

fe26b328 00140000 0023ffff fe22a428 00000000 fe210fc8

fe210fc8 00240000 0024ffff fe26b328 00000000 fe21a8c8

Page 54: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

fe21a8c8 00250000 00258fff fe210fc8 00000000 fe21be68

fe21be68 00260000 0026dfff fe21a8c8 00000000 fe215dc8

fe215dc8 00270000 002b0fff fe21be68 00000000 fe231e88

fe231e88 002c0000 002c0fff fe215dc8 00000000 fe2449e8

fe2449e8 002d0000 002dffff fe231e88 00000000 fe21cb48

fe21cb48 002e0000 002e0fff fe2449e8 00000000 fe23b7a8

fe23b7a8 002f0000 002fffff fe21cb48 00000000 00000000

fe21a9c8 00400000 0040cfff 0 fe216b08 fe23c488

fe21b3e8 10000000 1000dfff fe2333e8 00000000 fe226348

fe2176c8 77e20000 77e4bfff fe226348 00000000 fe2326e8

fe2152c8 77e50000 77e54fff fe2326e8 00000000 00000000

fe2326e8 77e60000 77e9bfff fe2176c8 fe2152c8 00000000

fe226348 77ea0000 77ed7fff fe21b3e8 fe2176c8 fe2197c8

fe2197c8 77ee0000 77f12fff fe226348 00000000 00000000

fe2333e8 77f20000 77f73fff fe23c488 fe21b3e8 00000000

fe23c488 77f80000 77fcdfff fe21a9c8 fe2333e8 fe25aa88

fe22b408 7f2d0000 7f5cffff fe25aa88 00000000 fe22c4a8

fe22c4a8 7f5f0000 7f7effff fe22b408 00000000 fe23f5e8

fe23f5e8 7ff70000 7ffaffff fe22c4a8 00000000 00000000

fe25aa88 7ffb0000 7ffd3fff fe23c488 fe22b408 fe218288

fe21da88 7ffde000 7ffdefff fe218288 00000000 00000000

fe218288 7ffdf000 7ffdffff fe25aa88 fe21da88 00000000

Allocating memory using VirtualAlloc

Memory allocated @300000

Dumping the Vad tree again...

VadRoot is located @fe21a9c8

Vad@ Starting Ending Parent LeftLink RightLink

fe216b08 00010000 00010fff fe21a9c8 00000000 fe25a0e8

fe25a0e8 00020000 00020fff fe216b08 00000000 fe275da8

fe275da8 00030000 0012ffff fe25a0e8 00000000 fe22a428

fe22a428 00130000 00130fff fe275da8 00000000 fe26b328

fe26b328 00140000 0023ffff fe22a428 00000000 fe210fc8

fe210fc8 00240000 0024ffff fe26b328 00000000 fe21a8c8

fe21a8c8 00250000 00258fff fe210fc8 00000000 fe21be68

fe21be68 00260000 0026dfff fe21a8c8 00000000 fe215dc8

fe215dc8 00270000 002b0fff fe21be68 00000000 fe231e88

fe231e88 002c0000 002c0fff fe215dc8 00000000 fe2449e8

fe2449e8 002d0000 002dffff fe231e88 00000000 fe21cb48

fe21cb48 002e0000 002e0fff fe2449e8 00000000 fe23b7a8

fe23b7a8 002f0000 002fffff fe21cb48 00000000 fe27b628 fe27b628 00300000 00300fff fe23b7a8 00000000 00000000

Page 55: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

fe21a9c8 00400000 0040cfff 0 fe216b08 fe23c488

fe21b3e8 10000000 1000dfff fe2333e8 00000000 fe226348

fe2176c8 77e20000 77e4bfff fe226348 00000000 fe2326e8

fe2152c8 77e50000 77e54fff fe2326e8 00000000 00000000

fe2326e8 77e60000 77e9bfff fe2176c8 fe2152c8 00000000

fe226348 77ea0000 77ed7fff fe21b3e8 fe2176c8 fe2197c8

fe2197c8 77ee0000 77f12fff fe226348 00000000 00000000

fe2333e8 77f20000 77f73fff fe23c488 fe21b3e8 00000000

fe23c488 77f80000 77fcdfff fe21a9c8 fe2333e8 fe25aa88

fe22b408 7f2d0000 7f5cffff fe25aa88 00000000 fe22c4a8

fe22c4a8 7f5f0000 7f7effff fe22b408 00000000 fe23f5e8

fe23f5e8 7ff70000 7ffaffff fe22c4a8 00000000 00000000

fe25aa88 7ffb0000 7ffd3fff fe23c488 fe22b408 fe218288

fe21da88 7ffde000 7ffdefff fe218288 00000000 00000000

fe218288 7ffdf000 7ffdffff fe25aa88 fe21da88 00000000

The output of the VADDUMP program does not really look like a tree. You have to trace

through the output to get the tree structure. The entry with a null parent link is the root of the

tree. Once you find the root, you can follow the child pointers. To follow a child pointer, search

the pointer in the first column, named Vad@, in the output. The Vad entry with the same Vad@

is the entry for the child that you are looking for. An all-zero entry for a left/right child pointer

indicates that there is no left/right subtree for the node. Figure 4-5 shows a partial tree

constructed from the output shown previously.

Figure 4-5: VAD tree IMPACT ON HOOKING

Page 56: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Now we’ll look at the impact of the memory management scheme explained in the last section

in the area of hooking DLL API calls. To hook a function from a DLL, you need to change the

first few bytes from the function code. As you saw earlier, the DLL code is shared by all

processes and is write protected so that a misbehaving process cannot affect other processes.

Does this mean that you cannot hook a function in Windows NT? The answer is, “Hooking is

possible under Windows NT, but you need to do a bit more work to comply with stability

requirements.” Windows NT provides a system call, VirtualProtect, that you can use to change

page attributes. Hence, hooking is now a two-step process: Change the attributes of the page

containing DLL code to read-write, and then change the code bytes.

Copy-on-Write

“Eureka!” you might say, “I violated Windows NT security. I wrote to a shared page used by

other processes also.” No! You did not do that. You changed only your copy of the DLL code.

The DLL code page was being shared while you did not write to the page. The moment you

wrote on that page, a separate copy of it was made, and the writes went to this copy. All other

processes are safely using the original copy of the page. This is how Windows NT protects

processes from each other while consuming as few resources as possible.

The VirtualProtect() function does not mark the page as read-write–it keeps the page as

read-only. Nevertheless, to distinguish this page from normal read-only pages, it is marked for

copy-on-write. Windows NT uses one of the available PTE bits for doing this. When this page

is written onto, because it is a read-only page, the processor raises a page fault exception. The

page fault handler makes a copy of the page and modifies the page table of the faulting

process accordingly. The new copy is marked as read-write so that the process can write to it.

Windows NT itself uses the copy-on-write mechanism for various purposes. The DLL data

pages are shared with the copy-on-write mark. Hence, whenever a process writes to a data

page, it gets a personal copy of it. Other processes keep sharing the original copy, thus

maximizing the sharing and improving memory usage.

A DLL may be loaded in memory at different linear address for different processes. The

memory references–for example, address for call instruction, address for a memory to register

move instruction, and so on–in the DLL need to be adjusted (patched) depending on the linear

address where the DLL gets loaded. This process is called as relocating the DLL. Obviously,

relocation has to be done separately for each process. While relocating, Windows NT marks

the DLL code pages as copy-on-write temporarily. Thus, only the pages requiring page

relocation are copied per process. Other pages that do not have memory references in them

are shared by all processes.

This is the reason Microsoft recommends that a DLL be given a preferred base address and be

loaded at that address. The binding of the DLL to a specific base address ensures that the DLL

need not be relocated if it is loaded at the specified base address. Hence, if all processes load

Page 57: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

the DLL at the preferred base address, all can share the same copy of DLL code.

The POSIX subsystem of Windows NT uses the copy-on-write mechanism to implement the

fork system call. The fork system call creates a new process as a child of a calling process.

The child process is a replica of the parent process, and it has the same state of code and data

pages as the parent. Since these are two different processes, the data pages should not be

shared by them. However, generally it is wasteful to make a copy of the parent’s data pages

because in most cases the child immediately invokes the exec system call. The exec system

call discards the current memory image of the process, loads a new executable module, and

starts executing the new executable module. To avoid copying the data pages, the fork system

call marks the data pages as copy-on-write. Hence, a data page is copied only if the parent or

the child writes to it.

Copy-on-write is an extremely important concept contributing to the efficiency of NT memory

management.

The following sample program demonstrates how copy -on-write works. By running two

instances of the program, you can see how the concepts described in this section work. The

application loads a DLL, which contains two functions and two data variables. One function

does not refer to the outside world, so no relocations are required for it. The other function

accesses one global variable, so it contains relocatable instructions or instructions that need

relocation. One data variable is put in a shared data section so it will be shared across multiple

instances of DLL. One variable is put in a default data section. The two functions are put in

separate code sections just to make them page aligned.

When you run the first instance of the application, the application loads and prints the physical

addresses of two functions and two data variables. After this, you run the second instance of

the same application. In the second instance, the application arranges to load the DLL at a

different base address than that of the first instance. Then it prints the physical addresses of

two functions and two data variables. Next, the application arranges to load the DLL at the

same base address as that of the first instance. In this case, all physical pages are seen to be

shared. Next, the application modifies the shared and nonshared variable and modifies the

first few bytes of one function, and it prints the physical addresses for two functions and two

variables again. We first discuss the code for this sample program and then describe how the

output from the sample program demonstrates memory sharing and the effects of the

copy-on-write mechanism.

Listing 4-8: SHOWPHYS.C

#include <windows.h>

#include <stdio.h>

#include "gate.h"

#include "getphys.h"

HANDLE hFileMapping;

Page 58: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

/* Imported function/variable addresses */

static void *NonRelocatableFunction = NULL;

static void *RelocatableFunction = NULL;

static void *SharedVariable = NULL;

static void *NonSharedVariable = NULL;

HINSTANCE hDllInstance;

The initial portion of the file contains the header inclusion and global variable definitions. The

program demonstrates the use of various page attributes, especially to implement the

copy-on-write mechanism. As described earlier, the program uses four different types of

memory sections. The pointers to the four different types of memory sections are defined as

global variables. The hDllInstance stores the instance of the instance handle of the DLL that

contains the different kind of memory sections used in this demonstration.

/* Loads MYDLL.DLL and initializes addresses of imported functions/variables from

MYDLL.DLL and locks the imported areas */

int LoadDllAndInitializeVirtualAddresses()

{

hDllInstance = LoadLibrary("MYDLL.DLL");

if (hDllInstance == NULL) {

printf("Unable to load MYDLL.DLL\n");

return -1;

}

printf("MYDLL.DLL loaded at base address = %x\n", hDllInstance);

NonRelocatableFunction

=GetProcAddress(GetModuleHandle("MYDLL"),"_NonRelocatableFunction@0");

RelocatableFunction

=GetProcAddress(GetModuleHandle("MYDLL"),"_RelocatableFunction@0");

SharedVariable = GetProcAddress(GetModuleHandle("MYDLL"),"SharedVariable");

NonSharedVariable

=GetProcAddress(GetModuleHandle("MYDLL"),"NonSharedVariable");

if((!NonRelocatableFunction) ||(!RelocatableFunction) ||(!SharedVariable) ||

(!NonSharedVariable)) {

printf("Unable to get the virtual addresses for imports from MYDLL.DLL\n");

FreeLibrary(hDllInstance);

HDllInstance = 0;

return -1;

Page 59: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

}

VirtualLock(NonRelocatableFunction, 1);

VirtualLock(RelocatableFunction, 1);

VirtualLock(SharedVariable, 1);

VirtualLock(NonSharedVariable, 1);

return 0;

}

The four different types of memory sections that we use for the demonstration reside in

MYDLL.DLL. The LoadDllAndInitializeVirtualAddresses() function loads MYDLL.DLL in the

calling process’s address space and initializes the global variables to point to different types of

memory sections in the DLL. The function uses the GetProcAddress() function to get hold of

pointers to the exported functions and variables in MYDLL.DLL. The function stores the

instance handle for MYDLL.DLL in a global variable so that the FreeDll() function can later use

it to unload the DLL. The function also locks the different memory sections so that the pages

are loaded in memory and the page table entries are valid. Generally, Windows NT does not

load the page table entries unless the virtual address is actually accessed. In other words, the

memory won’t be paged in unless accessed. Also, the system can page out the memory that is

not used for some time, again marking the page table entries as invalid. We use the

VirtualLock() function to ensure that the pages of interest are always loaded and the

corresponding page table entries remain valid.

/* Unlocks the imported areas and frees the MYDLL.DLL */

void FreeDll()

{

VirtualUnlock(NonRelocatableFunction, 1);

VirtualUnlock(RelocatableFunction, 1);

VirtualUnlock(SharedVariable, 1);

VirtualUnlock(NonSharedVariable, 1);

FreeLibrary(hDllInstance);

HDllInstance = 0;

NonRelocatableFunction = NULL;

RelocatableFunction = NULL;

SharedVariable = NULL;

NonSharedVariable = NULL;

}

The FreeDll() function uses the VirtualUnlock() function to unlock the memory locations locked

by the LoadDllAndInitializeVirtualAddresses() function. The function unloads MYDLL.DLL after

unlocking the memory locations from the DLL. As the DLL is unloaded, the global pointers to

the memory sections in the DLL become invalid. The function sets all these pointers to NULL

according to good programming practice.

/* Converts the page attributes in readable form */

char *GetPageAttributesString(unsigned int PageAttr)

Page 60: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

{

static char buffer[100];

strcpy(buffer, "");

strcat(buffer, (PageAttr&0x01)? "P ": "NP ");

strcat(buffer, (PageAttr&0x02)? "RW ": "R ");

strcat(buffer, (PageAttr&0x04)? "U ": "S ");

strcat(buffer, (PageAttr&0x40)? "D ": " ");

return buffer;

}

The GetPageAttributesString() function returns a string with characters showing the page

attributes given the page attribute flags. The LSB in the page attributes indicates whether the

page is present in memory or the page table entry is invalid. This information is printed as P or

NP, which stands for present or not present. Similarly, R or RW means a read-only or

read-write page; S or U means a supervisor-mode or a user-mode page; and D means a dirty

page. The various page attributes are represented by different bits in the PageAttr parameter

to this function. The function checks the bits and determines whether the page possesses the

particular attributes.

/* Displays virtual to physical address mapping */

int DisplayVirtualAndPhysicalAddresses()

{

DWORD pNonRelocatableFunction = 0;

DWORD pRelocatableFunction = 0;

DWORD pSharedVariable = 0;

DWORD pNonSharedVariable = 0;

DWORD aNonRelocatableFunction = 0;

DWORD aRelocatableFunction = 0;

DWORD aSharedVariable = 0;

DWORD aNonSharedVariable = 0;

printf("\nVirtual to Physical address mapping\n");

printf("\n------------------------------------\n");

printf("Variable/function Virtual Physical Page\n");

printf(" Address Address Attributes\n");

printf("--------------------------------------\n");

GetPhysicalAddressAndPageAttributes(NonRelocatableFunction,&pNonRelocatableF

unction, &aNonRelocatableFunction);

GetPhysicalAddressAndPageAttributes(RelocatableFunction,&pRelocatableFunctio

n, &aRelocatableFunction);

GetPhysicalAddressAndPageAttributes(SharedVariable,&pSharedVariable,

&aSharedVariable);

GetPhysicalAddressAndPageAttributes(NonSharedVariable,&pNonSharedVariable,

Page 61: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

&aNonSharedVariable);

printf("NonRelocatableFunction\t %8x\t %8x\t %s\n",NonRelocatableFunction,

pNonRelocatableFunction,GetPageAttributesString(aNonRelocatableFunction));

printf("RelocatableFunction\t %8x\t %8x\t %s\n",RelocatableFunction,

pRelocatableFunction,GetPageAttributesString(aRelocatableFunction));

printf("SharedVariable\t %8x\t %8x\t %s\n", SharedVariable, pSharedVariable,

GetPageAttributesString(aSharedVariable));

printf("NonSharedVariable\t %8x\t %8x\t %s\n",NonSharedVariable,

pNonSharedVariable,GetPageAttributesString(aNonSharedVariable));

printf("------------------------------------\n\n");

return 0;

}

The DisplayVirtualAndPhysicalAddresses() function is a utility function that displays the virtual

address, the physical address, and the page attributes for different memory sections. It uses

the global pointers to the different sections in MYDLL.DLL initialized by the

LoadDllAndInitializeVirtualAddresses() function. It uses the

GetPhysicalAddressAndPageAttributes() function to get hold of the physical page address and

the page attributes for the given virtual address. The first parameter to the

GetPhysicalAddressAndPageAttributes() function is the input virtual address. The function fills

in the physical address for the input virtual address in the memory location pointed to by the

second parameter and the page attributes in the location pointed to by the third parameter.

int FirstInstance()

{

printf("***This is the first instance of the showphys program***\n\n");

printf("Loading DLL MYDLL.DLL\n");

if (LoadDllAndInitializeVirtualAddresses()!=0) {

return -1;

}

DisplayVirtualAndPhysicalAddresses();

printf("Now Run another copy of showphys ...\n");

getchar();

FreeDll();

}

We want to demonstrate the sharing of memory sections by the DLL loaded by two different

processes. You need to run two instances of the demonstration program. The FirstInstance()

function is executed when you run the first instance of the program. The first instance loads the

DLL and prints the physical addresses and page attributes for the various memory sections in

the DLL. Then, the function asks you to run another instance of the program. Now there are

two processes that loaded MYDLL.DLL. You can compare the outputs from these two

Page 62: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

instances to check how the memory sections are shared. More on this when we explain the

output from this sample program.

int NonFirstInstance()

{

DWORD OldAttr;

HINSTANCE hJunk;

printf("***This is another instance of the showphys

program***\n\n");

printf("Loading DLL MYDLL.DLL at diffrent base address than

that of the first instance\n");

CopyFile("MYDLL.DLL", "JUNK.DLL", FALSE);

hJunk=LoadLibrary("JUNK.DLL");

if (hJunk==NULL) {

printf("Could not find JUNK.DLL\n");

return -1;

}

if (LoadDllAndInitializeVirtualAddresses()!=0) {

FreeLibrary(hJunk);

return -1;

}

FreeLibrary(hJunk);

DisplayVirtualAndPhysicalAddresses();

FreeDll();

printf("Loading DLL MYDLL.DLL at same base address as that of

the first instance\n");

if (LoadDllAndInitializeVirtualAddresses()!=0) {

return -1;

}

DisplayVirtualAndPhysicalAddresses();

printf("....Modifying the code bytes at the start of

NonRelocatableFunction\n");

VirtualProtect(NonRelocatableFunction, 1, PAGE_READWRITE,

&OldAttr);

*(unsigned char *)NonRelocatableFunction=0xE9;

printf("....Modifying the value of SharedVariable\n");

*(char *)SharedVariable=0x10;

printf("....Modifying the NonSharedVariable’s value\n\n");

*(char *)NonSharedVariable=0x10;

DisplayVirtualAndPhysicalAddresses();

Page 63: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

FreeDll();

return 0;

}

The second instance of the program does a lot more work than the first instance. The sharing

of the DLL memory sections depends on the way the instance loads the DLL and accesses the

memory locations in the DLL. In more concrete terms, the sharing depends on whether the

second instance loads the DLL at the same base address as the first instance. It also depends

on whether the instances only read the memory sections or any of the instances write to the

memory sections. To demonstrate this, the NonFirstInstance() function first loads the DLL at a

different base address than the first instance. The function ensures that the DLL is loaded at a

different base address by loading JUNK.DLL before loading MYDLL.DLL. JUNK.DLL has the

same preferred base address as that of MYDLL.DLL. The first instance loads MYDLL.DLL at

its preferred base address by default. In the second instance, MYDLL.DLL cannot be loaded at

its preferred base address because the address range is already occupied by JUNK.DLL. After

MYDLL.DLL is loaded at a different base address, there is no reason for the program to keep

JUNK.DLL loaded, and so it frees the JUNK.DLL instance. Next, the function prints the

physical addresses and page attributes of the memory sections in MYDLL.DLL using the

DisplayVirtualAndPhysicalAddresses() function. The information printed here can be

compared with the output of the first instance of the program to get an idea of how the DLLs

loaded at different base addresses share the memory sections.

The NonFirstInstance also demonstrates the sharing of memory sections by MYDLL.DLL

loaded at the same base address by two processes. It unloads MYDLL.DLL and loads it again.

This time MYDLL.DLL is loaded at its preferred bas e address because now that JUNK.DLL is

no more loaded, the virtual address space is not occupied by anything. Thus, MYDLL.DLL is

loaded at the same base address in both the first and the second instance of the program. The

physical addresses and the page attributes printed here demonstrate the memory sharing by

MYDLL.DLL when loaded at the same base address in two processes. Next, the

NonFirstInstance() function writes to some of the memory locations in MYDLL.DLL. As we

explain soon, this action affects the memory sharing between the instances. As described

earlier, the code sections are marked read-only by Windows NT. The function uses the

VirtualProtect() API function to change the attributes of the NonRelocatableFunction() so that it

can modify a few bytes at the start of this function. You can modify the data variables from

MYDLL.DLL without any such hassle because the data variables have the read-write attribute.

int DecideTheInstanceAndAct()

{

hFileMapping = CreateFileMapping(

(HANDLE)0xFFFFFFFF,

NULL,

PAGE_READWRITE,

0,

0x1000,

"MyFileMapping");

Page 64: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

if (hFileMapping == NULL) {

printf("Unable to create file mapping\n");

FreeDll();

return -1;

}

if (GetLastError() == ERROR_ALREADY_EXISTS) {

NonFirstInstance();

} else {

FirstInstance();

}

}

The sample program does not accept any parameter to indicate whether it’s the first instance.

It uses a simple trick to decide it: It creates a named file mapping. The call to the

CreateFileMapping() API function sets the last error to ERROR_ALREADY_EXISTS if a

mapping with the same name already exists. This indicates that an instance that created the

file mapping is already running. In other words, if the program can successfully create the

named file mapping, it’s the first instance of the program. Otherwise, another instance (that is,

the first instance) of the program is already running and the current instance is the second

instance. Depending on whether it’s the first instance, the DecideTheInstanceAndAct()

function calls the NonFirstInstance() function or the FirstInstance() function. A file mapping is

automatically destroyed by the operating system when the reference count drops to zero. The

sample program does not explicitly close the handle to the mapping. The handle is closed and

the reference count for the memory mapping is decremented when the program exits. The

mapping is freed up when the last instance of the program exits.

main()

{

int rc;

/* Creates callgate to get PTE entries from ring 3 application */

if ((rc = CreateRing0CallGate()) != SUCCESS) {

printf("Unable to create callgate, rc=%x\n",rc);

return -1;

}

DecideTheInstanceAndDoTheThings();

/* Releases the callgate */

FreeRing0CallGate();

}

The main() function starts by a call to the CreateRing0CallGate() function that is located in the

GETPHYS.C file. The sample program uses the callgate mechanism to access the page tables.

As described earlier, the page tables reside in the kernel memory and are not accessible to the

user-mode code. The CreateRing0CallGate() function sets up a function that reads in the page

tables to be executed in kernel mode. The DisplayVirtualAndPhysicalAddresses() function

Page 65: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

later uses this function to get hold of the physical address and the page attributes for a given

virtual address. After creating the callgate, the main function passes control to the

DecideTheInstanceAndDoTheThings() function. The callgate is freed up by the program

before exiting.

Listing 4-9: GETPHYS.C

#include <windows.h>

#include <stdio.h>

#include "..\cgate\dll\gate.h"

static short CallGateSelector;

The GETPHYS.C file implements the function to access the page table using the callgate

mechanism. The GATE.H file is included because it contains the prototypes for functions that

deal with the callgate manipulation. The segment selector of the callgate used by the program

is stored in the global variable, CallGateSelector.

/* C function called from assembly langauage stub */

BOOL _stdcall

CFuncGetPhysicalAddressAndPageAttributes(

unsigned int VirtualAddress,

unsigned int *PhysicalAddress,

unsigned int *PageAttributes)

{

unsigned int *PageTableEntry;

*PhysicalAddress = 0;

*PageAttributes = 0;

PageTableEntry = (unsigned int *)0xC0000000U +(VirtualAddress > 0x0CU);

if ((*PageTableEntry)&0x01) {

*PhysicalAddress =((*PageTableEntry)&0xFFFFF000U)

+(VirtualAddress&0x00000FFFU);

*PageAttributes = (*PageTableEntry)&0x00000FFFU;

return TRUE;

} else {

return FALSE;

}

}

The CfuncGetPhysicalAddressAndPageAttributes() function executes in kernel mode using

the callgate mechanism. The function depends on the fact that page tables for a process are

always mapped at the virtual address 0xC0000000. It’s an array of 1024 page tables where

each page table is an array of 1024 page table entries. You can access the memory area as if

it were a single contiguous array of page table entries. The first entry in this big array

Page 66: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

corresponds to a virtual address in the range 0 ?4096, the second entry corresponds to virtual

address range 4096 ?8192, and so on. The function calculates the index in the big PTE array

by dividing the given virtual address by 4096–that is, by shifting the virtual address by 12 bits.

Adding the index in the base address of the PTE array gives us the required PTE. Each PTE is

4 bytes (32 bits) long. Out of these 32 bits, the upper 20 bits in the PTE denote the address of

the physical page, and the lower 12 bits denote the page attributes. The physical address and

the page attributes are valid only if the LSB is set. The function checks the LSB and if the bit is

set, it separates out the physical page address and the page attributes by masking off

appropriate bits from the PTE. The function adds the offset within the page to the physical

page address to get the physical address for the given virtual address.

BOOL GetPhysicalAddressAndPageAttributes(

void *VirtualAddress,

unsigned int *PhysicalAddress,

unsigned int *PageAttributes)

{

BOOL rc;

static short farcall[3];

if (!CallGateSelector) {

return FALSE;

}

farcall[2] = CallGateSelector;

_asm {

mov eax, PageAttributes

mov ecx, PhysicalAddress

mov edx, VirtualAddress

call fword ptr [farcall]

mov rc, eax

}

return rc;

}

The GetPhysicalAddressAndPageAttributes() function runs in user mode and invokes the

CfuncGetPhysicalAddressAndPageAttributes() function in kernel mode using the callgate

mechanism. It uses the callgate initialized by the call to the CreateRing0CallGate() function.

The parameters to the kernel-mode function are passed through the processor registers. An

intermediate Assembly language function, namely, GetPhysicalAddressAndPageAttributes(),

converts the register parameters to stack parameters.

int CreateRing0CallGate()

{

DWORD rc;

rc = CreateCallGate(

_GetPhysicalAddressAndPageAttributes,

Page 67: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

0,

&CallGateSelector);

return rc;

}

The CreateRing0CallGate() function is a utility function that uses the CreateCallGate() function

provided by GATE.DLL to create a callgate to execute the

GetPhysicalAddressAndPageAttributes() function in kernel mode. It stores the segment

selector of the created callgate in the CallGateSelector global variable, which is used later by

the GetPhysicalAddressAndPageAttributes() function while invoking the kernel-mode function.

int FreeRing0CallGate()

{

DWORD rc;

rc = FreeCallGate(CallGateSelector);

if (rc == SUCCESS) {

CallGateSelector = 0;

}

return rc;

}

The FreeRing0CallGate() function is another utility function that destroys the callgate created

by the CreateCallGate() function. It uses the FreeCallGate() interface function provided by

GATE.DLL.

Listing 4-10: RING0.ASM

.386

.model small

.code

public __GetPhysicalAddressAndPageAttributes

extrn _CFuncGetPhysicalAddressAndPageAttributes@12:near

include ..\include\undocnt.inc

__GetPhysicalAddressAndPageAttributes proc

Ring0Prolog

push eax

push ecx

push edx

call _CFuncGetPhysicalAddressAndPageAttributes@12

Ring0Epilog

retf

__GetPhysicalAddressAndPageAttributes endp

END

Page 68: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The GetPhysicalAddressAndPageAttributes() function gets control through the callgate. The

function executes the Ring0Prolog macro just after entering the function to enable paging in

kernel mode. It converts the register parameters to stack parameters because

CfuncGetPhysicalAddressAndPageAttributes() is a C function that expects the parameters on

stack.

Listing 4-11 presents the output from the previous sample program. Note the differences

between the physical addresses and page attributes printed by the first instance and the

second instance. See if you can explain the output and match your findings with our

description that comes after this output.

Here are two instances of the showphys program.

Listing 4-11: showphys program

Loading DLL MYDLL.DLL

MYDLL.DLL loaded at base address = 20000000 Virtual address to Physical address

mapping

--------------------------------------------------------------

Variable/function VirtualPhysical Page

AddressAddressAttributes

--------------------------------------------------------------

NonRelocatableFunction 20001000 d8b000 P R U

RelocatableFunction 20002000 d8a000 P R U

SharedVariable 2000c000 e44000 P RW U

NonSharedVariable 2000b000 6b7000 P R U

--------------------------------------------------------------

Now Run another copy of showphys ...

This is another instance of the showphys program:

Loading DLL MYDLL.DLL at diffrent base address than that of the

first instance

MYDLL.DLL loaded at base address = 7e0000 Virtual address to Physical address

mapping

--------------------------------------------------------------

Variable/function VirtualPhysical Page

AddressAddressAttributes

--------------------------------------------------------------

NonRelocatableFunction 7e1000 d8b000 P R U

RelocatableFunction 7e2000 1d6c000P R U

SharedVariable 7ec000 e44000 P RW U

Page 69: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

NonSharedVariable 7eb000 6b7000 P R U

--------------------------------------------------------------

Loading DLL MYDLL.DLL at same base address as that of the first instance

MYDLL.DLL loaded at base address = 20000000 Virtual address to Physical address

mapping

----------------------------------------------------------------

Variable/function VirtualPhysical Page

AddressAddressAttributes

----------------------------------------------------------------

NonRelocatableFunction 20001000 d8b000 P R U

RelocatableFunction 20002000 d8a000 P R U

SharedVariable 2000c000 e44000 P RW U

NonSharedVariable 2000b000 6b7000 P R U

----------------------------------------------------------------

....Modifying the code bytes at the start of NonRelocatableFunction

....Modifying the value of SharedVariable

....Modifying the NonSharedVariable’s value

Virtual address to Physical address mapping

------------------------------------------------------------------

Variable/function VirtualPhysical Page

AddressAddressAttributes

------------------------------------------------------------------

NonRelocatableFunction 20001000 87e000 P RW U D

RelocatableFunction 20002000 d8a000 P R U

SharedVariable 2000c000 e44000 P RW U D

NonSharedVariable 2000b000 1ceb000P RW U D

------------------------------------------------------------------

Note the page attributes from the output of the first instance. The functions are marked

read-only, as expected. The unshared variable is also marked read-only. This is because

Windows NT tries to share the data space also. As described earlier, such pages are marked

for copy-on-write, and as soon as the process modifies any location in the page, the process

gets a private copy of the page to write to. The other page attributes show that the PTE is valid,

the page is a user-mode page, and nobody has modified the page so far.

Now, compare the output from the first instance with the output from the second instance when

it loaded the MYDLL.DLL at a base address different from that in the first instance. As

expected, the virtual addresses of all the memory sections are different than those for the first

instance. The physical addresses are the same except for the physical address of the

relocatable function. This demonstrates that the code pages are marked as copy-on-write, and

Page 70: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

when the loader modifies the code pages while performing relocation, the process gets a

private writable copy. Our nonrelocatable function does not need any relocation; hence, the

corresponding pages are not modified. The second instance can share these pages with the

first instance and hence has the same physical page address.

To cancel out the effects of relocation, the second instance loads MYDLL.DLL at the same

base address as that in the first instance. Yup! Now, the virtual address matches the ones from

the first instance. Note that the physical address for the relocatable function also matches that

in the output from the first instance. The loader need not relocate the function because the DLL

is loaded at the preferred base address. This allows more memory sharing and provides

optimal performance. It’s reason enough to allocate proper, nonclashing preferred base

addresses for your DLLs.

This ideal share-all situation ceases to exist as soon as a process modifies some memory

location. Other processes cannot be allowed to view these modifications. Hence, the modifying

process gets its own copy of the page The second instance of the sample program

demonstrates this by modifying the data variables and a byte at the start of the nonrelocatable

function. The output shows that the physical address of the nonrelocatable doesn’t match with

the first instance. The nonrelocatable function is not modified by the loader, but it had the same

effect on sharing when we modified the function. The shared variable remains a shared

variable. Its physical address matches that in the first instance because all the processes

accessing a shared variable are allowed to see the modifications made by other processes.

But the nonshared variable has a different physical address now. The second instance cannot

share the variable with the first instance and gets its own copy. The copy was created by the

system page fault handler when we tried to write to a read-only page and the page was also

marked for copy-on-write. Note that the page is now marked read-write. Hence, further writes

go through without the operating system getting any page faults. Also, note that the modified

pages are marked as dirty by the processor.

SWITCHING CONTEXT

As we saw earlier, Windows NT can switch the memory context to another process by setting

the appropriate page table directory. The 80386 processor requires that the pointer to the

current page table directory be maintained in the CR3 register. Therefore, when the Windows

NT scheduler wants to perform a context switch to another process, it simply sets the CR3

register to the page table directory of the concerned process.

Windows NT needs to change only the memory context for some API calls such as

VirtualAllocEx(). The VirtualAllocEx() API call allocates memory in the memory space of a

process other than the calling process. Other system calls that require memory context switch

are ReadProcessMemory() and WriteProcessMemory(). The ReadProcessMemory() and

WriteProcessMemory() system calls read and write, respectively, memory blocks from and to a

process other than the calling process. These functions are used by debuggers to access the

Page 71: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

memory of the process being debugged. The subsystem server processes also use these

functions to access the client process’s memory. The undocumented KeAttchProcess()

function from the NTOSKRNL module switches the memory context to specified process. The

undocumented KeDetachProcess() function switches it back. In addition to switching memory

context, it also serves as a notion of current process. For example, if you attach to a particular

process and create a mutex, it will be created in the context of that process. The prototypes for

KeAttachProcess() and KeDetachProcess() are as follows:

NTSTATUS KeAttachProcess(PEB *);

NTSTATUS KeDetachProcess ();

Another place where a call to the KeAttachProcess() function appears is the NtCreateProcess()

system call. This system call is executed in the context of the parent process. As a part of this

system call, Windows NT needs to map the system DLL (NTDLL.DLL) in the child process’s

address space. Windows NT achieves this by calling KeAttachProcess() to switch the memory

context to the child process. After mapping the DLL, Windows NT switches back to the parent

process’s memory context by calling the KeDetachProcess() function.

The following sample demonstrates how you can use the KeAttachProcess() and

KeDetachProcess() functions. The sample prints the page directories for all the processes

running in the system. The complete source code is not included. Only the relevant portion of

the code is given. Because these functions can be called only from a device driver, we have

written a device driver and provided an IOCTL that demonstrates the use of this function. We

are giving the function that is called in response to DeviceIoControl from the application. Also,

the output of the program is shown in kernel mode debugger’s window (such as SoftICE).

Getting the information back to the application is left as an exercise for the reader.

void DisplayPageDirectory(void *Peb)

{

unsigned int *PageDirectory =

(unsigned int *)0xC0300000;

int i;

int ctr=0;

KeAttachProcess(Peb);

for (i = 0; i < 1024; i++) {

if (PageDirectory[i]&0x01) {

if ((ctr%8) == 0)

DbgPrint(" \n");

DbgPrint("%08x ", PageDirectory[i]&0xFFFFF000);

ctr++;

}

}

DbgPrint("\n\n");

KeDetachProcess();

Page 72: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

}

The DisplayPageDirectory() function accepts the PEB for the process whose page directory is

to be printed. The function first calls the KeAttachProcess() function with the given PEB as the

parameter. This switches the page directory to the desired one. Still, the function can access

the local variables because the kernel address space is shared by all the processes. Now the

address space is switched, and the 0xC030000 address points to the page directory to be

printed. The function prints the 1024 entries from the page directory and then switches back to

the original address space using the KeDetachProcess() function.

void DisplayPageDirectoryForAllProcesses()

{

PLIST_ENTRY ProcessListHead, ProcessListPtr;

ULONG BuildNumber;

ULONG ListEntryOffset;

ULONG NameOffset;

BuildNumber=NtBuildNumber & 0x0000FFFF;

if ((BuildNumber==0x421) || (BuildNumber==0x565)) { // NT 3.51 or NT 4.0

ListEntryOffset=0x98;

NameOffset=0x1DC;

} else if (BuildNumber==0x755) {// Windows 2000 beta2

ListEntryOffset=0xA0;

NameOffset=0x1FC;

} else {

DbgPrint("Unsupported NT Version\n");

return;

}

ProcessListHead=ProcessListPtr=(PLIST_ENTRY)(((char

*)PsInitialSystemProcess)+ListEntryOffset);

while (ProcessListPtr->Flink!=ProcessListHead) {

void *Peb;

char ProcessName[16];

Peb=(void *)(((char *)ProcessListPtr)-ListEntryOffset);

memset(ProcessName, 0, sizeof(ProcessName));

memcpy(ProcessName, ((char *)Peb)+NameOffset, 16);

DbgPrint("**%s Peb @%x** ", ProcessName, Peb);

DisplayPageDirectory(Peb);

ProcessListPtr=ProcessListPtr->Flink;

}

}

The DisplayPageDirectoryForAllProcesses() function calls the DisplayPageDirectory() function

for each process in the system. All the processes running in a system are linked in a list. The

Page 73: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

function gets hold of the list of the processes from the PEB of the initial system process. The

PsInitialSystemProcess variable in NTOSKRNL holds the PEB for the initial system process.

The process list node is located at an offset of 0x98 (0xA0 for Windows NT 5.0) inside the PEB.

The process list is a circular linked list. Once you get hold of any node in the list, you can

traverse the entire list. The DisplayPageDirectoryForAllProcesses() function completes a

traversal through the processes list by following the Flink member, printing the page directory

for the next PEB in the list every time until it reaches back to the PEB it started with. For every

process, the function first prints the process name that is stored at a version-dependent offset

within the PEB and then calls the DisplayPageDirectory() function to print the page directory.

Here, we list partial output from the sample program. Please note a couple of things in the

following output. First, every page directory has 50-odd valid entries while the page directory

size is 1024. The remaining entries are invalid, meaning that the corresponding page tables

are either not used or are swapped out. In other words, the main memory overhead of storing

page tables is negligible because the page tables themselves can be swapped out. Also, note

that the page directories have the same entries in the later portion of the page directory. This is

because this part represents the kernel portion shared across all processes by using the same

set of page tables for the kernel address range.

Listing 4-12: Displaying page directories: output

**System Peb @fdf06b60**

00500000 008cf000 008ce000 00032000 00034000 00035000 ... ... ...

00040000 00041000 00042000 00043000 00044000 00045000 ... ... ...

00048000 00049000 0004a000 0004b000 0004c000 0004d000 ... ... ...

00050000 00051000 00052000 00053000 00054000 00055000 ... ... ...

00058000 00059000 0005a000 0005b000 0005c000 0005d000 ... ... ...

00020000 00021000 00023000 0040b000 0040c000 0040d000 ... ... ...

00410000 00411000 00412000 00413000 00414000 00415000 ... ... ...

**smss.exe Peb @fe2862e0**

00032000 00034000 00035000 00033000 00e90000 00691000 ... ... ...

00043000 00044000 00045000 00046000 00047000 00048000 ... ... ...

0004b000 0004c000 0004d000 0004e000 0004f000 00050000 ... ... ...

00053000 00054000 00055000 00056000 00057000 00058000 ... ... ...

0005b000 0005c000 0005d000 0005e000 0005f000 00020000 ... ... ...

0040b000 0040c000 0040d000 0040e000 0040f000 00410000 ... ... ...

00413000 00414000 00415000 00416000 00031000

... ... ...

**winlogon.exe Peb @fe27dde0**

00032000 00034000 00035000 00033000 00be1000 00953000 ... ... ...

00043000 00044000 00045000 00046000 00047000 00048000 ... ... ...

0004b000 0004c000 0004d000 0004e000 0004f000 00050000 ... ... ...

Page 74: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

00053000 00054000 00055000 00056000 00057000 00058000 ... ... ...

0005b000 0005c000 0005d000 0005e000 0005f000 00020000 ... ... ...

0040b000 0040c000 0040d000 0040e000 0040f000 00410000 ... ... ...

00413000 00414000 00415000 00416000 00031000

... ... ...

DIFFERENCES BETWEEN WINDOWS NT AND WINDOWS 95/98

Generally, the memory management features offered by Windows 95/98 are the same as

those in Windows NT. Windows 95/98 also offers 32-bit flat separate address space for each

process. Features such as shared memory are still available. However, there are some

differences. These differences are due to the fact that Windows 95/98 is not as secure as

Windows NT. Many times, Windows 95/98 trades off security for performance reasons.

Windows 95/98 still has the concept of user-mode and kernel-mode code. The bottom 3GB is

user-mode space, and the top 1GB is kernel-mode space. But the 3GB user-mode space can

be further divided into shared space and private space for Windows 95/98. The 2GB to 3GB

region is the shared address space for Windows 95/98 proc esses. For all processes, the page

tables for this shared region point to the same set of physical pages.

All the shared DLLs are loaded in the shared region. All the system DLLs–for example,

KERNEL32.DLL and USER32.DLL–are shared DLLs. Also, a DLL’s code/data segment can be

declared shared while compiling the DLL, and the DLL will get loaded in the shared region.

The shared memory blocks are also allocated space in the shared region. In Windows 95/98,

once a process maps a shared section, the section is visible to all processes. Because this

section is mapped in shared region, other processes need not map it separately.

There are advantages as well as disadvantages of having such a shared region. Windows

95/98 need not map the system DLLs separately for each process; the corresponding entries

of page table directory can be simply copied for each process. Also, the system DLLs loaded in

shared region can maintain global data about all the processes and separate subsystem

processes are not required. Also, most system calls turn out to be simple function calls to the

system DLLs, and as a result are very fast. In Windows NT, most system calls either cause a

context switch to kernel mode or a context switch to the subsystem process, both of which are

costly operations. For developers, loading system DLLs in a shared region means that they

can now put global hooks for functions in system DLLs.

For all these advantages, Windows 95/98 pays with security features. In Windows 95/98, any

process can access all the shared data even if it has not mapped it. It can also corrupt the

system DLLs and affect all processes.

SUMMARY

In this chapter, we discussed the memory management of Windows NT from three different

Page 75: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

perspectives. Memory management offers programmers a 32-bit flat address space for every

process. A process cannot access another process’s memory or tamper with it, but two

processes can share memory if they need to. Windows NT builds its memory management on

top of the memory management facilities provided by the microprocessor. The 386 (and above)

family of Intel microprocessors provides support for segmentation plus paging. The address

translation mechanism first calculates the virtual address from the segment descriptor and the

specified offset within the segment. The virtual address is then converted to a physical address

using the page tables. The operating system can restrict access to certain memory regions by

using the security mechanisms that are provided both at the segment level and the page level.

Windows NT memory management provides the programmer with flat address space, data

sharing, and so forth by selectively using the memory management features of the

microprocessor. The virtual memory manager takes care of the paging and allows 4GB of

virtual address space for each process, even when the entire system has much less physical

memory at its disposal. The virtual memory manager keeps track of all the physical pages in

the system through the page frame database (PFD). The system also keeps track of the virtual

address space for each process using the virtual address descriptor (VAD) tree. Windows NT

uses the copy-on-write mechanism for various purposes, especially for sharing the DLL data

pages. The memory manager has an important part in switching the processor context when a

process is scheduled for execution. Windows 95/98 memory management is similar to

Windows NT memory management with the differences being due to the fact that Windows

95/98 is not as security conscious as Windows NT.

Page 76: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 5

Reverse Engineering Techniques

Administrator
Reverse Engineering Techniques
Administrator
Chapter 5
Page 77: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter teaches you how to reverse engineer Windows NT given the raw Assembly code

and the useful symbolic information provided by Microsoft in the form of .DBG files. With this

knowledge, you can explore on your own the undocumented Windows NT world.

THIS CHAPTER DIFFERS greatly from other chapters in the book. It does not contain any

undocumented Windows NT information. Instead, it provides some general tips regarding how

to reverse engineer on your own to explore the undocumented Windows NT world.

This chapter teaches you how to reverse engineer Windows NT given the raw Assembly code

and the useful symbolic information provided by Microsoft in the form of .DBG files. You can

access these .DBG files on the Windows NT distribution CD-ROM. This chapter does not

provide a complete guide to reverse engineering for the simple reason that you cannot clearly

define a way of approaching this problem. Reverse engineering is like panning for gold; you

have to sift through tons of Assembly code to find a little information. But this chapter contains

some useful tricks we have used to come up with undocumented Windows NT. Reverse

engineering is an art, and it requires a lot of intuition, patience, and logical deduction.

We divided this chapter into different sections with each section describing a step in reverse

engineering. We conclude the chapter by illustrating reverse engineering of a sample

undocumented function. The best tool for implementing reverse engineering is NuMega抯

excellent SoftICE. This book would not have been possible without SoftICE. This chapter

assumes that the reader has used debuggers. We recommend trying out SoftICE to get the

most out of this chapter. Although the concepts explained here specifically apply to reverse

engineering NTOSKRNL (NT Executive image) using SoftICE, these concepts can apply to

reverse engineering any piece of operating system code.

HOW TO PREPARE FOR REVERSE ENGINEERING

First, install SoftICE on your machine with “Boot time” as the option. Now copy the .DBG files

in the SUPPORT directory on the Windows NT CD-ROM. There are many .DBG files in this

directory categorized according to the type of the file (for example, .DLL, SYS, or .EXE ).

The .DBG files you will require depend upon the Windows NT component you want to explore.

XREF: See the NuMega Web site at http://www.numega.com/ for up-to-date version information

on SoftICE.

You need the following .DBG files to explore the KERNEL component:

KERNEL32.DBG

NTDLL.DBG

Page 78: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

NTOSKRNL.DBG

You need the following .DBG files to explore the USER and GDI components:

USER32.DBG

GDI32.DBG

CSRSS.DBG

CSRSRV.DBG

WIN32K.DBG

Copy these .DBG files onto your hard drive, and then, using the symbol loader, convert .DBG

files into .NMS (the native symbol format of SoftICE). Then, add these files to SoftICE 抯

initialization settings using the SoftICE Initialize Settings/Symbols option in the symbol loader.

This ensures that the symbols get loaded when SoftICE loads. Now, reboot the machine.

SoftICE now contains the symbolic information rather than the hex addresses, making the

Assembly code look more readable. The Windows 2000 symbolic information comes in .DBG

and .PDB files instead of just .DBG files. One needs to have MSPDB60.DLL file from Visual

C++ to covert these files into native symbol format of SoftICE (.NMS)

HOW TO REVERSE ENGINEER

Because most of the Windows NT components are written in C, you must understand how the

C compiler generates the Assembly code that corresponds to a C function. You must also

understand how a compiler generates the code to call a particular function, how the

parameters are passed, how compiler implements local variables, and so on. Compilers follow

different function calling conventions. We will not get into the details of each and every

compiler calling convention. Instead, we will cover only the stdcall and fastcall calling

conventions because most of the functions in Windows NT follow either of these calling

conventions. The NTOSKRNL.EXE contains a lot of functions with the fastcall calling

convention, the fastest of all the calling conventions.

In stdcall calling conventions, the parameters are pushed by the caller from right to left, and

the parameters pop off the stack by the called function. The advantage of using the stdcall

calling convention is that it generates compact code because the code for popping the

parameters off the stack resides in only one place (in the function itself). The disadvantage is

that since a fixed number of parameters always pop off in the function, this calling convention

cannot support a variable number of arguments. To have a variable number of arguments, you

must follow the cdecl calling convention.

The fastcall calling convention resembles stdcall, except its first two parameters are passed in

registers instead of on a stack. This results in faster code because the register access proves

much faster than memory access.

Page 79: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Let us take one sample C function following the stdcall calling convention and see the

corresponding Assembly code generated by the compiler. In this example, we will also see

how compiler-generated Assembly code accesses parameters passed to the function, and

how local variables are implemented. The concepts explained here form the basis for reverse

engineering discussed later in this chapter.

Listing 5-1: C function

int _stdcall sum(int x, int y, int z)

{

int sum;

sum=x+y+z;

return sum;

}

main()

{

sum(10, 20, 30);

}

Listing 5-2: Compiler-generated Assembly code for the C function in Listing 5-1

;sum

PUSH EBP

MOV EBP,ESP

SUB ESP,04

PUSH EBX

PUSH ESI

PUSH EDI

MOV EAX,[EBP+10]

ADD EAX,[EBP+0C]

ADD EAX,[EBP+08]

MOV [EBP-04],EAX

MOV EAX,[EBP-04]

POP EDI

POP ESI

POP EBX

LEAVE

RET 000C

;main

PUSH 30

PUSH 20

PUSH 10

CALL _sum@12

If you take a look at the Assembly code, the compiler generates the code to set the EBP

register to the start of the stack frame. (The stack frame for the function starts from EBP+8

since the compiler pushes the EBP register to maintain the stack frame set up by the caller

Page 80: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

function.) Hence, the parameters passed to the function start at EBP+8. Therefore, the first

parameter x is accessed as [EBP+8] by the generated Assembly code. The parameters y and

z are accessed as [EBP+C] and [EBP+10]. For implementing local variables, compilers

typically generate code, which decrements the ESP register by the total number of bytes

required to hold all the local variables defined in the function. In the previous code, there is

only one local variable sum; therefore, the compiler allocates space for 4 bytes (1 DWORD) on

the stack by generating the instruction SUB ESP, 4. The EBP register accesses all such local

variables as negative offsets. The variable sum is accessed as [EBP-4] in the code. The

LEAVE instruction used in the end restores the contents of EBP register and cleans up the

local variables.

Let us demonstrate the preceding mechanism in tables.

When the function sum is called, the stack frame looks like:

30 fl Last parameter

20 fl Second parameter

10 fl First parameter

Return address

(Address of the

instruction

following the call

_sum@12

instruction)

fl ESP

After setting up the standard stack frame of PUSH EBP, MOV EBP, ESP and creating space for

local variables, the stack looks like:

30 fl Last parameter(EBP+10)

20 fl Second parameter(EBP+C)

10 fl First parameter(EBP+8)

Return address

(Address of the

instruction

following the call

_sum@12

instruction)

Original contents

of EBP register

fl EBP

Page 81: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

holding the stack

frame base for

function main

Local variable

(sum)

ESP(EBP-4)

Most of the functions in the NTOSKRNL access the parameters and local variables in the

same way (by setting up the frame using EBP registers and accessing the local variables

using the negative offsets from the EBP register). But a few functions do not set up this

standard stack frame; instead, the parameters are accessed directly using the ESP register

(such as ESP+8). In this case, reverse engineering becomes very difficult because the same

parameter is accessed using different offsets from the ESP register at different places. The

advantage is that it results in faster and more compact code.

UNDERSTANDING CODE GENERATION PATTERNS Because compilers are themselves software programs, they follow a certain pattern when

generating the Assembly code.

LEA EDI, [EBP-24]

MOV ECX, 6

REPSZ STOSD

This piece of code initializes the memory of 6 DWORD size (0x18 bytes), which starts at

location EBP-24. This also suggests that probably some structures of size 0x18 bytes is locally

defined and initialized in the function.

MOV EAX, [EBP+18]

TEST EAX, 00008000

JZ BitNotSet

..

..

BitNotSet:

This piece of code tests the fifteenth bit of the fifth parameter passed to the function, assuming

standard stack frame is generated for the function and does the processing based on the bit

test results.

MOV EAX, [FS:124]

This statement fills in the EAX register with a pointer to the current thread object. Note that the

FS register points to a Processor Control Region (PCR) in kernel mode.

MOV EAX, [FS:124]

Page 82: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

MOV EAX, [EAX+40]

This piece of code fills in the EAX register with a pointer to the current process object under

Windows NT 3.51. Under Windows NT 4.0 and Windows 2000, this instruction looks like MOV

EAX, [EAX+44], since the offset of pointer to process object is changed from 3.51 to 4.0.

HOW WINDOWS NT PROVIDES DEBUGGING INFORMATION Various kernel data variables control the output of debug messages. By turning on a few bits in

these variables, you can get more debugging messages from the operating system apart from

the messages given by default. Some of these bits are already turned on in checked builds of

the operating system, although some of them are not. We strongly feel that Microsoft itself

likely turns on bits of these variables whenever they get any bug information and they want to

figure out the problem. But Microsoft probably turns these bits off when they get the release

out. By doing this, Microsoft hides a wealth of information from operating system reverse

engineering. We expose a part of this wealth here. There could be many other such flags.

Pieces of hidden debug messages code inside NTOSKRNL appear like this:

TEST [DebugVariable], 0x80

JZ HideFromReverseEngineering

PUSH ..

PUSH ..

PUSH ..

CALL DbgPrint

HideFromReverseEngineering:

Whenever you come across such a piece of code, just set the required bit from SoftICE, and

you will see all those messages that are hidden.

Here are some of the known variables in NTOSKRNL and the debug messages shown by the

operating system when these variables or bits of these variables are turned on. Most of the

variables appear only in the checked builds of the operating system.

ExpEchoPoolCalls

By setting this variable to 1, you can get the information about each memory

allocation/deallocation performed using functions such as ExAllocatePoolWithTag and

ExFreePool. The information shown includes the address where the memory was allocated,

size of the region allocated, type of the pool used (paged/nonpaged), and type of memory

(cache, aligned, and so on). The information displays as follows:

"0xe1354668 EXALLOC: from Paged size 284 Callers:0, 0

0xe1354668 EXDEALLOC: from Paged Callers:0, 0"

ObpShowAllocAndFree

By setting this variable to 1, you can get information about each executive object when it is

created/destroyed. The information includes the memory address where the object was

Page 83: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

created and the type of the object (Key, Semaphore, and so on). The information appears like

this:

"OB: Alloc e1304908 (e1304908) 0012 - Key

OB: Free e1304908 (e1304908) - Type: Key"

LpcpTraceMessages

This variable proves very useful in reverse engineering the local procedure call mechanism

(LPC) used by Windows NT for implementing various subsystems. By setting this variable to 1,

you can get tons of information about how LPC functions. The information displays as follows:

"LPC[ 55.54 ]: Allocate Msg e1118b08

LPC[ 55.54 ]: Explorer.exe Send Request (LPC_REQUEST) Msg e1118b08 (853) [000000

00 00010001 77c21e63 77b9da6b] to Port e11a6dc0 (csrss.exe)

LPC[ 1a.52 ]: csrss.exe Receive Msg e1118b08 (853) from Port e11a6dc0 (csrss.exe

)

LPC[ 1a.52 ]: Free Msg e1118b08

LPC[ 1a.52 ]: Allocate Msg e1118b08

LPC[ 1a.52 ]: csrss.exe Sending Reply Msg e1118b08 (853.0, 0) [00000000 00010001

77c21e63 77b9da6b] to Thread ff939020 (Explorer.exe)

LPC[ 1a.52 ]: csrss.exe Waiting for message to Port e11a6dc0 (csrss.exe)

LPC[ 55.54 ]: Explorer.exe Got Reply Msg e1118b08 (853) [00000000 00010001 00000

000 77b9da6b] for Thread ff939020 (Explorer.exe)

LPC[ 55.54 ]: Free Msg e1118b08"

MmDebug

By setting different bits of this variable, you can see different messages generated by the

memory management module. Following, we list the bits of this variable that the operating

system can set and then generate the corresponding messages.

Bit 2

MM:actual fault c01dfc38 va 77f0e9db

***DumpPTE at c01dfc38 contains 3d0450 protoaddr e10f40a0 subsect ffba2fc0

inserting element 51 77f0e001

MM:actual fault c0307b00 va c1ec0000

***DumpPTE at c0307b00 contains 75c434 protoaddr e11d7068 subsect ffb6a3b0

***DumpPTE at e11d7068 contains 1c1d44c2 protoaddr e8075184 subsect fdfc2bf8

MM:actual fault c030d500 va c3540000

***DumpPTE at c030d500 contains 7f4434 protoaddr e11fd068 subsect ffb60bb0

removing wsle 313 c3540661

Bit 3

***WSLE cursize 79 frstfree 11a Min 1e Max 91

quota 88 firstdyn 3 last ent 25a next slot 3

Page 84: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

index 0 c0300403

index 1 c0301403

index 2 c0502403

index 3 c01ff401

....

....

index 259 77f43401

Bit 4

csrss.exe file: \MMFAULT: va: 8018cd7e size: 1000 process: SystemVa file: \

MMFAULT: va: 77d9bd10 size: 1000 process: progman.exe file: \MMFAULT: va: c1ec00

00 size: 1000 process: SystemVa file: (null)

MMFAULT: va: c1786000 size: 1000 process: SystemVa file: (null)

MMFAULT: va: c1787000 size: 1000 process: SystemVa file: (null)

....

....

Bit 10

allocated 0x1 Ptes at c03f308c

releasing 0x1 system PTEs at location c03f308c

System Pte at c03f308c for 1 entries (c03f308c)

System Pte at c03f3108 for 2 entries (c03f310c)

System Pte at c03f31b0 for 1 entries (c03f31b0)

System Pte at c03f31d0 for 1 entries (c03f31d0)

Bit 28

crea sect access mask f001f maxsize 0 page prot 10

. allocation attributes 1000000 file handle a0

return crea sect handle a4 status 0

crea sect access mask f001f maxsize 10000 page prot 4

allocation attributes 4000000 file handle 0

return crea sect handle 1f0 status 0

mapview process handle ffffffff section 1f0 base address 0

zero bits 0

view size 0 offset 0 commitsize 10000 protect 4

Inheritdisp 2 Allocation type 0

Bit 30

MM:**access fault - va 77ea1d17 process fdf787a0 thread fdf77020

MM:**access fault - va 77ea31ba process fdf787a0 thread fdf77020

ObDebugFlags

Two bits of this variable (the fifth and sixth bits) control the operating system debug messages.

These bits control the security descriptor-related messages

Page 85: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Bit 6

Reference Index = 20, New RefCount = 5

Referencing index #20, Refcount = 5

Dereferencing SecurityDescriptor e11cc778, index #20, refcount = 6

Reference Index = 252, New RefCount = 8

Referencing index #252, Refcount = 8

Bit 7

Deassigning security descriptor e11cea98, Index = 252

Deassigning security descriptor e11cc778, Index = 20

Deassigning security descriptor e11d0ed8, Index = 214

Deassigning security descriptor e11d89d8, Index = 250

NtGlobalFlag

One bit of this variable enables the debug messages. Other bits control the validations

performed by the operating system and general operation of the operating system. Take a look

at the GFLAGS utility in the resource kit for the description of individual bits of NtGlobalFlag.

The value of this variable is inherited by a variable in NTDLL.DLL during the process startup.

NTDLL.DLL uses the second bit of this variable to show the loading of a process. During

process startup, NTDLL gets the value of this flag and sets its internal variable ShowSnap to 1

if the second bit is set. Once this bit is set, you can watch the behavior of the PE

executable/DLL loader. Windows NT will show names of all the imported DLLs, plus it will

show a real set of DLLs required to start an application. It will also show you the address of

initialization functions of each of these DLLs as well as a lot of other information. Look at the

following messages displayed by the operating system by just turning on one bit of the

NtGlobal flag variable. Here, we started pstat.exe and terminated it immediately:

LDR: PID: 0x47 started - 'pstat'

LDR: NEW PROCESS

Image Path: C:\MSTOOLS\bin\PSTAT.EXE (PSTAT.EXE)

Current Directory: C:\MSTOOLS\bin

Search Path:

C:\MSTOOLS\bin;.;C:\WINNT40\System32;C:\WINNT40\system;C:\WINNT40;C:\WINNT40

\system32;C:\WINNT40;c:\winnt35;c:\winnt35\system32;c:\msdev\bin;C:\DOS

LDR: PSTAT.EXE bound to USER32.dll

NTICE: Load32 START=77E10000 SIZE=62000 KPEB=FF925DE0 MOD=user32

LDR: ntdll.dll used by USER32.dll

LDR: Snapping imports for USER32.dll from ntdll.dll

LDR: KERNEL32.dll used by USER32.dll

NTICE: Load32 START=77ED0000 SIZE=5E000 KPEB=FF925DE0 MOD=kernel32

LDR: ntdll.dll used by KERNEL32.dll

LDR: Snapping imports for KERNEL32.dll from ntdll.dll

Page 86: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

LDR: Snapping imports for USER32.dll from KERNEL32.dll

LDR: LdrLoadDll, loading NTDLL.dll from

LDR: LdrGetProcedureAddress by NAME - RtlReAllocateHeap

LDR: LdrLoadDll, loading NTDLL.dll from

LDR: LdrGetProcedureAddress by NAME - RtlSizeHeap

...

...

LDR: LdrLoadDll, loading NTDLL.dll from

LDR: LdrGetProcedureAddress by NAME - RtlUnwind

LDR: LdrLoadDll, loading NTDLL.dll from

LDR: LdrGetProcedureAddress by NAME - RtlAllocateHeap

LDR: LdrLoadDll, loading NTDLL.dll from

LDR: LdrGetProcedureAddress by NAME - RtlFreeHeap

LDR: Refcount USER32.dll (1)

LDR: Refcount KERNEL32.dll (1)

LDR: Refcount GDI32.dll (1)

LDR: Refcount KERNEL32.dll (2)

...

...

...

LDR: Real INIT LIST

C:\WINNT40\system32\KERNEL32.dll init routine 77ed47a0

C:\WINNT40\system32\RPCRT4.dll init routine 77dc060d

C:\WINNT40\system32\ADVAPI32.dll init routine 77d38650

C:\WINNT40\system32\USER32.dll init routine 77e23890

LDR: KERNEL32.dll loaded. - Calling init routine at 77ed47a0

LDR: RPCRT4.dll loaded. - Calling init routine at 77dc060d

LDR: ADVAPI32.dll loaded. - Calling init routine at 77d38650

LDR: USER32.dll loaded. - Calling init routine at 77e23890

LDR: PID: 0x47 finished - 'pstat'

NTICE: Exit32 PID=47 MOD=PSTAT

SepDumpSD

By setting this variable to 1, the operating system dumps the security descriptor in the security

handling 杛 elated code.

SECURITY DESCRIPTOR

Revision = 1

Dacl present

Self relative

Owner S-1-5-32-544

Group SYSTEM S-1-5-18

Sacl@ 0

Dacl@ e11f71fc

Revision: 02 Size: 0044 AceCount: 0002

Page 87: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

AceHeader: 00140000 Access Allowed

Access Mask: 001f03ff

AceSize = 20

Ace Flags =

Sid = SYSTEM S-1-5-18

AceHeader: 00180000 Access Allowed

Access Mask: 00120048

AceSize = 24

Ace Flags =

Sid = S-1-5-32-544

TokenGlobalFlag

By setting this variable to 1, the operating system dumps the security token-related messages.

SE (Token): Acquiring Token READ Lock for access to token

0xe11826f0

SE (Token): Releasing Token Lock for access to token 0xe11826f0

SE (Token): Acquiring Token READ Lock for access to token

0xe11826f0

SE (Token): Releasing Token Lock for access to token 0xe11826f0

CmLogLevel and CmLogSelect These variables control the debugging messages given by the registry handling code. Different

log levels serve as debug levels. By setting the individual bits in CmLogSelect, you can control

the volume of messages generated by the operating system. The maximum value of

CmLogLevel is 7. By default, the individual bits in CmLogSelect are set to produce the most

verbose output.

NtOpenKey

DesiredAccess=80000000 RootHandle=00000000

Name='\Registry\Machine\Software\Microsoft\Windows

NT\CurrentVersion\Image File Execution Options\notepad.exe'

CmpParseKey:

CompleteName = '\Registry\Machine\Software\Microsoft\Windows

NT\CurrentVersion\Image File Execution Options\notepad.exe'

RemainingName = '\Machine\Software\Microsoft\Windows NT\CurrentVersion\Image

File Execution Options\notepad.exe'

CmpFindSubKeyByName:

Hive=e10025c8 Parent=00000020 SearchName=fdd0bd08

CmpFindSubKeyInLeaf:

Hive=e10025c8 Index=e10091f4 SearchName=fdd0bd08

CmpFindSubKeyByName:

Page 88: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Hive=e10025c8 Parent=00000108 SearchName=fdd0bd08

CmpFindSubKeyInLeaf:

Hive=e10025c8 Index=e100943c SearchName=fdd0bd08

CmpFindSubKeyByName:

Hive=e10c8988 Parent=00000020 SearchName=fdd0bd08

HOW TO DECIPHER THE PARAMETERS PASSED TO AN UNDOCUMENTED FUNCTION

This section describes how you can find out the parameters to be passed to an undocumented

function. The first step in deciphering parameters is to set a breakpoint on the function using

SoftICE. If you know the application that uses this undocumented function (from the import

dump), start the application. For example, Dr. Watson (DRWTSN32.EXE) uses an

undocumented NTDLL function NtOpenThread().

XREF: You can find a complete list of functions (documented as well as undocumented) imported

by an application using the DUMPBIN utility. For example, DUMPBIN PROGMAN.EXE /IMPORTS will

display all the functions imported by the program manager.

To start DRWTSN32, begin an application that faults (GPF) or write one that does the fault

explicitly. If you do not know an application that uses this undocumented function, try to find an

equivalent Win32 API call. If you find such a call, write an application that will call this function.

Assuming you want to decipher the parameters passed to a NtAllocateVirtualMemory system

service, you may write an application that calls VirtualAlloc(). Once the breakpoint for the

function that you want to decipher is triggered, you can look at the details of the function

implementation. You can use some general tricks to decipher the parameters. We discuss a

few of them in the sections that follow.

Examining the Error Handling Code

Many times a function checks for the value of a particular parameter, and if it is not appropriate,

returns an error code. By examining the error code, you can get information about the error in

NTSTATUS.H file from DDK. Then, we can find out the type of parameter used.

Consider the following piece of code in an undocumented NtQueryMutant system service:

CMP DWORD PTR [EBP+C], 0

JZ 8019D397

MOV DWORD PTR [EBP-34], C0000003 (STATUS_INVALID_INFO_CLASS)

..

..

8019D397:

CMP DWORD [EBP+14], 8

JZ 8019D3B3

MOV DWORD PTR [EBP-34], C0000004 (STATUS_INFO_LENGTH_MISMATCH)

Page 89: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

From this Assembly code, you can easily see that [EBP+C], the second parameter, contains

the InfoClass, and [EBP+14], the fourth parameter, contains the size of the buffer that holds

the mutant information.

Use in the Function

Sometimes, a particular parameter of an undocumented function is passed as a parameter to

some documented function. In this case, by looking at the documented function, you can easily

find out the parameter passed to the undocumented function.

Consider the following piece of code in the NtQueryMutant() function:

PUSH 00

LEA EAX,[EBP-20]

PUSH EAX

PUSH DWORD PTR [EBP-19]

MOV EAX,[_ExMutantObjectType]

PUSH EAX

PUSH 01

PUSH DWORD PTR [EBP+08]

CALL _ObReferenceObjectByHandle

MOV [EBP-24],EAX

TEST EAX,EAX

JL 8019D435

PUSH DWORD PTR [EBP-20]

CALL _KeReadStateMutant

Looking at this code, you can clearly see that the first parameter to the NtQueryMutant()

function is the Mutex object handle because the same parameter is passed a first parameter to

documented ObReferenceObjectByHandle() function, and first parameter to

ObReferenceObjectByHandle() function is the object handle. Hence, using the knowledge that

the name of the function is NtQueryMutant and the first parameter is passed as is to

ObReferenceObjectByHandle as a object handle, we can conclude that the first parameter

might be a handle to a mutex object.

Checking the Validation Code

Sometimes, a piece of code checks for the value of a parameter and displays a message if it

has a particular value. By looking at the message provided by the operating system, you can

find out the parameter. Especially in checked builds, asserts are used extensively. By looking

at the messages in these asserts, you can find out the parameters. For example, a function

that expects PEB as a parameter contains a piece of code that checks if the type field of the

object is a Process object.

TYPICAL ASSEMBLY LANGUAGE PATTERNS AND THEIR

Page 90: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

MEANINGS This piece of code gets the Current process object pointer (PEB) in the EAX register:

MOV EAX, FS:[124]

MOV EAX, [EAX+40]

While executing in kernel mode, FS:[124] always points to the currently executing thread

(TEB) and [TEB+40] always points to the current process. Under Windows NT 4.0 and

Windows 2000, [TEB+44] points to the current process.

MOV EAX, ESI

AND EAX, 0xFFFFF3FF

SHR EAX, 0A

SUB EAX, 40000000

MOV EAX, ESI

AND EAX, FFCFFFFF

SHR EAX, 14

SUB ECX, 3FD00000

The preceding two pieces of code route to the page table entry and the page directory entry,

respectively, for the virtual address present in the ESI register. The functioning registers might

change; however, the pattern remains the same. You may have seen this code in many

memory management-related functions. At first it looks odd; however, it is highly optimized

using the 2’s complement method. As an exercise, try to determine how this works. Hint: Page

tables are mapped starting at the virtual address 0xC0000000, and Page directory is mapped

starting at the virtual address 0xC0300000.

PUSH 00

LEA EAX,[EBP-20]

PUSH EAX

PUSH ECX

PUSH DWORD PTR [_PsProcessType]

PUSH 08

PUSH DWORD PTR [EBP+08]

CALL _ObReferenceObjectByHandle

MOV [EBP-24],EAX

TEST EAX,EAX

JL .....

MOV EAX,FS:[00000124]

MOV ECX,[EBP-20]

CMP [EAX+40],ECX

JZ ...

PUSH ECX

Page 91: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

CALL _KeAttachProcess

Here, the code attempts to play with other processes. It wants to perform some work on behalf

of another process. This piece of code gets the handle to the Process object as a parameter.

Using this handle, the code reaches to the actual object and then compares the address of the

Process object with the address of the current Process object stored at [TEB+40] in Windows

NT 3.51 and [TEB+44] in Windows NT 4.0 and Windows 2000. If the Process object dealt with

is not the current Process object, then the code attaches to the desired Process object using

KeAttachProces(). The code following this will execute in the context of the attached process.

You can see a similar kind of code in the system services that have the ability to play in other

processes. The system service NtAllocateVirtualMemory enables allocation of the memory for

a process other than the current one. You will find this kind of code in the

NtAllocateVirtualMemory() function. Other places where you can find this code are

NtFreeVirtualMemory() and NtLockVirtualMemory().

THE PRACTICAL APPLICATION OF REVERSE ENGINEERING

Now, let’s observe the practical application of the reverse engineering techniques discussed in

this chapter. We will show clearly how you can arrive at pseudocode given the raw assembler

listing.

XREF: You can study the example we chose in Chapter 10, “Adding New Software Interrupts.”

In Chapter 10, we discuss the callgate implementation on Windows NT (for running ring 0 code

from ring 3 application). When we decided to design the callgate mechanism, we were in

search of some mechanism to allocate the selectors— the basic requirement for creating

callgates. We knew that the Win32 application did not have a Local Descriptor Table (LDT).

Therefore, we wanted to allocate selectors from a Global Descriptor Table (GDT). First, we

looked at the symbols of NTOSKRNL by using SoftICE’s command SYM *Selector*. We

received some entries matching the regular expression *Selector*.

One symbol we found interesting was KeI386AllocateGdtSelector. We deduced from the name

that this function must allocate GDT Selectors. Next, we took the export dump of NTOSKRNL

to see whether the function is exported. You can make use of undocumented functions only if

the function is exported. If the function is not exported then you have to deal with hard-coded

addresses. This makes the program bound to the specific version of Windows NT (for example,

NT 3.51/4.0/2000, free builds/checked builds/service packs). Luckily, we found that the

function was exported. Our next step was to put breakpoint on this function. Unfortunately, we

found that this breakpoint is never triggered on our configuration, so we decided to reverse

engineer the function ourselves. We extracted the Assembly output of the function using the

SoftICE history buffer. Here is the raw Assembly code for the function:

_KeI386AllocateGdtSelectors

Page 92: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

0008:80125D00 PUSH EBP

0008:80125D01 MOV EBP,ESP

0008:80125D03 PUSH ESI

0008:80125D04 MOV SI,[EBP+0C]

0008:80125D08 PUSH EDI

0008:80125D09 CMP [_KiNumberFreeSelectors$S10229],SI

0008:80125D10 JB 80125D5E

0008:80125D12 MOV ECX,_KiAbiosGdtLock

0008:80125D17 CALL [__imp_@KfAcquireSpinLock]

0008:80125D1D SUB [_KiNumberFreeSelectors$S10229],SI

0008:80125D24 MOV EDX,[_KiFreeGdtListHead$S10230]

0008:80125D2A TEST SI,SI

0008:80125D2D JZ 80125D47

0008:80125D2F MOV ECX,[EBP+08]

0008:80125D32 MOV EDI,EDX

0008:80125D34 SUB DI,[_KiAbiosGdt]

0008:80125D3B MOV [ECX],DI

0008:80125D3E ADD ECX,02

0008:80125D41 DEC SI

0008:80125D43 MOV EDX,[EDX]

0008:80125D45 JNZ 80125D32

0008:80125D47 MOV ECX,_KiAbiosGdtLock

0008:80125D4C MOV [_KiFreeGdtListHead$S10230],EDX

0008:80125D52 MOV EDX,EAX

0008:80125D54 CALL [__imp_@KfReleaseSpinLock]

0008:80125D5A XOR EAX,EAX

0008:80125D5C JMP 80125D63

0008:80125D5E MOV EAX,C0000115 ;

STATUS_ABIOS_SELECTOR_NOT_AVAILABLE

0008:80125D63 POP EDI

0008:80125D64 POP ESI

0008:80125D65 POP EBP

0008:80125D66 RET 0008

Looking at the last instruction, RET 8, the function clearly followed the _stdcall calling

convention with two parameters to the function. We next had to decipher what those

parameters were. Because the compiler generated the standard stack frame (PUSH EBP,

MOV EBP, ESP), clearly EBP+8 referred to the first parameter, and EBP+C referred to the

second parameter.

The following instruction sequence suggests that the second parameter represents the

number of selectors to be allocated:

0008:80125D03 PUSH ESI

Page 93: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

0008:80125D04 MOV SI,[EBP+0C]

0008:80125D08 PUSH EDI

0008:80125D09 CMP [_KiNumberFreeSelectors$S10229],SI

0008:80125D10 JB 80125D5E

...

...

0008:80125D5E MOV EAX,C0000115 ;

STATUS_ABIOS_SELECTOR_NOT_AVAILABLE

This code moves the second parameter in the SI register and compares the SI register with the

kernel variable KiNumberFreeSelectors$S10229. If the value in the SI register is less than

KiNumberFreeSelectors$S10229, then the code jumps to a label and from there fills in the

EAX register with an error code of STATUS_ABIOS_SELECTOR_NOT_AVAILABLE. Clearly,

the second parameter to the function was “Number of Selectors to allocate.”

Next, we looked at the code, assuming an x number of available selectors. We assumed that

the JB condition evaluated to false.

The next two instructions acquired the GDT lock. Locks are extensively used at various places

to protect multiple threads from accessing some shared kernel data structure. Most of the time,

you can ignore these pieces of code, because they have nothing to do with the actual logic of

the function.

0008:80125D12 MOV ECX,_KiAbiosGdtLock

0008:80125D17 CALL [__imp_@KfAcquireSpinLock]

The next instruction decrements the value of the kernel variable

_KiNumberFreeSelectors$S10229 according to the number of selectors to be allocated.

0008:80125D1D SUB [_KiNumberFreeSelectors$S10229],SI

Then, the function loads the EDX register with the value of the kernel variable

_KiFreeGdtListHead$S10230. Looking at the instruction, you can see the selectors are put in

a free list.

0008:80125D24 MOV EDX,[_KiFreeGdtListHead$S10230]

Next, the function checks to see if the number of selectors to be allocated is zero. In that case,

the function jumps to a label where some rollback is done, and the EAX register is zeroed out

indicating success so the function returns.

0008:80125D2A TEST SI,SI

0008:80125D2D JZ 80125D47

....

Page 94: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

....

0008:80125D47 MOV ECX,_KiAbiosGdtLock

0008:80125D4C MOV [_KiFreeGdtListHead$S10230],EDX

0008:80125D52 MOV EDX,EAX

0008:80125D54 CALL [__imp_@KfReleaseSpinLock]

0008:80125D5A XOR EAX,EAX

0008:80125D5C JMP 80125D63

0008:80125D5E MOV EAX,C0000115 ;

STATUS_ABIOS_SELECTOR_NOT_AVAILABLE

0008:80125D63 POP EDI

0008:80125D64 POP ESI

0008:80125D65 POP EBP

0008:80125D66 RET 0008

Now, let’s see what happens when the number of allocated selectors is nonzero:

0008:80125D2F MOV ECX,[EBP+08]

0008:80125D32 MOV EDI,EDX

0008:80125D34 SUB DI,[_KiAbiosGdt]

0008:80125D3B MOV [ECX],DI

0008:80125D3E ADD ECX,02

0008:80125D41 DEC SI

0008:80125D43 MOV EDX,[EDX]

0008:80125D45 JNZ 80125D32

The code fills the ECX register with the first parameter. Then, it loads the EDI register with the

value of the EDX register (_KiFreeGdtListHead$S10230). Next, it subtracts the value of the

kernel variable KiAbiosGdt. The value of the kernel variable KiAbiosGdt matched with the base

address of the Global Descriptor Table. Hence, the preceding piece of code extracts the

selector value in the DI register. Next, the code copies the selector value in the location

pointed by the ECX register. The code then adds 2 to the ECX register. From this, we deduced

that the first parameter points to a buffer that contains the selector values allocated with each

entry consisting of 2 bytes. Therefore, the first parameter must be an array of short integers.

The code reaches to the next free selector using the instruction:

MOV EDX,[EDX]

From this, we can see that the free selectors are maintained in a linked list, and the descriptors

are used for keeping track of the next free selector in the list. The SI register decrements each

time in the loop. Initially, the SI register contains the number of selectors to be allocated. In the

end, the SI register reaches 0. At this point, the buffer pointed by second parameter contains

the list of selectors allocated.

Now, we’ll write the pseudocode for the function:

Page 95: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

NTSTATUS _stdcall

KeI386AllocateGdtSelectors(

unsigned short *SelectorArray,

unsigned short nSelectors)

{

register int i=0;

register int *DescritorEntry;

if (KiNumberFreeSelectors$S10229<nSelectors) {

return STATUS_ABIOS_SELECTOR_NOT_AVAILABLE;

}

KfAcquireSpinLock(_KiAbiosGdtLock);

_KiNumberFreeSelectors$S10229-=nSelectors;

if (nSelectors==0) {

goto CommonExit;

}

DescriptorEntry=_KiFreeGdtListHead$S10230;

while (nSelectors!=0) {

SelectorArray[i]=DescriptorEntry-KiAbiosGdt;

i++;

nSelectors--;

DescriptorEntry=*DescriptorEntry

}

CommonExit:

KfReleaseSpinLock(_KiAbiosGdtLock);

return 0;

}

SUMMARY

In this chapter, we described how to use symbolic information supplied with Windows NT using

SoftICE. We also discussed some general techniques used for reverse engineering, such as

how to understand the compiler code generation patterns. Next, we showed how Windows NT

can assist in reverse engineering by enabling some debugging flags in the kernel. We also

discussed various ways of deciphering the parameters for undocumented functions. Next, we

reviewed some typical Assembly language patterns found throughout the Windows NT kernel

code. The chapter concluded with an example showing the deciphering of an undocumented

function called KeI386AllocateGdtSelectors from NTOSKRNL EXE.

Page 96: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 6

Hooking Windows NT System Services

Page 97: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter explores system services under DOS, Windows 3.x, Windows 95/98, and

Windows NT. The authors discuss the need for hooking these system services.

THIS CHAPTER DISCUSSES hooking Windows NT system services. Before we begin, let’s first

review what we mean by a system service. A system service refers to a set of functions

(primitive or elaborate) provided by the operating system. Application programming interfaces

(APIs) enable developers to call several system services, directly or indirectly. The operating

system provides APIs in the form of a dynamic link library (DLL) or a static compiler library.

These APIs are often based on system services provided by the operating system. Some of

the API calls are directly based on a corresponding system service, and some depend on

making multiple system service calls. Also, some of the API calls may not make any calls to

system services. In short, you do not need a one-to-one mapping between API functions and

system services. Figure 6-1 demonstrates this in context of Windows NT.

Figure 6-1: Mappings between API functions and system services

SYSTEM SERVICES: THE LONG VIEW

System services and the APIs calling these system services have come a long way from DOS

to Windows NT.

System Services under DOS

Under DOS, system services comprise part of the MS-DOS kernel (including MSDOS.SYS

and IO.SYS). These system services are available to users in the form of Interrupt Service

Routines (ISRs). ISRs can be invoked by calling the appropriate interrupt handlers using the

Page 98: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

INT instruction. API functions, provided by compiler libraries, call the interrupt handler for

system services (the INT 21h interrupt). For example, to open a file, MS-DOS provides a

system service for which you have to specify the function number 0x3D in the AH register,

attribute mask in the CL register, filename in the DS:DX register, as well as issue the INT 21h

instruction. Compilers typically provide wrappers around this and provide a nice API function

for this purpose.

System Services under Windows 3.x and Windows 95/98

Under Windows 3.x or Windows 95/98, the core system services take the form of VXDs and

DLLs and some real-mode DOS code. The APIs are provided in the form of dynamic link

libraries. These dynamic link libraries call the system services to implement the APIs. For

example, to open a file, applications call an API function from KERNEL32.DLL such as

OpenFile() or CreateFile(). These APIs, in turn, call a system service.

System Services under Windows NT

Under Windows NT, the NT executive (part of NTOSKRNL.EXE) provides core system

services. These services are rather generic and primitive. Various APIs such as Win32, OS/2,

and POSIX are provided in the form of DLLs. These APIs, in turn, call services provided by the

NT executive. The name of the API function to call differs for users calling from different

subsystems even though the same system service is invoked. For example, to open a file from

the Win32 API, applications call CreateFile() and to open a file from the POSIX API,

applications call the open() function. Both of these applications ultimately call the NtCreateFile()

system service from the NT executive.

Note: Under Windows NT 3.51, the system services are provided by a kernel-mode component

called NTOSKRNL.EXE. Most of the KERNEL32.DLL calls—such as those related to memory

management and kernel objects management—are handled by these system services. The

USER32 and GDI32 calls are handled by a separate subsystem process called CSRSS. Starting with

Windows NT 4.0, Microsoft moved most of the functionality of CSRSS into a kernel-mode driver

called WIN32K.SYS. The functionality moved into WIN32K.SYS is made available to the

applications in the form of system services. These system services are not truly part of native

system services since they are specific to the user interface and not used by all subsystems. This

chapter and the next chapter focus only on the system services provided by NTOSKRNL.EXE.

NEED FOR HOOKING SYSTEM SERVICES

Hooking represents a very common mechanism of intercepting a particular section of

executing code. Hooking provides a useful way of modifying the behavior of the operating

system. Hooking can help the developer in several ways. Often developers are concerned

more with how to hook a system service or an API call rather than why to hook. Nevertheless,

we examine the various possible situations in which the need to hook a system service arises.

How hooking can help the developer is explained in the following sections.

Page 99: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Trapping Events at Occurrence

Developers trap events such as the creation of a file (CreateFile()), creation of a mutex

(CreateMutex()), or Registry accesses (RegCreateKey()) for specific purposes. Hooking a

particular event -related API or system service call, synchronously, can help trap those events.

Applications doing system monitoring will find these kinds of hooking invaluable. These hooks

could act as interrupts triggered by the occurrence of these events. A developer could write a

routine to handle the occurrence of these events and take appropriate action.

Modifying System Behavior to Suit User Needs

Diverting the normal flow of control by introducing the hooks can modify operating system

behavior. This enables the developer to change data structures and context at the time of

hooking–enough to induce new behavior. For example, you can protect the opening of a

sensitive file by hooking the NtCreateFile() system service. Although NTFS provides user-level

security for files, this security is not available on FAT partitions. You should ensure that hooking

does not have any undesirable side effects on the operating system. Protecting modifications

to Registry keys is something easily doable when you hook the Registry system services. This

has several applications, since little protection is provided for Registry settings created by

applications.

Studying the Behavior of the System

In order to get a better idea of the internal workings of the operating system, studying the

behavior of the system is something most debuggers or system hackers will relate to.

Understanding of undocumented operating system functionality requires a lot of hacking,

which goes hand in hand with hooking.

Debugging

Complex programs could make use of system-service hooking to debug the stickiest problems.

For example, a few days back, we had a problem with the installation of a piece of software.

We had difficulty creating folders and shortcuts for this application. Using a systemwide hook,

we quickly figured that the installation program was looking for a Registry value that indicated

where to install the folders (which happened to be the Start menu). We hooked the

NtQueryValueKey() call, then obtained the value the installation program was looking for. We

created that value and solved our problem.

Getting Performance Data for Specific Tasks and Generating Statistics

These tasks can prove very useful to those writing benchmarks and applications to critically

measure system performance under specific conditions. Even measuring the frequency of

certain system services becomes very easy with this type of hooking. Measuring file system

performance by hooking the file system-related system services exemplify this procedure.

Life without hooking is unthinkable for most Windows developers in today’s

Microsoft-dominated world of operating systems. Windows NT system services lie at the

center of the NT universe, and having the ability to hook these can prove extremely handy.

Page 100: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

TYPES OF HOOKS

The following sections explore two types of hooking.

Kernel-Level Hooking

You can achieve kernel-level hooking by writing a VXD or device driver. In this method,

essential functions provided by the kernel are hooked. The advantage of this type of hooking is

that you get one central place from which you can monitor the events occurring as a result of a

user-mode call or a kernel-mode call. The disadvantage of this method is that you need to

decipher the parameters of the call passed from kernel mode, since many times these services

are undocumented. Also, the data passed to the kernel-mode call might differ from the data

passed in a user-mode call. Also, a user-level API call might be implemented using multiple

calls to the kernel. In this case, hooking becomes far more difficult. In general, this type of

hooking is more difficult to achieve, but it can produce more rewarding results.

User-Level Hooking

You can perform this type of hooking with some help from a VXD or device driver. In this

method, the functions provided by the user-mode DLLs are hooked. The advantage of this

method is that these functions are usually well documented. Therefore, you know the

parameters to expect. This makes it easy to write the hook function. This type of hooking limits

your field of vision to user mode only and does not extend to kernel mode.

IMPLEMENTATIONS OF HOOKS

The following sections detail the implementation of hooks under various Microsoft platforms.

DOS

In the DOS world, system services are implemented as an interrupt handler routine (INT 21h).

The compiler library routines typically call this interrupt handler to provide an API function to

the programmer. It is trivial to hook this handler using the GetVect (INT 21h, AX=25h) and

SetVect (Int 21h, AX=35h) services. Hence, hooking system services are fairly straightforward.

DOS does not contain separate user and kernel modes.

Windows 3.x

In the Windows 3.x world, system services are implemented in DLLs. The compiler library

routines represent stubs that jump to the DLL code (this is called dynamic linking of DLLs).

Also, because the address space is common to all applications, hooking amounts to getting

the address of that particular system service and changing a few bytes at that address.

Changing of these bytes sometimes requires the simple aliasing of selectors.

XREF: Refer to the MSDN article in Microsoft Systems Journal (Vol. 9, No. 1) entitled, “Hook and

Monitor Any 16-bit Windows(tm) Function With Our ProcHook DLL,” by James Finnegan.

Page 101: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Windows 95 and 98

In the Windows 95/98 world, system services are implemented in a DLL as in Windows 3.1.

However, under Windows 95/98, all 32-bit applications run in separate address spaces.

Because of this, you cannot easily hook any unshared DLL. It is fairly easy to hook a shared

DLL such as KERNEL32.DLL. You simply modify a few code bytes at the start of the system

service you want to hook and write your hook function in a DLL that is loaded in shared

memory. Modifying the code bytes may involve writing a VXD, because KERNEL32.DLL is

loaded in the upper 2GB of the address space and protected by the operating system.

Windows NT

In the Windows NT world, system services are implemented in the kernel component of NT

(NTOSKRNL.EXE). The APIs supported by various subsystems (Win32, OS/2, and POSIX)

are implemented by using these system services. There is no documented way of hooking

these system services from kernel mode. There are several documented ways for hooking

user-level API calls.

XREF: Refer to the MSDN articles in Microsoft Systems Journal entitled, “Learn System-Level

Win32(r) Coding Techniques by Writing and API Spy Program,” by Matt Pietrek (Vol.9, No.12), and

“Load Your 32-bit DLL into Another Process’s Address Space Using INJLIB,” by Jeffrey Richter

(Vol.9, No.5).

Refer to CyberSensor on http://www.cybermedia.co.in

We will present one way of achieving hooking of NT system services in kernel mode in this

chapter. We also provide the code for this on the CD-ROM accompanying this book.

WINDOWS NT SYSTEM SERVICES

Windows NT has been designed with several design goals in mind. Support for multiple

(popular) APIs, extensibility, isolation of various APIs from each other, and security are some

of the most important ones. The present design incorporates several protected subsystems

(for example, the Win32 subsystem, the POSIX subsystem, and others) that reside in the user

space isolated from each other. The NT executive runs in the kernel mode and provides native

support to all the subsystems. All subsystems use the NT system services provided by the NT

executive to implement most of their core functionality.

Windows programmers, when they link with the KERNEL32, USER32, and GDI32 DLLs, are

completely unaware of the existence of the NT system services supporting the various Win32

calls they make. Similarly, POSIX clients using the POSIX API end up using more or less the

same set of NT system services to get what they want from the kernel. Thus, NT system

services represent the fundamental interface for any user-mode application or subsystem to

the kernel.

Page 102: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

For example, when a Win32 application calls CreateProcess() or when a POSIX application

calls the fork() call, both ultimately call the NtCreateProcess() system service from the NT

executive.

NT system services represent routines, which run entirely in the kernel mode. For those

familiar with the Unix world, NT system services can be considered the equivalent of system

calls in Unix.

Figure 6-2 A caller program invoking an NT system service.

Figure 6-2: A caller program invoking an NT system service

Currently, Windows NT system services are not completely documented. The only place

where you can find some documentation regarding the NT system services is on Windows NT

DDK CD-ROMs from Microsoft. The DDK discusses about 25 different system services and

covers the parameters passed to them in some detail. You’ll see from Appendix A that this is

only the tip of the iceberg. In Windows NT 3.51, 0xC4 different system services exist, in

Windows NT 4.0, 0xD3 different system services exist, and in Windows 2000 Beta-2, 0xF4

different system services exist.

We deciphered the parameters of 90% of the system services. Prototypes for all these system

services can be found in UNDOCNT.H on the CD-ROM included with this book. We also

provide detailed documentation of some of the system services in Appendix A.

Page 103: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

In the following section, you will learn how to hook these system services.

HOOKING NT SYSTEM SERVICES

Let’s first look at how NT System Services are implemented in the Windows NT operating

system. We also will discuss the exact mechanics of hooking an NT system service. In addition,

we’ll explore the kernel data structures involved and provide sample code to aid hooking of

system services.

On the CD: Check out hookdrv.c on the accompanying CD-ROM.

Implementation of a System Service in Windows NT

The user mode interface to the system services of NTOSKRNL is provided in the form of

wrapper functions. These wrapper functions are present in a DLL called NTDLL.DLL. These

wrappers use the INT 2E instruction to switch to the kernel mode and execute the requested

system service. The Win32 API functions (mainly in KERNEL32.DLL and ADVAPI32.DLL) use

these wrappers for calling a system service. The Win32 API functions performs validations on

the parameters passed to the API functions, and translates everything to Unicode. After this,

the Win32 API function calls an appropriate wrapper function in NTDLL corresponding to the

required service. Each system service in NTOSKRNL is identified by the Service ID. The

wrapper function in NTDLL fills in the service id of the requested system service in the EAX

register, fills in the pointer to stack frame of the parameters in EDX register, and issues the INT

2E instruction. This instruction changes the processor to the kernel mode, and the processor

starts executing the handler specified for the INT 2E in the Interrupt Descriptor Table (IDT).

The Windows NT executive sets up this handler. The INT 2E handler copies the parameters

from user-mode stack to kernel-mode stack. The base of the stack frame is identified by the

contents of the EDX register. The INT 2E handler provided by NT Executive is internally called

as KiSystemService().

During the initialization of NTOSKRNL, it creates a function table, hereafter referred to as the

System Service Dispatch Table (SSDT), for different services provided by NTOSKRNL (see

Figure 6-3). Each entry in the table contains the address of the function to be executed for a

given service ID. The INT 2Eh handler looks up this table based on the service ID passed in

EAX register and calls the corresponding system service. The code for each function resides

in the kernel. Similarly, another table called the ParamTable (hereafter referred to as System

Service Parameter Table [SSPT]) provides the handler with the number of parameter bytes to

expect from a particular service.

Page 104: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Figure 6-3: System Service Dispatch Table and Parameter Table

Hooking NT System Services

The easiest way to put a hook into the system services is to locate the System Service

Dispatch Table used by the operating system and change the function pointers to point to

some other function inserted by the developer. You can do this only from a kernel-mode device

driver because this table is protected by the operating system at the page table level. The page

attribute for these pages is set so that only kernel-mode components can read from and write

to this table. User-level applications cannot read or write these memory locations.

LOCATING THE SYSTEM SERVICE DISPATCH TABLE IN THE NTOSKRNL

There is one undocumented entry in the export list of NTOSKRNL called

KeServiceDescriptorTable(). This entry is the key to accessing the System Service Dispatch

Table. The structure of this entry looks like this:

typedef struct ServiceDescriptorTable {

PVOID ServiceTableBase;

PVOID ServiceCounterTable(0);

unsigned int NumberOfServices;

PVOID ParamTableBase;

}

where

ServiceTableBase Base address of the System Service Dispatch Table.

NumberOfServices Number of services described by ServiceTableBase.

ServiceCounterTable This field is used only in checked builds of the operating system and

Page 105: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

contains the counter of how many times each service in SSDT is called. This

counter is updated by INT 2Eh handler (KiSystemService).

ParamTableBase Base address of the table containing the number of parameter bytes for

each of the system services.

ServiceTableBase and ParamTableBase contain NumberOfServices entries. Each entry

represents a pointer to a function implementing the corresponding system service.

The following program provides an example of hooking system services, under Windows NT.

The system service NtCreateFile() hooks and the name of the file created prints when the

hook gets invoked. We encourage you to insert code for hooking any other system service of

choice. Note the proper places for inserting new hooks in the following code.

Here are the steps to try out the sample (assuming that the sample binaries are copied in

C:\SAMPLES directory):

1. Run “instdrv hooksys c:\samples \hooksys.sys.” This will install the hooksys.sys driver. The

driver will hook the NtCreateFile system service.

2. Try to access the files on your hard disk. For each accessed file, the hooksys.sys will trap the

call and display the name of the file accessed in the debugger window. These messages can be

seen in SoftICE or using the debug message-capturing tool.

#include "ntddk.h"

#include "stdarg.h"

#include "stdio.h"

#include "hooksys.h"

#define DRIVER_SOURCE

#include "..\..\include\wintype.h"

#include "..\..\include\undocnt.h"

typedef NTSTATUS (*NTCREATEFILE)(

PHANDLE FileHandle,

ACCESS_MASK DesiredAccess,

POBJECT_ATTRIBUTES ObjectAttributes,

PIO_STATUS_BLOCK IoStatusBlock,

PLARGE_INTEGER AllocationSize OPTIONAL,

ULONG FileAttributes,

ULONG ShareAccess,

ULONG CreateDisposition,

ULONG CreateOptions,

PVOID EaBuffer OPTIONAL,

ULONG EaLength

);

Page 106: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

#define SYSTEMSERVICE(_function)

KeServiceDescriptorTable.ServiceTableBase[

*(PULONG)((PUCHAR)_function+1)]

NTCREATEFILE OldNtCreateFile;

NTSTATUS NewNtCreateFile(

PHANDLE FileHandle,

ACCESS_MASK DesiredAccess,

POBJECT_ATTRIBUTES ObjectAttributes,

PIO_STATUS_BLOCK IoStatusBlock,

PLARGE_INTEGER AllocationSize OPTIONAL,

ULONG FileAttributes,

ULONG ShareAccess,

ULONG CreateDisposition,

ULONG CreateOptions,

PVOID EaBuffer OPTIONAL,

ULONG EaLength)

{

int rc;

char ParentDirectory[1024];

PUNICODE_STRING Parent=NULL;

ParentDirectory[0]='\0';

if (ObjectAttributes->RootDirectory!=0) {

PVOID Object;

Parent=(PUNICODE_STRING)ParentDirectory;

rc=ObReferenceObjectByHandle(ObjectAttributes->RootDirectory,

0,

0,

KernelMode,

&Object,

NULL);

if (rc==STATUS_SUCCESS) {

extern NTSTATUS

ObQueryNameString(void *, void *, int size,

int *);

int BytesReturned;

rc=ObQueryNameString(Object,

ParentDirectory,

sizeof(ParentDirectory),

&BytesReturned);

ObDereferenceObject(Object);

Page 107: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

if (rc!=STATUS_SUCCESS)

RtlInitUnicodeString(Parent, L"Unknown\\");

} else {

RtlInitUnicodeString(Parent, L"Unknown\\");

}

}

DbgPrint("NtCreateFile : Filename = %S%S%S\n",

Parent?Parent->Buffer:L"",Parent?L"\\":L"",

ObjectAttributes->ObjectName->Buffer);

rc=((NTCREATEFILE)(OldNtCreateFile)) (

FileHandle,

DesiredAccess,

ObjectAttributes,

IoStatusBlock,

AllocationSize,

FileAttributes,

ShareAccess,

CreateDisposition,

CreateOptions,

EaBuffer,

EaLength);

DbgPrint("NtCreateFile : rc = %x\n", rc);

return rc;

}

NTSTATUS HookServices()

{

OldNtCreateFile=(NTCREATEFILE)(SYSTEMSERVICE(ZwCreateFile));

_asm cli

(NTCREATEFILE)(SYSTEMSERVICE(ZwCreateFile))=NewNtCreateFile;

_asm sti

return STATUS_SUCCESS;

}

void UnHookServices()

{

_asm cli

(NTCREATEFILE)(SYSTEMSERVICE(ZwCreateFile))=OldNtCreateFile;

_asm sti

return;

}

NTSTATUS

DriverEntry(

Page 108: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

IN PDRIVER_OBJECT DriverObject,

IN PUNICODE_STRING RegistryPath

)

{

MYDRIVERENTRY(DRIVER_DEVICE_NAME, FILE_DEVICE_HOOKSYS, HookServices());

return ntStatus;

}

NTSTATUS

DriverDispatch(

IN PDEVICE_OBJECT DeviceObject,

IN PIRP Irp

)

{

Irp->IoStatus.Status = STATUS_SUCCESS;

IoCompleteRequest (Irp, IO_NO_INCREMENT);

return Irp->IoStatus.Status;

}

VOID

DriverUnload(

IN PDRIVER_OBJECT DriverObject

)

{

WCHAR deviceLinkBuffer[] =L"\\DosDevices\\"DRIVER_DEVICE_NAME;

UNICODE_STRING deviceLinkUnicodeString;

UnHookServices();

RtlInitUnicodeString (&deviceLinkUnicodeString, deviceLinkBuffer);

IoDeleteSymbolicLink (&deviceLinkUnicodeString);

IoDeleteDevice (DriverObject->DeviceObject);

}

SUMMARY

In this chapter, we explored system services under DOS, Windows 3.x, Windows 95/98, and

Windows NT. We discussed the need for hooking these system services. We discussed

kernel- and user-lever hooks. We discussed the data structures used during the system call

and the mechanism used for hooking Windows NT system services. The chapter concluded

with an example that hooked the NtCreateFile() system service.

Page 109: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 7

Adding New System Services to the Windows NT Kernel

Page 110: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter explores in detail the system service implementation of Windows NT. The authors

explain the mechanism for adding new system services to the Windows NT kernel and provide

an example that adds three new system services.

CUSTOMIZING THE KERNEL for specific purposes has been very popular among developers long

before Windows NT. Ancient Unix gurus and developers alike practiced the art. In Unix, for

example, kernel developers can modify the kernel in several ways, such as adding new device

drivers, kernel extensions, system calls, and kernel processes. In Windows NT, DDK provide

means to add new device drivers. However, one of most effective ways of modifying the

kernel–adding new system services to it–is not documented. This method proves more

efficient than adding device drivers for several reasons discussed later in this chapter. Here,

we focus on the detailed implementation of a system service inside the Windows NT kernel

and explain, with examples, how new system services can add to the Windows NT.

In Inside Windows NT, Helen Custer mentions the design of system services and the

possibility of adding new system services to the kernel:

Using a system service dispatch table provides an opportunity to make native NT system

services extensible. The kernel can support new system services simply by expanding the

table without requiring changes to the system or to applications. After a code is written for a

new system service, a system administrator could simply run a utility program that dynamically

creates a new dispatch table. The new table will contain another entry that points to a new

system service.

The capability to add new system services exists in Windows NT but it is not documented.

Very little changed between NT 3.51 and later versions of Windows NT in this area. The only

thing being changed is that some of the data structures involved in implementation of a system

service are located at the different offsets in the later versions of the operating system. We feel

that our method of adding new system services may hold, possibly with very minor

modifications, in future releases of Windows NT.

At the end of this chapter, we try to shed some light on the possible thought that went into the

design of this portion of the operating system.

DETAILED IMPLEMENTATION OF A SYSTEM SERVICE IN WINDOWS NT

In Chapter 6, we discussed how a system service is invoked by the NTDLL.DLL at the request

of the application. The SSDT (System Service Dispatch Table) and SSPT (System Service

Parameter Table) help the kernel in accessing the right system service ID. The implementation

Page 111: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

of the SSDT and SSPT occurs similarly in all versions of Windows NT to date. We present the

two implementations separately for clarity, one for Windows NT 3.51 and one for the later

versions of the operating system such as Windows NT 4.0 and Windows 2000.

Below is the table containing the service ID mappings for all versions of Windows NT to date.

TABLE 7-1 SERVICE ID MAPPINGS

TABLE 7-1 SERVICE ID MAPPINGS

KERNEL32 and ADVAPI32 USER32 and GDI32 Calls

Windows NT 3.51 Mapped to 0x0 through 0xC3 service IDs inside NTOSKRNL

Processed by the Win32 subsystem–a user mode process. No system services are provided in the kernel for handling these directly. These calls use the Win32 subsystem using kernel 抯 LPC system

services.

Windows NT 4.0 (up to Service Pack 5)

Mapped to 0x0 through 0xD2 service IDs inside NTOSKRNL

Mapped to 0x1000 through 0x120A service IDs in the inside WIN32K.SYS. The kernel mode driver WIN32K.SYS takes over the functionality of the Win32 subsystem and supports these services.

Windows NT 2000 (beta-2) 0x0 through 0xF3 service IDs inside NTOSKRNL

Mapped to 0x1000 through 0x1285 service IDs in the inside WIN32K.SYS. The kernel mode driver WIN32K.SYS takes over the functionality of the Win32 subsystem and supports these services.

In Windows NT 3.51, only the KERNEL32 and ADVAPI32 functions of the operating system

route through NTDLL.DLL to NTOSKRNL. The USER32 and GDI32 functions of the operating

system implement as a part of the Win32 subsystem process (CSRSS). The USER32.DLL and

GDI32.DLL provide wrappers, which calls the CSRSS process using the local procedure call

(LPC) facility.

Page 112: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The functionality of USER32.DLL and GDI32.DLL is implemented differently in Windows NT

4.0 and Windows 2000. The functionality of the USER32 and GDI32 components is moved

into the kernel mode driver WIN32K.SYS. The workhorse routines of NT 3.51’s Win32

subsystem have transferred their load on the system services added by the addition of the

WIN32K.SYS component. This explains why we see more system services versions later to

Windows NT 3.51. This new set of system services corresponds to the USER32 and GDI32

components of the operating system.

Figure 7-1 System service tables

Figure 7-1 System service tables

Windows NT System Service Implementation

Here, we discuss the implementation of a system service under Windows NT. An INT 2Eh

instruction implements the system services. The INT 2Eh handler is internally named as

KiSystemService and hereafter we refer to it as the handler. Before entering the handler, the

EAX register is loaded with the service ID and the EDX register with a pointer to the stack

frame required for implementation of a particular service. The handler gets to the current TEB

(Thread Environment Block) by looking at the Processor Control Region (PCR). The current

TEB is stored at an offset of 0x124 in the Processor Control Region. The handler gets the

address of the System Service Descriptor Table from the TEB. You can locate the address of

the Service Descriptor Table at 0x124 offset in the TEB. Chapter 6 explains the format of the

Service Descriptor Table.

The handler refers to the first entry in the Service Descriptor Table for service IDs less than

0x1000 and refers to the second entry of the table for service IDs greater than or equal to

0x1000. The handler checks the validity of service IDs. If a service ID is valid, the handler

extracts the addresses of the SSDT and SSPT. The handler copies the number of bytes (equal

to the total number of bytes of the parameter list) described by the SSPT for the service–from

user-mode stack to kernel-mode stack–and then calls the function pointed to by the SSDT for

that service.

Page 113: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Initially, when any thread is started, the TEB contains a pointer to the Service Descriptor

Table–identified internally as KeServiceDescriptorTable. KeServiceDescriptorTable contains

four entries. Only the first entry in this table is used, which describes the service ids for some

of the KERNEL32 and ADVAPI32 calls. Another Service Descriptor Table, internally named

KeServiceDescriptorTableShadow, identically matches KeServiceDescriptorTable under NT

3.51. However, under later versions of the operating system, the second entry in the table is

not NULL. The second entry points to another SSDT and SSPT. This SSDT and SSPT

comprise part of the WIN32K.SYS driver. The WIN32K.SYS driver creates this entry during its

initialization (in its DriverEntry routine) by calling the function called

KeAddSystemServiceTable. (We provide more information on this later in this chapter.) This

second entry describes the services exported by WIN32K.SYS for USER32 and GDI32

modules.

You should note that in all versions of Windows NT, KeServiceDescriptorTable contain only

one entry and that all started threads point their TEBs to KeServiceDescriptorTable. This

continues so long as the threads call services belonging to first entry in

KeServiceDescriptorTable. When the threads call services above these limits (unlikely in 3.51,

but very likely in later versions of Windows NT, because USER and GDI service IDs start with

0x1000), the KiSystemService jumps to a label _KiEndUnexpectedRange under NT 3.51 and

_KiErrorMode under NT 4.0 and KiBBTEndUnexpectedRange in Windows 2000. Let’s see

what role the code at each label plays.

_KiEndUnexpectedRange (NT 3.51) The following example shows the role of the code at the _KiEndUnexpectedRange label:

if (serviceID < 0x1000) {

/* It means if service id > 0xC3 and

* service id < 0x1000

*/

return STATUS_INVALID_SYSTEM_SERVICE;

}

if (PsConvertToGuiThread() != STATUS_SUCCESS) {

return STATUS_INVALID_SYSTEM_SERVICE;

}

PsConvertToGuiThread()

{

if (PspW32ProcessCallout) {

/* In case of NT 3.51 this is code is never

* invoked, since PspW32ProcessCallout is

* always = 0

*/

/* This is only invoked for the later versions of the operating system

* Please refer to the next section for details

*/

Page 114: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

} else {

return STATUS_ACCESS_DENIED;

}

}

_KiErrormode (in Windows NT 4.0 and KiBBTEndUnexpectedRange in Windows 2000)

The code resembles _KiEndUnexpectedRange, except that the PspW32ProcessCallout

variable is always nonzero. Hence, the code in PsConvertToGuiThread proceeds further. It

performs several tasks; we now describe the one of immediate interest.

PsConvertToGuiThread allocates a block of memory and copies

KeServiceDescriptorTableShadow to the allocated block. Note that under NT 4.0 and Windows

2000, KeServiceDescriptorTableShadow contains two entries–one for KERNEL32 calls and

one for USER32 and GDI32 calls. After copying this, the code updates the TEB of the current

thread to point to this copy of KeServiceDescriptorTableShadow and then returns. This

happens only the first time a USER32 or GDI32 service is invoked. After this, all system

services, including KERNEL32 module, route through this new table, since the first entry in this

table already points to the SSDT and SSPT for the KERNEL32 functions.

KeServiceDescriptorTableShadow is not exported by the NTOSKRNL and therefore is a

nonaccessible table.

Under Windows NT 3.51, both KeServiceDescriptorTable and the Shadow Table point to the

same SSDT and SSPT and contain only one entry. Now, ask yourself this logical question:

“Why do we have the Shadow Table at all when apparently it does not provide much help in NT

3.51?” We attempt to answer this question later in the chapter.

Note: Note that once a process makes a USER32/GDI32 call, it permanently stops using the

original KeServiceDescriptorTable and switches entirely to a copy of

KeServiceDescriptorTableShadow.

ADDING NEW SYSTEM SERVICES

Adding new system services involve the following steps:

1. Allocate a block of memory large enough to hold existing SSDT and SSPT and the extensions

to each of the table.

2. Copy the existing SSDT and SSPT into this block of memory.

3. Append the new entries to the new copies of the two tables as shown in Figure 7-2.

4. Update KeServiceDescriptorTable and KeServiceDescriptorTableShadow to point to the newly

allocated SSDT and SSPT.

In NT 3.51, because the Shadow Table is never used, you could get away without having to

Page 115: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

update it. In NT 4.0 and Windows 2000, however, the Shadow Table takes a leading role once

a GDI32 or a USER32 call has been made. Therefore, it is important that you update both

KeServiceDescriptorTable and KeServiceDescriptorTableShadow. If you fail to update

KeServiceDescriptorTableShadow in NT 4.0 or Windows 2000, the newly added services will

fail to work once a GDI32 or USER32 call is made. We recommend that you update both the

tables in all versions of Windows NT so that you can use the same piece of code with all the

versions of the operating systems.

Figure 7-2 Adding new system services

One implementation issue in updating the KeServiceDescriptorTableShadow is that

NTOSKRNL does not export this table. However, NTOSKRNL does export

KeServiceDescriptorTable. So, how can you get the address of

KeServiceDescriptorTableShadow?

The method we used for this is as follows. There is a function in NTOSKRNL called

KeAddSystemServiceTable. This function is used by WIN32K.SYS driver for adding the

USER32 and GDI 32 related functions. This function does refer to

KeServiceDescriptorTableShadow. The first entry in both KeServiceDescriptorTable and

KeServiceDescriptorTableShadow is the same. We iterate through each DWORD in the

KeAddSystemServiceTable code, and for all valid addresses found in this function, we

compare the 16 bytes (size of one entry in descriptor table) at this address with the first entry in

KeServiceDescriptorTable. If we find the match, we consider that as the address of the

KeServiceDescriptorTableShadow. This method seems to work in all Windows NT versions.

Page 116: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

EXAMPLE OF ADDING A NEW SYSTEM SERVICE

This example consists of three modules. One device driver contains the code for new system

services and the mechanism of adding new system services to a Windows NT kernel. One

DLL represents an interface to new system services (just as NTDLL.DLL provides interface for

services called by KERNEL32.DLL). And one application links to this wrapper DLL and calls

the newly added services. The newly added services print a debug message saying, “kernel

service .... Called” and print the parameters passed to the services. Each service returns

values 0, 1, and 2. The function AddServices() isolates the code for the mechanism of adding

new system services.

Assuming first that the sample binaries are copied in C:\SAMPLES directory, here are the

steps to try out the sample:

1. Run “instdrv extndsys c:\samples \extndsys.sys.” This will install the extndsys.sys driver. The

driver will add three new system services to Windows NT Kernel.

2. Run MYAPP.EXE. This will call wrapper functions in MYNTDLL.DLL to call newly added system

services in EXTNDSYS.SYS.

#include "ntddk.h"

#include "stdarg.h"

#include "stdio.h"

#include "extnddrv.h"

#define DRIVER_SOURCE

#include "..\..\include\wintype.h"

#include "..\..\include\undocnt.h"

/* Prototypes for the services to be added */

NTSTATUS SampleService0(void);

NTSTATUS SampleService1(int param1);

NTSTATUS SampleService2(int param1, int param2);

/* TODO TODO TODO TODO

..............

..............

Add more to this list to add more services

*/

/* Table describing the new services */

unsigned int ServiceTableBase[]={(unsigned int)SampleService0,

(unsigned int)SampleService1,

(unsigned int)SampleService2,

/* TODO TODO TODO TODO

..............

..............

Page 117: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Add more to this list to add more services

*/

};

/* Table describing the parameter bytes required for the new services */

unsigned char ParamTableBase[]={0, 4, 8,

/* TODO TODO TODO TODO

..............

..............

Add more parameter bytes to this list to add more services

*/ };

unsigned int *NewServiceTableBase; /* Pointer to new SSDT */

unsigned char *NewParamTableBase; /* Pointer to new SSPT */

unsigned int NewNumberOfServices; /* New number of services */

unsigned int StartingServiceId;

NTSTATUS SampleService0(void)

{

trace(("Kernel service with 0 parameters called\n"));

return STATUS_SUCCESS;

}

NTSTATUS SampleService1(int param1)

{

trace(("Kernel service with 1 parameters called\n"));

trace(("param1=%x\n", param1));

return STATUS_SUCCESS+1;

}

NTSTATUS SampleService2(int param1, int param2)

{

trace(("Kernel service with 2 parameters called\n"));

trace(("param1=%x param2=%x\n", param1, param2));

return STATUS_SUCCESS+2;

}

/* TODO TODO TODO TODO

..............

..............

Add implementations of other services here

*/

unsigned int GetAddrssofShadowTable()

{

int i;

Page 118: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

unsigned char *p;

unsigned int dwordatbyte;

p=(unsigned char *)KeAddSystemServiceTable;

for (i=0; i<4096; i++, p++) {

__try {

dwordatbyte=*(unsigned int *)p;

}

__except (EXCEPTION_EXECUTE_HANDLER) {

return 0;

}

if (MmIsAddressValid((PVOID)dwordatbyte)) {

if (memcmp((PVOID)dwordatbyte,

&KeServiceDescriptorTable, 16)==0) {

if

((PVOID)dwordatbyte==&KeServiceDescriptorTable) {

continue;

}

DbgPrint("Shadow @%x\n", dwordatbyte);

return dwordatbyte;

}

}

}

return 0;

}

NTSTATUS AddServices()

{

PServiceDescriptorTableEntry_t KeServiceDescriptorTableShadow;

unsigned int NumberOfServices;

NumberOfServices=sizeof(ServiceTableBase)/sizeof(ServiceTableBase[0]);

trace(("KeServiceDescriptorTable=%x\n", &KeServiceDescriptorTable));

KeServiceDescriptorTableShadow=(PServiceDescriptorTableEntry_t)

GetAddrssofShadowTable();

if (KeServiceDescriptorTableShadow==NULL) {

return STATUS_UNSUCCESSFUL;

}

trace(("KeServiceDescriptorTableShadow=%x\n",

KeServiceDescriptorTableShadow));

NewNumberOfServices=KeServiceDescriptorTable.NumberOfServices

+NumberOfServices;

Page 119: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

StartingServiceId=KeServiceDescriptorTable.NumberOfServices;

/* Allocate sufficient memory to hold the existing services as well as

the services you want to add */

NewServiceTableBase=(unsigned int *) ExAllocatePool (PagedPool,

NewNumberOfServices*sizeof(unsigned int));

if (NewServiceTableBase==NULL) {

return STATUS_INSUFFICIENT_RESOURCES;

}

NewParamTableBase=(unsigned char *) ExAllocatePool(PagedPool,

NewNumberOfServices);

if (NewParamTableBase==NULL) {

ExFreePool(NewServiceTableBase);

return STATUS_INSUFFICIENT_RESOURCES;

}

/* Backup the exising SSDT and SSPT */

memcpy(NewServiceTableBase, KeServiceDescriptorTable.ServiceTableBase,

KeServiceDescriptorTable.NumberOfServices*sizeof(unsigned int));

memcpy(NewParamTableBase, KeServiceDescriptorTable.ParamTableBase,

KeServiceDescriptorTable.NumberOfServices);

/* Append to it new SSDT and SSPT */

memcpy(NewServiceTableBase+KeServiceDescriptorTable.NumberOfServices,

ServiceTableBase, sizeof(ServiceTableBase));

memcpy(NewParamTableBase+KeServiceDescriptorTable.NumberOfServices,

ParamTableBase, sizeof(ParamTableBase));

/* Modify the KeServiceDescriptorTableEntry to point to new SSDT and SSPT */

KeServiceDescriptorTable.ServiceTableBase=NewServiceTableBase;

KeServiceDescriptorTable.ParamTableBase=NewParamTableBase;

KeServiceDescriptorTable.NumberOfServices=NewNumberOfServices;

/* Also update the KeServiceDescriptorTableShadow to point to new SSDT and

SSPT */

KeServiceDescriptorTableShadow->ServiceTableBase=NewServiceTableBase;

KeServiceDescriptorTableShadow->ParamTableBase=NewParamTableBase;

KeServiceDescriptorTableShadow->NumberOfServices=NewNumberOfServices;

/* Return Success */

DbgPrint("Returning success\n");

return STATUS_SUCCESS;

}

NTSTATUS

DriverDispatch(

IN PDEVICE_OBJECT DeviceObject,

Page 120: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

IN PIRP Irp

);

VOID

DriverUnload(

IN PDRIVER_OBJECT DriverObject

);

NTSTATUS

DriverEntry(

IN PDRIVER_OBJECT DriverObject,

IN PUNICODE_STRING RegistryPath

)

{

MYDRIVERENTRY(L"extnddrv", FILE_DEVICE_EXTNDDRV, AddServices());

return ntStatus;

}

NTSTATUS

DriverDispatch(

IN PDEVICE_OBJECT DeviceObject,

IN PIRP Irp

)

{

PIO_STACK_LOCATION irpStack;

PVOID ioBuffer;

ULONG inputBufferLength;

ULONG outputBufferLength;

NTSTATUS ntStatus;

Irp->IoStatus.Status = STATUS_SUCCESS;

Irp->IoStatus.Information = 0;

irpStack = IoGetCurrentIrpStackLocation (Irp);

switch (irpStack->MajorFunction)

{

case IRP_MJ_DEVICE_CONTROL:

trace(("EXTNDDRV.SYS: IRP_MJ_CLOSE\n"));

switch (irpStack->Parameters.DeviceIoControl.IoControlCode)

{

case IOCTL_EXTNDDRV_GET_STARTING_SERVICEID:

trace(("EXTNDDRV.SYS:IOCTL_EXTNDDRV_GET_STARTING_SERVICEID\n"));

outputBufferLength = irpStack->Parameters.DeviceIoControl.OutputBufferLength;

if (outputBufferLength<sizeof(StartingServiceId)) {

Page 121: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Irp->IoStatus.Status = STATUS_INSUFFICIENT_RESOURCES;

} else

{

ioBuffer = (PULONG)Irp->AssociatedIrp.SystemBuffer;

memcpy(ioBuffer, &StartingServiceId, sizeof(StartingServiceId));

Irp->IoStatus.Information = sizeof(StartingServiceId);

}

break;

}

break;

}

ntStatus = Irp->IoStatus.Status;

IoCompleteRequest (Irp, IO_NO_INCREMENT);

return ntStatus;

}

VOID

DriverUnload(

IN PDRIVER_OBJECT DriverObject

)

{

WCHAR deviceLinkBuffer[] = L"\\DosDevices\\EXTNDDRV";

UNICODE_STRING deviceLinkUnicodeString;

RtlInitUnicodeString (&deviceLinkUnicodeString,

deviceLinkBuffer );

IoDeleteSymbolicLink (&deviceLinkUnicodeString);

IoDeleteDevice (DriverObject->DeviceObject);

trace(("EXTNDDRV.SYS: unloading\n"));

}

/* MYNTDLL.C

* This DLL is a wrapper around the new services

* added by the device driver. This DLL is like

* NTDLL.DLL which is a wrapper around KERNEL32.DLL

*/

#include <windows.h>

#include <stdio.h>

#include <winioctl.h>

#include "..\sys\extnddrv.h"

Page 122: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

typedef int NTSTATUS;

int ServiceStart;

__declspec(dllexport) NTSTATUS SampleService0(void)

{

_asm {

mov eax, ServiceStart

int 2eh

}

}

__declspec(dllexport) NTSTATUS

SampleService1(int param)

{

void **stackframe=&param;

_asm {

mov eax, ServiceStart

add eax, 1

mov edx, stackframe

int 2eh

}

}

__declspec(dllexport) NTSTATUS

SampleService2(int param1, int param2)

{

char **stackframe=&param1;

_asm {

mov eax, ServiceStart

add eax, 2

mov edx, stackframe

int 2eh

}

}

__declspec(dllexport) NTSTATUS

SampleService3(int param1, int param2, int param3)

{

char **stackframe=&param1;

_asm {

mov eax, ServiceStart

add eax, 3

Page 123: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

mov edx, stackframe

int 2eh

}

__declspec(dllexport) NTSTATUS

SampleService4(int param1, int param2,int param3, int param4)

{

char **stackframe=&param1;

_asm {

mov eax, ServiceStart

add eax, 4

mov edx, stackframe

int 2eh

}

}

__declspec(dllexport) NTSTATUS

SampleService5(int param1, int param2,int param3, int param4,int param5)

{

char **stackframe=&param1;

_asm {

mov eax, ServiceStart

add eax, 5

mov edx, stackframe

int 2eh

}

}

__declspec(dllexport) NTSTATUS

SampleService6(int param1, int param2,int param3, int param4,int param5, int

param6)

{

char **stackframe=&param1;

_asm {

mov eax, ServiceStart

add eax, 6

mov edx, stackframe

int 2eh

}

}

Page 124: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

BOOL SetStartingServiceId()

{

HANDLE hDevice;

BOOL ret;

hDevice = CreateFile (

"\\\\.\\extnddrv",

GENERIC_READ | GENERIC_WRITE,

0,

NULL,

OPEN_EXISTING,

FILE_ATTRIBUTE_NORMAL,

NULL

);

if (hDevice == ((HANDLE)-1))

{

MessageBox(0, "Unable to open handle to driver", "Error", MB_OK);

ret = FALSE;

}

else

{

DWORD BytesReturned;

ret=DeviceIoControl(

hDevice,

IOCTL_EXTNDDRV_GET_STARTING_SERVICEID,

NULL,

NULL,

&ServiceStart,

sizeof(ServiceStart),

&BytesReturned,

NULL);

if (ret) {

if (BytesReturned!=sizeof(ServiceStart)) {

MessageBox(0, "DeviceIoControl failed", "Error", MB_OK);

ret=FALSE;

} else {

ret = TRUE;

}

} else {

MessageBox(0, "DeviceIoControl failed","Error", MB_OK);

}

CloseHandle (hDevice);

}

Page 125: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

return ret;

}

BOOL WINAPI

DllMain(HANDLE hModule,

DWORD Reason,

LPVOID lpReserved)

{

switch (Reason) {

case DLL_PROCESS_ATTACH:

//

// We’re being loaded - save our handle

//

return SetStartingServiceId();

default:

return TRUE;

}

}

/* This is a sample console application that calls the newly added services. The

services are called through a wrapper DLL. The application simply prints the return

values from the newly added system services. */

#include <windows.h>

#include <stdio.h>

#include "..\dll\myntdll.h"

main()

{

printf("SampleService0 returned = %x\n",

SampleService0());

printf("SampleService1 returned = %x\n",

SampleService1(0x10));

printf("SampleService2 returned = %x\n",

SampleService2(0x10, 0x20));

return 0;

}

Device Drivers as a Means of Extending the Kernel versus Adding New System Services

Writing pseudo device drivers and providing the DeviceIoControl methods to the applications

can also extend the kernel. However, in this case, each application that wants to use the

DeviceIoControl has to open a handle to the device, issue the DeviceIoControl, and close the

device. Extending the kernel by means of system services has its distinct advantages; first and

foremost is that applications need not be aware of the device driver. Applications will just link to

Page 126: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

a DLL that provides an interface for the system services (just like NTDLL.DLL provides an

interface for KERNEL32.DLL). Further, DeviceIoContol proves much slower, especially if the

DeviceIoControl requires a large amount of data transfers between the application and the

device driver. By using this technique of adding system services, you might write a set of

system services and provide a user-level interface DLL that everybody can use. This

implementation looks cleaner and more standardized than the DeviceIoControl method.

KeAddSystemServiceTable

The WIN32K.SYS driver calls this function during its DriverEntry under Windows NT 4.0 and

Windows 2000. This function looks somehow odd. The function expects five parameters: an

index in the Service Descriptor Table where this new entry is to be added, SSDT, SSPT, the

number of services, and one parameter for use only in checked build versions. This last

parameter points to a DWORD Table that holds the value of the number of times each service

gets called.

NT 3.51 Design versus NT 4.0 and Windows 2000 Design: Microsoft’s Options

You might find it interesting to discover that the code manipulating the

KeServiceDescriptorTableShadow resides in all versions of Windows NT–the only difference is

that the code for allocating and copying the Shadow Table is not triggered under NT 3.51

based on the value of a PspW2ProcessCallout variable. This information might convince you

that the relocation of USER32 and GDI32 component into the NT 4.0 and Windows 2000

kernel (as contrasted with the NT 3.51 kernel) is not only performance based–as Microsoft

claims now–but something well thought out as an option when NT 3.51 was designed. This

leads us to believe that Microsoft had two solutions implemented for USER32 and GDI32

modules–the LPC-based solution of using the Win32 subsystem and the INT 2Eh-based

system service solution. Microsoft attempted the first solution under NT 3.51 and now settles

for the second solution in later versions of Windows NT. The partial code for both solutions

exists in NT 3.51, but there is no trace of the LPC solution for the Win32 subsystem under

versions later than NT 3.51. So, we can also conclude that the future releases of NT, unless

drastically different, will continue to use the INT 2Eh-based solution for WIN32K.SYS system

services.

SUMMARY

In this chapter, we discussed in detail the system service implementation of Windows NT. We

explored some code fragments from a system service interrupt handler, using

KiSystemService() as an example. Next, we detailed the mechanism for adding new system

services to the Windows NT kernel. We also used an example that adds three new system

services to the Windows NT kernel. We compared extending the kernel with device drivers

with extending the kernel by adding system services.

Page 127: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 8

Local Procedure Call

Page 128: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

A local procedure call (LPC) is the communication mechanism used by Windows NT

subsystems. This chapter introduces subsystems and then provides a detailed discussion on

the undocumented LPC mechanism.

MICROSOFT DESIGNED THE local procedure call (LPC) facility to enable efficient communication

with what Windows NT calls the subsystems. Although you do not need to know about

subsystems before understanding the LPC mechanism, it is certainly interesting and advisable.

In this chapter, we discuss the subsystems and then shed some light on the undocumented

LPC mechanism.

THE ORIGIN OF THE SUBSYSTEMS

Although Microsoft never stated what “NT” stood for, one popular theory suggests that it refers

to “New Technology.” That’s not to say everything that goes inside Windows NT is new.

Windows NT has borrowed several concepts from earlier operating systems. For example, the

NTFS (New Technology File System) borrows a lot from the HPFS (High Performance File

System) of IBM’s OS/2. The Win32 API itself is an extension of the Windows 16-bit API. The

Windows NT 3.51 user interface comes from Windows 3.1 and Windows NT 4.0 inherits its

interface from Windows 95. Windows 2000 (Beta 3) maintains more or less the same user

interface as Windows NT 4.0. In this section, we discuss the overall architecture of Windows

NT, which Microsoft borrowed from the MACH operating system, originally developed at

Carnegie Mellon University.

DOS and Unix variants dominated the operating systems world in the 1980s. DOS has a

monolithic architecture, composed of a single lump of code. Unix follows the layered

architecture, where the operating system divides into layers such that each layer uses only the

interface provided by the lower layers. The MACH operating system follows a new

client-server approach. The initial versions of MACH were based on BSD Unix 4.3.

The MACH team focused on two major goals. First, they wanted to have a more structured

code than BSD 4.3. Second, they wanted to support different variants of the Unix API. They

achieved both these goals by pushing the execution of kernel code to user-mode processes,

which acted as servers. The MACH kernel appears very small, providing only the basic system

services common to all Unix APIs. Therefore, we call it a micro-kernel. The server processes

run in user mode and provide a sophisticated API interface. The normal application processes

are clients of these server processes. When a client process invokes an API function, the

emulation library, which links with the client code, transparently passes on the call to the server

process. You can accomplish this using a facility similar to RPC (remote procedure call). The

server process, after carrying out any necessary processing, returns the results to the client.

To support a new API in the MACH environment, you need to write a server process and

Page 129: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

emulation library, which support the new API. Not all server processes provide a different API.

Some provide generic functionality such as memory management or TTY management.

The Windows NT design team sought goals similar to that of MACH’s developers. They

wanted to support Win32, OS/2, and POSIX APIs, while keeping room for future APIs.

Client-server architecture proved a natural choice.

The servers are called as the protected subsystems in Windows NT. Subsystems are

user-mode processes running in a local system security context. We call them protected

subsystems because they are separate processes operating in separate address spaces and

hence are protected from client access/modification. There are two types of subsystems:

§ Integral subsystems

§ Environment subsystems

Integral Subsystems

An integral subsystem performs some essential operating system task. For Windows NT, this

group includes the Local Security Authority (lsass.exe), the Security Accounts Manager, the

Session Manager (smss.exe), and the network server. The Local Security Authority (LSA)

subsystem manages security access tokens for users. The Security Accounts Manager (SAM)

subsystem maintains a database of information on user accounts, including passwords, any

account groups a given user belongs to, the access rights each user is allowed, and any

special privileges a given user has. The Session Manager subsystem starts and keeps track of

NT logon sessions and serves as an intermediary among protected subsystems.

Environment Subsystems

An environment subsystem is a server that appears to perform operating system functions for

its native applications by calling system services. An environment subsystem runs in user

mode and its interface to end-users emulates another operating system, such as OS/2 or

POSIX–on top of Windows NT. Even the Win32 API implements through a subsystem process

under Windows NT 3.51.

Note: Not all the API functions in the client-side DLLs need to pass the call to the subsystem

process. For example, most of the KERNEL32.DLL calls can directly map onto the system services

provided by the kernel. Such API functions invoke the system services via NTDLL.DLL. Most of the

USER32.DLL functions and GDI32.DLL functions pass on the call to the subsystem process. (In

Windows NT 4.0, Microsoft moved the Win32 subsystem inside the kernel for performance

reasons.)

The system call interface provided by the Windows NT kernel is called as the native API. The

Win32 subsystem uses the native API for implementing the Win32 API. Generally, user

programs make calls to an API provided by some subsystem, avoiding the use of a

cumbersome, native API. We refer to the user programs as the clients of the subsystem that

provides the API used by these programs.

Page 130: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The communication between the client processes and the subsystem happens through a

mechanism called local procedure call (LPC), specially designed by Microsoft for that purpose.

For unknown reasons, Microsoft prefers to keep the LPC interface undocumented. There is no

reason why LPC cannot function as an Inter-Process Communication (IPC) mechanism.

Microsoft provides a RPC kit for client-server communication across machines. Windows NT

optimizes the RPCs by converting them to LPCs, in case the client and the server reside on

the same machine. However, RPC has its own overheads. LPC proves most efficient in the

raw form, and the subsystems also use it in that form only. Apart from that, RPC does not

provide access to the fastest form of LPC–the Quick LPC. For these reasons, we provide you

with useful information on the LPC interface.

LOCAL PROCEDURE CALL

In Windows NT, client-subsystem communication happens in a fashion similar to that in the

MACH operating system. Each subsystem contains a client-side DLL that links with the client

executable. The DLL contains stub functions for the subsystem’s API. Whenever a client

process–an application using the subsystem interface–makes an API call, the corresponding

stub function in the DLL passes on the call to the subsystem process. The subsystem process,

after the necessary processing, returns the results to the client DLL. The stub function in the

DLL waits for the subsystem to return the results and, in turn, passes the results to the caller.

The client process simply resembles calling a normal procedure in its own code. In the case of

RPC, the client actually calls a procedure sitting in some remote server over the

network–hence the name remote procedure call. In Windows NT, the server runs on the same

machine; hence the mechanism is called as a local procedure call.

There are three types of LPC. The first type sends small messages up to 304 bytes. The

second type sends larger messages. The third type of LPC is called as Quick LPC and used

by the Win32 subsystem in Windows NT 3.51.

The first two types of LPC use port objects for communication. Ports resemble the sockets or

named pipes in Unix. A port is a bidirectional communication channel between two processes.

However, unlike sockets, the data passed through ports is not streamed. The ports preserve

the message boundaries. Simply put, you can send and receive messages using ports. The

subsystems create ports with well-known names. The client processes that need to invoke

services from the subsystems open the corresponding port using the well-known name. After

opening the port, the client can communicate, with the server, over the port.

Short Message Communication

The client-subsystem communication via a port happens as follows. The server/subsystem

creates a port using the NtCreatePort() function. The name of the port is well published and

known to the clients (or, rather, to the client-side DLL). The NtCreatePort() function returns a

port handle used by the subsystem to wait and accept requests using the NtListenPort()

Page 131: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

function. Any process can send connection requests on this port and get a port handle for

communication. The subsystem receives the request messages, processes them, and sends

back the replies over the port to the client.

The client sends a connection request to a waiting subsystem using the NtConnectPort()

function. When the subsystem receives the connect request, it comes out of the NtListenPort()

function and accepts the connection using the NtAcceptConnectPort() function. The

NtAcceptConnectPort returns a new port handle specific to the client requesting the

connection. The server can break the communication link with the particular client by closing

this handle. The subsystem completes the connection protocol using the

NtCompleteConnectPort() function. Now, the client also returns from the NtConnectPort()

function and gets a handle to the communication port. This handle is private to the client

process. The child processes do not inherit the port handles so the children need to open the

subsystem port again.

After completing this connection protocol, the client and the subsystem can start

communicating over this port. The client sends a request to the subsystem using the

NtRequestPort() function. When the NtRequestPort() function sends datagram messages to

the subsystem, the client does not receive any acknowledgment for the sent messages. In

case the client expects a reply to its request, the client can use the NtRequestWaitReplyPort()

function, which sends the request to the subsystem and waits for a reply from the subsystem.

The subsystem receives request messages using the NtReplyWaitReceive() function and

sends reply messages using the NtReplyPort() function. The subsystem can optimize by

replying to the previous request and waiting for the next request using a single call to the

NtReplyWaitReceivePort() function. Figure 8-1 displays this entire process of communication.

Page 132: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Figure 8-1 Steps in communication using the Port object

A subsystem may receive/reply to messages from more than one client using the same port.

The message contains fields, which identify the client process and thread. The kernel fills in

the process ID and the thread ID in the messages. Therefore, the subsystems can rely on this

information, and the LPC forms a secure and reliable communication mechanism because the

sender of the messages can be reliably identified.

Shared Section Communication

You can send only short messages–up to 304 bytes–via ports. You need to use a shared

region of memory for passing larger messages. If clients want to pass messages via shared

memory, they have to do some extra processing before calling NtConnectPort(). A client

creates a section object of required size, using CreateFileMapping()–a documented function.

The size of the message is restricted only by the size of the section. The client need not map

the section onto the address space; the port connection procedure takes care of that. But the

client has to pass the section handle to the NtConnectPort() call. The function returns the

addresses where the section is mapped in the client’s as well as the server’s address spaces.

Now, whenever the client wants to invoke the server, it simply copies the parameters to the

shared section and sends a message over the port. This message simply acts as an indication

of the client request because the actual parameters pass via the shared section.

Generally, as a part of the port message, the client specifies the server space address of the

shared section and the offset of the copied parameters within the shared section. If the server

uses this information, it should first validate it if the client process proves unreliable. After

processing the request, the server also sends back the results via the shared section. Apart

from the additional processing, the shared section LPC essentially uses the same set of port

APIs as the short message communication. The sequence of operations also resembles that

of the short message communication with one exception–in addition to handling the message

port, the client must create the shared section and perform the parameter copying. The

sequence of operations shown in Figure 8-1 applies to the shared section LPC as well.

PORT-RELATED FUNCTIONS

In this section, we discuss the port-related functions and parameters passed to them in detail.

We prepared sample programs demonstrating short message passing and shared section

memory message passing. We discuss these programs next.

NtCreatePort

int _stdcall

NtCreatePort(

PHANDLE PortHandle,

POBJECT_ATTRIBUTES ObjectAttributes,

DWORD MaxConnectInfoLength,

DWORD MaxDataLength,

DWORD Unknown);

This function creates a new port for communication. The name of the port and the parent

Page 133: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

directory in the object hierarchy pass through the ObjectAttributes parameter. The

MaxConnectInfoLength parameter specifies the maximum size of information that can pass on

to a connection request. (Later in this section, we discuss the connection information.) The

MaxDataLength parameter is the maximum size of the message that can pass through the port.

Both these parameters are ignored. The operating system always sets the connection

information length to 260 bytes and the data length to 328 bytes, which are the maximum

allowed values for these parameters. Just make sure that you pass values less than the

maximum allowed values because the function returns an error otherwise. The unknown fifth

parameter can pass as zero. A handle to the newly created port returns in PortHandle. The

server process uses this port handle to accept connection requests from clients.

NtConnectPort

int _stdcall

NtConnectPort(

PHANDLE PortHandle,

PUNICODE_STRING PortName,

PVOID Unknown1,

LPCSECTIONINFO sectionInfo,

PLPCSECTIONMAPINFO mapInfo,

PVOID Unknown2,

PVOID ConnectInfo,

PDWORD pConnectInfoLength);

The client uses this function to establish LPC communication with the server. The name of the

port to connect to is specified as a Unicode string in the PortName parameter. Th e second

parameter, unknown at this time, cannot pass as NULL because the function fails the

validation checks otherwise. The third parameter operates only when you use the shared

section LPC. It is a pointer to a structure, described as follows:

typedef struct LpcSectionInfo {

DWORD Length;

HANDLE SectionHandle;

DWORD Param1;

DWORD SectionSize;

DWORD ClientBaseAddress;

DWORD ServerBaseAddress;

} LPCSECTIONINFO, *PLPCSECTIONINFO;

The Length field in this structure specifies the size of the structure; it is always set to 24. The

caller of this function–the client–fills the SectionHandle and SectionSize fields, apart from the

Length. The CreateFileMapping() function can create a shared section of required size. Upon

return from the NtConnectPort() function, the ClientBaseAddress and ServerBaseAddress

fields, in the LPCSECTIONINFO structure, contain the addresses where the section is

mapped in the client address space and the server address space, respectively.

The next parameter to the NtConnectPort() function–mapInfo–also functions only for the

shared section LPC. This parameter is a pointer to a structure described as follows:

Page 134: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

typedef struct LpcSectionMapInfo

{

DWORD Length;

DWORD SectionSize;

DWORD ServerBaseAddress;

} LPCSECTIONMAPINFO, *PLPCSECTIONMAPINFO;

This structure duplicates the information in the LPCSECTIONINFO structure. The client needs

to fill only the Length field, which it always sets to 12–the size of the structure. We have not

been able to decipher the significance of passing this structure to the NtConnectPort() function.

Still, you have to pass a valid structure; if you pass a NULL pointer, the function fails. We have

observed that the two members of the structure, namely, SectionSize and ServerBaseAddress,

zero out on return from the function.

We do not know the next parameter sent to the NtConnectPort() function, so set it as NULL.

The client can send some information to the server with the connection request. The server

receives this information via the LPC message, which it gets from the

NtReplyWaitReceivePort() function in case of a connection request. The ConnectInfo

parameter points to this connection information. The size of the connection information passes

through the pConnectInfoLength parameter that is a pointer to a double word. The server, also,

can send back some information to the client at connection time. This information returns in

the same ConnectInfo buffer, and the pConnectInfoLength is set to indicate the length of the

returned connection information.

NtReplyWaitReceivePort

int _stdcall

NtReplyWaitReceivePort(

HANDLE PortHandle,

PDWORD Unknown,

PLPCMESSAGE pLpcMessageOut,

PLPCMESSAGE pLpcMessageIn);

This function is used by the server side of LPC to receive requests from clients and reply to

them. The first parameter is the port handle obtained from the NtCreatePort() function. The

second parameter, currently unknown, can be passed as NULL. The third parameter is the

message that serves as a reply to the previous client request. This parameter can be NULL, in

which case the function simply accepts a request from the client. The fourth parameter, a

pointer to a LpcMessage structure, fills, on return from the function, with the request

information. Both the third and the fourth parameters are pointers to the LpcMessage structure,

which we display here.

typedef struct LpcMessage {

/* LPC Message Header */

WORD ActualMessageLength;

WORD TotalMessageLength;

DWORD MessageType;

Page 135: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

DWORD ClientProcessId;

DWORD ClientThreadId;

DWORD MessageId;

DWORD SharedSectionSize;

BYTE MessageData[MAX_MESSAGE_DATA];

} LPCMESSAGE, *PLPCMESSAGE;

The ActualMessageLength field is set to the size of the actual message stored in the

MessageData field, whereas the TotalMessageLength is set to the size of the entire

LpcMessage structure along with the MessageData. The system, not the client-server, sets the

MessageType field. There are several message types. We detail the important ones:

LPC_REQUEST The server receives this type of message when a client

sends a request using the NtRequestWaitReplyPort()

function. The server should reply to this message using the

NtReplyPort() function or the NtReplyWaitReceivePort()

function. The server should not reply to any messages

other than the LPC_REQUEST messages. The

NtRequestWaitReplyPort() function waits until it gets the

reply from the server and then returns the reply message to

the client. Effectively, the client thread that calls the

NtRequestWaitReplyPort() function hangs if the server

does not send a reply message.

LPC_REPLY The client receives this type of message from the

NtRequestWaitReplyPort() function, when the server

replies to the request.

LPC_DATAGRAM The server receives this type of message when a client

sends a request using the NtRequestPort() function. As the

name of the message type implies, the client does not get

a reply from the server for this kind of message. If the

server tries to reply to this message using the

NtReplyPort() function or the NtReplyWaitReceivePort()

function, the function fails and returns an error.

LPC_PORT_CLOSED The server receives this type of message when a client

closes the port handle. If a client dies without closing the

port handle, the operating system closes the handle on

behalf of the client. Thus, the server gets the

LPC_PORT_CLOSED message in any case and can use it

to free the per-client resources it allocates.

LPC_CLIENT_DIED The server receives this type of message when a client

dies. Refer to the description of the

NtRegisterThreadTerminatePort() function for more

information.

Page 136: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

LPC_CONNECTION_REQUEST The corresponding server receives this type of message

when a client tries to connect to a port using the

NtConnectPort() function.

The next fields in the LpcMessage structure are set, by the system, to the client’s process ID

and thread ID, respectively. The next field is set to the unique message ID generated by the

system. The server can rely on these fields because the operating system, not the client, sets

them. These fields do not make sense in the messages received by the client and therefore

are set to zero in the messages returned by the NtRequestWaitReplyPort() function.

Only the shared section LPC uses the SharedSectionSize field. The system sets this field to

the size of the shared section when it passes a LPC_CONNECTION_REQUEST type of

message to the server.

The last field is the actual message and is a variable length field. The client-server can choose

to allocate only enough memory space to hold the structure parameters and the actual

message. When passing a pointer to this structure for receiving a message, you must allocate

enough memory space to fit the message the process can send at the other end of the port. If

you fail to do it, you will receive an “Invalid Access” or similar kind of fault. To be on the safer

side, you should always allocate for the maximum-sized message while passing a pointer for

receiving a message.

NtAcceptConnectPort

int _stdcall

NtAcceptConnectPort(

PHANDLE PortHandle,

DWORD Unknown1,

PLPCMESSAGE pLpcMessage,

DWORD acceptIt,

DWORD Unknown3,

PLPCSECTIONMAPINFO mapInfo);

Whenever the server receives a connection request, it follows a connection establishment

procedure by first calling the NtAcceptConnectPort() function and then the

NtCompleteConnectPort() function. This sequence of operations establishes a communication

channel between the client and the server. The client end of the channel represents the handle

that it gets from the NtConnectPort() function. The first parameter to NtAcceptConnectPort() is

a port handle pointer set to another handle to the message port on return. This handle is the

server-side end of the communication channel, although the server can use the handle

returned from the NtCreatePort() function to accept requests from all clients. The server can

close the handle, returned by the NtAcceptConnectPort() function, when it no longer wants to

accept requests using the particular communication channel. Any further requests by the client

on a closed communication channel will fail.

We have not been able to decipher the second parameter–generally set to zero. The third

parameter is the LPC message returned to the client as the connection information from the

Page 137: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

server. The fourth parameter, named acceptIt, is passed as 0 if the server cannot accept the

connection request. The server passes acceptIt as a nonzero value if it can accept the

connection request. The fifth parameter, not deciphered yet, can be set to zero. The last

parameter is a pointer to the LpcSectionMapInfo structure, which fills with appropriate data

upon return. We already explained the members of this structure. This structure supplies

shared-section information for future use by the server for communicating with the client.

NtCompleteConnectPort

int _stdcall

NtCompleteConnectPort(HANDLE PortHandle);

The server finishes the connection procedure with the NtCompleteConnectPort() function. The

only parameter to this function is the port handle returned by the previous call to the

NtAcceptConnectPort() function. The client waits in the NtCon-nectPort() function until the

server completes the connection procedure by calling the NtCompleteConnectPort() function.

NtRequestWaitReplyPort

int _stdcall

NtRequestWaitReplyPort(HANDLE PortHandle,PLPCMESSAGE pLpcMessageIn,

PLPCMESSAGE pLpcMessageOut);

The client uses this function to send a request and wait for a reply to/from the server. The first

parameter is the port handle obtained via a previous call to the NtConnectPort() function. The

pLpcMessageIn parameter is a pointer to a LPC request message sent to the server. The last

parameter is a pointer to another LPC message structure that fills with the reply message from

the server, on return from the function.

NtListenPort

int _stdcall

NtListenPort(HANDLE PortHandle,PLPCMESSAGE pLpcMessage);

This very small function internally uses the NtReplyWaitReceivePort() function. Here we

present the pseudocode of this function:

NtListenPort(HANDLE PortHandle,PLPCMESSAGE pLpcMessage)

{

while(1) {

rc = NtReplyWaitReceivePort(

PortHandle,

NULL,

NULL

pLpcMessage);

if (rc == 0)

if(pLpcMessage->MessageType ==LPC_CONNECTION_REQUEST)

break;

}

Page 138: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

return rc;

}

As you can see from this pseudocode, the NtListenPort() function ignores all messages except

connection requests. You cannot use this function if servicing multiple clients. While servicing

multiple clients, a server gets a mix of connection requests and other client requests. The

server needs to sort out the connection requests from the other requests and perform

appropriate processing. If only a single client can connect at a time, the server can get the

connection request using the NtListenPort() function and then start a loop to accept and

process other client requests.

NtRequestPort

int _stdcall NtRequestPort(HANDLE PortHandle,PLPCMESSAGE pLpcMessage);

This function just sends a message on the port and returns. The server thread waiting on this

port gets the message and does the required processing. The server thread need not return

the results to the caller. In this case, the message type in the header is LPC_DATAGRAM. A

message sent using this function resembles a datagram in the sense that the sender does not

receive an acknowledgment.

NtReplyPort

int _stdcall

NtReplyPort(HANDLE PortHandle, PLPCMESSAGE pLpcMessage);

The server uses this function if it wants to send a reply to the client and does not want to be

blocked for the next request from the client. The first parameter to this function is the port

handle, and the second parameter is the reply message sent to the client.

NtRegisterThreadTerminatePort

int _stdcall NtRegisterThreadTerminatePort(HANDLE PortHandle);

If a client calls this function after connecting to a port, then the operating system sends the

LPC_CLIENT_DIED message to the server when the client dies. Even if the client closes the

port handle and keeps running, the system maintains a reference to the port. Therefore, the

operating system sends the LPC_PORT_CLOSED message after the LPC_CLIENT_DIED

message and not after the client closes the port handle.

NtSetDefaultHardErrorPort

int _stdcall NtSetDefaultHardErrorPort(HANDLE PortHandle);

The CSRSS subsystem calls this function during its initialization. The NtRaiseHardError()

function, called in case of serious system errors, sends a message to the registered hard error

port. Hence, the CSRSS subsystem can pop up the message when application startup

problems appear. The kernel houses only one set of global variables. These variables store

the pointer to the hard error port so only one process can capture system errors. On Windows

NT, this happens to be the Win32 subsystem. Calling this function requires special privilege.

Page 139: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Here, we present the pseudocode for this function:

NtSetDefaultHardErrorPort(HANDLE PortHandle)

{

if (PrivilegeNotHeld) return STATUS_PRIVILEGE_NOT_HELD);

if (ExReadyForErrors == 0) {

Get a pointer to the kernel port object from PortHandle;

ExpDefaultErrorPort =pointer to kernel port object;

ExpDefaultErrorPortProcess = CurrentProcess;

ExReadyForErrors = 1;

} else {

return STATUS_UNSUCCESSFUL

}

return STATUS_SUCCESS;

}

NtImpersonateClientOfPort

int _stdcall

NtImpersonateClientOfPort(

HANDLE PortHandle,

PLPCMESSAGE pLpcMessage);

A subsystem may need to perform some processing in the security context of the calling

thread. The NtImpersonateClientOfPort() function enables the server thread to assume the

security context of the client thread. The function uses the pLpcMessage parameter to identify

the process ID and thread ID of the client thread.

LPC SAMPLE PROGRAMS

In this section, we present two sample programs. The first program demonstrates the short

message communication using LPC, and the second program demonstrates the

communication using shared memory.

On the CD: The sample program can be found in the PORT.C file on the accompanying CD-ROM.

The data prototypes and structure definitions for port-related functions can be found in

UNDOCNT.H, which is also on the CD-ROM.

Short Message LPC Sample The PORT.C file contains the program that acts as both the client and the server for

demonstrating short message communication. When the program is invoked without any

parameters, it acts as the server. If invoked with some parameter, it acts as a client (the

parameter is a dummy parameter and gets ignored). You should start the program in server

Page 140: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

mode first. The server-mode program first creates a port and then loops into a “receive

request–process request–reply request” sequence. It uses the NtReplyWaitReceivePort()

function to accept requests. The connection requests are treated differently than other

requests. In case of a connection request, the server thread has to accept the connection and

complete the connection sequence. For requests, other than the connection request, the

server prints the message, inverts all the bytes in the message, and sends this inverted

message back as the reply.

Once the server is ready to accept connections, you can run another instance of the

program–this time in client mode. The client-mode program connects to the port created by the

server-mode instance. It first demonstrates the use of the NtRequestPort() function to send a

datagram. Then, the client sends a request and waits for a reply in a loop. You can start

multiple client sessions; the server portion of the program can handle multiple client requests.

We list and explain the PORT.C file in this section.

Listing 8-1: PORT.C

/***************************************************/

/* Demonstrates the short message LPC provided by the port object */

#include <windows.h>

#include <stdio.h>

#include "undocnt.h"

#include "print.h"

#define PORTNAME L"\\Windows\\MyPort"

Apart from regular header inclusions, the initial portion of the PORT.C file has the definition of

the name of the message port used by the sample program. It is a complete path name

starting from the root of the object directory. Note that the wide character set is used instead of

the normal ASCII character set because we are directly invoking the system services and the

system services understand only the Unicode character set.

/* A real server function would do some meaningful processing here. As we are

writing just a sample server, we have a dummy server function that just inverts

all the bytes in the message */

void ProcessMessageData(PLPCMESSAGE pLpcMessage)

{

DWORD *ptr;

DWORD i;

ptr = (DWORD *)(pLpcMessage->MessageData);

for(i=0;i<pLpcMessage->ActualMessageLength/sizeof(DWORD);i++) {

ptr[i] = ~ptr[i];

}

return;

}

This is a dummy processing function on the server side. This function is passed the LPC

request message, received by the server. The function should return the reply message in the

same memory space. As the comment says, the function simply inverts all the bytes in the

Page 141: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

message. Because we only want to demonstrate the working of the LPC, we do not provide

any intricate server functionality. You can modify this function to implement the functionality

provided by your server.

BOOL

ProcessConnectionRequest(

PLPCMESSAGE LpcMessage,

PHANDLE pAcceptPortHandle)

{

HANDLE AcceptPortHandle;

int rc;

*pAcceptPortHandle=NULL;

printf("Got the connection request\n");

PrintMessage(LpcMessage);

ProcessMessageData(LpcMessage);

rc = NtAcceptConnectPort(

&AcceptPortHandle,

0,

LpcMessage,

1,

0,

NULL);

if (rc != 0) {

printf("NtAcceptConnectPort failed, rc=%x\n", rc);

return FALSE;

}

printf("AcceptPortHandle=%x\n", AcceptPortHandle);

rc = NtCompleteConnectPort(AcceptPortHandle);

if (rc != 0) {

CloseHandle(AcceptPortHandle);

printf("NtCompleteConnectPort failed, rc=%x\n",rc);

return FALSE;

}

*pAcceptPortHandle = AcceptPortHandle;

return TRUE;

}

The server part of the program calls this function when it receives a connection request from

the client. This function receives the message containing the connection request and returns

the port handle specific to the client. The function first prints the message then calls the

ProcessMessageData() function. As described earlier, the message data in a connection

request consists of nothing but the ConnectInfo passed to the NtConnectPort() function by the

client.

Page 142: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The ProcessConnectionRequest() function starts the real work by calling the

NtAcceptConnectPort() function. The only parameter of significance to this function is the

message containing the connection request. The function returns a handle to the port in the

AcceptPortHandle parameter. This function returns a nonzero value if it fails. If the function

succeeds, the ProcessConnectionRequest() function calls the NtCompleteConnectPort()

function, which accepts the port handle returned by the NtAcceptConnectPort() function as the

parameter. The NtCompleteConnectPort() function also returns a zero on success and a value

other than zero on failure.

In this function, we accept all the connection requests. You may want to modify this function to

selectively accept connection requests. For example, you might permit the connection only for

certain users or only if the client provides certain connection information. If your server can

accept only a single client at a time, you need to reject all further connection requests. As

described earlier, you can reject connection requests by passing the acceptIt parameter as

zero.

BOOL ProcessLpcRequest(

HANDLE PortHandle,

PLPCMESSAGE LpcMessage)

{

int rc;

printf("Got the LPC request\n");

PrintMessage(LpcMessage);

ProcessMessageData(LpcMessage);

rc = NtReplyPort(PortHandle, LpcMessage);

if (rc != 0) {

printf("NtReplyPort failed, rc=%x\n", rc);

return FALSE;

}

return TRUE;

}

In this program, we chose to use two function calls to reply to a message and receive the next

message, instead of using a single call to the NtReplyWaitReceive() function. The

ProcessLpcRequest() function, a small utility function, prints the received message, processes

it (inverts the bytes by calling the ProcessMessageData() function), and sends back the

processed data as the reply using the NtReplyPort message.

int server(OBJECT_ATTRIBUTES *ObjectAttr)

{

BOOL RetVal;

HANDLE PortHandle;

int rc;

Page 143: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

LPCMESSAGE LpcMessage;

/* Create the named port */

rc = NtCreatePort(&PortHandle, ObjectAttr,0x0, 0x0, 0x00000);

if (rc != 0) {

printf("Error creating port, rc=%x\n", rc);

return -1;

}

printf("Port created, PortHandle=%d\n", PortHandle);

memset(&LpcMessage, 0, sizeof(LpcMessage));

while (1) {

HANDLE AcceptPortHandle;

/* Wait for the message on the port*/

rc = NtReplyWaitReceivePort(PortHandle,

NULL,

NULL,

&LpcMessage);

if (rc != 0) {

printf("NtReplyWaitReceivePort failed");

CloseHandle(PortHandle);

return -1;

}

RetVal = TRUE;

switch (LpcMessage.MessageType) {

case LPC_CONNECTION_REQUEST:

RetVal = ProcessConnectionRequest(&LpcMessage,&AcceptPortHandle);

break;

case LPC_REQUEST:

RetVal = ProcessLpcRequest(

PortHandle,

&LpcMessage);

break;

default:

PrintMessage(&LpcMessage);

break;

}

if (RetVal == FALSE) {break;}

}

return 0;

}

As described earlier, the same LPC demonstration program acts as the server and the client.

The main() function calls the server() function when the program is invoked without any

Page 144: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

parameters. The server() function is passed a pointer to the OBJECT_ATTRIBUTES structure

that contains the object name of the communication port. The function creates a port with this

name, upon which it gets back a handle to the port. As described earlier, the

MaxConnectInfoLength and MaxDataLength parameters to the NtCreatePort() function are

ignored so we simply pass them as zero. The NtCreatePort() function returns a zero on

success and a nonzero value on failure.

After successful creation of the port, the server() function goes into a receive -process-reply

loop. The function uses the NtReplyWaitReceivePort() function to receive requests from clients.

Since we use this function only to receive requests, the pLpcMessageOut parameter passes

as NULL. The NtReplyWaitReceivePort() function returns zero on success, and the

pLpcMessageIn contains the client request. This request can take the form of a

LPC_CONNECTION_REQUEST, a LPC_DATAGRAM, a LPC_REQUEST, and so on. The

server processes each type of requests differently. It processes the

LPC_CONNECTION_REQUEST by performing the connection protocol. It accomplishes this

by calling the ProcessConnectionRequest() function. With a LPC_REQUEST message, the

server needs to do the requested processing and reply to the request. Since we are not

implementing any significant functionality in the server, we just print the message, invert the

message bytes, and return a reply. We do this in the ProcessLpcRequest() function. For

LPC_DATAGRAM messages, a reply is not expected. These messages and all other

messages, including LPC_PORT_CLOSED and LPC_CLIENT_DIED, are handled in the

default case of the switch statement. A real server may need to perform different processing

for these messages. For example, a real server might free up per-client resources on receiving

a LPC_PORT_CLOSED message.

The server side of the program continuously loops, receiving-processing-replying the client

requests. We did not program an exit for the server part. This is generally the case with servers,

and that’s the reason why they are called daemons in Unix terminology. Generally, servers

start up with the system boot and continue processing client requests until the system shuts

down. With our server, you can kill it by pressing Ctrl+C in the command window or by using

the Task Manager.

int client(UNICODE_STRING *uString)

{

static int Param3;

HANDLE PortHandle;

DWORD ConnectDataBuffer[] = {0, 1, 2, 3, 4, 5};

int Size = sizeof(ConnectDataBuffer);

DWORD i;

DWORD Value=0xFFFFFFFF;

int rc;

LPCMESSAGE LpcMessage;

DWORD *ptr;

printf("ClientProcessId=%x, ClientThreadId=%x\n",GetCurrentProcessId(),

Page 145: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

GetCurrentThreadId());

rc = NtConnectPort(&PortHandle,

uString,

&Param3,

0,

0,

0,

ConnectDataBuffer,

&Size);

if (rc != 0) {

printf("Connect failed, rc=%x\n", rc);

return -1;

}

printf("Connect success, PortHandle=%d\n", PortHandle);

for (i = 0; i < Size/sizeof(DWORD); i++) {

printf("%x ", ConnectDataBuffer[i]);

}

printf("\n\n");

rc = NtRegisterThreadTerminatePort(PortHandle);

if (rc != 0) {

printf("Unable to register thread termination port\n");

CloseHandle(PortHandle);

return -1;

}

/* Demonstrates how to send a datagram using NtRequestPort */

memset(&LpcMessage, 0, sizeof(LpcMessage));

LpcMessage.ActualMessageLength=0x08;

LpcMessage.TotalMessageLength=0x20;

ptr=(DWORD *)LpcMessage.MessageData;

ptr[0]=0xBABABABA;

ptr[1]=0xCACACACA;

rc=NtRequestPort(PortHandle, &LpcMessage);

while (1) {

/* Fill in the message */

memset(&LpcMessage, 0, sizeof(LpcMessage));

LpcMessage.ActualMessageLength=0x08;

LpcMessage.TotalMessageLength=0x20;

Page 146: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

ptr = (DWORD *)LpcMessage.MessageData;

ptr[0] = Value;

ptr[1] = Value-1;

printf("Stop sending message (Y/N)? ");

fflush(stdin);

if (toupper(getchar()) == ’Y’) {

CloseHandle(PortHandle);

break;

}

/* Send the message and wait for the reply */

printf("Sending request/waiting for reply");

rc = NtRequestWaitReplyPort(PortHandle,

&LpcMessage,

&LpcMessage);

if (rc != 0) {

printf("NtRequestWaitReplyport failed, rc=%x\n",rc);

return -1;

}

/* Print the reply received */

printf("Got the reply\n");

PrintMessage(&LpcMessage);

Value -= 2;

}

return 0;

}

The client() function implements the client-side portion of the LPC sample. The function prints

the process ID and the thread ID; you can match it with the process ID and thread ID printed

from the messages received by the server.

The client() function starts its job by connecting to the port created by the server process. It

passes six double words as the connectInfo. You can verify that the server receives these

words as the message data with the LPC_CONNECTION_REQUEST. Upon return from the

NtConnectPort() function, the client gets a handle to the port. Also, the connectInfo buffer fills

with the data message passed to the NtAcceptConnectPort() function by the server.

Further, the client calls the NtRegisterThreadTerminatePort() function, with the newly acquired

port handle as the parameter, so that the operating system sends a LPC_CLIENT_DIED

message over the port when the client terminates. The client calls this function only if the

server needs to know about the client death. We call this function here to demonstrate the

mechanism.

The client also demonstrates the datagram communication via the LPC. As described earlier,

Page 147: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

the NtRequestPort() function passes LPC_DATAGRAM type requests. Note that the client fills

in only the message length fields and the actual message data; the operating system fills in the

remaining fields in the LPCMESSAGE structure before the message passes to the server. The

client() function sends two double words as the message data, which the server prints upon

reception of the message.

After demonstrating the datagram communication, the client goes in a “send request ?wait for

reply” loop. Every time, before sending the request, it asks the user whether to continue or quit.

If the user wants to continue with the demonstration, the client sends a sample request over

the port using the NtRequestWaitReply() function. The message data consists of two double

words inverted by the server and sent back as the reply. The NtRequestWaitReply() function

returns to the client after it gets the reply message from the server. In this program, we used

the same buffer to pass the request message and to receive the reply message. You can use

different buffers for this purpose.

main(int argc, char **argv)

{

OBJECT_ATTRIBUTES ObjectAttr;

UNICODE_STRING uString;

int rc;

/* Initializes the object attribute structure */

memset(&ObjectAttr, 0, sizeof(ObjectAttr));

ObjectAttr.Length = sizeof(ObjectAttr);

RtlInitUnicodeString(&uString, PORTNAME);

ObjectAttr.ObjectName = &uString;

if (argc == 1) {

/* If no parameters are specified for the program, act as the server */

rc = server(&ObjectAttr);

} else {

/* If any command line parameter is specified it acts as the client */

rc = client(&uString);

}

return rc;

}

The main() function simply represents the control function that calls either the server part or

the client part depending on whether the user specifies a parameter. Before passing on the

control to one of these functions, the main() function initializes a UNICODE_STRING and an

OBJECT_ATTRIBUTES structure with the port name. These pass as parameters to the server()

and client() functions.

Page 148: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Apart from the PORT.C file, the sample program contains a PRINT.C file and a PRINT.H file.

The PRINT.C file contains utility routines to print the LPCMESSAGE structure, and the

PRINT.H file contains the prototypes for these functions.

Shared Section LPC Sample

The following program demonstrates the LPC using shared memory. The program resembles

the one demonstrating short message LPC, except that it uses a shared memory to pass

parameters and get results to/from the server. This sample program uses the same PORT.H

file, used by the short message LPC sample. The SSLPC.C file in this sample program

replaces the PORT.C file from the earlier sample program. The SSLPC.C file contains the

server code as well as the client code.

Similar to the short message LPC sample, the same program works as the server as well as

the client depending on whether a parameter is passed while invoking the program. You

should start the program in the server mode first and when the server is ready, start the same

program in client mode from another command window. The client creates a shared section for

passing parameters and receiving results. The client then establishes communication with the

server and asks for a string sent to the server as the parameter. The client copies the string to

the shared section and sends a message to the server. Upon receiving the message, the

server reverses the string in the shared section and sends a reply. The client prints the

reversed string after receiving the reply. The server permits you to start multiple client sessions

simultaneously.

Listing 8-2: SSLPC.C

#include <windows.h>

#include <stdio.h>

#include undocnt .h"

#include "..\port\print.h"

#define SHARED_SECTION_SIZE 0x10000

typedef struct SharedLpcMessage {

DWORD ServerBaseAddress;

DWORD MessageOffset;

} SHAREDLPCMESSAGE, *PSHAREDLPCMESSAGE;

This initial portion of the file contains, apart from the required include directives, a couple of

important definitions. The client creates the section and therefore determines the size of the

shared section. The server is intimated about the size of the section at the time of connection.

The operating system sets the SharedSectionSize field, in the LPC message, to the size of the

shared section when it passes a LPC_CONNECTION_REQUEST message to the server. The

server might choose to reject the connection request if it disagrees with the section size

chosen by the client. For example, the section size might prove too small for the replies from

Page 149: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

the server.

The section size definition is followed by the definition for the message that the client sends

over the port when the client wants to invoke some service from the server. As described

earlier, the actual parameters pass via the shared section; the message simply indicates to the

server that the client wants to invoke some service. In this sample program, we choose to pass

the port message containing the server-side base address of the shared section and the offset

of the copied parameters within the shared section. The server, in this sample program, does

not keep track of the shared section information for the connected clients. (Remember that the

server is informed of the details of the shared section when it accepts the connection request

via NtAcceptConnectPort().) The server depends solely on the shared-section information

passed by the client with every LPC request. In a nondevelopment environment, with

unreliable clients, the server should either maintain the track of the shared-section information

itself or verify the information sent by the client.

/* Extract the message string from the shared section and reverse it */

void ProcessMessageData(PLPCMESSAGE pLpcMessage)

{

PSHAREDLPCMESSAGE SharedLpcMessage;

char *ServerView;

SharedLpcMessage =(PSHAREDLPCMESSAGE)(pLpcMessage->MessageData);

ServerView =((char

*)SharedLpcMessage->ServerBaseAddress)+SharedLpcMessage->MessageOffset;

strrev(ServerView);

}

The ProcessMessageData() function resembles that in the short message communication

sample, except that it operates on the shared section instead of the data passed in the LPC

message. As described earlier, the client sends LPC requests, containing the server-side base

address of the shared section and the offset of the copied parameters within the shared

section. The ProcessMessageData() function retrieves this information from the LPC message

and calculates the memory address, where the client copied the parameter string. The function

reverses this string, and the client sees the reversed string when the client receives the reply

from the server.

BOOL

ProcessConnectionRequest(

PLPCMESSAGE LpcMessage,

PHANDLE pAcceptPortHandle)

{

LPCSECTIONMAPINFO mapInfo;

HANDLE AcceptPortHandle;

Page 150: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

PrintMessage(LpcMessage);

/* If you get the connection request, accept and complete the request */

memset(&mapInfo, 0, sizeof(mapInfo));

mapInfo.Length=0x0C;

rc = NtAcceptConnectPort(

&AcceptPortHandle,

0,

LpcMessage,

1,

0,

&mapInfo);

if (rc != 0) {

printf("NtAcceptConnectPort failed rc=%x\n", rc);

return FALSE;

}

printf("AcceptPortHandle=%x\n", AcceptPortHandle);

printf("mapInfo.SectionSize=%x\n",mapInfo.SectionSize);

printf("mapInfo.ServerBaseAddress=%x", mapInfo.ServerBaseAddress);

rc = NtCompleteConnectPort(AcceptPortHandle);

if (rc != 0) {

printf("NtCompleteConnectPort failed, rc=%x\n",rc);

return FALSE;

}

*pAcceptPortHandle = AcceptPortHandle;

return TRUE;

}

The ProcessConnectionRequest() here also resembles the one in the shared section LPC

sample. The only difference between the two functions is in the value they pass for the

mapInfo parameter to NtAcceptConnectPort(). If the server passes a non-NULL value for the

mapInfo parameter and the client has not sent the shared section information with the

connection request, the call fails. Therefore, the ProcessConnectionRequest() function, in the

shared section LPC sample, passes NULL as the mapInfo parameter. Here, the

ProcessConnectionRequest() function passes a pointer to the LPCSECTIONMAPINFO

structure, where it receives the information about the shared section for use in parameter

passing. The sample program does not use this information. A real server might keep track of

the shared-section information per client; for example, it can maintain a hash table indexed by

the client thread ID. The server can later retrieve the shared-section information from the hash

table whenever it receives a LPC request. In this sample program, the client sends the shared

section information, with every LPC request, as a part of the message sent over the port.

int server(OBJECT_ATTRIBUTES *ObjectAttr)

Page 151: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

{

HANDLE PortHandle;

int rc;

LPCMESSAGE LpcMessage;

HANDLE AcceptPortHandle;

BOOL FirstTime=TRUE;

/* Create the named port */

rc = NtCreatePort(&PortHandle, ObjectAttr, 0x0,0x0, 0x00000);

if (rc != 0) {

printf("NtCreatePort failed, rc=%x\n", rc);

return -1;

}

printf("Port created, PortHandle=%d\n", PortHandle);

memset(&LpcMessage, 0, sizeof(LpcMessage));

while (1) {

if ((FirstTime) ||(LpcMessage.MessageType != LPC_REQUEST)) {

/* If this is the first message or if the previous message was not a LPC request,

then do not send any reply but just wait on the message.*/

rc = NtReplyWaitReceivePort(

PortHandle,

NULL,

NULL,

&LpcMessage);

FirstTime=FALSE;

} else {

/* Send a reply to the previous message and wait for the new message. */

printf("Sending reply and Waiting for the request....\n");

rc = NtReplyWaitReceivePort(

PortHandle,

0,

&LpcMessage,

&LpcMessage);

if (rc != 0) {

printf("NtReplyWaitReceivePort failed, rc=%x\n", rc);

return -1;

}

}

if (LpcMessage.MessageType ==LPC_CONNECTION_REQUEST) {

printf("Got the connection request\n");

ProcessConnectionRequest(&LpcMessage,

pAcceptPortHandle)

Page 152: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

} else if (LpcMessage.MessageType == LPC_REQUEST) {

/* Process the message received and send the reply in the next iteration of the

while loop. */

printf("Got the request\n");

PrintMessage(&LpcMessage);

ProcessMessageData(&LpcMessage);

}

}

return 0;

}

The server() function implements the server-side functionality of the sample program. It starts

by creating a port object. After successful creation of the port, the function goes in a “receive

request ?process request ?send reply” loop. The server continues in the loop until you

terminate it by pressing Ctrl+C or with the help of the Task Manager.

The server() function receives a new request and replies to the previous request using a single

call to the NtReplyWaitReceive() function. A reply needs to be sent only if the previous request

is of type LPC_REQUEST. Hence, the function calls the NtReplyWaitReceive() function with a

NULL pLpcMessageOut parameter when it receives the first request or the previous request is

not of type LPC_REQUEST. Otherwise, the message received from the client sends as the

pLpcMessageOut parameter. In both cases, upon return from the NtReplyWaitReceive()

function, the LpcMessage structure contains the next request sent by the client.

The server handles only the LPC_REQUEST and LPC_CONNECTION_REQUEST type

messages; other messages are ignored. For LPC_CONNECTION_REQUEST messages, the

server establishes a communication channel with the client by calling the

ProcessConnectionRequest() function. For LPC_REQUEST messages, the server prints the

message and calls the ProcessMessageData() function that reverses the string that passes as

a parameter in the shared section. The reply to the LPC_REQUEST message is sent by a call

to the NtReplyWaitReceivePort() function in the next iteration.

int client(UNICODE_STRING *uString)

{

static int Param3;

HANDLE hFileMapping;

LPCSECTIONINFO sectionInfo;

LPCSECTIONMAPINFO mapInfo;

DWORD ServerBaseAddress;

DWORD ClientBaseAddress;

char *ClientView;

HANDLE PortHandle;

int rc;

LPCMESSAGE LpcMessage;

Page 153: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

hFileMapping = CreateFileMapping(

(HANDLE)0xFFFFFFFF,

NULL,

PAGE_READWRITE,

0,

SHARED_SECTION_SIZE,

NULL);

if (hFileMapping == NULL) {

printf("Unable to create file mapping\n");

return -1;

}

memset(&sectionInfo, 0, sizeof(sectionInfo));

memset(&mapInfo, 0, sizeof(mapInfo));

sectionInfo.Length = 0x18;

sectionInfo.SectionHandle = hFileMapping;

sectionInfo.SectionSize = SHARED_SECTION_SIZE;

mapInfo.Length = 0x0C;

printf("ClientProcessId=%x, ClientThreadId=%x\n",GetCurrentProcessId(),

GetCurrentThreadId());

rc = NtConnectPort(

&PortHandle,

uString,

&Param3,

&sectionInfo,

&mapInfo,

NULL,

NULL,

NULL);

if (rc != 0) {

printf("Connect failed, rc=%x %d\n", rc);

return -1;

}

printf("PortHandle=%x\n", PortHandle);

printf("Client Base address=%x\n",sectionInfo.ClientBaseAddress);

printf("Server Base address=%x\n",sectionInfo.ServerBaseAddress);

ServerBaseAddress =sectionInfo.ServerBaseAddress;

ClientBaseAddress =sectionInfo.ClientBaseAddress;

while (1) {

Page 154: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

static char MessageString[SHARED_SECTION_SIZE];

int MessageOffset = 0;

PSHAREDLPCMESSAGE SharedLpcMessage;

printf("Enter the message string, enter ’quit’ to exit : ");

gets(MessageString);

if (stricmp(MessageString, "quit") == 0) {

CloseHandle(PortHandle);

return 0;

}

fflush(stdin);

printf("Enter the offset in shared memory "where the message is to be kept : ");

scanf("%d", &MessageOffset);

if ((MessageOffset+strlen(MessageString)) >=SHARED_SECTION_SIZE) {

printf("Message cannot fit in shared "memory window\n");

return -1;

}

/* Fill in the message */

memset(&LpcMessage, 0, sizeof(LpcMessage));

LpcMessage.ActualMessageLength=0x08;

LpcMessage.TotalMessageLength=0x20;

SharedLpcMessage =(PSHAREDLPCMESSAGE)(LpcMessage.MessageData);

printf("Server base address=%x\n",ServerBaseAddress);

SharedLpcMessage->ServerBaseAddress =ServerBaseAddress;

SharedLpcMessage->MessageOffset =MessageOffset;

ClientView = ((char *)ClientBaseAddress) +MessageOffset;

strcpy(ClientView, MessageString);

/* Send the message and wait for the reply */

printf("Sending request and waiting for reply....\n");

rc=NtRequestWaitReplyPort(

PortHandle,

&LpcMessage,

&LpcMessage);

if (rc != 0) {

printf("NtRequestWaitReplyport failed, rc=%x\n", rc);

Page 155: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

return -1;

}

/* Print the reply received */

printf("Got the reply\n");

PrintMessage(&LpcMessage);

//

printf("Reply = %s\n", ClientView);

}

return 0;

}

The client() function, which encompasses the client-side functionality of the sample program,

substantially differs from the client() function in the short message LPC sample. This is

because a majority of the shared-section handling is performed in the client.

The client() function starts creating a shared section by calling the CreateFileMapping() API

function. Note that the section is created with read+write permissions. Also note that, the file

handle, passed as ?, means an unnamed section not associated with any file is created. You

can create a section by mapping a disk file, but it is not necessary. The function passes the

section handle, returned by the CreateFileMapping() function, to the NtConnectPort() function

via the sectionInfo parameter. The NtConnectPort() function maps the shared section in the

client as well as the server address space before sending a connection request to the server.

The NtConnectPort() function returns after successfully establishing a communication channel

with the server. Upon return, the sectionInfo structure contains the information about

shared-section mapping. The function also returns the handle to the LPC port, used by the

client for issuing requests.

After a successful connection establishment, the client goes in a "send request ?wait for reply”

loop. The client asks the user for a string that it sends to the server as the parameter. (If you

enter “quit,” the client exits.) The client also inputs the offset, within the shared section. After

receiving these inputs, the client copies the given string at the specified offset in the shared

section. It fills up a LPC message indicating the base address, of the shared section, in the

server address space and the offset of the string within the shared section. The client sends

the LPC message to the server over the port by calling the NtRequestWaitReplyPort() function.

Upon receiving the message, the server reverses the string and sends a reply message. The

client prints the reversed string upon return from the NtRequestWaitReplyPort() function.

main(int argc, char **argv)

{

OBJECT_ATTRIBUTES ObjectAttr;

UNICODE_STRING uString;

int rc;

/* Initializes the object attribute structure */

Page 156: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

memset(&ObjectAttr, 0, sizeof(ObjectAttr));

ObjectAttr.Length=sizeof(ObjectAttr);

ObjectAttr.ObjectName=&uString;

RtlInitUnicodeString(&uString, PORTNAME);

if (argc == 1) {

/* If no parameters are specified for the program act as the server */

rc = server(&ObjectAttr);

} else {

/* If any command line parameter is specified it acts as the client */

rc = client(&uString);

}

return rc;

}

Similar to the short message LPC sample, the main() function in this sample program does not

have any substantial code. It simply acts as a control function that calls either the server()

function or the client() function depending on whether the program is invoked with command

line parameters. The program also uses the PRINT.H and PRINT.C files for printing the LPC

messages.

QUICK LPC

Quick LPC is the fastest form of LPC. Apart from that, Quick LPC has some peculiarities. For

one, Quick LPC does not use port objects. Second, Quick LPC serves as the exclusive

medium of communication for the Win32 subsystem. The Windows NT kernel supports only a

single server (per client) using Quick LPC; the Win32 subsystem occupies this slot. Therefore,

if you want to use Quick LPC, you need to modify the kernel a bit. (Note that until now, we

presented only user-level code in this chapter.) However, talking about the peculiarities without

giving details can make this concept puzzling. So, here we present details about Quick LPC.

Quick LPC is used only in Windows NT 3.51.

Advantages of Quick LPC

Let us first see why Quick LPC is faster than the regular LPC. The LPC communication using

the port objects proves slow for a couple of reasons. One, there is a single server thread

waiting on the port object and servicing the requests. This single server thread is naturally

overloaded in anticipation of multiple clients making frequent requests. You can overcome this

disadvantage by using a fleet of slave threads. The main server thread gets requests from the

port and simply passes them on to one of the slave threads for servicing. The server threads

run in parallel with the main thread and process requests when the main thread receives new

requests.

Another problem with the regular LPC is that the context switching between the client thread

and the server thread happens in an “uncontrolled” manner. Typically, a client sends a request

on the port and waits for a response from the server (except while sending datagrams using

the NtRequestPort() function). While the client thread waits on the port for a reply, the thread

Page 157: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

scheduler searches for the most eligible thread for execution. More often than not, this new

thread selected for execution differs from the server thread. Essentially, the server thread is

not immediately scheduled when the request comes over the port. Similarly, the client thread

may not be scheduled immediately after the subsystem sends a reply.

Quick LPC overcomes both of the aforementioned disadvantages. The first disadvantage is

overcome by creating a dedicated server thread per client thread. The second disadvantage is

overcome by using a kernel object named an event pair, which serves as the backbone of the

Quick LPC. As implied by its name, an event pair consists of a pair of event objects, named

high event and low event, respectively. The NT kernel provides functions, which allow a thread

to wait on one of the events in the pair and signal the second event in an atomic operation. The

event pair object also guarantees that the thread waiting on the signaled event is the next

thread to be scheduled.

Note: Two sets of functions operate on the event pair. One set of functions gives the regular

sleep-wakeup protocol; it does not guarantee immediate thread scheduling: The

NtSetHighWaitLowEventPair() function and the NtSetLowWaitHighEventPair() function. In this

chapter, we discuss the other set of functions that guarantee the immediate scheduling of the

signaled thread.

Quick LPC and Win32 Subsystem

To clarify, let’s see how the Win32 subsystem uses the Quick LP C. When a client thread

makes the first GUI call, the Win32 subsystem creates a thread dedicated to the calling client

thread. The new server thread creates an event pair object and calls the

KiSetLowWaitHighThread() function, with the event pair object as a parameter. The server

thread waits for the high event from the pair to get signaled. Now, whenever the client thread

makes a GUI call, the KiSetHighWaitLowThread() function is called. This call signals the high

event in the pair and en-queues the client thread in the list of threads waiting for the low event

to get signaled. In other words, the client thread sleeps while the corresponding server thread,

waiting on the high event, gets woken up. After processing the request, the server thread calls

the KiSetLowWaitHighThread() function that makes the server thread sleep for the high event

and the client thread, which was waiting for the low event, takes over the CPU. This sequence

repeats for every GUI call made by the client.

The event pair object takes care of the “controlled” thread switching. It provides no mechanism

for passing parameters and return values. The Quick LPC achieves this with a dedicated

shared section for each client thread. The Win32 subsystem also creates a dedicated section

object and maps it in the address space of both the client and the subsystem processes. The

client thread fills in the parameters in the shared area before passing the control to the server

thread and similarly the server thread copies the results in the shared area before returning the

control to the client thread.

Naturally, you may think, “Why is the Quick LPC restricted to the Win32 subsystem? Why can’t

it operate as a general-purpose Inter-Process Communication mechanism?” The reason is

that you cannot call the functions KiSetLowWaitHighThread() and KiSetHighWaitLowThread()

Page 158: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

from the user-mode process directly. Windows NT reserves two software interrupts for this

purpose. Interrupt 0x2C calls the function KiSetLowWaitHighThread() and interrupt 0x2B calls

the function KiSetHighWaitLowThread().These two interrupt routines operate on a default

event pair object. The Thread Environment Block (TEB) maintains a pointer to this default

event pair. You can use the NtSetThreadInformation() function to set this pointer. Since only

one event pair object can associate with every thread, only one server thread can make use of

the Quick LPC; that server thread typically belongs to the Win32 subsystem for most

applications. However, non-Win32 applications–or for that matter, non-GUI applications–can

still use the Quick LPC for general-purpose communication.

Steps in Quick LPC Communication

The server application for a non-GUI program can mimic the Win32 subsystem and use the

Quick LPC for communication. Let’s see what the Win32 subsystem does while establishing

the Quick LPC.

1. It creates one dedicated thread in the CSRSS process.

2. It creates a section object, 64K in size.

3. It maps the view of sections in the client thread and the subsystem.

4. It creates an event pair object.

5. It duplicates the event pair object handle in the client process.

6. It duplicates the section object handle in the client process.

7. It calls NtSetInformationThread() function, with SetEventPairThread as the information class, for

the subsystem thread and for the client thread.

8. It returns information such as duplicated event pair handle, section handle, address in the client

process where the shared section is mapped, and so on.

9. After this, the thread data in the client thread reflects that the Quick LPC is established.

10. The dedicated CSRSS thread calls INT 2CH (KiSetLowWaitHighThread). Because of this, the

CSRSS thread is blocked until the client sends a request.

11. When the client makes a GUI call, the client fills in the parameters in the shared section and

issues INT 2BH (KiSetHighWaitLowThread). Because of this, the server thread wakes up,

performs the specified task, and fills in the results in the shared section. Then, the server thread

issues the interrupt 0x2B, which wakes up the client thread. This 2B/2C sequence repeats until

the client thread terminates.

Quick LPC Sample Here, we present a program that mimics the Win32 subsystem and shows how you can use

the Quick LPC for general-purpose communication. The program, only a demonstration, does

not implement any service. As described earlier, the event pair object does not provide any

parameter passing mechanism. The user of the event pair object has to implement parameter

passing using shared sections. In this sample program, we do not demonstrate the use of

shared section because it is straightforward and we already demonstrated it at length in the

shared section LPC sample program. In this sample program, we demonstrate only how to

implement “controlled” switching between the client thread and the server thread using the

Page 159: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

event pair object.

Following the usual practice in this chapter, the same sample program acts as the server or the

client depending on whether you pass a command line parameter to the program. You should

first start the program in the client mode. The client prints its own process ID and thread ID.

The server needs this information to establish the event pair object. After you start the program

in the server mode, it asks you for the process ID and the thread ID of the client. After

initializing the thread object, the server issues INT 2CH then waits for a client request.

Meantime, the client waits for a user keystroke. After getting a keystroke from the user, the

client issues a INT 2BH, which switches the execution thread from the client thread to the

server thread. The server prints a message indicating that it is scheduled and then waits for a

keystroke. Upon receiving the keystroke, it switches the control back to the client by triggering

INT 2C again. This continues until you kill the server and the client by pressing Ctrl+C or using

the Task Manager.

The implementation, for both client and server, resides in a single file, QLPC.C, which we

describe in detail in the next section.

Listing 8-3: QLPC.C

#include <windows.h>

#include <stdio.h>

#include "..\include\undocnt.h"

#define EVENTPAIRNAMEL"\\MyEventPair"

Apart from the usual header inclusions, the initial portion of the QLPC.C file defines the name

of the event pair used by the sample program to demonstrate “controlled” thread switching. We

create the event pair at the root of the object directory. If you want to create several objects in

the object tree, we suggest you create these objects under an application-specific directory.

int server()

{

static HANDLE EventPairHandle;

HANDLE ClientEventPairHandle;

OBJECT_ATTRIBUTES ObjectAttr;

UNICODE_STRING uString;

DWORD ClientPid, ClientTid;

HANDLE ClientProcessHandle, ClientThreadHandle;

DWORD OpenThreadParam[2];

int rc;

memset(&ObjectAttr, 0, sizeof(ObjectAttr));

ObjectAttr.Length = sizeof(ObjectAttr);

RtlInitUnicodeString(&uString, EVENTPAIRNAME);

ObjectAttr.ObjectName = &uString;

Page 160: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

rc = NtCreateEventPair(

&EventPairHandle,

STANDARD_RIGHTS_ALL,

&ObjectAttr);

if (rc == 0) {

printf("EventPairHandle=%x\n", EventPairHandle);

} else {

printf("Unable to create event pair, rc=%x\n", rc);

return -1;

}

rc = ZwSetInformationThread(

GetCurrentThread(),

8,

&EventPairHandle,

4);

if (rc != 0) {

printf("NtSetInformationThread failed for the server, rc=%x\n", rc);

return -1;

}

printf("Enter pid and tid of the client : ");

scanf("%d%d", &ClientPid, &ClientTid);

ClientProcessHandle = OpenProcess(

PROCESS_ALL_ACCESS,

FALSE,

ClientPid);

if (ClientProcessHandle == NULL) {

rc = GetLastError();

printf("Unable to open handle to process, rc=%x\n",rc);

return -1;

}

memset(&ObjectAttr, 0, sizeof(ObjectAttr));

ObjectAttr.Length = sizeof(ObjectAttr);

OpenThreadParam[0] = ClientPid;

OpenThreadParam[1] = ClientTid;

rc = NtOpenThread(

&ClientThreadHandle,

THREAD_ALL_ACCESS,

&ObjectAttr,

OpenThreadParam);

if (rc != 0) {

Page 161: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

printf("NtOpenThread failed, rc=%x\n", rc);

return -1;

}

printf("ClientProcessHandle = %x\n",ClientProcessHandle);

printf("ClientThreadHandle = %x\n",ClientThreadHandle);

rc = DuplicateHandle(

GetCurrentProcess(),

EventPairHandle,

ClientProcessHandle,

&ClientEventPairHandle,

0,

FALSE,

DUPLICATE_SAME_ACCESS);

if (rc == FALSE) {

rc = GetLastError();

printf("DuplicateHandle failed, rc=%x\n", rc);

return -1;

}

printf("Client EventPair handle = %x\n",ClientEventPairHandle);

rc = ZwSetInformationThread(ClientThreadHandle,

8,

&EventPairHandle,

4);

if (rc != 0) {

printf("NtSetInformationThread failed for the client, rc=%x\n", rc);

return -1;

}

while (1) {

DWORD ret_val;

_asm int 2Ch

_asm mov ret_val, eax

if (ret_val != 0) {

printf("int 2C returned error, rc=%x\n", ret_val);

} else {

printf("int 2C returned\n");

}

getchar();

}

return 0;

Page 162: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

}

The server() function creates a named event pair object. It receives a handle to the newly

created event pair upon successful creation of the object. Next, it establishes an association

between the event pair and the server thread–the current thread. The server uses the

ZwSetInformationThread() function to associate the event pair with the thread. This function is

documented in the Windows NT DDK, but you can also call it from a user-mode, nondriver

application. The prototype for this function looks like:

NTSTATUS

ZwSetInformationThread(

HANDLE ThreadHandle,

THREADINFOCLASS ThreadInformationClass,

PVOID ThreadInformation,

ULONG ThreadInformationLength);

As described earlier, each thread points to the associated event pair object, and the INT

2BH/INT 2CH issued by a thread operates on the associated event pair object. The operating

system stores the pointer of the associated event pair in the Thread Environment Block for the

thread, and you can set it using the ZwSetInformationThread() function. The

ThreadInformationClass for the event pair pointer is 8. The actual information to set is the

handle of the event pair object. We pass 4 as the ThreadInformationLength parameter

because it represents the size of a handle in Windows NT.

The server needs to associate the event pair with the client thread. But this is not as simple as

setting up the association for the current thread. First, the server gets a hold of handles to the

client process and the client thread. For this, it needs the client’s process ID and thread ID,

which input from the user. The function uses the OpenProcess() API function to get a handle to

the client process.

Note: The server process should have security rights to open the client process.

The function uses an undocumented system call–namely, NtOpenThread()–to get a handle to

the client thread. The NtOpenThread() system call returns a thread handle given the process

ID and the thread ID. Next, the server duplicates the event pair handle in the client process’s

context. It uses the DuplicateHandle() API function to achieve this. The server needs the

process handle to get the duplicate event pair handle and the thread handle to associate the

event pair and the thread. The ZwSetInformationThread() function is called again, this time

with the client thread handle, to associate the event pair with the client thread. The function

requires the event pair handle to exist in the context of the process that owns the thread. That

is the reason we duplicated the handle in the context of the client process.

After setting up the Quick LPC channel, the server can now accept requests from the client. It

goes into a loop, blocking in the INT 2CH, and indicating it to the user whenever it gets a

Page 163: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

request from the client. The server waits for a keystroke and then issues INT 2CH. This causes

the server thread to suspend and the client thread to release for execution. We use inline

assembly to issue the software interrupt. Note that the interrupt routine, for interrupt 0x2C,

stores the return value in the EAX register.

int client()

{

printf("Client Process id = %d\n",

GetCurrentProcessId());

printf("Client Thread id = %d\n",

GetCurrentThreadId());

getchar();

while (1) {

DWORD ret_val;

_asm int 2Bh

_asm mov ret_val, eax

if (ret_val != 0) {

printf("int 2B returned error, rc=%x\n", ret_val);

} else {

printf("int 2B returned\n");

}

getchar();

}

return 0;

}

The client() function proves much simpler in comparison to the server() function because the

entire Quick LPC initialization is done by the server. The client just provides the process ID and

the thread ID for input to the server. After the initialization is complete, the server waits for a

client request in INT 2CH. You should indicate the end of initialization to the client by a

keystroke. After receiving the keystroke, the client issues a INT 2BH, releasing the server

thread for execution. Now, the client blocks and is rescheduled only when the server issues

INT 2CH. The client waits for a keystroke from the user before issuing another INT 2BH.

We use inline assembly to issue the software interrupt. Note that the interrupt routine, for

interrupt 0x2B, stores the return value in the EAX register.

main(int argc, char **argv)

{

int rc;

if (argc == 1) {

rc = server();

} else {

rc = client();

Page 164: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

}

return rc;

}

The main function in this sample program represents the control center. It calls the server()

function if you invoke the program without any parameters; otherwise, it calls the client()

function.

Enhancements to the Sample Program

The sample program, presented in the previous section, can handle a single client. But the

user must supply the client’s process ID and thread ID to the server. You can overcome these

deficiencies by making use of the port LPC for establishing the Quick LPC. The server can

create a port and wait for requests on the port.

Whenever a client starts, it connects to the server over the port and sends a LPC request

containing its process ID and thread ID. The server, upon receiving the request, initializes an

event pair object and creates a new thread to handle the new client. A shared section also

needs to be created and mapped in the server address space, as well as the client address

space. The server can do it explicitly, or it can use the shared-section LPC so that the client

creates the section and the system itself takes care of the mapping.

After setting up the communication channel like this, the main server thread sends a reply

message to the client indicating that everything is set up. Now, the main server thread can

freely accept more connection requests from clients. The newly created thread waits for the

client requests by issuing INT 2CH. After the Quick LPC channel is established, the client can

copy the parameters to the shared area and issue INT 2BH whenever it needs to invoke some

service from the server.

As a result of the software interrupt, the server thread is scheduled for execution. The server

thread reads the parameters from the shared area, processes the request, copies the results

to the shared area, and invokes INT 2CH. The software interrupt causes the server thread to

sleep, and the client thread is scheduled for execution. This continues until the client thread

closes the port handle or dies. Now, the main server thread gets a LPC_HANDLE_CLOSED

message over the port. Upon receiving the message, the main thread releases all resources

allocated for the client; in other words, it destroys the shared-section mapping, kills the thread

handling the particular client, destroys the event pair handle, and so on.

The sample program presented in the previous section works for console applications under

Windows NT 3.51. The program does not work for GUI applications because the Win32

subsystem also sets the event pair handle in the Thread Environment Block (TEB), overwriting

the event pair handle set by our program. The Win32 subsystem sets the event pair handle in

the TEB when the thread makes the first GUI call. One fact in our favor is that the event pair

handle is maintained per thread. Therefore, you can work around this problem very easily by

having a separate client thread to communicate with the server. The other threads in the

application can consist of GUI threads, accessing the GUI functions offered by the Win32

Page 165: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

subsystem and using the Quick LPC to talk to the Win32 subsystem. You should take care only

that the thread, using the Quick LPC to talk to your own server, does not make any GUI calls.

Note: Our sample program does not work in Windows NT 4.0 because the interrupt 0x2B serves a

different purpose.

As you know, the Win32 subsystem functionality moves entirely into the kernel-mode driver,

namely, WIN32K.SYS, in Windows NT 4.0. The Win32 GUI calls also process as system calls in

Windows NT 4.0. Therefore, the Win32 subsystem no longer needs the Quick LPC interface, also

negating the requirement of interrupts 0x2C and 0x2B.

We already saw that the functions KiSetLowWaitHighThread() and KiSetHighWaitLowThread()

are not directly callable from the user land. Being unable to use interrupt 0x2B means that a

way to access these functions from the user land is blocked. There is another way though. A

pair of kernel functions, namely, NtSetLowWaitHighThread() and NtSetHighWaitLowThread(),

can perform the same job. You can get to these functions using a pair of system calls that

invoke these functions. These system calls don’t accept any parameters since the two

functions operate on the event pair pointed to by the TEB of the calling thread. Surprisingly, the

corresponding functions in the NTDLL.DLL don’t invoke these system services. Instead, these

functions invoke the interrupts 0x2B and 0x2C.

Note: Surprisingly, the Win32 subsystem, under Windows NT 3.51, does not call the NTDLL.DLL

functions. It invokes the interrupts 0x2B and 0x2C directly. Performance seems the most likely

reason behind this “bypassing” act. First, the system call interface is bypassed. The overheads of

system call setup—that is, indexing the system call ID to find out the number of parameters and

the kernel function to be invoked—might prove unacceptable. Hence, we find the two functions in

question by going out of the way and invoking the special interrupts instead of using the normal

system call interface interrupt 0x2E. Of course, this required modifying the kernel to handle the

two new software interrupts. We still don’t understand why the Win32 subsystem bypasses the

NTDLL.DLL functions.

You cannot use these functions to access the Quick LPC on Windows NT 4.0. Obviously, you

need to implement the system call invocation yourself; it’s fairly easy, though. On Windows NT

4.0, you need to change the INT 2Bh instruction to the following sequence of instructions that

invoke the NtSetHighWaitLowThread() system call:

MOV EAX, A0h

LEA EDX, [ESP + 4]

INT 2Eh

You cannot use INT 2CH, under Windows NT 4.0, even though the interrupt handler for it

remains there in place. (You would expect both the interrupt handlers to be extinct if the Win32

subsystem no longer requires them, wouldn’t you?) This is because the interrupt handler

returns a STATUS_NO_EVENT_PAIR error even if the TEB of the calling thread points to a

Page 166: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

proper event pair. Therefore, you need to use a corresponding system call to achieve the

same effect as the KiSetLowWaitHighThread() function. You can replace the INT 2CH

instruction with the following instructions that invoke the NtSetLowWaitHighThread() system

call:

MOV EAX, ABh

LEA EDX, [ESP + 4]

INT 2Eh

The system call interface exists and can function even under Windows NT 3.51. You might

choose to use the same interface for the two versions of Windows NT so that the same code

works on both versions. OK! It’s not so straightforward because the service IDs changed from

Windows NT 3.51 to Windows NT 4.0. In Windows NT 3.51, the service ID for the

NtSetLowWaitHighThread() system call is 0xA3, and the NtSetHighWaitLowThread() system

call is 0x98.

SUMMARY

A local procedure call (LPC) is the communication mechanism used by Windows NT

subsystems. In this chapter, we gave you a brief introduction to subsystems followed by a

detailed discussion on the undocumented LPC mechanism.

There are three types of LPC. The short message LPC passes small messages up to 304

bytes in length. The shared section LPC uses shared memory and passes larger messages.

Both the short message LPC and the shared section LPC are based on a kernel object called

port. The functions to manipulate ports are not documented. In this chapter, we documented

the parameters and use of these functions with demonstration programs.

The Quick LPC, the fastest form of LPC, is used exclusively by the Win32 subsystem. The

Quick LPC proves faster because it ensures controlled scheduling of the client and server

thread. In contrast with the other two forms of LPC, the Quick LPC requires a dedicated server

thread per client thread. The Quick LPC mechanism uses another kernel object–the event pair.

The context switches between the client thread and the corresponding dedicated server thread

are optimized using the event pair object.

Page 167: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 9

Hooking Software Interrupts

Page 168: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter covers how operating systems use software interrupts, why software interrupts

need hooking, and how to hook software interrupts. An example hooks INT 2E (the system

service interrupt) in Windows NT.

WHAT ARE INTERRUPTS?

An interrupt refers to a mechanism that breaks into the normal execution of an application

program and transfers control to operating system code. There are three kinds of interrupts:

hardware interrupts, software interrupts, and exceptions.

Hardware interrupts come from the physical devices in the machine. For example, whenever

there is a character waiting on the COM port, a hardware interrupt will be triggered. When an

I/O operation completes, a hardware interrupt also will be triggered.

Software interrupts occur as a result of an explicit INT nn request from the application.

Applications typically use this mechanism to get different services from the operating system.

Exceptions occur as a result of an application 抯 attempt to perform illegal operations, such as

dividing by zero.

The next sections detail how processors handle software interrupts in real, protected, and V86

modes.

Interrupt Processing in Real Mode

In real mode, the lower 1K of memory holds a data structure known as the Interrupt Vector

Table (IVT). There are nominally 256 entries in this table. (Since the 80286, the IVT is not

required to have 256 entries or start at physical address 0. The base and address and length

of the IVT are determined by looking at the Interrupt Descriptor Table Register.) Each entry

contains a far pointer to an Interrupt Service Routine. Any type of interrupt routes to the

appropriate Interrupt Service Routine through this table. The processor indexes the interrupt

number in this table; pushes current CS, IP, and flags on the stack; and calls the far pointer

specified in the IVT. The handler processes the interrupt and then executes an IRET

instruction to return control to the place where the processor executed at the time of the

interrupt.

Interrupt Processing in Protected Mode

In protected mode, interrupts are handled in a similar way as real mode. The Interrupt

Descriptor Table (IDT) does what the IVT does in real mode. IDT consists of an array of 8-byte

segment descriptors called gates. The Interrupt Descriptor Table Register (IDTR) holds the

base address and the limit of IDT. The IDT must exist in physical memory and should never

swap out to virtual memory. This is because if an interrupt were to occur while the IDT were

swapped out, the processor would generate an exception, requiring the IDT to get the handler

for handling this exception, and so on until the system crashed. The gates in the IDT can

Page 169: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

consist of three types: interrupt gates, trap gates, and task gates. We won 抰 dwell on the

details of the trap and task gates. For further information, refer to Intel processor

documentation.

Interrupt gates interest us. The important fields of interrupt gates include the code segment

selector and the offset of the code for execution for this interrupt, as well as the privilege level

of the interrupt descriptor. The interrupt processing closely resembles that in real mode. When

the interrupt occurs, the processor indexes the interrupt number in IDT, pushes EFLAGS, CS,

and EIP onto the stack, and calls the handler specified in the IDT. When the handler finishes

executing, it should execute the IRET instruction to return control. Depending upon the type of

interrupt, an error code may be pushed on the stack. The handler must clear this error code

from the stack. The DPL field in the interrupt gate controls the software interrupts. The current

privilege level must be at least as privileged as DPL to call these software interrupts. If not,

then a General Protection Fault is triggered. This protection feature permits the operating

system to reserve certain software interrupts for its own use. Hardware interrupts and

exceptions process without regard to the current privilege level.

Interrupt Processing in V86 Mode

In V86 mode, any INT nn instruction causes a General Protection Fault. Windows NT uses this

to map INT 21h calls made from an MS-DOS application to Win32 API calls. This mapping

occurs as part of a GPF handler for Windows NT. Other types of interrupts are handled

similarly to those in protected mode.

HOW OPERATING SYSTEMS USE SOFTWARE INTERRUPTS

MS-DOS uses INT 21 to provide core system services to the applications. Other software

interrupts are also provided, such as multiplex interrupt 2F. Applications fill in the parameters in

various registers and execute the INT nn instruction to access these services from the

operating system. Various compiler libraries provide wrappers around these interrupt

interfaces and provide useful C functions, such as _open, _read, _write, and others.

Not much changes in the way software interrupts are used in Windows 95/98 and Windows NT.

Windows NT provides user-callable software interrupts. The following table lists the important

software interrupts provided.

TABLE 9-1 WINDOWS SOFTWARE INTERRUPTS

TABLE 9-1 WINDOWS SOFTWARE INTERRUPTS

Interrupt Number

Functionality

2Ah Used to get the current timer tick count.

2Bh, 2Ch Used by the CSRSS subsystem to force an immediate thread switch. This

Page 170: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

occurs as part of a LPC mechanism. We discussed LPC in more detail in Chapter 8. These interrupts are used only in Windows NT 3.51 since in later versions of Windows NT, most of the functionality in CSRSS is moved to a kernel-mode driver WIN32K.SYS.

2Dh Debugging service. This service, used by driver writers, outputs debugging messages to the Debugger Window. The DbgPrint() function provided in DDK calls this interrupt to output debug messages.

2Eh This interrupt is extensively used for calling system services provided by Windows NT. The system services are provided by two components viz. NTOSKRNL and WIN32K.SYS. The services provided by WIN32K.SYS are present only in Windows NT versions later than 3.51. We discuss system services in detail in Chapters 6 and 7.

WHY SOFTWARE INTERRUPTS NEED TO BE HOOKED Software interrupts need to be hooked for several reasons. One reason is to change the

behavior of the system services exported by the operating system. By hooking the software

interrupts, you can write monitoring applications. Hooking can prove useful in studying

operating system internals. This can also serve as a way to hook system services, although

the mechanism discussed in Chapter 6 provides a better way of doing that.

MS-DOS provides system services to hook software interrupts by means of INT 21h, and

functions 25h and 35h. Compiler libraries provide wrapper functions such as _dos_getvect and

_dos_setvect to hook software interrupts. Windows 95 provides a mechanism to hook software

interrupts by means of Set_PM_Int_Vector and Hook_V86_Int_Chain VxD services. However,

Windows NT does not officially support any way to hook software interrupts. The DDK does

provide functions such as HalGetInterruptVector() and IoConnectInterrupt() to hook hardware

interrupts. Once we understand Intel data structures such as IDT and interrupt gates, we can

easily hook software interrupts in Windows NT. Hooking software interrupts basically amounts

to changing the code selector and offset fields in the Interrupt Gate Descriptor. However, this

certainly becomes a platform-dependent situation. It will work only on an Intel implementation

of Windows NT.

You can apply the same technique for hooking software interrupts to hook hardware interrupts

or exceptions although you should use the documented IoConnectInterrupt() function to hook

hardware interrupts. You have to write an interrupt handler keeping in mind the type of interrupt

it is hooking into because the stack frame might differ in various situations. The new interrupt

handler must be written in Assembly language because of the restrictions imposed by 32-bit

compilers.

HOW TO HOOK SOFTWARE INTERRUPTS

Page 171: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

As we already discussed, the two Intel data structures –IDTR and Interrupt Gate

Descriptor–play crucial roles in interrupt processing. You can discover the contents of IDTR

with the sidt Assembly instruction. This instruction places the base and limit of IDT in a 6-byte

location specified by the operand. Once you get the base address of IDT, you can index the

interrupt number you want to hook in this table and change the code selector and offset

specified. Before doing this, you must save the old code selector and offset. Also, your new

handler should ensure that the interrupt is chained properly to the old handler, meaning the

new handler should maintain the state of registers and stack in such a way that the old handler

should be called as if it were directly called by the processor through the IDT.

The sample application that we write in this chapter hooks INT 2Eh (System Service Interrupt)

and maintains the counters of how many times a particular system service was called. The

sample maintains only the counter of system services provided by NTOSKRNL.EXE. The

user-level application issues DeviceIoControl to this driver to obtain the statistics about the

service usage. As we already saw in Chapter 7, there are a total of 0xC4 system services in

NT 3.51, 0xD3 services in NT 4.0, and 0xF4 services in Windows 2000 provided by

NTOSKRNL.EXE. This sample works on all versions of Windows NT to date.

HOOKINT.C

#include "ntddk.h"

#include "stdarg.h"

#include "stdio.h"

#include "Hookint.h"

#define TEST_PAGING

#define DRIVER_SOURCE

#include "..\..\include\intel.h"

#include "..\..\include\wintype.h"

#include "..\..\include\undocnt.h"

/* Interrupt to be hooked */

#define HOOKINT 0x2E

int OldHandler;

ULONG *ServiceCounterTable;

ULONG ServiceCounterTableSize;

int NumberOfServices;

#ifdef TEST_PAGING

void *PagedData;

#endif

Page 172: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

extern void _cdecl NewHandler();

/* Buffer to store result of sidt instruction */

char buffer[6];

/* Pointer to structure to identify the limit and * base of IDTR */

PIdtr_t Idtr=(PIdtr_t)buffer;

#pragma pack()

void NewHandlerCFunc(int ServiceId)

{

if (ServiceId>NumberOfServices) return;

#ifdef TEST_PAGING

memset(PagedData, 0, 100000);

#endif

ServiceCounterTable[ServiceId+1]++;

return;

}

NTSTATUS DriverSpecificInitialization()

{

PIdtEntry_t IdtEntry;

extern PServiceDescriptorTableEntry_t

KeServiceDescriptorTable;

NumberOfServices =KeServiceDescriptorTable->NumberOfServices;

ServiceCounterTableSize =(NumberOfServices+1)*sizeof(int);

ServiceCounterTable = ExAllocatePool(PagedPool,

ServiceCounterTableSize);

if (!ServiceCounterTable) {

return STATUS_INSUFFICIENT_RESOURCES;}

#ifdef TEST_PAGING

PagedData=ExAllocatePool(PagedPool, 100000);

if (!PagedData) {

ExFreePool(ServiceCounterTable);

return STATUS_INSUFFICIENT_RESOURCES;

}

#endif

memset(ServiceCounterTable,0,ServiceCounterTableSize);

*ServiceCounterTable=NumberOfServices;

trace(("NumberOfServices=%x, ""ServiceCounterTableSize=%x, @%x\n",

Page 173: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

NumberOfServices, ServiceCounterTableSize,

ServiceCounterTable));

/* Get the Base and Limit of IDTR Register */

_asm sidt buffer

IdtEntry=(PIdtEntry_t)Idtr->Base;

/* Index the interrupt number to be hooked specified y "HOOKINT define" in

* appropriate IDT entry, extract and save away the Old * handler’s address */

OldHandler =((unsigned

int)IdtEntry[HOOKINT].OffsetHigh<<16U)|(IdtEntry[HOOKINT].OffsetLow);

/* Plug into the interrupt by changing the offset

* field to point to NewHandler function

*/

_asm cli

IdtEntry[HOOKINT].OffsetLow =(unsigned short)NewHandler;

IdtEntry[HOOKINT].OffsetHigh =(unsigned short)((unsigned int)NewHandler>16);

_asm sti

return STATUS_SUCCESS;

}

NTSTATUS

DriverEntry(

IN PDRIVER_OBJECT DriverObject,

IN PUNICODE_STRING RegistryPath

)

{

MYDRIVERENTRY(L"hookint",FILE_DEVICE_HOOKINT,DriverSpecificInitialization()

);

return ntStatus;

}

NTSTATUS

DriverDispatch(

IN PDEVICE_OBJECT DeviceObject,

IN PIRP Irp

)

{

PIO_STACK_LOCATION irpStack;

PVOID ioBuffer;

ULONG inputBufferLength;

ULONG outputBufferLength;

ULONG ioControlCode;

NTSTATUS ntStatus;

Page 174: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Irp->IoStatus.Status = STATUS_SUCCESS;

Irp->IoStatus.Information = 0;

irpStack = IoGetCurrentIrpStackLocation (Irp);

ioBuffer = Irp->AssociatedIrp.SystemBuffer;

inputBufferLength = irpStack->Parameters.DeviceIoControl.InputBufferLength;

outputBufferLength = irpStack->Parameters.DeviceIoControl.OutputBufferLength;

switch (irpStack->MajorFunction)

{

case IRP_MJ_DEVICE_CONTROL:

trace(("HOOKINT.SYS: IRP_MJ_DEVICE_CONTROL\n"));

ioControlCode = irpStack->Parameters.DeviceIoControl.IoControlCode;

switch (ioControlCode)

{

case IOCTL_HOOKINT_SYSTEM_SERVICE_USAGE:

{

int i;

/* Check if sufficient sized buffer is * provided to hold the counters for system

* service usage */

if (outputBufferLength >=ServiceCounterTableSize) {

/* Output the counters describing the system service usage*/

trace((for (i=1;i<=NumberOfServices;i++)

DbgPrint("%x ",ServiceCounterTable[i])));

trace((DbgPrint("\n")));

/* Copy the counter information in the user supplied buffer */

memcpy(ioBuffer, ServiceCounterTable, ServiceCounterTableSize);

/* Fill in the number of bytes to be returned to the caller */

Irp->IoStatus.Information =ServiceCounterTableSize;

} else {

Irp->IoStatus.Status = STATUS_INSUFFICIENT_RESOURCES;

}

break;

}

default:

Irp->IoStatus.Status =STATUS_INVALID_PARAMETER;

trace(("HOOKINT.SYS: unknown " "IRP_MJ_DEVICE_CONTROL\n"));

break;

}

break;

}

ntStatus = Irp->IoStatus.Status;

IoCompleteRequest (Irp,IO_NO_INCREMENT);

return ntStatus;

}

Page 175: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

VOID

DriverUnload(

IN PDRIVER_OBJECT DriverObject

)

{

WCHAR deviceLinkBuffer[]=L"\\DosDevices\\hookint";

UNICODE_STRING deviceLinkUnicodeString;

PIdtEntry_t IdtEntry;

ExFreePool(ServiceCounterTable);

#ifdef TEST_PAGING

ExFreePool(PagedData);

#endif

/* Reach to IDT */

IdtEntry=(PIdtEntry_t)Idtr->Base;

/* Unplug the interrupt by replacing the offset field in the Interrupt Gate

Descriptor by the old handler address. */

_asm cli

IdtEntry[HOOKINT].OffsetLow =(unsigned short)OldHandler;

IdtEntry[HOOKINT].OffsetHigh =(unsigned short)((unsigned int)OldHandler>16);

_asm sti

RtlInitUnicodeString (&deviceLinkUnicodeString, deviceLinkBuffer);

IoDeleteSymbolicLink (&deviceLinkUnicodeString);

IoDeleteDevice (DriverObject->DeviceObject);

trace(("HOOKINT.SYS: unloading\n"));

}

HANDLER.ASM

.386

.model small

.code

include ..\..\include\undocnt.inc

public _NewHandler

extrn _OldHandler:near

extrn _NewHandlerCFunc@4:near

_NewHandler proc near

Ring0Prolog

STI

push eax

call _NewHandlerCFunc@4

Page 176: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

CLI

Ring0Epilog

jmp dword ptr cs:[_OldHandler]

_NewHandler endp

END

SUMMARY

In this chapter, we discussed interrupt processing in various modes of Intel processors. Then,

we saw how the operating system makes use of interrupts. Next, we discussed the need for

hooking software interrupts. We also explored a mechanism for hooking software interrupts.

We concluded the chapter with an example that hooks Int 2E (the system service interrupt) in

Windows NT.

Page 177: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 10

Adding New Software Interrupts

Page 178: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter explains how interrupts are executed in Windows NT. The authors discuss

processor data structures used while processing the interrupt and present an example that

adds a software interrupt to Windows NT. Another example shows an application that calls the

newly added interrupt.

AS WE SAW IN THE previous chapter, software interrupts are one of the mechanisms used for

calling system services. We have also seen that INT 2E is used for getting the system services

from the Windows NT kernel. By adding new software interrupts, it is possible to add new

system services to the Windows NT kernel. We have already seen one way to add new system

services to the Windows NT kernel, and this is just one more method. In this chapter, we will

not be playing with the operating system data structures as we did in Chapter 7. Instead, we

will use Intel data structures to add new system services.

WHAT HAPPENS WHEN A 32-BIT APPLICATION EXECUTES AN INT NN INSTRUCTION?

Before we proceed with the technique of adding new software interrupts to the Windows NT

kernel, let’s first see what happens when a 32-bit application executes an INT nn type of

instruction. Application programs run at privilege level 3, and the kernel code executes at

privilege level 0. When a 32-bit application program executes an INT nn type of instruction, the

processor first looks at the descriptor entry for the interrupt and verifies that the current

privilege level is at least as high as the descriptor privilege level. If not, the processor raises a

General Protection Fault. If the privilege level of the descriptor allows the interrupt to continue,

the processor switches to the kernel stack. The kernel stack is selected by looking at the field

in the Task State Segment (TSS). After this, the processor pushes the old ring 3 stack pointer

(SS:ESP) and a standard interrupt frame (EFLAGS and CS:EIP) and jumps to the handler

routine specified in the interrupt descriptor table entry. The handler performs its job and finally

executes the IRETD instruction to return to the calling application. When IRETD is executed,

the processor pops off EFLAGS and CS:EIP, notices the switch from ring 0 to ring 3 and pops

off the ring 3 SS:ESP, and then the execution continues from the instruction following the INT

nn instruction.

If you see the descriptor entry for INT 2Eh through a debugger such as SoftICE, you will notice

that its descriptor privilege level is 3. That is why NTDLL.DLL can call INT 2Eh on behalf of the

applications.

ADDING NEW SOFTWARE INTERRUPTS TO THE WINDOWS NT KERNEL

As you saw in the last chapter, an interrupt gate is installed in the IDT for the software

interrupts. Here is the structure of the interrupt gate:

typedef struct InterruptGate {

Page 179: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

unsigned short OffsetLow ;

unsigned short Selector;

unsigned char Reserved;

unsigned char SegmentType:4;

unsigned char SystemSegmentFlag:1;

unsigned char Dpl:2;

unsigned char Present:1;

unsigned short OffsetHigh;

} InterruptGate_t;

There are a few unused interrupts in Windows NT, including INT 20h and INT 22-29h. You can

use these interrupts to add new software interrupts. Following are the steps for adding new

software interrupts:

1. Get the base address of the interrupt descriptor table using the assembly instruction “sidt.” This

instruction stores the base address and limit of IDT at the specified memory location.

2. Treat this base address an a pointer to array of “InterruptGate_t” structures.

3. Index the interrupt number to be added into this table.

4. Fill in the “InterruptGate_t” entry at the index according to the requirements of the interrupt gate.

That is, sNNet the “SegmentType” field to 0Eh meaning interrupt gate; set the

“SystemSegmentFlag” to 0 meaning segment; set the “Selector,” “OffsetLow,” and “OffsetHigh”

fields with the address of the interrupt handler. Set the “Present” field to 1.

5. Establish some mechanism for passing parameters to the interrupt service routine. For example,

INT 2Eh uses the EDX register to point to the user stack frame and the EAX register for the

service ID.

XREF: We have already seen mechanisms used by INT 2Eh handler in Chapter 6.

6. Use the INT nn instructions in your application programs according to the conventions

established in the previous step.

The sample application that illustrates this method adds INT 22h to the Windows NT kernel.

The interrupt handler expects that the EDX register points to the buffer, which will be filled by

the handler with the “Newly added interrupt called” string. The buffer should be at least 29

bytes long.

Following is the device driver that adds a new software interrupt to the Windows NT kernel.

The driver adds the interrupt in its DriverEntry routine and removes the interrupt in its

DrvUnload routine. The full source code for the application that issues this newly added

interrupt is not given. Only the relevant part that issues the interrupt is given here.

Listing 10-1: ADDINT.C

#include "ntddk.h"

#include "stdarg.h"

Page 180: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

#include "stdio.h"

#include "addint.h"

#include "..\include\intel.h"

#include "..\include\undocnt.h"

/* Old Idt Entry */

IdtEntry_t OldIdtEntry;

/* Interrupt Handler */

extern void _cdecl InterruptHandler();

/* Buffer to store result of sidt instruction */

char buffer[6];

/* Pointer to structure to identify the limit and base of IDTR*/

PIdtr_t Idtr=(PIdtr_t)buffer;

void _cdecl CFunc()

{

}

NTSTATUS AddInterrupt()

{

PIdtEntry_t IdtEntry;

/* Get the Base and Limit of IDTR Register */

_asm sidt buffer

IdtEntry=(PIdtEntry_t)Idtr->Base;

if((IdtEntry[ADDINT].OffsetLow!=0)||(IdtEntry[ADDINT].OffsetHigh!=0))

return STATUS_UNSUCCESSFUL;

/* Save away the old IDT entry */

memcpy(&OldIdtEntry, &IdtEntry[ADDINT], sizeof(OldIdtEntry));

_asm cli

/* Initialize the IDT entry according to the interrupt gate requirement */

IdtEntry[ADDINT].OffsetLow=(unsigned short)InterruptHandler;

IdtEntry[ADDINT].Selector=8;

IdtEntry[ADDINT].Reserved=0;

IdtEntry[ADDINT].Type=0xE;

IdtEntry[ADDINT].Always0=0;

IdtEntry[ADDINT].Dpl=3;

IdtEntry[ADDINT].Present=1;

IdtEntry[ADDINT].OffsetHigh=(unsigned short)((unsigned int)

InterruptHandler>16);

Page 181: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

_asm sti

return STATUS_SUCCESS;

}

NTSTATUS

DriverEntry(

IN PDRIVER_OBJECT DriverObject,

IN PUNICODE_STRING RegistryPath

)

{

MYDRIVERENTRY(DRIVER_DEVICE_NAME,

FILE_DEVICE_ADDINT,AddInterrupt());

return ntStatus;

}

void RemoveInterrupt()

{

PIdtEntry_t IdtEntry;

/* Reach to IDT */

IdtEntry=(PIdtEntry_t)Idtr->Base;

_asm cli

/* Restore the old IdtEntry */

memcpy(&IdtEntry[ADDINT], &OldIdtEntry, sizeof(OldIdtEntry));

_asm sti

}

NTSTATUS

DriverDispatch(

IN PDEVICE_OBJECT DeviceObject,

IN PIRP Irp

)

{

Irp->IoStatus.Status = STATUS_SUCCESS;

IoCompleteRequest (Irp,

IO_NO_INCREMENT

);

return Irp->IoStatus.Status;

}

VOID

DriverUnload(

IN PDRIVER_OBJECT DriverObject

)

{

Page 182: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

WCHAR deviceLinkBuffer[] =

L"\\DosDevices\\"DRIVER_DEVICE_NAME;

UNICODE_STRING deviceLinkUnicodeString;

RemoveInterrupt();

RtlInitUnicodeString (&deviceLinkUnicodeString,

deviceLinkBuffer

);

IoDeleteSymbolicLink (&deviceLinkUnicodeString);

IoDeleteDevice (DriverObject->DeviceObject);

trace(("ADDINT.SYS: unloading\n"));

}

Listing 10-2: HANDLER.ASM

.386

.model small

.code

public _InterruptHandler

extrn _CFunc:near

include ..\include\undocnt.inc

_InterruptHandler proc

Ring0Prolog

mov edi, edx

test edi, edi

jz NullPointer

lea esi, message

mov ecx, messagelen

repz movsb

NullPointer:

call _CFunc

Ring0Epilog

iretd

message db "Newly added interrupt called.", 0

messagelen dd $-message

_InterruptHandler endp

End

Listing 10-3: ADDINTAPP.C

#include <windows.h>

#include <stdio.h>

#include "addint.h"

main()

{

Page 183: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

char buffer[100];

__try {

_asm {

lea edx, buffer

int 22h

}

}

__except (EXCEPTION_EXECUTE_HANDLER) {

printf("Exception occurred, make sure that addint.sys is

installed and started\n");

return 0;

}

printf("Buffer filled by the interrupt handler = %s\n",buffer);

return 0;

}

USING CALLGATES TO EXECUTE PRIVILEGED CODE

Next, we will discuss one generic method of executing ring 0 instructions from a user-level

application running at ring 3 with the help of a device driver. This is an equivalent of RING0 by

Matt Pietek, which appeared in the May 1993 edition of Microsoft Systems Journal in an article

called "Run Privileged Code from Your Windows-based Program Using CallGates." This may

be used for performing direct port I/O under Windows NT (refer to "Direct Port I/O and

Windows NT" by Dale Roberts, Dr. Dobb’s Journal of Software Tools, May 1996). The whole

trick of running ring 0 instructions at ring 3 is based on the concept of callgates.

Callgates are mechanisms that facilitate controlled and secure communication from a lower

privilege level to higher privilege level. Right now we will consider the control transfer from ring

3 to ring 0 since Windows NT uses only these two privilege levels. It is as if you have ring 3

and ring 0 code on two sides of a callgate, with the callgate acting as an intermediary between

the two. The callgate enables messages to pass from one ring to the other.

When creating a callgate, you have to specify the address of each side of the fence and the

number of parameters to be passed from one side of the fence to the other. The privilege level

of the callgate dictates which processes have access to it. When the control is transferred

though the callgate, the processor switches to the ring 0 stack. This stack is selected by

looking at the TSS. The TSS contains the stack for each privilege level. After this, the

processor pushes the ring 3 SS:ESP on this new stack. Then the processor copies the number

of parameters specified by the callgate from the ring 3 stack to the ring 0 stack. Parameters

are in terms of the number of DWORDS for 32-bit callgates and the number of WORDS for a

16-bit callgate. Further, the processor pushes the ring 3 CS:EIP onto the stack and jumps to

the address specified in the callgate. The function at ring 0 is responsible for cleaning the

parameters from the stack once it has finished executing. In the end, the ring 0 code should

execute a retf nn instruction to clean up the stack and return control to the ring 3 code.

Page 184: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The sample accompanying this technique is based on the sample program PHYS.EXE

demonstrated in Matt Pietrek’s Windows 95 Programming Secrets(IDG Books Worldwide).

The sample shows you how you can use the same trick under Windows NT. The sample uses

three undocumented functions in NTOSKRNL.EXE. These functions enable you to allocate

and release selectors from the Global Descriptor Table (GDT) and modify the descriptor

entries corresponding to the selectors. Use of the following undocumented functions prevents

the need to directly manipulate Intel data structures such as the GDT.

NTSTATUS

KeI386AllocateGdtSelectors(

unsigned short *SelectorArray,

int NumberOfSelectors);

The function allocates the specified number of selectors from the GDT and fills in the

SelectorArray with the allocated selector values. NTOSKRNL keeps a linked list of free

selectors in the descriptor itself. Also, NTOSKRNL keeps track of the number of free selectors.

The function checks whether the specified number of selectors is present. If enough selectors

are available, the function removes those selectors from the free list and gives the list to the

caller. Interestingly, these functions are exported from the NTOSKRNL.EXE file, so any driver

can use them. Other functions also enable descriptor queries and other tasks, but they are not

exported.

NTSTATUS

KeI386ReleaseGdtSelectors(

unsigned short *SelectorArray,

int NumberOfSelectors);

The function releases the specified number of selectors. The selectors are specified in the

array SelectorArray. The function updates the variable that keeps track of the number of

selectors and inserts these selectors in the free list of selectors.

NTSTATUS

KeI386SetGdtSelector(unsigned int Selector, void *);

This function fills in the descriptor corresponding to a particular selector. The second

parameter should be a pointer to a descriptor entry.

HOW TO USE THE CALLGATE TECHNIQUE

The following sample shows how you can perform direct-to-port I/O and run privileged

instructions from a user-level application with the callgate technique. A device driver is

provided that enables the user application to allocate and release the callgates. The user-level

Page 185: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

application contains a function that does direct port I/O to get the base memory size and

extended memory size from CMOS data. The application also prints the contents of CPU

control registers such as CR0, CR2. The instructions for accessing these registers are

privileged.

The sample comprises three modules:

§ CALLGATE.SYS, which provides the functions for allocating and releasing the GDT selectors.

§ The user mode DLL called CGATEDLL.DLL, which provides wrappers for calling the functions in

CALLGATE.SYS. This DLL uses DeviceIoControl to talk to CALLGATE.SYS.

§ The user mode application CGATEAPP.EXE, which uses wrappers in CGATEDLL.DLL to

demonstrate the sample. CGATEAPP.EXE contains the function that does direct port I/O and

tries to access the processor control registers.

The function in CGATEAPP.EXE that runs ring 0 code is written in Assembly language due to

the restrictions imposed by the 32-bit compiler. These restrictions are discussed in Matt

Pietrek’s Windows 95 Programming Secrets, but we will summarize those points again. The

function that is called through callgate has to make a far return, whereas a standard 32-bit

compiler generates a near return. Also, the function gets called as a far call, so the stack frame

is not compatible with the one generated by a standard 32-bit compiler. The 32-bit compiler

generates code in such a way that it expects the first parameter to be at [EBP+8] once it sets

up the stack frame with PUSH EBP, MOV EBP, and ESP. However, because the function gets

called as a far call, the first parameter is present at [EBP+0Ch].

Listing 10-4: CALLGATE.C

#include "ntddk.h"

#include "stdarg.h"

#include "stdio.h"

#include "callgate.h"

#include "..\include\intel.h"

#include "..\include\undocnt.h"

/* This function creates a callgate on request from the application

and returns the callgate to the application, which the application can

use to run privileged instructions from user level application */

NTSTATUS CreateCallGate(PCallGateInfo_t CallGateInfo)

{

static CALLGATE_DESCRIPTOR callgate_desc;

static CODE_SEG_DESCRIPTOR ring0_desc;

unsigned short SelectorArray[2];

NTSTATUS rc;

#define LOWORD(l) ((unsigned short) (unsigned int)(l))

#define HIWORD(l) ((unsigned short) ((((unsigned int)(l)) > 16) &

0xFFFF))

Page 186: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

rc=KeI386AllocateGdtSelectors(SelectorArray, 0x02);

if (rc!=STATUS_SUCCESS) {

trace(("Unable to allocate selectors from GDT\n"));

return rc;

}

trace(("Selectors allocated = %x %x\n", SelectorArray[0],SelectorArray[1]));

/* Fill the descriptor according to the requirements of the code descriptor

*/

ring0_desc.limit_0_15 = 0xFFFF;

ring0_desc.base_0_15 = 0;

ring0_desc.base_16_23 = 0;

ring0_desc.readable = 1;

ring0_desc.conforming = 0;

ring0_desc.code_data = 1;

ring0_desc.app_system = 1;

ring0_desc.dpl = 0;

ring0_desc.present = 1;

ring0_desc.limit_16_19 = 0xF;

ring0_desc.always_0 = 0;

ring0_desc.seg_16_32 = 1;

ring0_desc.granularity = 1;

ring0_desc.base_24_31 = 0;

/* Fill the descriptor according to the requirements of the call gate descriptor

*/

callgate_desc.offset_0_15 = LOWORD( CallGateInfo->FunctionLinearAddress );

callgate_desc.selector = SelectorArray[0];

callgate_desc.param_count = CallGateInfo->NumberOfParameters;

callgate_desc.some_bits = 0;

callgate_desc.type = 0xC; // 386 call gate

callgate_desc.app_system = 0; // A system descriptor

callgate_desc.dpl = 3; // Ring 3 code can call

callgate_desc.present = 1;

callgate_desc.offset_16_31 = HIWORD(CallGateInfo->FunctionLinearAddress);

/* Return to the caller application the selectors allocated,caller is only

interested in CallGateSelector */

CallGateInfo->CodeSelector=SelectorArray[0];

CallGateInfo->CallGateSelector=SelectorArray[1];

/* Set the descriptor entry for code selector */

rc=KeI386SetGdtSelector(SelectorArray[0], &ring0_desc);

Page 187: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

if (rc!=STATUS_SUCCESS) {

trace(("SetGdtSelector=%x\n", rc));

KeI386ReleaseGdtSelectors(SelectorArray, 0x02);

return rc;

}

/* Set the descriptor entry for call gate selector */

rc=KeI386SetGdtSelector(SelectorArray[1], &callgate_desc);

if (rc!=STATUS_SUCCESS) {

trace(("SetGdtSelector=%x\n", rc));

KeI386ReleaseGdtSelectors(SelectorArray, 0x02);

return rc;

}

/* Return success */

return STATUS_SUCCESS;

}

/* This function releases the previously allocated callgate */

NTSTATUS ReleaseCallGate(PCallGateInfo_t CallGateInfo)

{

unsigned short SelectorArray[2];

int rc;

SelectorArray[0]=CallGateInfo->CodeSelector;

SelectorArray[1]=CallGateInfo->CallGateSelector;

rc=KeI386ReleaseGdtSelectors(SelectorArray, 0x02);

if (rc!=STATUS_SUCCESS) {

trace(("ReleaseGDTSelectors failed, rc=%x\n", rc));

}

return rc;

}

NTSTATUS

DriverEntry(

IN PDRIVER_OBJECT DriverObject,

IN PUNICODE_STRING RegistryPath

)

{

MYDRIVERENTRY(DRIVER_DEVICE_NAME, FILE_DEVICE_CALLGATE, ntStatus);

return ntStatus;

}

NTSTATUS

DriverDispatch(

IN PDEVICE_OBJECT DeviceObject,

Page 188: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

IN PIRP Irp

)

{

PIO_STACK_LOCATION irpStack;

PVOID ioBuffer;

ULONG inputBufferLength;

ULONG outputBufferLength;

ULONG ioControlCode;

NTSTATUS ntStatus;

Irp->IoStatus.Status = STATUS_SUCCESS;

Irp->IoStatus.Information = 0;

irpStack = IoGetCurrentIrpStackLocation (Irp);

ioBuffer = Irp->AssociatedIrp.SystemBuffer;

inputBufferLength =

irpStack->Parameters. deviceIoControl.InputBufferLength;

outputBufferLength =

irpStack->Parameters.deviceIoControl.OutputBufferLength;

switch (irpStack->MajorFunction)

{

case IRP_MJ_DEVICE_CONTROL:

trace(("CALLGATE.SYS: IRP_MJ_DEVICE_CONTROL\n"));

ioControlCode = irpStack->Parameters. DeviceIoControl.IoControlCode;

switch (ioControlCode)

{

case IOCTL_CALLGATE_CREATE:

{

PCallGateInfo_t CallGateInfo;

CallGateInfo=(PCallGateInfo_t)ioBuffer;

Irp->IoStatus.Status=CreateCallGate(CallGateInfo);

trace(("CreateCallGate rc=%x\n", Irp->IoStatus.Status));

if (Irp->IoStatus.Status==STATUS_SUCCESS) {

Irp->IoStatus.Information = sizeof(CallGateInfo_t);

}

break;

}

case IOCTL_CALLGATE_RELEASE:

{

PCallGateInfo_t CallGateInfo;

CallGateInfo=(PCallGateInfo_t)ioBuffer;

ntStatus=ReleaseCallGate(CallGateInfo);

trace(("ReleaseCallGate rc=%x\n", ntStatus));

Page 189: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

break;

}

default:

Irp->IoStatus.Status = STATUS_INVALID_PARAMETER;

trace(("CALLGATE.SYS: unknown IRP_MJ_DEVICE_CONTROL\n"));

break;

}

break;

}

ntStatus = Irp->IoStatus.Status;

IoCompleteRequest (Irp, IO_NO_INCREMENT);

return ntStatus;

}

VOID

DriverUnload(

IN PDRIVER_OBJECT DriverObject)

{

WCHAR deviceLinkBuffer[] =

L"\\DosDevices\\"DRIVER_DEVICE_NAME;

UNICODE_STRING deviceLinkUnicodeString;

RtlInitUnicodeString (&deviceLinkUnicodeString,

deviceLinkBuffer

);

IoDeleteSymbolicLink (&deviceLinkUnicodeString);

IoDeleteDevice (DriverObject->DeviceObject);

trace(("CALLGATE.SYS: unloading\n"));

}

Listing 10-5: CGATEDLL.C

#include <windows.h>

#include <winioctl.h>

#include "callgate.h"

#include "gate.h"

HANDLE hCallgateDriver=INVALID_HANDLE_VALUE;

WORD CodeSelectorArray[8192];

void OpenCallgateDriver()

{

char completeDeviceName[64] = "";

strcpy (completeDeviceName, "\\\\.\\callgate");

hCallgateDriver = CreateFile (completeDeviceName,

GENERIC_READ | GENERIC_WRITE,

Page 190: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

0,

NULL,

OPEN_EXISTING,

FILE_ATTRIBUTE_NORMAL,

NULL

);

}

void CloseCallgateDriver()

{

if (hCallgateDriver!=INVALID_HANDLE_VALUE) {

CloseHandle(hCallgateDriver);

}

}

int WINAPI CreateCallGate(void *FunctionAddress,

int NumberOfParameters,

PWORD pSelector)

{

CallGateInfo_t CallGateInfo;

DWORD BytesReturned;

if (hCallgateDriver==INVALID_HANDLE_VALUE) {

return ERROR_DRIVER_NOT_FOUND;

}

if (!pSelector) return ERROR_BAD_PARAMETER;

memset(&CallGateInfo, 0, sizeof(CallGateInfo));

CallGateInfo.FunctionLinearAddress=FunctionAddress;

CallGateInfo.NumberOfParameters=NumberOfParameters;

if (!DeviceIoControl(hCallgateDriver,

(DWORD)IOCTL_CALLGATE_CREATE,

&CallGateInfo,

sizeof(CallGateInfo),

&CallGateInfo,

sizeof(CallGateInfo),

&BytesReturned,

NULL)) {

return ERROR_IOCONTROL_FAILED;

}

*pSelector=CallGateInfo.CallGateSelector;

CodeSelectorArray[CallGateInfo.CallGateSelector]=

CallGateInfo.CodeSelector;

return SUCCESS;

}

Page 191: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

int WINAPI FreeCallGate(WORD CallGateSelector)

{

CallGateInfo_t CallGateInfo;

DWORD BytesReturned;

if (hCallgateDriver == INVALID_HANDLE_VALUE) {

return ERROR_DRIVER_NOT_FOUND;

}

if(CallGateSelector >=

sizeof(CodeSelectorArray)/sizeof(CodeSelectorArray[0])) {

return ERROR_BAD_PARAMETER;

}

memset(&CallGateInfo, 0, sizeof(CallGateInfo));

CallGateInfo.CallGateSelector = CallGateSelector;

CallGateInfo.CodeSelector = CodeSelectorArray[CallGateSelector];

if (!DeviceIoControl(hCallgateDriver,

(DWORD)IOCTL_CALLGATE_RELEASE,

&CallGateInfo,

sizeof(CallGateInfo),

&CallGateInfo,

sizeof(CallGateInfo),

&BytesReturned,

NULL)) {

return ERROR_IOCONTROL_FAILED;

}

return SUCCESS;

}

BOOL WINAPI DllMain(HANDLE hModule, DWORD Reason, LPVOID lpReserved)

{

switch (Reason) {

case DLL_PROCESS_ATTACH:

OpenCallgateDriver();

return TRUE;

case DLL_PROCESS_DETACH:

CloseCallgateDriver();

return TRUE;

default:

return TRUE;

}

}

Listing 10-6: CGATEAPP.C

/*

CGATEAPP.C

Page 192: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Copyright (C) 1997 Prasad Dabak and Sandeep Phadke and Milind Borate

Sample application that uses CGATEDLL.DLL API for creating callgates

*/

#include <windows.h>

#include <stdio.h>

#include "gate.h"

void DumpBaseMemory()

{

unsigned short BaseMemory;

outp( 0x70, 0x15 );

BaseMemory = inp( 0x71 );

outp( 0x70, 0x16 );

BaseMemory += inp(0x71) << 8;

printf("Base memory = %dK\n", BaseMemory);

}

void DumpExtendedMemory()

{

unsigned short ExtendedMemory;

outp( 0x70, 0x17 );

ExtendedMemory = inp( 0x71 );

outp( 0x70, 0x18 );

ExtendedMemory += inp(0x71) << 8;

printf("Extended memory = %dK\n", ExtendedMemory);

}

void DumpControlRegisters()

{

DWORD mcr0, mcr2, mcr3;

_asm {

mov eax, cr0

mov mcr0, eax;

}

_asm {

mov eax, cr2

mov mcr2, eax;

}

_asm {

mov eax, cr3

mov mcr3, eax;

}

Page 193: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

printf("CR0 = %x\n", mcr0);

printf("CR2 = %x\n", mcr2);

printf("CR3 = %x\n", mcr3);

}

void cfunc()

{

DumpBaseMemory();

DumpExtendedMemory();

DumpControlRegisters();

}

/* Declare the function present in RING0.ASM */

void func(void);

main()

{

WORD CallGateSelector;

int rc;

short farcall[3];

__try {

cfunc();

}

__except (EXCEPTION_EXECUTE_HANDLER) {

printf("Direct port I/O and CPU control registers access without callgate

raised exception!!\n");

}

printf("Now, performing direct port I/O and Control register access using

callgates..\n\n");

/* Create a callgate for function 'func' which takes '3'

parameters and get the callgate selector value in 'CallGateSelector'*/

rc=CreateCallGate(func, 0, &CallGateSelector);

/* Check if callgate creation succeeds */

if (rc==SUCCESS) {

/*Prepare for making the far call. Forget about the offset

portion of far call, so no need to think about first two

elements of farcall array */

farcall[2]=CallGateSelector;

_asm {

/*Make a far call*/

call fword ptr [farcall]

}

/* Release the callgate created using CreateCallGate*/

Page 194: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

rc=FreeCallGate(CallGateSelector);

if (rc!=SUCCESS) {

printf("FreeCallGate failed, CallGateSelector=%x,rc=%x\n",

CallGateSelector, rc);

}

} else {

printf("CreateCallGate failed, rc=%x\n", rc);

}

return 0;

}

Listing 10-7: RING0.ASM

.386

.model small

.code

public _func

extrn _cfunc:near

include ..\include\undocnt.inc

_func proc

Ring0Prolog

call _cfunc

Ring0Epilog

retf

_func endp

END

PAGING ISSUES

While writing the callgate sample, we observed that there are certain issues regarding

accessing the paged/swapped out data in the interrupt routine and also in the function called

through callgate. All the existing interrupt handlers such as INT 2Eh were seen to follow certain

entry and exit code before performing any real work. Some of the tasks performed by the entry

code were:

1. Creates some space on stack.

2. Prepares a trap frame that will record the state of some of the CPU registers.

3. Saves away some of the fields in Thread Environment Block such as processor mode and one

field in TEB, which SoftICE calls as "KSS EBP." We don’t know the exact meaning of this, but its

seems that each interrupt handler should set this field to the trap frame created in previous step.

4. Saves away the contents of FS register and sets FS register to 0x30.

Out of all these steps, the first step is absolutely necessary and is related to the logic used by

page fault handler of the operating system. The page fault handler does some arithmetic on

Page 195: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

the current stack pointer and the stack pointer at the time of ring transition from ring 3 to ring 0

and take some decisions. If at least a specific amount of stack space is not found between

these two stack pointer values, then the system crashes with a Blue Screen.

It is essential that you follow this while writing interrupt handlers or functions executed through

callgate to successfully access paged out data. The fourth step of setting FS register to 0x30 is

also necessary since the system expects FS register to point to Processor Control Region

when the thread is executing in ring 0 and the selector 0x30 points to the descriptor with the

base address equal to address of processor control region.

Note: Note that you have to follow the same steps while hooking software interrupts.

The second and third step seems to be only for bookkeeping information.

All the samples in this book that use callgates or interrupt handlers use a macro defined in

UNDOCNT.INC file called Ring0Prolog and Ring0Epilog. These macros implement the code,

which takes care of these paging issues.

SUMMARY

In this chapter, we detailed how interrupts are executed under Windows NT. Then we

discussed a mechanism for adding new software interrupts. Along the way, we discussed

some processor data structures used while processing the interrupt and presented an example

that adds a software interrupt (0x22) to Windows NT. We also showed an example of an

application that calls the newly added interrupt. After that, we discussed callgates, used for

running ring 0 code from ring 3. This was followed by an example that demonstrated how to

use callgates to read processor control registers such as CR0, CR3 and do direct port I/O from

ring 3. The chapter concluded with the discussion about the paging issues while executing

functions through callgates and interrupt handlers.

Page 196: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Chapter 11

Portable Executable File Format

Page 197: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Abstract

This chapter gives you a comprehensive picture of the Portable Executable file format for

Windows NT. The PE format is portable across all Microsoft 32-bit operating systems.

MICROSOFT INTRODUCED A NEW executable file format with Windows NT. This format is called the

Portable Executable (PE) format because it is supposed to be portable across all 32-bit

operating systems by Microsoft. The same PE format executable can be executed on any

version of Windows NT, Windows 95, and Win32s. Also, the same format is used for

executables for Windows NT running on processors other than Intel x86, such as MIPS, Alpha,

and Power PC. The 32-bit DLLs and Windows NT device drivers also follow the same PE

format.

It is helpful to understand the PE file format because PE files are almost identical on disk and

in RAM. Learning about the PE format is also helpful for understanding many operating system

concepts. For example, how operating system loader works to support dynamic linking of DLL

functions, the data structures involved in dynamic linking such as import table, export table,

and so on.

The PE format is not really undocumented. The WINNT.H file has several structure definitions

representing the PE format. The Microsoft Developer 抯 Network (MSDN) CD-ROMs contain

several descriptions of the PE format. However, these descriptions are in bits and pieces, and

are by no means complete. In this chapter, we try to give you a comprehensive picture of the

PE format.

Microsoft also provides a DLL with the SDK that has utility functions for interpreting PE files.

We also discuss these functions and correlate them with other information about the PE format.

OVERVIEW OF A PE FILE

In this section, we discuss the overall structure of a PE file. In the sections that follow, we go

into detail about the PE format. A PE file comprises various sections. Because Microsoft’s

32-bit operating systems follow the flat memory model, an executable no longer contains

segments. Still, different parts of an executable, such as code and data, have different

characteristics. These different parts of an executable are stored as different sections. Thus, a

PE file is a concatenation of data stored in sections.

A few sections are always present in a PE file generated by the Microsoft linker. Other linkers

may generate similar sections with different names. A PE file generated with the Microsoft

linker has a .text section that contains the code bytes concatenated from all the object files. As

for the data, it can be classified into different categories. The .data section contains all the

initialized global and static data, while the .bss section contains the uninitialized data. The

Page 198: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

read-only data, such as string literals and constants, is stored in the .rdata section. This

section also contains some other read-only structures, such as the debug directory, the Thread

Local Storage (TLS) directory, and so on, which we explain later in this chapter. The .edata

section contains information about the functions exported from a DLL, while the .idata section

stores information about the functions imported by an executable or a DLL. The .rsrc section

contains various resources, such as menus and dialog boxes. The .reloc section stores the

information required for relocating the image while loading.

The names of the sections do not have any significance. As mentioned earlier, different linkers

may use different names for the sections. Programmers can also create new sections of their

own. The #pragma code_seg and #pragma data_seg macros can be used to create new

sections while working with Microsoft compiler. The operating system loader locates the

required piece of information from the data directories present in the file headers. Shortly, we

will present an overview of file headers and then look at them in more detail.

STRUCTURE OF A PE FILE

Apart from the sections consisting of the actual data, a PE file contains various headers that

describe the sections and the important information present in the sections.

If you look at the hex dump of a PE file, the first 2 bytes might look familiar. Aren’t they M and Z?

Yes, a PE file starts with the DOS executable header. It is followed by a small program that

prints an error message saying that the program cannot be run in DOS mode. It’s the same

idea that was used in 16-bit Windows executables. This program code is executed, if the PE

image is run under DOS.

After the DOS header and the DOS executable stub comes the PE header. A field in the DOS

header points to this new header. The PE header starts with the 4-byte signature “PE” followed

by two nulls. The PE format is based on the Common Object File Format (COFF) used by Unix.

The PE signature is followed by the object file header borrowed from COFF. This header is

present also for the object files produced by Microsoft’s 32-bit compilers. This header contains

some general information about the file, such as the target machine ID, the number of sections

in the file, and so forth. The COFF style header is followed by the optional header. This header

is optional in the sense that it is not required for the object files. As far as executables and

DLLs are concerned, this header is mandatory. The optional header has two parts. The first

part is inherited from COFF and can be found in all COFF files. The second part is an

NT-specific extension of COFF. Apart from other NT-specific information, such as the

subsystem type, this part also contains the data directory. The data directory is an array in

which each entry points to some important piece of information. One of the entries in the data

directory points to the import table of the executable or DLL, another entry points to the export

table of the DLL, and so on.

XREF: We will look at the detailed formats of the different pieces of information later in this

Page 199: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

chapter.

The data directory is followed by the section table. The section table is an array of section

headers. A section header summarizes the important information about the respective section.

Finally, the section table is followed by the sections themselves.

We hope that this gives you an overview of the organization of a PE file. Before diving into the

details of the PE format, let’s discuss a concept that is vital in interpreting a PE file.

RELATIVE VIRTUAL ADDRESS

All the offsets within a PE file are denoted as Relative Virtual Addresses (RVAs). An RVA is an

offset from the base address at which an executable is loaded in memory. This is not the same

as the offset within the file because of the section alignment requirements. The PE header

specifies the section alignment requirements for an executable image. A section has to be

loaded at a memory address that is a multiple of the section alignment. The section alignment

has to be a multiple of the page size. This is because different sections have different page

attribute requirements; for example, the .data section needs read-write permissions, while

the .text section needs read-execute permissions. Hence, a page cannot span section

boundaries.

Because the PE format always talks in terms of RVAs, it’s difficult to find the location of the

required information within a file. A common practice while accessing a PE file is to map the file

in memory using the Win32 memory mapping API. It’s a bit complicated to calculate the

address for the given RVA in this memory-mapped file. You first need to find out the section in

which the given RVA lies. You can accomplish this by iterating through the section table. Each

section header stores the starting RVA for the section and the size of the section. A section is

guaranteed to be contiguously loaded in memory. Hence, the offset from the start of the

section for a particular piece of data is bound to be the same whether the file is memory

mapped or loaded by the operating system loader for execution. Hence, to find out the address

in a memory -mapped file, you simply need to add this offset to the base address of the section

in the memory-mapped file. Now, this base address can be calculated from within the file offset

of the section, which is also stored in the respective section header. Quite an easy procedure,

isn’t it?

ImageRvaToVa() Don’t worry, there is an easier way out. Microsoft comes to our rescue here with

IMAGEHLP.DLL. This DLL exports a function that computes the address in the

memory-mapped file, given an RVA.

LPVOID ImageRvaToVa(

PIMAGE_NT_HEADERS NtHeaders,

Page 200: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

LPVOID Base,

DWORD Rva,

PIMAGE_SECTION_HEADER *LastRvaSection

);

PARAMETERS

NtHeaders Pointer to an IMAGE_NT_HEADERS structure. This structure represents

the PE header and is defined in the WINNT.h file. A pointer to the PE

header within a PE file can be obtained using the ImageNtHeader()

function exported by IMAGEHLP.DLL.

Base Base address where the PE file is mapped into memory using the Win32

API for the memory mapping of files.

Rva Given relative virtual address.

LastRvaSection Last RVA section. This is an optional parameter, and you can pass NULL.

When specified, it points to a variable that contains the last section value

used for the specified image to translate an RVA to a VA. This is used for

optimizing the section search, in case the given RVA also falls within the

same section as the one for the previous call to the function. The

LastRVASection is checked first, and the regular sequential search for the

section is carried out only if the given RVA does not fall within the

LastRVASection.

RETURN VALUES

If the function succeeds, the return value is the virtual address in the mapped file; otherwise, it

is NULL. The error number can be retrieved using the GetLastError() function.

ImageNtHeader() The ImageRvaToVa() function needs a pointer to the PE header. The ImageNtHeader

exported from the IMAGEHLP.DLL can provide you this pointer.

PIMAGE_NT_HEADERS ImageNtHeader(

LPVOID ImageBase

);

PARAMETERS

ImageBase Base address where the PE file is mapped into memory using the Win32

API for the memory mapping of files.

RETURN VALUES

Page 201: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

If the function succeeds, the return value is a pointer to the IMAGE_NT_HEADERS structure

within the mapped file; otherwise, it returns NULL.

MapAndLoad() The IMAGEHLP.DLL can also take care of memory mapping a PE file for you. The

MapAndLoad() function maps the requested PE file in memory and fills in the

LOADED_IMAGE structure with some useful information about the mapped file.

BOOL MapAndLoad(

LPSTR ImageName,

LPSTR DllPath,

PLOADED_IMAGE LoadedImage,

BOOL DotDll,

BOOL ReadOnly

);

PARAMETERS

ImageName Name of the PE file that is loaded.

DllPath Path used to locate the file if the name provided cannot be found. If NULL

is passed, then normal rules for searching using the PATH environment

variable are applied.

LoadedImage The structure LOADED_IMAGE is defined in the IMAGEHLP.H file. The

structure has the following members:

ModuleName Name of the loaded file.

hFile Handle obtained through the call to CreateFile.

MappedAddress Memory address where the file is mapped.

FileHeader Pointer to the PE header within the mapped file.

LastRvaSection The function sets it to the first section (see ImageRvaToVa).

NumberOfSections Number of sections in the loaded PE file.

Sections Pointer to the first section header within the mapped file.

Characteristics Characteristics of the PE file (this is explained in more detail later in this

chapter).

fSystemImage Flag indicating whether it is a kernel-mode driver/DLL.

Page 202: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

fDOSImage Flag indicating whether it is a DOS executable.

Links List of loaded images.

SizeOfImage Size of the image.

The function sets the members in the structure appropriately after loading the PE file.

DotDll If the file needs to be searched and does not have an extension, then either

the .exe or the .dll extension is used. If the DotDll flag is set to TRUE,

the .dll extension is used; otherwise, the .exe extension is used.

ReadOnly If the flag is set to TRUE, the file is mapped as read-only.

RETURN VALUES

If the function succeeds, the return value is TRUE; otherwise, it is FALSE.

UnMapAndLoad() After you are done with the mapped file, you should call the UnMapAndLoad() function. This

function unmaps the PE file and deallocates the resources allocated by the MapAndLoad()

function.

BOOL UnMapAndLoad(

PLOADED_IMAGE LoadedImage

);

PARAMETERS

LoadedImage Pointer to a LOADED_IMAGE structure that is returned from a call to

the MapAndLoad() function.

RETURN VALUES

If the function succeeds, the return value is TRUE; otherwise, it is FALSE.

We will discuss the other useful functions from this DLL as we continue in this chapter.

DETAILS OF THE PE FORMAT

The WINNT.H file has the structure definitions representing the PE format. We refer to these

structure definitions while describing the PE format. Let’s begin at the beginning. The DOS

header that comes at the beginning of a PE file does not contain much important information

from the PE viewpoint. The fields in this header have values pertaining to the DOS executable

stub that follows this header. The only important field as far as PE format is considered is

e_lfanew, which holds the offset to the PE header. You can add this offset to the base of the

Page 203: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

memory-mapped file to get the address of the PE header. You can also use the

ImageNtHeader() function explained earlier, or simply use the FileHeader field from the

LOADED_IMAGE after a call to the MapAndLoad() function.

The IMAGE_NT_HEADERS structure that represents the PE header is defined as follows in

the WINNT.H file:

typedef struct _IMAGE_NT_HEADERS {

DWORD Signature;

IMAGE_FILE_HEADER FileHeader;

IMAGE_OPTIONAL_HEADER OptionalHeader;

} IMAGE_NT_HEADERS, *PIMAGE_NT_HEADERS;

The signature is PE followed by two nulls, as mentioned earlier. The COFF style header is

represented by the IMAGE_FILE_HEADER structure and is followed by the optional header

represented by the IMAGE_OPTIONAL_HEADER structure. The fields in the COFF style

header are as follows:

MachineTarget machine ID. Various values are defined in the WINNT.H file–for example,

0x14C is used for Intel 80386 (and compatibles) and 0x184 is used for Alpha AXP.

NumberOfSections Number of sections in the file.

TimeDateStamp Time and date when the file was created.

PointerToSymbolTable Offset to the COFF symbol table. This

field is used only for COFF style object

files and PE files with COFF style debug

information.

NumberOfSymbols Number of symbols present in the symbol

table.

SizeOfOptionalHeader Size, in bytes, of the optional header that

follows this header. This data can be

used in locating the string table that

immediately follows the symbol table.

This field is set to 0 for the object files

because the optional header is absent in

them.

Characteristics Attributes of the file. The flag values are

defined in the WINNT.H file. This field

contains an OR of these flags. The

Page 204: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

important flags are as follows:

IMAGE_FILE_EXECUTABLE_IMAGE Set for an executable file.

IMAGE_FILE_SYSTEM Indicates that it is a kernel-mode

driver/DLL.

IMAGE_FILE_DLL The file is a dynamic link library (DLL).

IMAGE_FILE_UP_SYSTEM_ONLY This file should be run only on an UP

machine.

IMAGE_FILE_LINE_NUMS_STRIPPED Indicates that the COFF line numbers

have been removed from the file.

IMAGE_FILE_LOCAL_SYMS_STRIPPED Indicates that the COFF symbol table has

been removed from the file.

IMAGE_FILE_DEBUG_STRIPPED Indicates that the debugging information

has been removed from the file.

IMAGE_FILE_RELOCS_STRIPPED Indicates that the base relocation

information is stripped from this file, and

the file can be loaded only at the

preferred base address. If the loader

cannot load such an image at the

preferred base address, it fails because it

cannot relocate the image.

IMAGE_FILE_AGGRESIVE_WS_TRIM Aggressively trim working set.

IMAGE_FILE_BYTES_REVERSED_LO Little endian: the least significant bit

(LSB) precedes the most significant bit

(MSB) in memory, but they are stored in

reverse order.

IMAGE_FILE_BYTES_REVERSED_HI Big endian: the MSB precedes the LSB in

memory, but they are stored in reverse

order.

IMAGE_FILE_32BIT_MACHINE The target machine is based on

32-bit-word architecture.

IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP If this flag is set and the file is run from a

removable media, such as a floppy, the

loader copies the file to the swap area

and runs it from there.

IMAGE_FILE_NET_RUN_FROM_SWAP Similar to the previous flag. It is run from

swap if the file is run from a network

Page 205: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

drive.

Note: The COFF style header is followed by the optional header. The optional header is absent in

the object files. The format of the optional header is defined as the IMAGE_OPTIONAL_HEADER

structure in the WINNT.H file. The first few fields in this structure are inherited from COFF.

Magic This field is set to 0x10b for a normal executable/DLL.

MajorLinkerVersion,

MinorLinkerVersion

Version of the linker that produced the file.

SizeOfCode Size of the code section. If there are multiple code sections, this field

contains the sum of sizes of all these sections.

SizeOfInitializedData Size of the initialized data section. If there are multiple initialized

data sections, this field contains the sum of sizes of all these

sections.

SizeOfUninitializedData Same as SizeOfInitializedData, but for the uninitialized data (BSS)

section.

AddressOfEntryPoint RVA of the entry point.

BaseOfCode RVA of the start of the code section.

BaseOfData RVA of the start of the data section.

Microsoft added some NT-specific fields to the optional header. These fields are as follows:

ImageBase If the file is loaded at this address in memory, the loader need not do any

base relocations. This is because the linker resolves all the base

relocations at the time of linking, assuming that the file will be loaded at this

address. We discuss this in more detail in the section on the relocation

table. For now, it is enough to know that the loading time is reduced if a file

gets loaded at the preferred base address. A file may not get loaded at the

preferred base address because of the nonavailability of the address. This

happens when more than one DLL used by an executable use the same

preferred base address. The default preferred base address is 0x400000.

You may want to have a different preferred base address for your DLL so

that it does not clash with that of any other DLL used by your application.

You can change the preferred base address using a linker switch. You can

also change the base address of a file using the rebase utility that comes

with the Win32 SDK.

ReBaseImage()

Page 206: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The ReBaseImage() function from the IMAGEHLP.DLL also enables you to change the

preferred base address.

BOOL ReBaseImage(

LPSTR CurrentImageName,

LPSTR SymbolPath,

BOOL fReBase,

BOOL fRebaseSysfileOk,

BOOL fGoingDown,

DWORD CheckImageSize,

LPDWORD OldImageSize,

LPDWORD OldImageBase,

LPDWORD NewImageSize,

LPDWORD NewImageBase,

DWORD TimeStamp

);

PARAMETERS

CurrentImageName Filename that is rebased.

SymbolPath In case the symbolic debug information is stored as a separate file, the

path to find the corresponding symbol file. This is required to update the

header information and timestamp of the symbol file.

fReBase The file is really rebased only if this value is TRUE.

fRebaseSysfileOk If the file is a system file with the preferred base address above

0x80000000, it is rebased only if this flag is TRUE.

fGoingDown If you want the loaded image of the file to lie entirely below the given

address, set this flag to TRUE. For example, if the loaded size of a DLL

is 0x2000 and you call the function with the fGoingDown flag as TRUE

and give the address as 0x600000, the DLL will be rebased at

0x508000.

CheckImageSize Rebasing might change the loaded image size of the file because of the

Page 207: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

section alignment requirements. If this parameter is nonzero, the file is

rebased only if the changed size is less than this parameter.

OldImageSize Original image size before the rebase operation is returned here.

OldImageBase Original image base before the rebase operation is returned here.

NewImageSize New loaded image size after the rebase operation is returned here.

NewImageBase New base address. Upon return, it contains the actual address where

the file is rebased.

TimeStamp New timestamp for the file.

RETURN VALUES

If the function succeeds, the return value is TRUE; otherwise, it is FALSE.

The other fields in the optional header are as follows:

SectionAlignment A section needs to be loaded

at an address that is a

multiple of the section

alignment. Refer to the

discussion on RVA for more

information.

FileAlignment In the file, a section always

starts at an offset that is a

multiple of the file alignment.

This value is some multiple of

the sector size.

MajorOperatingSystemVersion,

MinorOperatingSystemVersion

Minimum operating system

version required to execute

this file.

MajorImageVersion, MinorImageVersion A developer can use these

fields to version his or her

files. It can be specified with a

linker flag.

MajorSubsystemVersion, MinorSubsystemVersion Minimum subsystem version

required to execute this file.

Win32VersionValue Reserved for future use.

SizeOfImage Size of the image after

considering the section

Page 208: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

alignment. This amount of

virtual memory needs to be

reserved for loading the file.

SizeOfHeaders Total size of the headers,

including the DOS header,

the PE header, and the

section table. The sections

containing the actual data

start at this offset in the file.

CheckSum This is used only for the

kernel-mode drivers/DLLs. It

can be set to 0 for user-mode

executables/DLLs.

Subsystem Subsystem used by the file.

The following values are

defined in the WINNT.H file:

IMAGE_SUBSYSTEM_NATIVE Image doesn’t require a

subsystem. The kernel-mode

drivers and native

applications such as

CSRSS.EXE have this value

for the field.

IMAGE_SUBSYSTEM_WINDOWS_GUI File uses the Win32 GUI

interface.

IMAGE_SUBSYSTEM_WINDOWS_CUI File uses the character-based

user interface.

IMAGE_SUBSYSTEM_OS2_CUI File requires the OS/2

subsystem.

IMAGE_SUBSYSTEM_POSIX_CUI File uses the POSIX API.

DllCharacteristics Obsolete.

SizeOfStackReserve Address space to be reserved

for the stack. Only the virtual

address space is marked–the

swap space is not allocated.

SizeOfStackCommit Actual memory committed for

the stack. This much swap

space is initially allocated.

Page 209: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

The committed stack size is

increased on demand until it

reaches the

SizeOfStackReserve.

SizeOfHeapReserve Address space to be reserved

for the heap. Similar to the

SizeOfStackReserve field.

SizeOfHeapCommit Actual committed heap

space. Similar to the

SizeOfStackCommit field.

LoaderFlags Obsolete.

NumberOfRvaAndSizes Number of entries in the data

directory that follows this

field. It is always set to 16.

DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES] As mentioned earlier, each

entry in the data directory

points to some important

piece of information. Each of

these entries is of the type

IMAGE_DATA_DIRECTORY,

which is defined as follows:

typedef struct _IMAGE_DATA_DIRECTORY {

DWORD VirtualAddress;

DWORD Size;

} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

ImageDirectoryEntryToData() The VirtualAddress field contains the RVA of the respective piece of information, and the Size

field contains the size of the data. To get to the actual data, you need to convert the RVA to the

actual address in the memory-mapped PE file. This can be accomplished with the

ImageDirectoryEntryToData() function exported by IMAGEHLP.DLL.

PVOID ImageDirectoryEntryToData(

LPVOID Base,

BOOLEAN MappedAsImage,

USHORT DirectoryEntry,

Page 210: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

PULONG Size

);

PARAMETERS

Base Base address where the file is mapped in memory.

MappedAsImage Set this flag to TRUE if the system loader maps the file. Otherwise, set the

flag to FALSE.

DirectoryEntry Index into the data directory array.

Size Upon return, the size from the data directory is filled here.

RETURN VALUES

If the function succeeds, the return value is the address in the memory-mapped file where the

required data resides. Otherwise, the function returns NULL.

INDICES IN THE DATA DIRECTORY

Each index in the data directory (except a few at the end that are still unused) represents some

important piece of information. In the following sections, we discuss some of the important

entries in this directory and the format in which the respective information is stored.

Export Directory

The data directory entry at the IMAGE_DIRECTORY_ENTRY_EXPORT index points to the

export directory for the file. The RVA in this directory entry points to the .edata section. The

information about the functions exported by the file (generally a DLL) is stored here. The data

directory entry points to the export directory that is defined as the

IMAGE_EXPORT_DIRECTORY structure in the WINNT.H file. The fields in this structure are

as follows:

Characteristics Reserved field. Always set to 0.

TimeDateStamp Date and time of creation.

MajorVersion,

MinorVersion

Developer can set the version of the export table.

Name RVA of the zero-terminated name of the DLL.

Base Starting ordinal for the exported functions–that is, the least of the

ordinals. Generally, this field is 1.

NumberOfFunctions Total number of functions exported from the DLL.

NumberOfNames Number of functions that are exported by name. Some functions may

Page 211: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

be exported only by ordinal, so this number may be less than

NumberOfFunctions.

AddressOfFunctions RVA of an array (let’s call it as the export-functions array) that has an

entry for each function exported from the DLL. Hence, the size of this

array is equal to the NumberOfFunctions field. The entry at index i

corresponds to the function exported with ordinal i + Base. Each entry in

this array is also an RVA. If the RVA for a particular array entry points

within the export section, then it is a forwarder. Forwarder means that

the function is not present in this DLL, but it is a forwarder reference to

some function in another DLL. In such a case, the RVA points to an

ASCIIZ string that stores the name of the other DLL and the function

name separated by a period. In case the target DLL exports the function

by ordinal, the function name is formed as # followed by the ordinal

printed in decimal. For example, the KERNEL32.DLL for Windows NT

forwards the HeapAlloc() function to the RtlAllocateHeap() function in

the NTDLL.DLL. Hence, the corresponding RVA in this case points to a

location within the export section that holds the string

NTDLL.RtlAllocateHeap. The Win32 applications can import the

HeapAlloc() function from the KERNEL32.DLL without worrying about

all these details. When the application runs on Windows 95, the loader

resolves the import reference to the function in the KERNEL32.DLL.

When the same application runs on Windows NT, the loader finds that

the function is forwarded to the NTDLL.DLL. Hence, the loader

automatically loads the NTDLL.DLL and resolves the imported function

to the RtlAllocateHeap() function.

When an export-functions array entry is not a forwarder–that is, the RVA does not lie within the

export section–the RVA points to the entry point of the function or to the location of the

exported variable.

The export-functions array may have gaps. This is beacause some ordinals might be left

unused while exporting functions, and some ordinals might not have any corresponding export.

In such a case, the corresponding array entry is set to 0.

AddressOfNames RVA of an array called as the export -names array that has an entry

for every function that is exported by name. Hence, the size of this

array is equal to the NumberOfNames field. Each entry in this array

is an RVA pointing to an ASCIIZ string containing the export name.

The array is sorted on the lexical order so as to allow binary

search.

AddressOfNameOrdinals RVA of an array of ordinals henceforth called as the export-ordinals

array. This array has the size same as that of the AddressOfNames

Page 212: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

array. All three arrays, namely, export-names, export-ordinals, and

export -functions, are instrumental in resolving imports by name.

For resolving an import by name, the loader first searches the

name in the export-names array. If the name matches an entry with

index i, the ith entry in the export-ordinals array is the ordinal of the

function. Finally, the address of the function can be found from the

export-functions array.

Import Directory

The next index in the data directory, IMAGE_DIRECTORY_ENTRY_IMPORT, is reserved for

the import directory of an executable/DLL. The RVA in this data directory entry points to the

import directory, which is nothing but a variable-sized array of

IMAGE_IMPORT_DESCRIPTORs, one for each imported DLL. The first field in this structure

is a union. If the Characteristics field in this union is 0, it indicates the end of the variable-sized

import descriptors array. Otherwise, the union is interpreted using the other member,

OriginalFirstThunk.

OriginalFirstThunk This is an RVA of what Microsoft calls as the Import Lookup Table (ILT).

Each entry in the ILT is a 32-bit number. If the MSB of this number is set, it

is treated as an import by ordinal. The bits 0 through 30 are treated as the

ordinal of the imported function. If the MSB is not set, the number is

treated as an RVA to the IMAGE_IMPORT_BY_NAME structure. The first

member of this structure is a hint for searching for the imported name in

the export directory of the imported DLL. The loader uses this hint as the

starting index in the export-names array when it does a binary search

while resolving the import reference. The hint is followed by an ASCIIZ

name of the import reference.

The WINNT.H file provides the IMAGE_SNAP_BY_ORDINAL macro to determine whether it’s

an import by ordinal. It also provides the IMAGE_ORDINAL macro to get the ordinal from the

32-bit number in the ILT. The ILT is a variable-sized array. The end of the ILT is marked with a 0.

The other members in the IMAGE_IMPORT_DESCRIPTOR structure are as follows:

TimeDateStamp This field is set to 0, unless the imports are bound. Soon, we discuss what’s

meant by binding the imports of a PE file.

ForwarderChain The field is used only if the imports are bound.

Name RVA of the ASCIIZ string that stores the name of the imported DLL.

FirstThunk RVA of the Import Address Table (IAT). The IAT is another array parallel to

the ILT, unless the image is bound. The IAT also has ordinals or pointers to

the IMAGE_IMPORT_BY_NAME structures. When the loader resolves the

Page 213: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

import references, it replaces the entries in the IAT with the actual

addresses of the corresponding functions. Astonishingly, that is all it needs

to do to achieve dynamic linking–everything else is already set in place by

the linker and import librarian. Let’s see how all these components work

together to achieve dynamic linking.

DYNAMIC LINKING WITH PE FILES

Every DLL has an import library that can either be created using an import librarian or may be

generated by the linker itself while creating the DLL. The import library has stub functions with

names the same as those of the functions exported from the DLL. The import library also has

a .idata section containing an import table that has entries for all the functions from the DLL.

Each stub function is an indirect jump that refers to the appropriate entry in the IAT in the .idata

section. When an executable is linked with the import library, the linker resolves the imported

function calls to the stub functions in the import library. The linker also concatanates the .text

section from the import library that contains the stub functions with the .text section of the

generated executable. The .idata sections and, incidentally, the import directories are also

concatenated. The stage is now set for loading. While loading, the entries in the IAT are

replaced by the actual function addresses, and that’s it. Now when the function is called, the

control is transferred to the stub function that performs an indirect jump. As the IAT entry

contains the address of the actual function from the DLL, the control is transferred to the

required function.

The situation is a bit different if you use the new __declspec(dllimport) directive while

prototyping an imported function. In that case, the compiler itself generates an import table. In

addition, it generates an indirect call referring to the appropriate location in the generated IAT.

This method does away with the overhead of an extra jump.

BINDING IMPORTS FOR A PE FILE

A major portion of loading time is spent on resolving the imports. The loader has to search

each imported symbol in the export directory of the imported DLL to find out the virtual address

of the symbol. The loading time can be drastically reduced if the IAT contains the actual

address of the symbol instead of the name or ordinal. Such a PE file is called as a bound

image. The imported symbol addresses are calculated assuming that the imported DLL will be

loaded at the preferred base address at the time of loading. The

IMAGE_IMPORT_DESCRIPTORs, in a bound PE file, are also modified. The TimeDateStamp

field stores the timestamp of the imported DLL. At the time of loading, if this timestamp does

not match with that of the DLL, the imports need to be resolved again. Because the IAT is

modified and does not contain the symbol names or ordinals, the ILT is used, in this case, to

resolve the imports.

The forwarded functions pose another problem with binding. The addresses of the forwarded

functions cannot be calculated at bind time, and so these functions have to be resolved at load

time. A list of all the forwarded functions for an imported DLL is maintained through the

ForwarderChain member in the corresponding IMAGE_IMPORT_DESCRIPTOR. This

Page 214: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

member stores the index of a forwarded function in the IAT. The IAT entry at this index stores

the index of the next forwarded function, and so on, forming a list of forwarded functions. The

list is terminated by a ? entry.

BindImage() The bind utility that is shipped with Win32 SDK enables binding of PE files. Also, the

BindImage and BindImageEx() functions in the IMAGEHLP.DLL provide this functionality.

BOOL BindImage(

LPSTR ImageName,

LPSTR DllPath,

LPSTR SymbolPath

);

PARAMETERS

ImageName The filename of the file to be bound. This can contain only a filename, a

partial path, or a full path.

DllPath A root path to search for ImageName if the filename contained in

ImageName cannot be opened.

SymbolPath A root path to search for the corresponding symbol file. If the symbol file is

stored separately, the header of the symbol file is changed to reflect the

changes in the PE file.

RETURN VALUES

If the function succeeds, the return value is TRUE; otherwise, it is FALSE.

BindImageEx() This function is very similar to BindImage function except it provides more customization such

as getting a periodic callback during the progress of binding process.

BOOL BindImageEx(

IN DWORD Flags,

IN LPSTR ImageName,

IN LPSTR DllPath,

IN LPSTR SymbolPath,

IN PIMAGEHLP_STATUS_ROUTINE StatusRoutine

Page 215: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

);

PARAMETERS

This function has the following additional parameters:

Flags The field controls the behavior of the function. It is set to as

an OR of the flag values defined in the IMAGEHLP.H file. The

following flag values are defined in the IMAGEHLP.H file:

BIND_NO_BOUND_IMPORTS Do not generate a new import address table.

BIND_NO_UPDATE Do not make any changes to the file.

BIND_ALL_IMAGES Bind all images that are in the call tree for this file.

StatusRoutine Pointer to a status routine. The status routine is called during

the progress of the image binding process.

RETURN VALUES

If the function succeeds, the return value is TRUE; otherwise, it is FALSE.

Calling BindImage is equivalent to calling BindImageEx with Flags as 0 and StatusRoutine as

NULL. That is, calling BindImage(ImageName, DllPath, SymbolPath) is equivalent to calling

BindImageEx(0, ImageName, DllPath, SymbolPath, NULL).

Resource Directory

The next index in the data directory, IMAGE_DIRECTORY_ENTRY_RESOURCE, refers to the

resource directory for a PE file. The resource directory and the resources themselves are

generally stored in a section named .rsrc section. The resources are maintained in a tree

structure similar to that in a file system. The root directory contains subdirectories. A

subdirectory can contain subdirectories or resource data. The subdirectories can be nested to

any level. But Windows NT only uses a three-level structure. At each level, the resource

directory branches according to certain characteristics of the resources. At the first level, the

type of the resource–bitmap, menu, and so on–is considered. All the bitmaps are stored under

one subtree, all the menus are stored under another subtree, and so on. At the next level, the

name of the resource is considered, and the third level classifies the resource according to the

language ID. The third-level resource directory points to a leaf node that stores the actual

resource data.

A resource directory consists of summary information about the directory followed by the

directory entries. Each directory entry has a name or ID that is interpreted as a type ID, a name

ID, or a language ID, depending on the level of the directory. A directory entry can point either

to the resource data or to a subdirectory that has a similar format.

The format of the resource directory is defined as the IMAGE_RESOURCE_DIRECTORY

structure in WINNT.H.

Page 216: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

Characteristics Currently unused. Set to 0.

TimeDateStamp Date and time when the resource was generated by the resource

compiler.

MajorVersion,

MinorVersion

Can be set by the user.

NumberOfNamedEntries Number of directory entries having string names. These entries

immediately follow the directory summary information and are

sorted.

NumberOfIdEntries Number of directory entries that use integer IDs as the names.

These entries follow the ones having string names.

This summary information is followed by the directory entries. Each directory has a format as

defined by the IMAGE_RESOURCE_DIRECTORY_ENTRY structure in WINNT.H. This

structure is composed of two unions. The first union stores the ID of the entry. If the MSB is set,

then the lower 31 bits in this field is an RVA of the Unicode string that stores the name of the

entry. The Unicode string consists of the length of the string followed by the 16-bit Unicode

characters. If the MSB is not set, then the union stores the integer ID of the resource. This first

union stores the type ID, the name ID, or the language ID, depending on the level of the

directory. The second union, in the IMAGE_RESOURCE_DIRECTORY_ENTRY structure,

points either to another resource directory or to the resource data, depending on the MSB. If

the bit is set, the lower 31 bits is an RVA of another subdirectory. If the MSB is not set, then it’s

an RVA of the resource data entry that forms a leaf node of the resource directory tree

structure. The format of the resource data entry is defined as the

IMAGE_RESOURCE_DATA_ENTRY structure in the WINNT.H file and has following

members:

OffsetToData RVA of the actual resource data.

Size Size of the resource data.

CodePage Code page used to decode code point values within the resource data.

Typically, the code page would be the Unicode code page.

Relocation Table

A PE file needs only based relocations. The linker resolves all the relative relocations,

assuming that the file will get loaded at the preferred base address. For example, if a function

foo has the RVA as 0x100 and the preferred base address is 0x400000, the linker resolves the

call to foo as a call to address 0x400100. At run time, if the file is loaded at the preferred base

address of 0x400000, then no relocation needs to be preformed. If, for some reason, the file

cannot be loaded at the base address of 0x400000, the loader needs to patch the call. If the

loader manages to load the file at a base address of 0x600000, it needs to change the call

address to 0x600100. In general, it needs to add the difference of 0x200000 to all the

Page 217: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

to-be-patched locations. This process is called as the based relocation. The list of the

to-be-patched locations, also called as fixups, is maintained in the relocation table that is

generally present in the .reloc section and is pointed to by the data directory entry at the

IMAGE_DIRECTORY_ENTRY_BASERELOC index. The relocation table is nothing but a

series of relocation blocks, each representing the fixups for a 4K page. Each relocation block

has a header followed by the relocation entries for the corresponding page. The relocation

block format is defined as the IMAGE_BASE_RELOCATION structure in the WINNT.H file, and

it has following fields:

VirtualAddress RVA of the page to be patched.

SizeOfBlock Total size of the relocation block, including the header and the relocation

entries.

Each relocation entry is a 16-bit word. The higher 4 bits indicate the type of relocation, and the

lower 12 bits are the offset of the fixup location within the 4K page. The address-to-patched is

calculated by adding the base address for loading, the RVA of the page to be patched, and the

12-bit offset within the page. The relocation types are defined in the WINNT.H file–only two of

them are used on Intel machines:

IMAGE_REL_BASED_ABSOLUTE The relocation is skipped. This type can be used to pad a

relocation block so that the next block starts at a 4-byte

boundary.

IMAGE_REL_BASED_HIGHLOW The relocation adds the base-address difference to the

32-bit double word at the location denoted by the 12-bit

offset.

Debug Directory

The operating system is not concerned with the debug information present in a PE file. The

debugging tools access the debug information in a PE file. There are various debugging tools,

which expect the debug information in different formats. The corresponding compilers/linkers

also store the debug information in different formats. The PE format allows the debug

information to be stored in different formats, such as COFF, Frame Pointer Omission (FPO),

CodeView (CV4), and so on. A single file may contain debug information in more than one

format. The debug directory pointed to by the IMAGE_DIRECTORY_ENTRY_DEBUG entry in

the data directory is an array of debug directory entries, one for each debug information format.

The IMAGE_DEBUG_DIRECTORY structure in the WINNT.H file represents the format of a

debug directory entry.

Characteristics Currently unused. Set to 0.

TimeDateStamp Date and time when the debug data was created.

MajorVersion, Version of the debug data format.

Page 218: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

MinorVersion

Type Type of the debug data format.

SizeOfData Size of the debug data.

AddressOfRawData RVA of the debug data.

PointerToRawData Within file offset to the debug data.

Of the different debug information formats, three are frequently encountered in PE files. The

first one is the format used by the popular CodeView debugger. This format is defined in the

CV4 specification. The FPO format is used to describe nonstandard stack frames. Not all the

files in a PE file need have an FPO format debug entry. The functions without one are

assumed to have a normal stack frame. The third important format is COFF, which is the native

debug information format for PE files. The PE header itself points to the COFF symbol table.

The COFF debug information consists of symbols and line numbers.

Thread Local Storage

The threads executing in a process share the same global data space. Sometimes, it may be

required that each thread has some storage local to itself. For example, say a variable i needs

to be local for each thread.

In such a case, each thread gets a private copy of i. Whenever a particular thread is running,

its own private copy of i should be automatically activated. This is achieved in Windows NT

using the Thread Local Storage (TLS) mechanism. Let’s see how it works.

Do not confuse the local data of a thread with the local variables that are created on stack.

Each thread has a separate stack and local variables that are created and destroyed

separately for each thread as the stack grows and shrinks. In this section, the phrase local

data means global variables that have a separate copy for each thread.

The operating system maintains a structure called as the Thread Environment Block (TEB) for

every thread running in the system. The FS segment register is always set such that the

address FS:0 points to the TEB of the thread being executed. The TEB contains a pointer to

the TLS array. The TLS array is an array of 4-byte DWORDs. Similar to the TEB, a separate

TLS array is present for each thread. A thread can store its local data in the TLS array.

Generally, programs store pointers to local data in some slot in the TLS array. The slot

allocation for the TLS array is controlled by the API functions TlsAlloc() and TlsFree(). The

Win32 API also provides functions to set and get the value at a particular index in the TLS

array.

It is cumbersome to access the thread-specific data using the API functions. An easier way is

to use the __declspec(thread) specification while declaring global variables that need to have

a private copy for each thread. All such variables are gathered by the compiler/linker, and a

single TLS array index is automatically allotted to this bunch of data. The TLS array entry at

Page 219: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

this index contains the pointer to a local data buffer that stores all these variables. These

variables are accessed as any other normal variable in the program. Whenever such a

variable is accessed, the compiler takes care to generate the code to access the TLS array

entry and the data at a proper offset within the local data buffer.

This discussion is bit off the track. However, it is necessary before discussing the

IMAGE_DIRECTORY_ENTRY_TLS data directory entry. The TLS directory structure is

defined as IMAGE_TLS_DIRECTORY in the WINNT.H. Let’s have a look at this structure and

see how it fits in the TLS mechanism.

StartAddressOfRawData Each time a new thread is created, the operating system allocates a

new local data buffer for the thread and initializes the buffer with the

data that is pointed to by this field. Note that this address is not an

RVA, but it is a proper virtual address that has a relocation entry in

the .reloc section.

EndAddressOfRawData Virtual address of the end of the initialization data. The rest of the

local data buffer is filled with zeros.

AddressOfIndex Address in the data section where the loader should store the

automatically allotted TLS index. The code accessing TLS variables

accesses the index from this location.

AddressOfCallBacks Pointer to a null-terminated array of TLS callback functions. Each

function in this array is called whenever a new thread is created.

These functions can perform additional initialization (for example,

calling constructors) for the TLS data. The TLS callback has the

same parameters as the DLL entry-point function.

SizeOfZeroFill Size of the local data that is to be initialized to zero. The total size of

the local data is (EndAddressOfRawData StartAddressOfRawData)

+ SizeOfZeroFill.

Characteristics Reserved.

Section Table

We’ve roamed through the PE format without bothering about the section formats. This is

possible because of the data directory that directly locates the important pieces of information

within a PE file. You need not know about the sections at all to interpret a PE file. Nevertheless,

in case you need to modify a PE file, you may be required to know about the sections and

section headers. For example, you may want to add, remove, or extend a particular section,

and this requires changes to the section table, among other things.

As mentioned earlier, the PE header is followed by the section table. The section table is an

array of section headers. The format of the section header is defined by the

Page 220: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

IMAGE_SECTION_HEADER structure in the WINNT.H file. The members of a section header

are as follows:

Name Character array of size

IMAGE_SIZEOF_SHORT_NAME. Contains the

name of the section.

VirtualSize Size of the section.

VirtualAddress RVA of the section data when loaded in memory.

SizeOfRawData Size of the section as stored in the file. This is

equal to the VirtualSize rounded to the next file

alignment multiple.

PointerToRawData Within file offset to the section data. If you

memory map a PE file, this field needs to be

used to get to the section data.

PointerToRelocations Used only in the object files.

PointerToLinenumbers Within file offset to the COFF style line number

information.

NumberOfRelocations Used only in the object files.

NumberOfLinenumbers Number of records in the line number

information.

Characteristics The attributes of the section. It is an OR of the

section characteristics flags defined in the

WINNT.H file. Some of the important flags are as

follows:

IMAGE_SCN_CNT_CODE Section contains executable code.

IMAGE_SCN_CNT_INITIALIZED_DATA Section contains initialized data.

IMAGE_SCN_CNT_UNINITIALIZED_DATA Section contains uninitialized data.

IMAGE_SCN_LNK_REMOVE Section will not become part of the loaded

image. The .debug section may have this flag

set.

IMAGE_SCN_MEM_DISCARDABLE Section can be discarded. The relocation table

and debug information can be discarded after

the loading process is over. Hence, the .debug

and .reloc sections have this flag set.

IMAGE_SCN_MEM_NOT_CACHED Section cannot be cached.

Page 221: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

IMAGE_SCN_MEM_NOT_PAGED Section is not pageable.

IMAGE_SCN_MEM_SHARED Section can be shared in memory. If a DLL has

the data section with this flag set, all the

instances of the DLL in different processes

share the same data.

IMAGE_SCN_MEM_EXECUTE Section can be executed. For the code sections,

both the IMAGE_SCN_CNT_CODE and

IMAGE_SCN_MEM_EXECUTE flags are set.

IMAGE_SCN_MEM_READ Section can be read.

IMAGE_SCN_MEM_WRITE Section can be written to.

LOADING PROCEDURE

Let’s see how the loader interprets a PE file and prepares a memory image for execution. The

loader needs to find the free virtual address space to map the file in memory. The loader tries

to load the image at the preferred base address. After this is done, the loader maps the

sections in memory. The loader goes through the section table and maps each section at the

address calculated by adding the RVA of the section to the base address. The page attributes

are set according to the section’s characteristic requirements. After mapping the section in

memory, the loader performs based relocation if the base address is not equal to the preferred

base address. Then, the import table is checked and the required DLLs are loaded. The same

procedure for loading an executable–mapping sections, based relocation, resolving imports,

and so on–is applied while loading a DLL. After loading each DLL, the IAT is fixed to point to

the actual imported function address.

That’s it! The image is ready for execution.

SUMMARY

Microsoft introduced the Portable Executable (PE) file format with Windows NT. The PE format

serves as the executable file format for all the 32-bit Microsoft operating systems (that is, the

various versions of Windows NT and Windows 95/98) though these operating systems still

support the older executable file formats, including the DOS executable file format.

Various components in a PE file are addressed using the relative virtual address (RVA). The

IMAGEHLP.DLL provides us with utility functions to memory map a PE file to find the address

in the memory corresponding to the RVA specified in the PE file. A PE file is composed of the

file headers, the data directory, the section table, and the various sections. The data directory

points to the important parts of the PE file: the export directory, the import directory, the

Page 222: Chapter 1read.pudn.com/downloads113/ebook/474416/UndocumentedNT.pdf · Microsoft has come up with two 32-bit operating systems: Windows 95/98 and Windows NT. Windows NT is a high-end

relocation table, the debug directory, and the Thread Local Storage. The export directory lists

the symbols exported from the PE file, which is most likely a DLL. The import directory lists all

the symbols imported by the PE file. When a PE file is loaded in memory for execution, the

loader resolves the imported symbols to actual virtual addresses in the DLL that exports the

symbols. This process is termed dynamic linking.

The PE headers are followed by the section table that points to all the sections, including the

ones pointed to by the various data directory entries. The loader reads the section table and

maps various sections of a PE file in memory. Then it prepares the image for execution by

relocating the image for the mapped address and resolving various imported symbols after

loading the required DLLs.