Top Banner
Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard
38

Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Dec 13, 2015

Download

Documents

Antony Green
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Programming and porting in a Microsoft 64 bit world

Shanthal Vasanth

Hewlett Packard

Page 2: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Agenda

• Why 64 bit CPU?

• Introducing Itanium (EPIC)

• About Opteron and EM64T

• HP Itanium server platforms

• 64 bit porting

Page 3: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Why 64 bit CPU?

Page 4: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Why 64 bit CPU?

• Ability to use massive amounts of memory

• Data in memory is faster than getting it from the disk

• 32 bit CPU can address at the most 4 GB, while a 64 bit CPU can address upto 18 exabytes

• Ability to handle larger floating point numbers

• 32 bit CPUs bank on software emulation for values > 2^32

• 64 bit CPUs can handle upto 2^64

• Process data/instructions in chunks of 64 bits in a single clock cycle

Page 5: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Introducing Itanium (EPIC)

Page 6: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

CISC

RISC

Superscalar

EPICEPIC

Per

form

ance

Time

• Maximize instructions executed in parallel

• Improve floating point• Speculation & predication to

overcome memory latency & branch misses (common to databases)

• Large and fast on-die cache • Expanded Number of registers• (general, FP, branch)• Efficient management engine

– Register stack engine– Register Windowing– 4 GB page size

• Systems scalable to 512 cpus and beyond

• 64b for large memory addressability

• High internal & external bus & memory bandwidth

• MCA for data detection, correction, & logging

Massive On-chipMassive On-chipResourcingResourcing

Performance Performance EnhancementEnhancement Scalable & ReliableScalable & Reliable

Age : 20+Age : 20+

Age : 10-15+Age : 10-15+

IntelIntel®® ItaniumItanium®®

ProcessorProcessor Age : 2+Age : 2+

Age : 9+Age : 9+

* Source: * Source: Computer Organization and Architecture, 1999 W. StallingsComputer Organization and Architecture, 1999 W. Stallings

Architectural EvolutionEPIC (Explicitly Parallel Instruction Computing)

•Largest, most demanding workloads requires new approach•Benefits from the experience of past architectures•Goal to move beyond RISC performance bounds with explicit parallel instruction streams•Developed by the best CPU and server architecture minds in the industry

RISC (Reduced Instruction Set Computing)

•Goal to optimize performance with simpler instructions (this effort coined the term CISC)

Page 7: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Traditional Architecture Limiters

Today’s Processors often 60% Today’s Processors often 60% IdleIdleToday’s Processors often 60% Today’s Processors often 60% IdleIdle

parallelizedparallelizedcodecode parallelizedparallelized

codecode

parallelizedparallelizedcodecode

HardwareHardwareCompilerCompiler

multiplemultiple functional unitsfunctional units

Original SourceOriginal SourceCodeCode

Sequential MachineSequential MachineCodeCode

......

......

Execution Units Available Execution Units Available Used InefficientlyUsed Inefficiently

Page 8: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Technology case for a new Architecture

• Superscalar Complexity Growth• Functional unit area

grows linearly with number of units

• Scheduler area grows as the square of the number of units

Cost-performance Cost-performance reaches areaches a

point of diminishing point of diminishing returnsreturns

Cost-performance Cost-performance reaches areaches a

point of diminishing point of diminishing returnsreturns

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

Page 9: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Itanium Hardware/Software Synergy

compilercompiler implicitly parallel

implicitly parallel

hardware

......

......

sequential machine code

original sourcecode

Itanium-based

compiler

......

......multiple

execution units resources used more efficiently

parallel machine code

Traditional Architecture

massive resource

s

original sourcecode

multiple execution units

Page 10: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Itanium architecture

• Developed by HP and Intel• Next pervasive computer architecture• Explicitly Parallel Instruction Computing (EPIC)• Supports multiple operating systems: Windows, HP-UX, and Linux

Page 11: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Itanium architecture features

•explicit parallelism•predication•enhanced speculation•floating-point architecture•multi-processor scalability•large number of registers

Page 12: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

• Built-in instruction-level parallelism• Massive on-chip resources• Up to 2x instructions/clock cycle• CPU clock and compiler maturity curve

• Fewer memory loads/stores on complex workloads• Huge memory address spaces• 60% shorter memory pipeline• Latency avoidance• Instruction predication • Data and control speculation

Itanium’s unique advantages

Customer benefits• Higher performance in

FP-intensive and complex technical workloads.• 2X performance of x86, at any clock speed, for faster:

– Image manipulation– Voice encoding/recognition– Encryption

Industry leading performance and scalability for demanding and unpredictable commercial applications:

- OLTP

- database query (TPC-H)

- sorting

Intel Itanium architecture: ultimate technology for technical computing

Page 13: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

About Opteron &

EM64T

Page 14: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Opteron

• Integrated DDR memory controllers (5.3 GB/s)

• Enable direct access to memory reducing latency

• HyperTransport links

• a chip-to-chip interconnect technology that operates at memory speeds (6.4 GB/s)

• AMD 64 bit extensions on top of IA-32 instructions

• 32 bit binaries run natively

• Does not support Streaming SIMD Extensions 3

Page 15: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Xeon/EM64T - Extended Memory 64 Technology

• 64 bit Instruction set extensions to IA-32

• Compatible with AMD’s 64-bit instructions

• Traditional North Bridge Architecture

• Go thru north bridge to access memory

• Advantage: No need to spin CPU to access next-gen memory

• Does not support 3DNow

Page 16: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

How next-generation x86 extensions address needs for improved performance

AMD Opteron™ utilizes 3 key innovations• Memory controller on-board with the processor,

allowing the controller to run at full speed of the processor core.

• The HyperTransport link between the processors, and between the processors and I/O, is also extremely low-latency (6.4GB/sec of data throughput).

• Capability for maximum 32-bit performance, and excellent 64-bit performance.

• Increased frequency headroom; 3.6 GHz/1MB cache

• 800 MHz FSB - 1.5x system bus speed vs 533MHz platform

• DDR2-400 - Faster memory technology

• PCI Express 4x - 8x - Faster I/O

L2 Cache

L1Instruction

Cache

L1Data

Cache

AMD64Core

DDR Memory Controller

HyperTransport™

Multitude of features enhance Xeon performance

Additional Registers8- SSE & 8-Gen Purpose

Additional Registers8- SSE & 8-Gen Purpose

Double Precision (64-bit) Integer Support

Double Precision (64-bit) Integer Support

Extended Memory Addressability64-Bit Pointers, Registers

Extended Memory Addressability64-Bit Pointers, Registers

Xeon extensions add

Page 17: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Opteron Compared to Itanium 2

1 2 3 4 5 6

Opteron* Processor

6.4 GB/s16x16 HTT

1 TB

~2.0 GHz

Itanium® 2 Processor

6.4 GB/s

1024 TB

8

Memory Addressing

1 2 3 4 5 6 7 8 9 1011

System Bus Bandwidth

On-die Cache

On-die Registers

Execution Units

Core Frequency

Issue Ports

Itanium Architecture

264 Application Registers + 64 Predicate Registers*

6 Instructions / Cycle

40 Registers

12

3 Instructions / Cycle

6 MB

Instructions / Clk

6 Integer, 3 Branch

2 FP (FMAC)

1 SIMD2 Load and

2 Store

x86 with extra memory bits

2 Loador

2 Store

Fmul,Fadd 1 for SIMD

3Integer

1MB

1.5 GHz

Pipeline Stages

* Intel’s EPIC technology includes 64 single-bit predicate registers to accelerate loop unrolling and branch intensive code execution.

Page 18: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Opteron and Itanium: A comparison

Opteron MadisonClock (for this comparison) 1.8 GHz 1.5 GHzPhysical address Space 40 bit 50 bitVirtual address space 48 bit 56 bitInt (=GRs) Registers 16 128Float Registers 8 128supported page sizes 4 KB, 2 MB 4 KB … 4 GB Instructions/clock 3 6On Die Cache 1 MB 9 MBIA32 applications Native EmulatedNative 64 bit OS Win/Linux Win/Linux/UXMemory Access Onchip MCU NorthBridge

Page 19: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

HP Itanium OS & Server

Choices

Page 20: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

• Large scale applications and databases

• Complex workloads – technical and commercial

• Primarily back end DB & application tier

• Enterprise scale up and scale out

• Server consolidation

HP Integrity Servers

1 to 128-way*Itanium processor

architecture

* Future

HP ProLiant Servers

1 to 8-way

ProLiant and

Integrity serversx86

processor architecture

• Small to medium scale application and databases

• Well-defined, less-complex workloads

• Primarily front-end/network edge & application tier

• Scale out and small to mid-sizescale up

Customer choice

Customer-specific needs driven

Page 21: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

HP delivering choice

• Price/performance leadership with 32/64-bit co-existence

• Highest clock speed, peak performance

• Extensive 32-bit, and emerging 64-bit ecosystems

• Scale-out for simple, highly parallel workloads (2p nodes)

• Linux & Windows

• Price/performance leadership with 32/64-bit co-existence

• 32-bit throughput performance leadership

• Highest bandwidth for sustained performance

• Extensive 32-bit, and emerging 64-bit ecosystems

• Scale-out for moderate workloads (2p/4p nodes)

• Linux & Windows

• Highest performance 64-bit processor core for sustained performance

• Highest SMP scalability (to 128p)

• HP-UX for mission-critical technical computing

• Extensive 64-bit ecosystem(and 32/64-bit on HP-UX)

• Scale-up and scale-out for complex workloads

• HP-UX, Linux & Windows

ProLiant Serverswith 64-bit extensions

IntegrityServers

Page 22: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Complete choice to meet diverse needs across the data center

Workgroup

File, print

MailMessaging

Directory, DNS, firewall, security

Services, caching, proxy Web

Infra-structure

Parallel computing, clustering

HPC

OLTP mid size

Apptier

ERP, biz logic, app server

Biz intelligence/ SCM planning

Biz intelligenceVery large data sets

Back-end for CRM,SCM, ERP, large data sets

Large SMP, large memoryHPC

ERPlarge

OLTPlarge

BI

Front-end

Application & data-tier

Large scale data tier

1 - 4 processors 4 - 8 processors 8 - 64+ processors

OLTPmed

ERPmedium

BI

Integrity & NonStop

ProLiant & Integrity

OLTP large size DBHigh transaction volumes

Back-end for CRM, SCM, ERP

Integrity & NonStop servers

ProLiant & Integrity systems

Mix of ProLiant, Integrity & NonStop

ProLiant

Integrity

Page 23: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Unique options on Itanium

W2003-64 (native boot)

MS Win64 Applications compiled for IPF (64bit)

Linux64 (native boot)

MS W2Kapps for IA32

HP-UX 11i (native boot)

HP-UX ABILinux ABI

Linux 64 Applications (also Linux 32)

Linux 64 Applications compiled for IPF (64bit)

HP-UX Applications compiled for IPF

PA-Apps

ARIES

Page 24: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Itanium®-based hp servers:The most scalable roadmap from any

vendor

2001 2002 2003 2004

scala

ble

(ce

ll-base

d)

serv

ers

Itanium2 2-way 2U

SuperDomeMadison

rp8400 Madison

rp7410 Madison

Itanium24-way 7U

Itanium 4-ways

DL 590rx4610

rx9610

Futu

re Ita

niu

m

Proce

ssor

Madison

entr

y

serv

ers

rp5400

Itanium16-way

rx5670

rx2600

roadmap subject to change

Itanium2Madison4way 4U

Page 25: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

HP Itanium Processor Family– HP's entire family of Itanium-

based servers--- including the midrange 8-way and 16-way, and the high-end Superdome 64-way will support the 64-bit version of Microsoft Windows Server 2003

– HP is successfully running Windows on HP's Superdome server configured with 64 Itanium 2 processors and 512GB memory

– HP is successfully running Windows, HP-UX, and Linux in separate partitions on a 64-processor Superdome

64+way64-

socket

16+way16-socket

8+way8-socket

2-way

4-way

Page 26: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

64 bit porting

Page 27: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Porting to 64 bit : When?

• Applications that manipulate very large data sets and need to exceed the 4 GB (32 bit address space ) limit.

• Are I/O bound and can use memory to perform disk I/O.• Your platform does not have a 32 bit option.• You already have a 64 bit application.

Page 28: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Categories of applications good for 64 bit

• Database:– Benefit greatly from larger physical address space– Run entire database out of memory rather than from disk

• Email:– Larger address space : larger number of users per

server

• Terminal Server:– Avoiding kernel address space limitations when hosting

multiple applications– Example: Microsoft Office hosting on Terminal Server in

64-bit environment supports 50% more users than in 32-bit environment

Page 29: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

• Business Applications:– Apps that have high memory requirements– Apps that have high computational requirements

• Technical / Scientific computing:– Need for a large virtual and physical address space– Complex computations

Page 30: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Choose an appropriate porting model

• 64 bit full port• Small address space with 64 bit pointers• Small address space with 32 bit pointers• Win32 application

Page 31: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

General porting process

• Find all your source code – check under the couch.

• Acquire 64-bit versions of all 3rd party libraries.

• Port all of your own libraries to 64-bit.

• Rewrite assembly code to be 64-bit.

• Fix migration warnings

Page 32: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Porting steps:

• Compile the code in Win32 and eliminate warnings• Remove /FD compiler flag (exporting makefiles from VisualStudio)• Change linker option for machine type from IX86 to IA64• Remove /Gm compiler flag (minimal rebuild- ignored by IA64)• Add compiler options –Wp64 and –W4 • For 32 bit pointer variables , add –Ap32 • For 4 GB (small) address space use –As32• Clean all 32 bit objects, rebuild for 64 bit • Fix all warnings (use conditional compilation)• Test

Page 33: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

32 / 64 bit issues – Common pitfalls

– Storing pointers in ints– Truncating function return values– Casting pointers to ints or ints to pointers– Using unnamed or unqualified bit fields– Using literals and masks that assume data sizes– Hard coding size of data types– Hard coding bit shift values– Inline assembly

Page 34: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Example 1:

What is wrong with the following program ?

#include <stdlib.h>

void main() {

int *p = malloc(100);

int i = (int)p;

}

Page 35: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Example 2:

void func(int p);

char *ptr;

func(ptr);

Page 36: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

Example 3:

Why would the following C program dump core in 64 bit mode?

void main() {

char *p = malloc(2000);

*p = ‘A’;

}

Page 37: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.

• HP DSPP Site– www.hp.com/dspp

• For information on HP products in The Netherlands– www.hp.nl– Contact addresses

• Wouter Smit: [email protected]

• Laurent Van Veen: [email protected]

• Route64 training– www.route64.net

For more information

Page 38: Programming and porting in a Microsoft 64 bit world Shanthal Vasanth Hewlett Packard.