Top Banner

of 15

Accelerated Processing Unit

Apr 06, 2018

Download

Documents

Rahul Sampath
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 Accelerated Processing Unit

    1/15

    Accelerated Processing Unit

    CHAPTER 1

    INTRODUCTION

    Imagine a PC that:

    Recognizes your gestures without a remote

    Responds to your touch or voice to do your bidding

    Supports bi-directional hi-definition video chat over links with limited

    bandwidth

    Finds and tags the photos and videos in your library that contain particular

    faces, places or objects

    Helps you sort through your photo libraries to eliminate duplicates saved

    with different file names

    Enhances the videos youve created with regard to color, focus and image

    stability

    Up-scales even low-quality content to seamlessly match the capabilities of

    your HD display

    Adds stereoscopic 3D realism to 2D content

    Supports immersive, multi-monitor 3D gaming experiences

    Department of Electronics and Communication,College of Engineering , Adoor 1

  • 8/3/2019 Accelerated Processing Unit

    2/15

    Accelerated Processing Unit

    Sells at price points well within reach of the mainstream consumer.

    Many of these capabilities exist today piecemeal in labs, running on expensive,

    workstation-class computers that cost as much as tens of thousands of dollars. Why

    havent we progressed further, faster in delivering these capabilities to the mainstream?

    The semiconductor industry prides itself on rapid improvements in system performance,but hardware that runs fast enough to enable these advanced capabilities still costs far too

    much to enable high-volume deployment. Software developers, always tuned to market

    realities as well as technology, have focused their efforts on applications that run well on

    the dual- and quad-core x86 processors that comprise the bulk of todays mainstream

    system offerings. But change is in the air; in 2011, affordable mainstream systems that

    can support these advanced capabilities are set to enter the market. Youve probably

    heard this story before. Every two years, advances in semiconductor technology allow

    chip architects to double the number of transistors they can fit in a given area of silicon.

    Over the past decade, these extra transistors have been used to increase the size of on-

    chip caches and add more x86 processor cores to designs, making todays CPUs the

    fastest processors ever. Even the slowest contemporary CPUs have more than enough

    performance to handle traditional office productivity, Internet browsing and e-mail

    applications, which long ago ceased to be limited by CPU speed. But as fast as they are,

    todays CPUs lack the performance to deliver a vivid, modern computing experience on

    their own. The latest applications require CPUs that can deal with vast amounts of data

    and require hundreds, if not thousands of individual threads to manipulate the massive

    databases needed to recognize an object in a scene, the meaning in a sentence, or an

    anomaly in an x-ray image. Not surprisingly, traditional CPU architectures and

    application programming tools optimized for scalar data structures and serial algorithmsfit poorly with these new vector-oriented, multi-threaded data-parallel models.

    Fortunately, innovative architectures and tools better suited for these new workloads have

    emerged. Graphics processing units (GPUs), originally intended to enhance 3D

    visualization, have evolved into powerful, programmable vector processors that can

    accelerate a wide variety of software applications. Software tools like DirectCompute and

    OpenCL permit developers to create standards-based applications that combine the power

    of CPU cores and programmable GPU cores, and run on a wide variety of hardware

    platforms. A few ambitious independent software vendors (ISVs) have already added

    support for these new vector capabilities into their most advanced products, even if they

    had to structure their code around proprietary hardware and software interfaces to get the

    job done.

    Advanced Micro Devices (AMDs) forthcoming Accelerated Processing Units

    (APUs) build upon this momentum and take PC computing to the next level. These new

    processors are being designed to accelerate multimedia and vector processing

    applications, enhance the end-users PC experience, reduce power consumption, and

    Department of Electronics and Communication,College of Engineering , Adoor 2

  • 8/3/2019 Accelerated Processing Unit

    3/15

    Accelerated Processing Unit

    offer a superior visual graphics experience at mainstream system price points. More

    importantly, these APUs will enable ISVs to create new generations of applications and

    user interfaces limited perhaps only by the inventiveness of their developers, rather than

    by the constraints of the traditional CPU architectures that have dominated the computer

    industry for decades.

    CHAPTER 2

    ACCELERATED PROCESSING UNIT

    At the most basic level,

    Accelerated Processing Units

    combine general-purpose x86

    CPU cores with programmable

    vector processing engines on a

    single silicon die. APUs also

    include a variety of critical

    system elements, includingmemory controllers, I/O

    controllers, specialized video

    decoders, display outputs, and

    bus interfaces, but real appeal of

    these chips stems from the

    inclusion of both scalar and

    vector hardware as full-fledged

    processing elements. CPU and a

    basic graphics unit have been

    lashed together in a single package with truly

    programmable GPUs like those in the AMD Fusion, VIA corefusion, let alone GPUs that

    can be programmed using high-level industry-standard tools like DirectCompute and

    OpenCL. AMD is best situated to address this engineering challenge, as it is currently the

    only company which has access to extensive IP resources (e.g. patents and engineering

    expertise) in both x86 processor technology and industry-leading GPU technology. In

    Department of Electronics and Communication,College of Engineering , Adoor 3

  • 8/3/2019 Accelerated Processing Unit

    4/15

    Accelerated Processing Unit

    fact, AMDs recognition that it needed proven GPU technology for future converged

    products drove its 2006 acquisition of ATI Technologies. APU is set to arrive in a variety

    of shapes and sizes adapted to the requirements of their target markets. AMD has

    disclosed that its first APUs, code-named Llano and Ontario, are designed for

    mainstream desktop and notebook platforms and thin and light notebooks, and

    netbooks and slates. Both of these APUs will combine multiple superscalar x86 processor

    cores with an array of programmable SIMD engines leveraged from AMDs discrete

    graphics portfolio. The key aspect to note is that all the major system elements x86

    cores, vector (SIMD) engines, and a Unified Video Decoder (UVD) for HD decoding

    tasks attach directly to the same high speed bus, and thus to the main system memory.

    This design concept eliminates one of the fundamental constraints that limit the

    performance of traditional integrated graphics controllers (IGPs).

    Until now, transistor budget constraints

    typically mandated a two chip solution for

    such systems, forcing system architects to

    use a chip-to-chip crossing between the

    memory controller and either the CPU or

    GPU. These transfers affect memory

    latency, consume system power and thus

    impact battery life. The APUs scalar x86

    cores and SIMD engines share a common

    path to system memory to help avoid these

    constraints. Total system performance can

    be further enhanced through the addition ofa discrete GPU. The common architectures

    of the APU and GPU allow for a multi-GPU configuration where the system can scale to

    harness all available resources for exceptional graphics and enable truly breathtaking

    overall performance. Although the APUs scalar x86 cores and SIMD engines share a

    common path to system memory, APUs first generation implementations divide that

    memory into regions managed by the operating system running on the x86 cores and

    other regions managed by software running on the SIMD engines. APU provides high

    speed block transfer engines that move data between the x86 and SIMD memory

    partitions. Unlike transfers between an external frame buffer and system memory, these

    transfers never hit the systems external bus. Clever software developers can overlap theloading and unloading of blocks in the SIMD memory with execution involving data in

    other blocks. Insight 64 anticipates that future APU architectures will evolve towards a

    more seamless memory management model that allows even higher levels of balanced

    performance scaling. Just as AMDs architects have woven x86 cores and GPU cores

    into a single hardware fabric, astute software developers can now begin to weave high

    performance vector algorithms into programs previously constrained by the limited

    Department of Electronics and Communication,College of Engineering , Adoor 4

  • 8/3/2019 Accelerated Processing Unit

    5/15

    Accelerated Processing Unit

    computational capabilities of conventional scalar processors, even when arranged in

    multi-core configurations. In just a few years, machines equipped with programmable

    GPUs are expected to comprise a meaningful portion of the installed base of PCs.

    Software coming from ISVs who take advantage of these enhanced capabilities will have

    the ability to execute well beyond the capability of packages that lack support for these

    features.

    CHAPTER 3

    REASONS FOR MERGING

    The CPU and the GPU have been on this collision course for quite some time;

    although we often refer to the CPU as a general purpose processor and the GPU as a

    graphics processor, the reality is that they are both general purpose. The GPU is merely ahighly parallel general purpose processor, which is particularly well suited for particular

    applications such as 3D gaming. As the GPU became more programmable and thus

    general purpose, its highly parallel nature became interesting to new classes ofapplications: things like scientific computing are now within the realm of possibility for

    execution on a GPU.

    Today's GPUs are vastly superior to what we currently call desktop CPUs when itcomes to things like 3D gaming, video decoding and a lot of HPC applications. The

    problem is that a GPU is fairly worthless at sequential tasks, meaning that it relies on

    having a fast host CPU to handle everything else other than what it's good at.

    Department of Electronics and Communication,College of Engineering , Adoor 5

  • 8/3/2019 Accelerated Processing Unit

    6/15

    Accelerated Processing Unit

    Figure 3 Amdahls Law

    ATI discovered that long term, as the GPU grows in its power, it will eventually

    be bottlenecked by the ability to do high speed sequential processing. In the same vein,the CPU will eventually be bottlenecked by the ability to do highly parallel processing. In

    other words, GPUs need CPUs and CPUs need GPUs for all workloads going forward.

    Neither approach will solve every problem and run every program out there optimally, but the combination of the two is what is necessary.

    To understand the point of combining a highly sequential processor like modernday desktop CPUs and a highly parallel GPU you have to look above and beyond the

    gaming market, into what AMD is calling stream computing. AMD perceives a number

    of potential applications that will require a very GPU-like architecture to solve, thingsthat we already see today. Simply watching an HD-DVD can eat up almost 100% of

    some of the fastest dual core processors today, while a GPU can perform the same

    decoding task with much better power efficiency. H.264 encoding and decoding are

    perfect examples of tasks that are better suited for highly parallel processor architecturesthan what desktop CPUs are currently built on. But just as video processing is important,

    so are general productivity tasks, which is where we need the strengths of present day

    Out of Order superscalar CPUs. A combined architecture that can excel at both types ofapplications is clearly a direction that desktop CPUs need to target in order to remain

    Department of Electronics and Communication,College of Engineering , Adoor 6

  • 8/3/2019 Accelerated Processing Unit

    7/15

    Accelerated Processing Unit

    relevant in future applications for consumers as well as in researches.

    Future applications will easily combine stream computing with more sequential

    tasks, and we already see some of that now with web browsers. Imagine browsing a sitelike YouTube except where all of the content is much higher quality and requires far

    more CPU (or GPU) power to play. You need the strengths of a high powered sequential

    processor to deal with everything other than the video playback, but then you need thestrengths of a GPU to actually handle the video. Examples like this one are overly simple,

    as it is very difficult to predict the direction software will take when given even more

    processing power; the point is that CPUs will inevitably have to merge with GPUs inorder to handle these types of applications.

    CHAPTER 4

    MERGING CPUS AND GPUS

    AMD views the APU

    progression as three discretesteps:

    Today we have a CPU and aGPU separated by an

    external bus, with the two

    being quite independent.The CPU does what it does

    best, and the GPU helps out

    wherever it can.

    Department of Electronics and Communication,College of Engineering , Adoor 7

  • 8/3/2019 Accelerated Processing Unit

    8/15

    Accelerated Processing Unit

    Step 1, is what AMD is calling integration, and it is what we can expect in the first

    Fusion product. The CPU and GPU are simply placed next to one another and there's

    minor leverage of that relationship, mostly from a cost and power efficiency standpoint.

    Step 2, which AMD calls optimization, gets a bit more interesting. Parts of the CPU can

    be shared by the GPU and vice versa. There's not a deep level of integration, but it beginsthe transition to the most important step - exploitation.

    The final step in the evolution of APU is where the CPU and GPU are truly integrated,and the GPU is accessed by user mode instructions just like the CPU. You can expect to

    talk to the GPU via extensions to the x86 ISA, and the GPU will have its own register file

    (much like FP and integer units each have their own register files). Elements of the

    architecture will be shared, especially things like the cache hierarchy, which will proveuseful when running applications that require both CPU and GPU power.

    The GPU could easily be integrated onto a single die as a separate core behind a shared

    L3 cache. For example, if you look at the current Barcelona architecture you have fourhomogenous cores behind a shared L3 cache and memory controller; simply swap one of

    those cores with a GPU core and you've got an idea of what one of these chips could looklike. Instructions that can only be processed by the specialized core will be dispatched

    directly to it, while instructions better suited for other cores will be sent to them. There

    would have to be a bit of front end logic to manage all of this, but it's easily done.

    Chapter 5

    APU in Consumer Electronics

    The potential of Fusion extends far beyond the PC space and into the embedded

    space. If you can imagine a very low power, low profile Fusion CPU, you can easily see

    it being used in not only PCs but consumer electronics devices as well. The benefit is thatyour CE devices could run the same applications as your PC devices, truly encouraging

    and enabling convergence and cohabitation between CE and PC devices.

    Despite both sides attempting to point out how they are different, AMD and Intel

    actually have very similar views on where the microprocessor industry is headed. Both

    companies have stated to us that they have no desire to engage in the "core wars", as inwe won't see a race to keep adding cores. The explanation for why not is the same onethat applied to the GHz race: if you scale exclusively in one direction (clock speed or

    number of cores), you will eventually run into the same power wall. The true path to

    performance is a combination of increasing instruction level parallelism, clock speed, andnumber of cores in line with the demands of the software you're trying to run.

    AMD has been a bit more forthcoming than Intel in this respect by indicating that

    Department of Electronics and Communication,College of Engineering , Adoor 8

  • 8/3/2019 Accelerated Processing Unit

    9/15

    Accelerated Processing Unit

    it doesn't believe that there's a clear sweet spot, at least for desktop CPUs. AMD doesn't

    believe there's enough data to conclude whether 3, 4, 6 or 8 cores are the ideal number for

    desktop processors. From our testing with Intel's V8 platform, an 8-core platformtargeted at the high end desktop, it is extremely difficult finding high end desktop

    applications that can even benefit from 8 cores over 4. Our instincts tell us that for

    mainstream desktops, 3 - 4 general purpose x86 cores appears to be the near term targetthat makes sense. You could potentially lower the number of cores needed if you

    combine other specialized hardware (e.g. an H.264 encode/decode core).

    What's particularly interesting is that many of the same goals Intel has for the

    future of its x86 processors are in line with what AMD has planned. For the past couple

    of IDFs Intel has been talking about bringing to market a < 0.5W x86 core that can be

    used for devices that are somewhere in size and complexity between a cell phone and anUMPC (e.g. iPhone). Intel has committed to delivering such a core in 2008 called

    Silverthorne, based around a new micro-architecture designed for these ultra low power

    environments.

    AMD confirmed that it too envisions ultra low power x86 cores for use in

    consumer electronics devices, areas where ARM or other specialized cores are commonlyused. AMD also recognizes that it can't address this market by simply reducing clock

    speed of its current processors, and thus AMD mentioned that it is working on a separate

    micro-architecture to address these ultra low power markets. AMD didn't attribute any

    timeframe or roadmap to its plans, but knowing what we know about Fusion's debut we'dexpect a lower power version targeted at UMPC and CE markets that make up all the

    sales are scheduled to follow as early as possible.

    Why even think about bringing x86 cores to CE devices like digital TVs or

    smartphones? AMD offered one clear motivation: the software stack that will run on

    these devices is going to get more complex. Applications on TVs, cell phones and otherCE devices will get more complex to the point where they will require faster processors.

    Combine that with the fact that software developers don't want to target multiple

    processor architectures when they deliver software for these CE devices, and by usingx86 as the common platform between CE and PC software you end up creating an entire

    environment where the same applications and content can be available across any device.

    The goal of PC/CE convergence is to allow users to have access to any content, on any

    device, anywhere - if all the devices you're trying to gain access to content/programs onhappen to all be x86, it makes the process much easier.

    Why is a new core necessary? Although x86 can be applied to virtually anymarket segment, the range of usefulness of a particular core can extend throughout an

    order of magnitude of power. For example, AMD's current desktop cores can easily be

    scaled up or down to hit TDPs in the 10W - 100W range, but they would not be good forhitting something in the sub-1W range. AMD can easily address the sub-1W market, but

    it will require a different core from what it addresses the rest of the market with. This

    philosophy is akin to what Intel discovered with Centrino; in order to succeed in the

    mobile market, you need a mobile specific design. To succeed in the ultra mobile and

    Department of Electronics and Communication,College of Engineering , Adoor 9

  • 8/3/2019 Accelerated Processing Unit

    10/15

    Accelerated Processing Unit

    handtop markets, you need an ultra mobile/handtop specific processor design as well.

    Both AMD and Intel realize this, and now both companies have publicly stated that they

    are planning to do something about this recent consumer requirements.

    Chapter 6

    New Era of Software Development

    The GPU is ushering in a new age for software developers. Thats because theGPU is no longer just about visualization or high-end graphics. Sure, those are important

    functions, but new software and applications will more fully leverage the latent

    capabilities of the GPU as it takes its place alongside the CPU as a powerfulcomputational engine. This merging of CPU and GPU processing power, combined with

    the changing face of the Internet, promises to drive software to the next level of

    innovation.

    As Wired Magazine boldly declared recently, the Internet isnt just about webbrowsing anymore; its about instant communication and the applications and data to

    deliver video, photos and audio. The changing dynamics of the Internet are putting

    mobility at a premium and driving consumers increasingly into the market for thebroadening range of mobile devices Smartphone, tablets, netbooks, and notebooks.

    Department of Electronics and Communication,College of Engineering , Adoor 10

  • 8/3/2019 Accelerated Processing Unit

    11/15

    Accelerated Processing Unit

    What better time for the emergence of the APU a processor that will combine the power

    of the CPU and GPU onto a single chip in a small, power-saving format.

    Software developers have already started to ask, how do I embrace the new ageof GPU and APU computing? Luckily, AMD is in the trenches working with

    industry leaders on the tools and standards needed to help smooth the transition. Asweve touched on in previous blog posts, AMD supports:

    OpenCL: OpenCL is an open standard framework for writing parallel programsto execute across heterogeneous platforms consisting of CPUs, GPUs, and other

    processors. Notably, the standard enables applications to access the GPU for non-

    graphical computing and to balance computation between the CPU and GPU,

    therefore making it the perfect development environment for the APU. Wereseeing a lot of exciting innovation happening around OpenCL, such as

    MainConcepts new OpenCL H.264/AVC Encoder. MainConcept offers a flexible

    and powerful software development kit so other software developers can easily

    add OpenCL accelerated encoding to their own solutions. OpenCL is also helpingto drive developments around more natural user interfaces like touch and gesture

    and object and facial recognition as well as allowing developers to harness thepower of the GPU for productivity in HD video conferencing and virus scanning.

    Microsofts DirectX: DirectX, Microsofts Windows graphics technology,

    provides a collection of APIs that developers can use for handling tasks related to

    multimedia. Its been widely used by Windows developers for games and videoapplications and is catching the attention of a larger group of developers by

    enabling code to be offloaded to the GPU. DirectX APIs include:o D2D: a hardware-accelerated 2-D graphics API that provides high

    performance and high-quality rendering for 2-D geometry, bitmaps, and

    text. D2D drives your day-to-day software experience to a new level,

    particularly when it comes to online gaming and productivity applications.The next generation of web browsers is making use of D2D technology,

    including Microsofts IE9 beta and Mozillas FireFox4 beta.

    o Directcompute: Another API set of DirectX, Directcompute provides

    programmers with a more flexible way to access the computational

    capability of GPUs that support DirectX 10 and DirectX 11.

    Cyberlinks MediaShow 5s FaceMe Technology, which is designed to

    quickly identify faces in photos, is optimized for Microsoft DirectX 11Directcompute.

    OpenGL: OpenGL is another standard specification defining a cross-language,

    cross-platform API for writing applications that produce 2D and 3D computergraphics such as in content design software and high-end games. While OpenGL

    Department of Electronics and Communication,College of Engineering , Adoor 11

  • 8/3/2019 Accelerated Processing Unit

    12/15

    Accelerated Processing Unit

    isnt new, it does have noteworthy new functionality that simplifies porting

    between mobile and desktop platforms and increases interoperability with

    OpenCL. The recently released OpenGL 4.0 specification also includes update tothe OpenGL Shading language which lets developers better utilize the GPU

    acceleration.

    Chapter 7

    Practicality

    Although its exciting to look at the new applications that will finally becomepractical in the Fusion era, the fact remains that most users will want their new APU-

    based systems to handle a mix of traditional applications for office productivity and

    Internet access, along with those new exciting apps. Fortunately, the changes AMD made

    to enable new APU-accelerated applications can also help existing applications run betteras well.

    Many of these improvements stem from AMDs ability to fit the CPU cores, GPU

    cores and North Bridge (the part of the chip where the memory controller and PCI-

    express interfaces reside) onto a single piece of silicon. As noted earlier, this eliminates achip-to-chip linkage that adds latency to memory operations and consumes power. It

    takes less energy to move electrons across a chip than to move those same electrons

    Department of Electronics and Communication,College of Engineering , Adoor 12

  • 8/3/2019 Accelerated Processing Unit

    13/15

    Accelerated Processing Unit

    between two chips, and the power saved by this small change alone can help significantly

    increase system battery life. The co-location of all key elements on one chip also allows

    AMD to take a holistic approach to power management on these APUs. They can powervarious parts of the chip up and down depending on workloads, squeezing out a few

    milliwatts here and another few milliwatts there which in the aggregate can amount to

    significant power savings.

    Finally, some of the improvements can be attributed to the advanced GPU

    technology AMD embeds in its APU offerings. Although the company has yet to revealthe technical specs of these GPUs, it has disclosed they will be DirectX 11-compliant.

    These will be the first APU-based systems that can support DirectX 11s enhanced visual

    experience without a discrete GPU, and thus will represent a cost-effective solution forsystems developers

    Chapter 8

    Conclusion

    Since the days of the earliest personal computers, each major advance in system

    capability has enabled innovative software developers to create new products that opened

    new markets. The Apple II gave us VisiCalc, the first spreadsheet. The original IBM PC

    led to Lotus 1-2-3, the first spreadsheet with graphics. The Macintosh ushered in an eraof desktop publishing that has forever changed the way the world creates and distributes

    information.

    The dramatic increase in performance enabled by AMD Fusion technology can

    create new opportunities for entrepreneurial developers to innovate and make the world abetter and richer place. Along the way, they may enrich themselves as well. Thats the

    way the system is supposed to work.

    Department of Electronics and Communication,College of Engineering , Adoor 13

  • 8/3/2019 Accelerated Processing Unit

    14/15

    Accelerated Processing Unit

    More importantly, compared to todays mainstream offerings, APU-based

    platforms will possess prodigious amounts of computational horsepower. This processing

    power will allow developers to tackle problems that lie beyond the capabilities of todaysmainstream systems,and will enable innovative developers to step up and update existing

    applications or invent new ones that take advantage of GPU acceleration. These features

    will be a standard part of every APU. Over time, even the most affordable PCs can beexpected to have the computational performance of yesterdays million dollar

    mainframes with all day battery life.

    Of course, few users will want to run the same applications on tomorrows

    notebooks that they ran on yesterdays mainframes and supercomputers. They will likely

    want to run applications that help them in their everyday lives, doing tasks they cannotaccomplish on the systems they own today. They may want to use facial recognition

    software to sort their photos and videos, or even to help them identify people they meet

    on the street or actors they see in movies. They may want the on-screen appearance of thevideos they stream to approach that of the HD content on their TVs, even when

    bandwidth constrains that content to a low resolution format.

    For the hardware developer, ODM or PC manufacturer, its time to start thinking

    about how to incorporate these new APUs into product lines in order to enhance the

    consumer experience. Software developers should look to this new power to help theirsoftware run even better. All developers are encouraged to upgrade their skills and learn

    about OpenCL and DirectCompute, and to examine current software projects to see how

    they can be improved in a world where systems have dramatically more power. Becausepretty soon, they will.

    Reference

    The Industry-Changing Impact of Accelerated Computing

    o Nathan Brookwood

    fusion.amd.com

    Department of Electronics and Communication,College of Engineering , Adoor 14

  • 8/3/2019 Accelerated Processing Unit

    15/15

    Accelerated Processing Unit

    http://www.anandtech.com/show/2229

    http://sites.amd.com/us/fusion/APU/Pages/fusion.aspx

    http://www.dailytech.com/article.aspx?newsid=4696

    http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter34.html