All about graphic processing unit and their working

What is a GPU???????

A graphics processing unit (GPU), also occasionally called visual processing unit (VPU), is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display.

Where from the GPU came???

The term GPU was popularized by Nvidia in 1999, who marketed the GeForce 256 as "the world's first 'GPU', or Graphics Processing Unit, a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that are capable of processing a minimum of 10 million polygons per second". Rival ATI Technologies coined the term visual processing unit or VPU with the release of the Radeon 9700 in 2002.

What is the difference between CPU and GPU????

The CPU (central processing unit) has often been called the brains of the PC. But increasingly, that brain is being enhanced by another part of the PC – the GPU (graphics processing unit), which is its soul. All PCs have chips that render the display images to monitors. But not all these chips are created equal. Intel’s integrated graphics controller provides basic graphics that can display only productivity applications like Microsoft PowerPoint, low-resolution video and basic games.

GPU companies:::

Many companies have produced GPUs under a number of brand names. In 2009, Intel, Nvidia and AMD/ATI were the market share leaders, with 49.4%, 27.8% and 20.6% market share respectively. However, those numbers include Intel's integrated graphics solutions as GPUs. Not counting those numbers, Nvidia and ATI control nearly 100% of the market as of 2008. In addition, S3 Graphics (owned by VIA Technologies) and Matrox produce GPUs.

How Graphics Cards Work????

GPU uses special programming to help it analyze and use data. ATI and nVidia produce the vast majority of GPUs on the market, and both companies have developed their own enhancements for GPU performance. To improve image quality, the processors use:

Full scene anti aliasing (FSAA), which smoothes the edges of 3-D objects

Anisotropic filtering (AF), which makes images look crisper

Parts of graphic card

Heat sink::::

In electronic systems, a heat sink is a passive heat exchanger that cools a device by dissipating heat into the surrounding medium. In computers, heat sinks are used to cool central processing units or graphics processors. Heat sinks are used with high-power semiconductor devices such as power transistors and optoelectronics such as lasers and light emitting diodes (LEDs),

where the heat dissipation ability of the basic device is insufficient to moderate its temperature.

A heat sink is designed to maximize its surface area in contact with the cooling medium surrounding it, such as the air. Air velocity, choice of material, protrusion design and surface treatment are factors that affect the performance of a heat sink. Heat sink attachment methods and thermal interface materials also affect the die temperature of the integrated circuit. Thermal adhesive or thermal grease improve the heat sink's performance by filling air gaps between the heat sink and the device.

Nvidia launches the GTX 780!!!!!

http://www.gpureview.com/nvidia-launches-the-gtx-780-article-969.html

GTX 780 GPU Engine Specs:

CUDA Cores 2304Base Clock (MHz) 863

Boost Clock (MHz) 900Texture Fill Rate (billion/sec) 160.5

GTX 780 Memory Specs:

Memory Speed 6.0gbpsMBStandard Memory Config 3072MBMemory Interface GDDR5Memory Interface Width 384-bitMemory Bandwidth (GB/sec) 288.4

Review by FORBES:::

“As a tech writer, I also have to applaud NVIDIA for a recommendation they issued to the press in their review guide for the GTX 780: “In many cases, the GeForce GTX 780 is so powerful the CPU can actually become a bottleneck. To avoid this from occurring, we highly recommend you conduct your testing on a 2560×1440/2560×1600 display. We also suggest you use the highest graphics settings and high levels of AA in most games.”

AMD Radeon HD 7990 'Malta' Engineering Sample

Price = US $96,100.00

Best graphics cards reviewed and rated

1. Sapphire HD 7770 GHz Edition

£90 (around USD $147, AUD $163) Mid-range card

2. Sapphire HD 7790 Dual-X OC

£125 (around USD $205, AUD $226) Mid-range card

3. AMD Radeon HD 7850

£71 (around USD $116, AUD $128) Performance card

4. Asus GTX 650 Ti Boost DirectCU II OC


5. Nvidia GeForce GTX 760


6. Asus HD 7870 DirectCU II TOP


7. Asus HD 7950 DirectCU II

£265 (around USD $434, AUD $479) Enthusiast card

8. Nvidia Geforce GTX 770


9. Sapphire HD 7970 GHz Edition 3GB


10. Nvidia Geforce GTX 780


11. AMD Radeon HD 7990


12. Nvidia Geforce GTX Titan

£800 (around USD $1,310, AUD $1,447) Enthusiast card

4 Things You Should Know About the New Maxwell GPU Architecture

1. The Heart of Maxwell: More Efficient Multiprocessors

Maxwell introduces an all-new design for the Streaming Multiprocessor (SM) that dramatically improves power efficiency. Although the Kepler SMX design was extremely efficient for its generation, through its development NVIDIA’s GPU architects saw an opportunity for another big leap forward in architectural efficiency; the Maxwell SM is the realization of that vision. Improvements to control logic partitioning, workload balancing, clock-gating granularity, instruction scheduling, number of instructions issued per clock cycle, and many other enhancements allow the Maxwell SM (also called “SMM”) to far exceed Kepler SMX efficiency. The new Maxwell SM architecture enabled us to increase the number of SMs to five in GM107, compared to two in GK107, with only a 25% increase in die area.

Improved Instruction Scheduling

The number of CUDA Cores per SM has been reduced to a power of two, however with Maxwell’s improved execution efficiency, performance per SM is usually within 10% of Kepler performance, and the improved area efficiency of the SM means CUDA cores per GPU will be substantially higher versus comparable Fermi or Kepler chips. The Maxwell SM retains the same number of instruction issue slots per clock and reduces arithmetic latencies compared to the Kepler design.As with SMX, each SMM has four warp schedulers, but unlike SMX, all core SMM functional units are assigned to a particular scheduler, with no shared units. The power-of-two number of CUDA Cores per partition simplifies scheduling, as each of SMM’s warp schedulers issue to a dedicated set of CUDA Cores equal to the warp width. Each warp scheduler still has the flexibility to dual-issue (such as issuing a

math operation to a CUDA Core in the same cycle as a memory operation to a load/store unit), but single-issue is now sufficient to fully utilize all CUDA Cores.

Increased Occupancy for Existing Code

In terms of CUDA compute capability, Maxwell’s SM is CC 5.0. SMM is similar in many respects to the Kepler architecture’s SMX, with key enhancements geared toward improving efficiency without requiring significant increases in available parallelism per SM from the application. The register file size and the maximum number of concurrent warps in SMM are the same as in SMX (64k 32-bit registers and 64 warps, respectively), as is the maximum number of registers per thread (255). However the maximum number of active thread blocks per multiprocessor has been doubled over SMX to 32, which should result in an automatic occupancy improvement for kernels that use small thread blocks of 64 or fewer threads (assuming available registers and shared memory are not the occupancy limiter). Table 1 provides a comparison between key characteristics of Maxwell GM107 and its predecessor Kepler GK107.

Reduced Arithmetic Instruction Latency

Another major improvement of SMM is that dependent arithmetic instruction latencies have been significantly reduced. Because occupancy (which translates to available warp-level parallelism) is the same or better on SMM than on SMX, these reduced latencies improve utilization and throughput.

2. Larger, Dedicated Shared Memory

A significant improvement in SMM is that it provides 64KB of dedicated shared memory per SM—unlike Fermi and Kepler, which partitioned the 64KB of memory between L1 cache and shared memory. The per-thread-block limit remains 48KB on Maxwell, but the increase in total available shared memory can lead to occupancy improvements. Dedicated shared memory is made possible in Maxwell by combining the functionality of the L1 and texture caches into a single unit.

3. Fast Shared Memory Atomics

Maxwell provides native shared memory atomic operations for 32-bit integers and native shared memory 32-bit and 64-bit compare-and-swap (CAS), which can be used to implement other atomic functions. In contrast, the Fermi and Kepler architectures implemented shared memory atomics using a lock/update/unlock pattern that could be expensive in the presence of high contention for updates to particular locations in shared memory.

4. Support for Dynamic Parallelism

Kepler GK110 introduced a new architectural feature called Dynamic Parallelism, which allows the GPU to create additional work for itself. A programming model enhancement leveraging this feature was introduced in CUDA 5.0 to enable threads running on GK110 to launch additional kernels onto the same GPU.

SMM brings Dynamic Parallelism into the mainstream by supporting it across the product line, even in lower-power chips such as GM107. This will benefit developers, because it means that applications will no longer need special-case algorithm implementations for high-end GPUs that differ from those usable in more power constrained environments.

All about graphic processing unit and their working

Documents