Top Banner
operating system Last modified: Friday, January 04, 2002 The most important program that runs on a computer . Every general-purpose computer must have an operating system to run other programs. Operating systems perform basic tasks, such as recognizing input from the keyboard , sending output to the display screen , keeping track of files and directories on the disk , and controlling peripheral devices such as disk drives and printers . For large systems, the operating system has even greater Article: Expression Web 2 for PHP DevelopersSimplifyi ng Your PHP Applications Some of the most important new features in Microsoft's recently released Expression Web 2 involve enhanced support for PHP. Don't think this is just a half hearted effort to appeal to the Open Source web development crowd. Expression Web 2 supports PHP
84
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cam

operating system Last modified: Friday, January 04, 2002 

The most important program that runs on a computer. Every general-purpose computer must have an operating system to run other programs. Operating systems perform basic tasks, such as recognizing input from the keyboard, sending output to the display screen, keeping track of files and directories on the disk, and controlling peripheral devices such as disk drives and printers.

For large systems, the operating system has even greater responsibilities and powers. It is like a traffic cop -- it makes sure that different programs and users running at the same time do not interfere with each other. The operating system is also responsible for security, ensuring that unauthorized users do not access the system.

Operating systems can be classified as follows:

multi-user : Allows two or

 Article: Expression Web 2 for PHP DevelopersSimplifying Your PHP Applications Some of the most important new features in Microsoft's recently released Expression Web 2 involve enhanced support for PHP. Don't think this is just a half hearted effort to appeal to the Open Source web development crowd. Expression Web 2 supports PHP developers with a carefully constructed, feature full treatment which you should seriously consider for your PHP

applications. »  

 Download: Silverlight 2 Beta 2 Runtime

Page 2: Cam

more users to run programs at the same time. Some operating systems permit hundreds or even thousands of concurrent users. multiprocessing : Supports running a program on more than one CPU. multitasking : Allows more than one program to run concurrently. multithreading : Allows different parts of a single program to run concurrently. real time: Responds to input instantly. General-purpose operating systems, such as DOS and UNIX, are not real-time.

Operating systems provide a software platform on top of which other programs, called application programs, can run. The application programs must be written to run on top of a particular operating system. Your choice of operating system, therefore, determines to a great extent the applications you can run. For PCs, the most popular operating systems are DOS, OS/2, and Windows, but others are available, such as Linux.

As a user, you normally interact with the operating system through a set of commands. For example, the DOS operating system contains commands such as COPY and RENAME for copying files and changing the names of files, respectively. The commands are accepted and executed by a part of the operating system called the command processor or command line interpreter. Graphical user interfaces allow you to enter commands by pointing and clicking at objects that appear on the screen.

Page 3: Cam

operating system Last modified: Friday, January 04, 2002 

The most important program that runs on a computer. Every general-purpose computer must have an operating system to run other programs. Operating systems perform basic tasks, such as recognizing input from the keyboard, sending output to the display screen, keeping track of files and directories on the disk, and controlling peripheral devices such as disk drives and printers.

For large systems, the operating system has even greater responsibilities and powers. It is like a traffic cop -- it makes sure that different programs and users running at the same time do not interfere with each other. The operating system is also responsible for security, ensuring that unauthorized users do not access the system.

Operating systems can be classified as follows:

multi-user : Allows two or more users to run programs at the same time. Some operating systems permit hundreds or even thousands of concurrent users. multiprocessing : Supports running a program on more than one CPU. multitasking : Allows more than one program to run concurrently. multithreading : Allows different parts of a single program to run concurrently. real time: Responds to input instantly. General-purpose operating systems, such as DOS and UNIX, are not real-time.

Operating systems provide a software platform on top of which other programs, called application programs, can run. The application programs must be written to

 Article: Expression Web 2 for PHP DevelopersSimplifying Your PHP Applications Some of the most important new features in Microsoft's recently released Expression Web 2 involve enhanced support for PHP. Don't think this is just a half hearted effort to appeal to the Open Source web development crowd. Expression Web 2 supports PHP developers with a carefully constructed, feature full treatment which you should seriously consider for your PHP applications. »  

 Download: Silverlight 2 Beta 2 Runtime Microsoft Silverlight is a cross-browser, cross-platform, and cross-device plug-in for delivering the next generation of .NET based media experiences and rich interactive applications for the Web. Get Silverlight learning resources here and start building Silverlight 2 applications! »  

 Catch All the Olympic Highlights in SilverlightRelive the excitement and check out the video highlights on NBCOlympics.com, the largest online media event ever accomplished on the Web. Powered by Microsoft Silverlight, viewers can catch all the Olympic highlights they missed in high-definition. Check it out now! »  

 Download: Silverlight 2 SDK Beta 2This Silverlight 2 Beta 2 Software Development Kit contains documentation, online samples, libraries and tools for developing Silverlight 2 applications. Check out related resources and see what other developers are also downloading. »  

 Expression Blend 2.5 PreviewUse Expression Blend 2.5 to create and modify managed Silverlight 2-based applications. Expression Blend for Silverlight 2 includes all of the features in Expression Blend 2 but has not reached the quality level of Expression Blend 2 for WPF or Silverlight 1 development. »  

 

Page 4: Cam

run on top of a particular operating system. Your choice of operating system, therefore, determines to a great extent the applications you can run. For PCs, the most popular operating systems are DOS, OS/2, and Windows, but others are available, such as Linux.

As a user, you normally interact with the operating system through a set of commands. For example, the DOS operating system contains commands such as COPY and RENAME for copying files and changing the names of files, respectively. The commands are accepted and executed by a part of the operating system called the command processor or command line interpreter. Graphical user interfaces allow you to enter commands by pointing and clicking at objects that appear on the screen.

Page 5: Cam

operating system Last modified: Friday, January 04, 2002 

Page 6: Cam
Page 7: Cam

Download these IBM resources today!

Webcast: Hacking 101--The Top 10 Attacks in Web Applications Learn about the three most common web application attacks, including how they occur and what can be done to prevent them.

eKit: Web Application Security Discover how IBM Rational AppScan Standard Edition can help you detect vulnerabilities in your Web applications. The new Web Application Security eKit provides you with valuable resources, including whitepapers, demos, and additional information on the benefits of testing your Web applications.

Tutorial: Create Secure Java Applications ProductivelyThis is the first in a two-part tutorial series creating secure Java-based Web applications using Rational Application Developer, Data Studio and Rational AppScan.

eKit: Web 2.0 Developer Take advantage of open, flexible Web 2.0 technologies, like social software and mash-ups. The IBM Web 2.0 Developer eKit has been updated with the latest best practices & technologies from IBM.

Page 8: Cam
Page 9: Cam

Random-access memoryFrom Wikipedia, the free encyclopedia

  (Redirected from RAM)Jump to: navigation, search"RAM" redirects here. For other uses of the word, see Ram.

Example of writable volatile random access memory: Synchronous Dynamic RAM modules, primarily used as main memory in personal computers, workstations, and servers.

Computer memory types

Volatile

DRAM , e.g. DDR SDRAM SRAM Upcoming

o Z-RAM o TTRAM

Historical o Williams tube

o Delay line memory

Non-volatile

ROM o PROM o EAROM o EPROM o EEPROM

Flash memory Upcoming

o FeRAM o MRAM

Page 10: Cam

o CBRAM o PRAM o SONOS o RRAM o Racetrack memory o NRAM

Historical o Drum memory o Magnetic core memory o Plated wire memory o Bubble memory

o Twistor memory

Random-access memory (usually known by its acronym, RAM) is a type of computer data storage. Today it takes the form of integrated circuits that allow the stored data to be accessed in any order, i.e. at random. The word random thus refers to the fact that any piece of data can be returned in a constant time, regardless of its physical location and whether or not it is related to the previous piece of data.[1]

This contrasts with storage mechanisms such as tapes, magnetic discs and optical discs, which rely on the physical movement of the recording medium or a reading head. In these devices, the movement takes longer than the data transfer, and the retrieval time varies depending on the physical location of the next item.

The word RAM is mostly associated with volatile types of memory (such as DRAM memory modules), where the information is lost after the power is switched off. However, many other types of memory are RAM as well (i.e. Random Access Memory), including most types of ROM and a kind of flash memory called NOR-Flash.

Contents

[hide] 1 History 2 Overview

o 2.1 Types of RAM o 2.2 Memory hierarchy

2.2.1 Swapping o 2.3 Other uses of the "RAM" term

2.3.1 RAM disks 2.3.2 Shadow RAM

3 Recent developments 4 Memory wall 5 Security concerns 6 See also

Page 11: Cam

7 Notes and references

8 External links

[edit] History

An early type of widespread writable random access memory was the magnetic core memory, developed in 1949-1951, and subsequently used in most computers up until the development of the static and dynamic integrated RAM circuits in the late 1960s and early 1970s. Before this, computers used relays, delay lines or various kinds of vacuum tube arrangements to implement "main" memory functions (i.e. hundreds or thousands of bits), some of which were random access, some not. Latches built out of vacuum tube triodes, and later, out of discrete transistors, were used for smaller and faster memories such as registers and (random access) register banks. Prior to the development of integrated ROM circuits, permanent (or read-only) random access memory was often constructed using semiconductor diode matrixes driven by address decoders.

[edit] Overview

[edit] Types of RAM

Modern types of writable RAM generally store a bit of data in either the state of a flip-flop, as in SRAM (static RAM), or as a charge in a capacitor (or transistor gate), as in DRAM (dynamic RAM), EPROM, EEPROM and Flash. Some types have circuitry to detect and/or correct random faults called memory errors in the stored data, using parity bits or error correction codes. RAM of the read-only type, ROM, instead uses a metal mask to permanently enable/disable selected transistors, instead of storing a charge in them.

As both SRAM and DRAM are volatile, other forms of computer storage, such as disks and magnetic tapes, have been used as "permanent" storage in traditional computers. Many newer products instead rely on flash memory to maintain data between sessions of use: examples include PDAs, small music players, mobile phones, synthesizers, advanced calculators, industrial instrumentation and robotics, and many other types of products; even certain categories of personal computers, such as the OLPC XO-1, Asus Eee PC, and others, have begun replacing magnetic disk with so called flash drives (similar to fast memory cards equipped with an IDE or SATA interface).

There are two basic types of flash memory: the NOR type, which is capable of true random access, and the NAND type, which is not; the former is therefore often used in place of ROM, while the latter is used in most memory cards and solid-state drives, due to a lower price.

[edit] Memory hierarchy

Page 12: Cam

Many computer systems have a memory hierarchy consisting of CPU registers, on-die SRAM caches, external caches, DRAM, paging systems, and virtual memory or swap space on a hard drive. This entire pool of memory may be referred to as "RAM" by many developers, even though the various subsystems can have very different access times, violating the original concept behind the random access term in RAM. Even within a hierarchy level such as DRAM, the specific row, column, bank, rank, channel, or interleave organization of the components make the access time variable, although not to the extent that rotating storage media or a tape is variable. (Generally, the memory hierarchy follows the access time with the fast CPU registers at the top and the slow hard drive at the bottom.)

In many modern personal computers, the RAM comes in an easily upgraded form of modules called memory modules or DRAM modules about the size of a few sticks of chewing gum. These can quickly be replaced should they become damaged or too small for current purposes. As suggested above, smaller amounts of RAM (mostly SRAM) are also integrated in the CPU and other ICs on the motherboard, as well as in hard-drives, CD-ROMs, and several other parts of the computer system. The overall goal of using a memory hierarchy is to obtain the higher possible average access speed while minimizing the total cost of entire memory system.

[edit] Swapping

If a computer becomes low on RAM during intensive application cycles, the computer can perform an operation known as "swapping". When this occurs, the computer temporarily uses hard drive space as additional memory. Constantly relying on this type of backup memory is called thrashing, which is generally undesirable because it lowers overall system performance. In order to reduce the dependency on swapping, more RAM can be installed.

[edit] Other uses of the "RAM" term

Other physical devices with read/write capability can have "RAM" in their names: for example, DVD-RAM. "Random access" is also the name of an indexing method: hence, disk storage is often called "random access" because the reading head can move relatively quickly from one piece of data to another, and does not have to read all the data in between. However the final "M" is crucial: "RAM" (provided there is no additional term as in "DVD-RAM") always refers to a solid-state device.

[edit] RAM disks

Software can "partition" a portion of a computer's RAM, allowing it to act as a much faster hard drive that is called a RAM disk. Unless the memory used is non-volatile, a RAM disk loses the stored data when the computer is shut down. However, volatile memory can retain its data when the computer is shut down if it has a separate power source, usually a battery.

Page 13: Cam

[edit] Shadow RAM

Sometimes, the contents of a ROM chip is copied to SRAM or DRAM to allow for shorter access times (as ROM may be slower). The ROM chip is then disabled while the initialized memory locations are switched in on the same block of addresses (often write-protected). This process, sometimed called shadowing, is fairly common in both computers and embedded systems.

As a common example, the BIOS in typical personal computers often have an option called “use shadow BIOS” or similar. When enabled, functions relying on data from the BIOS’s ROM will instead use DRAM locations (most can also toggle shadowing of video card ROM or other ROM sections). Depending on the system, this may or may not give a performance boost. On some systems the benefit may be hypothetical because the BIOS is not used after booting in favour of direct hardware hardware access. Of course, somewhat less free memory is available when shadowing is enabled.[2]

[edit] Recent developments

Several new types of non-volatile RAM , which will preserve data while powered down, are under development. The technologies used include carbon nanotubes and the magnetic tunnel effect. In summer 2003, a 128 KB magnetic RAM chip manufactured with 0.18 µm technology was introduced. The core technology of MRAM is based on the magnetic tunnel effect. In June 2004, Infineon Technologies unveiled a 16 MB [3] prototype again based on 0.18 µm technology. Nantero built a functioning carbon nanotube memory prototype 10 GB [3] array in 2004. Whether some of these technologies will be able to eventually take a significant market share from either DRAM, SRAM, or flash-memory technology, however, remains to be seen.

Since 2006, "Solid-state drives" (based on flash memory) with capacities exceeding 150 gigabytes and speeds far exceeding traditional disks have become available. This development has started to blur the definition between traditional random access memory and "disks", dramatically reducing the difference in performance.

[edit] Memory wall

The "memory wall" is the growing disparity of speed between CPU and memory outside the CPU chip. An important reason for this disparity is the limited communication bandwidth beyond chip boundaries. From 1986 to 2000, CPU speed improved at an annual rate of 55% while memory speed only improved at 10%. Given these trends, it was expected that memory latency would become an overwhelming bottleneck in computer performance. [4]

Currently, CPU speed improvements have slowed significantly partly due to major physical barriers and partly because current CPU designs have already hit the memory

Page 14: Cam

wall in some sense. Intel summarized these causes in their Platform 2015 documentation (PDF)

“First of all, as chip geometries shrink and clock frequencies rise, the transistor leakage current increases, leading to excess power consumption and heat (more on power consumption below). Secondly, the advantages of higher clock speeds are in part negated by memory latency, since memory access times have not been able to keep pace with increasing clock frequencies. Third, for certain applications, traditional serial architectures are becoming less efficient as processors get faster (due to the so-called Von Neumann bottleneck), further undercutting any gains that frequency increases might otherwise buy. In addition, partly due to limitations in the means of producing inductance within solid state devices, resistance-capacitance (RC) delays in signal transmission are growing as feature sizes shrink, imposing an additional bottleneck that frequency increases don't address.”

The RC delays in signal transmission were also noted in Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures which projects a maximum of 12.5% average annual CPU performance improvement between 2000 and 2014. The data on Intel Processors clearly shows a slowdown in performance improvements in recent processors. However, Intel's new processors, Core 2 Duo (codenamed Conroe) show a significant improvement over previous Pentium 4 processors; due to a more efficient architecture, performance increased while clock rate actually decreased.

[edit] Security concerns

Contrary to simple models (and perhaps common belief), the contents of modern SDRAM modules aren't lost immediately when the computer is shutdown; instead, the contents fade away, a process that takes only seconds at room temperatures, but which can be extended to minutes at low temperatures. It is therefore possible to get hold of an encryption key if it was stored in ordinary working memory (i.e. the SDRAM modules).[5]

This is sometimes refered to as a cold boot attack.

[edit] See also

Wikimedia Commons has media related to: RAM

SRAM (Static RAM) DRAM (Dynamic RAM)

o FPM (Fast Page Mode DRAM) o EDO RAM (Extended Data Out DRAM) o BEDO RAM (Burst Extended Data Out DRAM) o SDRAM (Synchronous DRAM)

DDR SDRAM (Double Data Rate SDRAM) DDR2 SDRAM DDR3 SDRAM

Rambus DRAM XDR DRAM

Page 15: Cam

RIMM , SIMM, DIMM (RAM-packages) SO-DIMM and MicroDIMM (Laptop RAM-packages)

"CMOS RAM"

CAS latency (CL) Dual-channel architecture ECC (Error-correcting code) Registered/Buffered memory

Non-Volatile RAM (NVRAM) STT RAM (Spin Torque Transfer RAM)

Compact Flash , SD Card, xD Card etc

DVD-RAM

[edit] Notes and references

1. ̂ Strictly speaking, modern types of DRAM are therefore not truly (or technically) random access, as data are read in burst; the name DRAM has stuck however.

2. ̂ "Shadow Ram" (HTML). Retrieved on 2007-07-24. 3. ^ a b Transistorized memory, such as RAM and cache sizes (other than solid state disk

devices such as USB drives, CompactFlash cards, and so on) as well as CD-based storage size are specified using binary meanings for K (10241), M (10242), G (10243), ...

4. ̂ The term was coined in Hitting the Memory Wall: Implications of the Obvious (PDF). 5. ̂ Cold Boot Attacks on Encryption Keys

[edit] External links

Types of RAM and Memory Management How RAM Works – Article by Jeff Tyson and Dave Coustan What kind of RAM – Pictures and descriptions of RAM from Darrell's computer

help and information site

Page 16: Cam

Read Only MemoryFrom Wikipedia, the free encyclopedia

  (Redirected from Read-only memory)Jump to: navigation, search

The notion of read-only data can also refer to file system permissions.

Computer memory types

Volatile

DRAM , e.g. DDR SDRAM SRAM Upcoming

o Z-RAM o TTRAM

Historical o Williams tube

o Delay line memory

Non-volatile

ROM o PROM o EAROM o EPROM o EEPROM

Flash memory Upcoming

o FeRAM o MRAM o CBRAM o PRAM o SONOS o RRAM o Racetrack memory o NRAM

Historical o Drum memory o Magnetic core memory o Plated wire memory o Bubble memory

Page 17: Cam

o Twistor memory

Read-only memory (usually known by its acronym, ROM) is a class of storage media used in computers and other electronic devices. Because data stored in ROM cannot be modified (at least not very quickly or easily), it is mainly used to distribute firmware (software that is very closely tied to specific hardware, and unlikely to require frequent updates).

In its strictest sense, ROM refers only to mask ROM (the oldest type of solid state ROM), which is fabricated with the desired data permanently stored in it, and thus can never be modified. However, more modern types such as EPROM and flash EEPROM can be erased and re-programmed multiple times; they are still described as "read-only memory" because the reprogramming process is generally infrequent, comparatively slow, and often does not permit random access writes to individual memory locations. Despite the simplicity of mask ROM, economies of scale and field-programmability often make reprogrammable technologies more flexible and inexpensive, so that mask ROM is rarely used in new products as of 2007.

Contents

[hide] 1 History

o 1.1 Use of ROM for program storage o 1.2 ROM for data storage

2 Types of ROMs o 2.1 Semiconductor based o 2.2 Other technologies

2.2.1 Historical examples 3 Speed of ROMs

o 3.1 Reading speed o 3.2 Writing speed

4 Endurance and data retention 5 ROM images 6 See also 7 Terminology

8 References

[edit] History

The simplest type of solid state ROM is as old as semiconductor technology itself. Combinatorial logic gates can be joined manually to map n-bit address input onto arbitrary values of m-bit data output (a look-up table). With the invention of the integrated circuit came mask ROM. Mask ROM consists of a grid of word lines (the

Page 18: Cam

address input) and bit lines (the data output), selectively joined together with transistor switches, and can represent an arbitrary look-up table with a regular physical layout and predictable propagation delay.

In mask ROM, the data is physically encoded in the circuit, so it can only be programmed during fabrication. This leads to a number of serious disadvantages:

1. It is only economical to buy mask ROM in large quantities, since users must contract with a foundry to produce a custom design.

2. The turnaround time between completing the design for a mask ROM and receiving the finished product is long, for the same reason.

3. Mask ROM is impractical for R&D work since designers frequently need to modify the contents of memora means to receive the program contents from an external source (e.g. a personal computer via a serial cable). Flash memory, invented at Toshiba in the mid-1980s, and commercialized in the early 1990s, is a form of EEPROM that makes very efficient use of chip area and can be erased and reprogrammed thousands of times without damage.

All of these technologies improved the flexibility of ROM, but at a significant cost-per-chip, so that in large quantities mask ROM would remain an economical choice for many years. (Decreasing cost of reprogrammable devices had almost eliminated the market for mask ROM by the year 2000.) Furthermore, despite the fact that newer technologies were increasingly less "read-only," most were envisioned only as replacements for the traditional use of mask ROM.

The most recent development is NAND flash, also invented by Toshiba. Its designers explicitly broke from past practice, stating plainly that "the aim of NAND Flash is to replace hard disks,"[1] rather than the traditional use of ROM as a form of non-volatile primary storage. As of 2007, NAND has partially achieved this goal by offering throughput comparable to hard disks, higher tolerance of physical shock, extreme miniaturization (in the form of USB flash drives and tiny microSD memory cards, for example), and much lower power consumption.

[edit] Use of ROM for program storage

Every stored-program computer requires some form of non-volatile storage to store the initial program that runs when the computer is powered on or otherwise begins execution (a process known as bootstrapping, often abbreviated to "booting" or "booting up"). Likewise, every non-trivial computer requires some form of mutable memory to record changes in its state as it executes.

Forms of read-only memory were employed as non-volatile storage for programs in most early stored-program computers, such as ENIAC after 1948 (until then it was not a stored-program computer as every program had to be manually wired into the machine, which could take days to weeks). Read-only memory was simpler to implement since it required only a mechanism to read stored values, and not to change them in-place, and

Page 19: Cam

thus could be implemented with very crude electromechanical devices (see historical examples below). With the advent of integrated circuits in the 1960s, both ROM and its mutable counterpart static RAM were implemented as arrays of transistors in silicon chips; however, a ROM memory cell could be implemented using fewer transistors than an SRAM memory cell, since the latter requires a latch (comprising 5-20 transistors) to retain its contents, while a ROM cell might consist of the absence (logical 0) or presence (logical 1) of a single transistor connecting a bit line to a word line.[2] Consequently, ROM could be implemented at a lower cost-per-bit than RAM for many years.

Many home computers of the 1980s stored a BASIC interpreter or operating system in ROM. ROM was more economical than RAM, and other forms of non-volatile storage such as magnetic disk drives were too expensive to be included with every home computer. For example, the celebrated Commodore 64 included 64 KiB of RAM and 20 KiB of ROM contained a BASIC interpreter and the "KERNAL" (sic) of its operating system. Later home or office computers such as the IBM PC XT often included magnetic disk drives, and larger amounts of RAM, allowing them to load their operating systems from disk into RAM, with only a minimal hardware initialization core and bootloader remaining in ROM (known as the BIOS in IBM-compatible computers). This arrangement allowed for a more complex and easily upgradeable operating system.

In modern general-purpose computers, there is little reason to store any program code or data in read-only memory: secondary storage devices such as hard disks are fast, ubiquitous, and rapidly decreasing in cost per bit, and large capacity dynamic RAM modules are cheaper than ROM thanks to economies of scale and more efficient designs. In modern PCs, ROM is used only to store basic bootstrapping firmware, such as the legacy BIOS which persists in most x86-based systems; even this limited "read-only" memory is likely to be implemented as Flash ROM (see below) to permit in-place reprogramming should the need for a firmware upgrade arise.

ROM and its successor technologies remain prevalent in embedded systems, such as MP3 players, set-top boxes, and broadband routers, all of which are designed to achieve more restricted functions than general-purpose computers, but which are nonetheless based on general-purpose microprocessors in most cases. These devices often store all of their program code in ROM since they usually lack mass storage peripherals (e.g. hard disks) for reasons of cost, portability, and power consumption. Furthermore, since the software is usually tightly coupled to the hardware, changes to the software are rarely needed. Nonetheless, as of 2007 nearly all of these devices use Flash rather than mask ROM, and many provide some means to connect the device to a personal computer for firmware updates (for example, a digital audio player's firmware might be updated to support a new music file format). Hobbyists have taken advantage of this flexibility to reprogram such devices to new purposes; for example, the iPodLinux and OpenWRT projects have enabled users to run full-featured Linux distributions on their MP3 players and wireless routers, respectively.

ROM is also useful for binary storage of cryptographic data, as it makes them difficult to replace, which may be desirable in order to enhance information security.

Page 20: Cam

[edit] ROM for data storage

Since ROM (at least in hard-wired mask form) cannot be modified, it is really only suitable for storing data which is not expected to need modification for the life of the device. To that end, ROM has been used in many computers to store look-up tables for the evaluation of mathematical and logical functions (for example, a floating-point unit might tabulate the sine function in order to facilitate faster computation). This was especially effective when CPUs were slow and ROM was cheap compared to RAM.

Notably, the display adapters of early personal computers stored tables of bitmapped font characters in ROM. This usually meant that the text display font could not be changed interactively. This was the case for both the CGA and MDA adapters available with the IBM PC XT.

The use of ROM to store such small amounts of data has disappeared almost completely in modern general-purpose computers. However, Flash ROM has taken over a new role as a medium for mass storage or secondary storage of files .

[edit] Types of ROMs

The first EPROM, an Intel 1702, with the die and wire bonds clearly visible through the erase window.

[edit] Semiconductor based

Classic mask-programmed ROM chips are integrated circuits that physically encode the data to be stored, and thus it is impossible to change their contents after fabrication. Other types of non-volatile solid-state memory permit some degree of modification:

Programmable read-only memory (PROM), or one-time programmable ROM (OTP), can be written to or programmed via a special device called a PROM programmer. Typically, this device uses high voltages to permanently destroy or create internal links (fuses or antifuses) within the chip. Consequently, a PROM can only be programmed once.

Erasable programmable read-only memory (EPROM) can be erased by exposure to strong ultraviolet light (typically for 10 minutes or longer), then

Page 21: Cam

rewritten with a process that again requires application of higher than usual voltage. Repeated exposure to UV light will eventually wear out an EPROM, but the endurance of most EPROM chips exceeds 1000 cycles of erasing and reprogramming. EPROM chip packages can often be identified by the prominent quartz "window" which allows UV light to enter. After programming, the window is typically covered with a label to prevent accidental erasure. Some EPROM chips are factory-erased before they are packaged, and include no window; these are effectively PROM.

Electrically erasable programmable read-only memory (EEPROM) is based on a similar semiconductor structure to EPROM, but allows its entire contents (or selected banks) to be electrically erased, then rewritten electrically, so that they need not be removed from the computer (or camera, MP3 player, etc.). Writing or flashing an EEPROM is much slower (milliseconds per bit) than reading from a ROM or writing to a RAM (nanoseconds in both cases), since available densities are not as great and the cost per bit is higher.

o Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be modified one bit at a time. Writing is a very slow process and again requires higher voltage (usually around 12 V) than is used for read access. EAROMs are intended for applications that require infrequent and only partial rewriting. EAROM may be used as non-volatile storage for critical system setup information; in many applications, EAROM has been supplanted by CMOS RAM supplied by mains power and backed-up with a lithium battery.

o Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory can be erased and rewritten faster than ordinary EEPROM, and newer designs feature very high endurance (exceeding 1,000,000 cycles). Modern NAND flash makes efficient use of silicon chip area, resulting in individual ICs with a capacity as high as 16 GB as of 2007; this feature, along with its endurance and physical durability, has allowed NAND flash to replace magnetic in some applications (such as USB flash drives). Flash memory is sometimes called flash ROM or flash EEPROM when used as a replacement for older ROM types, but not in applications that take advantage of its ability to be modified quickly and frequently.

By applying write protection, some types of reprogrammable ROMs may temporarily become read-only memory.

[edit] Other technologies

There are other types of non-volatile memory which are not based on solid-state IC technology, including:

Optical storage media, such CD-ROM which is read-only (analogous to masked ROM). CD-R is Write Once Read Many (analogous to PROM), while CD-RW

Page 22: Cam

supports erase-rewrite cycles (analogous to EEPROM); both are designed for backwards-compatibility with CD-ROM.

[edit] Historical examples

Transformer matrix ROM (TROS), from the IBM System 360/20 Diode matrix ROM, used in small amounts in many computers in the 1960s as

well as electronic desk calculators and keyboard encoders for terminals. This ROM was programmed by installing discrete semiconductor diodes at selected locations between a matrix of word line traces and bit line traces on a printed circuit board.

Resistor , capacitor, or transformer matrix ROM, used in many computers until the 1970s. Like diode matrix ROM, it was programmed by placing components at selected locations between a matrix of word lines and bit lines. ENIAC's Function Tables were resistor matrix ROM, programmed by manually setting rotary switches. Various models of the IBM System/360 and complex peripherial devices stored their microcode in either capacitor (called BCROS for Balanced Capacitor Read Only Storage on the 360/50 & 360/65 or CCROS for Card Capacitor Read Only Storage on the 360/30) or transformer (called TROS for Transformer Read Only Storage on the 360/20, 360/40 and others) matrix ROM.

Core rope , a form of transformer matrix ROM technology used where size and/or weight were critical. This was used in NASA/MIT's Apollo Spacecraft Computers, DEC's PDP-8 computers, and other places. This type of ROM was programmed by hand by weaving "word line wires" inside or outside of ferrite transformer cores.

The perforated metal character mask ("stencil") in Charactron cathode ray tubes, which was used as ROM to shape a wide electron beam to form a selected character shape on the screen either for display or a scanned electron beam to form a selected character shape as an overlay on a video signal.

Various mechanical devices used in early computing equipment. A machined metal plate served as ROM in the dot matrix printers on the IBM 026 and IBM 029 key punches.

Page 23: Cam

[edit] Speed of ROMs

[edit] Reading speed

Although the relative speed of RAM vs. ROM has varied over time, as of 2007 large RAM chips can be read faster than most ROMs. For this reason (and to make for uniform access), ROM content is sometimes copied to RAM or shadowed before its first use, and subsequently read from RAM.

[edit] Writing speed

For those types of ROM that can be electrically modified, writing speed is always much slower than reading speed, and it may require unusually high voltage, the movement of jumper plugs to apply write-enable signals, and special lock/unlock command codes. Modern NAND Flash achieves the highest write speeds of any rewritable ROM technology, with speeds as high as 15 MiB/s (or 70 ns/bit), by allowing (indeed requiring) large blocks of memory cells to be written simultaneously.

[edit] Endurance and data retention

Because they are written by forcing electrons through a layer of electrical insulation onto a floating transistor gate, rewriteable ROMs can withstand only a limited number of write and erase cycles before the insulation is permanently damaged. In the earliest EAROMs, this might occur after as few as 1,000 write cycles, while in modern Flash EEPROM the endurance may exceed 1,000,000, but it is by no means infinite. This limited endurance, as well as the higher cost per bit, means that Flash-based storage is unlikely to completely supplant magnetic disk drives in the near future.

The timespan over which a ROM remains accurately readable is not limited by write cycling. The data retention of EPROM, EAROM, EEPROM, and Flash may be limited by charge leaking from the floating gates of the memory cell transistors. Leakage is exacerbated at high temperatures or in high-radiation environments. Masked ROM and fuse/antifuse PROM do not suffer from this effect, as their data retention depends on physical rather than electrical permanence of the integrated circuit (although fuse re-growth was once a problem in some systems).

[edit] ROM images

Main article: ROM image

The contents of ROM chips in video game console cartridges can be extracted with special software or hardware devices. The resultant memory dump files are known as ROM images, and can be used to produce duplicate cartridges, or in console emulators. The term originated when most console games were distributed on cartridges containing

Page 24: Cam

ROM chips, but achieved such widespread usage that it is still applied to images of newer games distributed on CD-ROMs or other optical media.

ROM images of commercial games usually contain copyrighted software. The unauthorized copying and distribution of copyrighted software is usually a violation of copyright laws (in some jurisdictions duplication of ROM cartridges for backup purposes may be considered fair use). Nevertheless, there is a thriving community engaged in the illegal distribution and trading of such software. In such circles, the term "ROM images" is sometimes shortened simply to "ROMs" or sometimes changed to "romz" to highlight the connection with "warez".

[edit] See also

Random access memory PROM EPROM EEPROM Flash memory

[edit] Terminology

EEPROM Electrically Erasable Programmable Read-Only Memory

EPROM Erasable Programmable Read-Only Memory

PROM Programmable read-only memory

[edit] References

1. ̂ See page 6 of Toshiba's 1993 NAND Flash Applications Design Guide. 2. ̂ See chapters on "Combinatorial Digital Circuits" and "Sequential Digital Circuits" in

Millman & Grable, Microelectronics, 2nd ed.

Page 25: Cam

Flash memoryFrom Wikipedia, the free encyclopedia

Jump to: navigation, search

This article needs additional citations for verification.Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (April 2008)

Computer memory types

Volatile

DRAM , e.g. DDR SDRAM SRAM Upcoming

o Z-RAM o TTRAM

Historical o Williams tube

o Delay line memory

Non-volatile

ROM o PROM o EAROM o EPROM o EEPROM

Flash memory Upcoming

o FeRAM o MRAM o CBRAM o PRAM o SONOS o RRAM o Racetrack memory o NRAM

Historical o Drum memory o Magnetic core memory

Page 26: Cam

o Plated wire memory o Bubble memory

o Twistor memory

A USB flash drive. The chip on the left is the flash memory. The microcontroller is on the right.

Flash memory is non-volatile computer memory that can be electrically erased and reprogrammed. It is a technology that is primarily used in memory cards and USB flash drives for general storage and transfer of data between computers and other digital products. It is a specific type of EEPROM (Electrically Erasable Programmable Read-Only Memory) that is erased and programmed in large blocks; in early flash the entire chip had to be erased at once. Flash memory costs far less than byte-programmable EEPROM and therefore has become the dominant technology wherever a significant amount of non-volatile, solid-state storage is needed. Example applications include PDAs (personal digital assistants), laptop computers, digital audio players, digital cameras and mobile phones. It has also gained popularity in the game console market, where it is often used instead of EEPROMs or battery-powered SRAM for game save data.

Flash memory is non-volatile, which means that no power is needed to maintain the information stored in the chip. In addition, flash memory offers fast read access times (although not as fast as volatile DRAM memory used for main memory in PCs) and better kinetic shock resistance than hard disks. These characteristics explain the popularity of flash memory in portable devices. Another feature of flash memory is that when packaged in a "memory card," it is enormously durable, being able to withstand intense pressure, extremes of temperature, and even immersion in water.

Although technically a type of EEPROM, the term "EEPROM" is generally used to refer specifically to non-flash EEPROM which is erasable in small blocks, typically bytes. Because erase cycles are slow, the large block sizes used in flash memory erasing give it a significant speed advantage over old-style EEPROM when writing large amounts of data.

Contents

[hide] 1 History

Page 27: Cam

2 Principles of operation o 2.1 NOR flash o 2.2 NAND flash

3 Limitations o 3.1 Block erasure o 3.2 Memory wear

4 Low-level access o 4.1 NOR memories o 4.2 NAND memories o 4.3 Standardization

5 Distinction between NOR and NAND flash o 5.1 Write Endurance

6 Flash file systems 7 Capacity 8 Transfer rates 9 Applications

o 9.1 Serial flash 9.1.1 Firmware storage

o 9.2 Flash memory as a replacement for hard drives 10 Industry 11 Flash scalability 12 See also 13 References

o 13.1 Flash file systems (general references)

14 External links

[edit] History

Flash memory (both NOR and NAND types) was invented by Dr. Fujio Masuoka while working for Toshiba circa 1980.[1][2] According to Toshiba, the name "flash" was suggested by Dr. Masuoka's colleague, Mr. Shoji Ariizumi, because the erasure process of the memory contents reminded him of a flash of a camera. Dr. Masuoka presented the invention at the IEEE 1984 International Electron Devices Meeting (IEDM) held in San Francisco, California.

Intel saw the massive potential of the invention and introduced the first commercial NOR type flash chip in 1988.[3] NOR-based flash has long erase and write times, but provides full address and data buses, allowing random access to any memory location. This makes it a suitable replacement for older ROM chips, which are used to store program code that rarely needs to be updated, such as a computer's BIOS or the firmware of set-top boxes. Its endurance is 10,000 to 1,000,000 erase cycles.[4] NOR-based flash was the basis of early flash-based removable media; CompactFlash was originally based on it, though later cards moved to less expensive NAND flash.

Page 28: Cam

Toshiba announced NAND flash at ISSCC in 1989. It has faster erase and write times, and requires a smaller chip area per cell, thus allowing greater storage densities and lower costs per bit than NOR flash; it also has up to ten times the endurance of NOR flash. However, the I/O interface of NAND flash does not provide a random-access external address bus. Rather, data must be read on a block-wise basis, with typical block sizes of hundreds to thousands of bits. This made NAND flash unsuitable as a drop-in replacement for program ROM since most microprocessors and microcontrollers required byte-level random access. In this regard NAND flash is similar to other secondary storage devices such as hard disks and optical media, and is thus very suitable for use in mass-storage devices such as memory cards. The first NAND-based removable media format was SmartMedia, and many others have followed, including MultiMediaCard, Secure Digital, Memory Stick and xD-Picture Card. A new generation of memory card formats, including RS-MMC, miniSD and microSD, and Intelligent Stick, feature extremely small form factors. For example, the microSD card has an area of just over 1.5 cm², with a thickness of less than 1 mm; microSD capacities range from 64MB to 16GB, as of March 2008.[citation needed]

[edit] Principles of operation

Flash memory stores information in an array of memory cells made from floating-gate transistors. In traditional single-level cell (SLC) devices, each cell stores only one bit of information. Some newer flash memory, known as multi-level cell (MLC) devices, can store more than one bit per cell by choosing between multiple levels of electrical charge to apply to the floating gates of its cells.

A flash memory cell.

[edit] NOR flash

Programming a NOR memory cell (setting it to logical 0), via hot-electron injection.

Erasing a NOR memory cell (setting it to logical 1), via quantum tunneling.

In NOR gate flash, each cell resembles a standard MOSFET, except the transistor has two gates instead of one. On top is the control gate (CG), as in other MOS transistors, but below this there is a floating gate (FG) insulated all around by an oxide layer. The FG is interposed between the CG and the MOSFET channel. Because the FG is electrically isolated by its insulating layer, any electrons placed on it are trapped there and, under normal conditions, will not discharge for many years. When the FG holds a charge, it screens (partially cancels) the electric field from the CG, which modifies the threshold voltage (VT) of the cell. During read-out, a voltage is applied to the CG, and the

Page 29: Cam

MOSFET channel will become conducting or remain insulating, depending on the VT of the cell, which is in turn controlled by charge on the FG. The current flow through the MOSFET channel is sensed and forms a binary code, reproducing the stored data. In a multi-level cell device, which stores more than one bit per cell, the amount of current flow is sensed (rather than simply its presence or absence), in order to determine more precisely the level of charge on the FG.

A single-level NOR flash cell in its default state is logically equivalent to a binary "1" value, because current will flow through the channel under application of an appropriate voltage to the control gate. A NOR flash cell can be programmed, or set to a binary "0" value, by the following procedure:

an elevated on-voltage (typically >5 V) is applied to the CG the channel is now turned on, so electrons can flow from the source to the drain

(assuming an NMOS transistor) the source-drain current is sufficiently high to cause some high energy electrons

to jump through the insulating layer onto the FG, via a process called hot-electron injection

To erase a NOR flash cell (resetting it to the "1" state), a large voltage of the opposite polarity is applied between the CG and source, pulling the electrons off the FG through quantum tunneling. Modern NOR flash memory chips are divided into erase segments (often called blocks or sectors). The erase operation can only be performed on a block-wise basis; all the cells in an erase segment must be erased together. Programming of NOR cells, however, can generally be performed one byte or word at a time.

Despite the need for high programming and erasing voltages, virtually all flash chips today require only a single supply voltage, and produce the high voltages via on-chip charge pumps.

NOR flash memory wiring and structure on silicon

[edit] NAND flash

NAND gate flash uses tunnel injection for writing and tunnel release for erasing. NAND flash memory forms the core of the removable USB storage devices known as USB flash drives, most memory card formats available today and many Nintendo DS storage devices such as N-Card.

NAND flash memory wiring and structure on silicon

[edit] Limitations

Page 30: Cam

[edit] Block erasure

One limitation of flash memory is that although it can be read or programmed a byte or a word at a time in a random access fashion, it must be erased a "block" at a time. This generally sets all bits in the block to 1. Starting with a freshly erased block, any location within that block can be programmed. However, once a bit has been set to 0, only by erasing the entire block can it be changed back to 1. In other words, flash memory (specifically NOR flash) offers random-access read and programming operations, but cannot offer arbitrary random-access rewrite or erase operations. A location can, however, be rewritten as long as the new value's 0 bits are a superset of the over-written value's. For example, a nibble value may be erased to 1111, then written as 1110. Successive writes to that nibble can change it to 1010, then 0010, and finally 0000. In practice few algorithms can take advantage of this successive write capability and in general the entire block is erased and rewritten at once.

Although data structures in flash memory cannot be updated in completely general ways, this allows members to be "removed" by marking them as invalid. This technique must be modified somewhat for multi-level devices, where one memory cell holds more than one bit.

[edit] Memory wear

Another limitation is that flash memory has a finite number of erase-write cycles. Most commercially available flash products are guaranteed to withstand around 100,000 write-erase-cycles.[citation needed] The guaranteed cycle count may apply only to block zero (as is the case with TSOP NAND parts), or to all blocks (as in NOR). This effect is partially offset in some chip firmware or file system drivers by counting the writes and dynamically remapping blocks in order to spread write operations between sectors; this technique is called wear levelling. Another approach is to perform write verification and remapping to spare sectors in case of write failure, a technique called bad block management (BBM). For portable consumer devices, these wearout management techniques typically extend the life of the flash memory beyond the life of the device itself, and some data loss may be acceptable in these applications. For high reliability data storage, however, it is not advisable to use flash memory that has been through a large number of programming cycles. This limitation does not apply to 'read-only' applications such as thin clients and routers, which are only programmed once or at most a few times during their lifetime.

[edit] Low-level access

The low-level interface to flash memory chips differs from those of other memory types such as DRAM, ROM, and EEPROM, which support bit-alterability (both zero to one and one to zero) and random-access via externally accessible address buses.

While NOR memory provides an external address bus for read and program operations (and thus supports random-access); unlocking and erasing NOR memory must proceed

Page 31: Cam

on a block-by-block basis. With NAND flash memory, read and programming operations must be performed page-at-a-time while unlocking and erasing must happen in block-wise fashion.

[edit] NOR memories

Reading from NOR flash is similar to reading from random-access memory, provided the address and data bus are mapped correctly. Because of this, most microprocessors can use NOR flash memory as execute in place (XIP) memory, meaning that programs stored in NOR flash can be executed directly without the need to first copy the program into RAM. NOR flash may be programmed in a random-access manner similar to reading. Programming changes bits from a logical one to a zero. Bits that are already zero are left unchanged. Erasure must happen a block at a time, and resets all the bits in the erased block back to one. Typical block sizes are 64, 128, or 256 kiB.

Bad block management is a relatively new feature in NOR chips. In older NOR devices not supporting bad block management, the software or device driver controlling the memory chip must correct for blocks that wear out, or the device will cease to work reliably.

The specific commands used to lock, unlock, program, or erase NOR memories differ for each manufacturer. To avoid needing unique driver software for every device made, a special set of CFI commands allow the device to identify itself and its critical operating parameters.

Apart from being used as random-access ROM, NOR memories can also be used as storage devices by taking advantage of random-access programming. Some devices offer read-while-write functionality so that code continues to execute even while a program or erase operation is occurring in the background. For sequential data writes, NOR flash chips typically have slow write speeds compared with NAND flash.

[edit] NAND memories

NAND flash architecture was introduced by Toshiba in 1989. These memories are accessed much like block devices such as hard disks or memory cards. Each block consists of a number of pages. The pages are typically 512[5] or 2,048 or 4,096 bytes in size. Associated with each page are a few bytes (typically 12–16 bytes) that should be used for storage of an error detection and correction checksum.

Typical block sizes include:32 pages of 512 bytes each for a block size of 16 kiB64 pages of 2,048 bytes each for a block size of 128 kiB64 pages of 4,096 bytes each for a block size of 256 kiB128 pages of 4,096 bytes each for a block size of 512 kiB

Page 32: Cam

While reading and programming is performed on a page basis, erasure can only be performed on a block basis. Another limitation of NAND flash is data in a block can only be written sequentially. Number of Operations (NOPs) is the number of times the sectors can be programmed. So far this number for MLC flash is always one whereas for SLC flash it is four.[citation needed]

NAND devices also require bad block management by the device driver software, or by a separate controller chip. SD cards, for example, include controller circuitry to perform bad block management and wear leveling. When a logical block is accessed by high-level software, it is mapped to a physical block by the device driver or controller. A number of blocks on the flash chip may be set aside for storing mapping tables to deal with bad blocks, or the system may simply check each block at power-up to create a bad block map in RAM. The overall memory capacity gradually shrinks as more blocks are marked as bad.

NAND relies on ECC to compensate for bits that may spontaneously fail during normal device operation. This ECC may correct as little as one bit error in each 2048 bits, or up to 22 bits in each 2048 bits [6]. If ECC cannot correct the error during read, it may still detect the error. When doing erase or program operations, the device can detect blocks that fail to program or erase and mark them bad. The data is then written to a different, good block, and the bad block map is updated.

Most NAND devices are shipped from the factory with some bad blocks which are typically identified and marked according to a specified bad block marking strategy. By allowing some bad blocks, the manufacturers achieve far higher yields than would be possible if all blocks had to be verified good. This significantly reduces NAND flash costs and only slightly decreases the storage capacity of the parts.

When executing software from NAND memories, virtual memory strategies are often used: memory contents must first be paged or copied into memory-mapped RAM and executed there (leading to the common combination of NAND + RAM). A memory management unit (MMU) in the system is helpful, but this can also be accomplished with overlays. For this reason, some systems will use a combination of NOR and NAND memories, where a smaller NOR memory is used as software ROM and a larger NAND memory is partitioned with a file system for use as a nonvolatile data storage area.

NAND is best suited to systems requiring high capacity data storage. This type of flash architecture offers higher densities and larger capacities at lower cost with faster erase, sequential write, and sequential read speeds, sacrificing the random-access and execute in place advantage of the NOR architecture.

[edit] Standardization

A group called the Open NAND Flash Interface Working Group (ONFI) has developed a standardized low-level interface for NAND flash chips. This allows interoperability

Page 33: Cam

between conforming NAND devices from different vendors. The ONFI specification version 1.0[7] was released on December 28, 2006. It specifies:

a standard physical interface (pinout) for NAND flash in TSOP-48, WSOP-48, LGA-52, and BGA-63 packages

a standard command set for reading, writing, and erasing NAND flash chips a mechanism for self-identification (comparable to the Serial Presence Detection

feature of SDRAM chips)

The ONFI group is supported by major NAND Flash manufacturers, including Hynix, Intel, Micron Technology, and Numonyx, as well as by major manufacturers of devices incorporating NAND flash chips.[8]

A group of vendors, including Intel, Dell, and Microsoft formed a Non-Volatile Memory Host Controller Interface (NVMHCI) Working Group.[9] The goal of the group is to provide standard software and hardware programming interfaces for nonvolatile memory subsystems, including the "flash cache" device connected to the PCI Express bus.

[edit] Distinction between NOR and NAND flash

NOR and NAND flash differ in two important ways:

the connections of the individual memory cells are different the interface provided for reading and writing the memory is different (NOR

allows random-access for reading, NAND allows only page access)

It is important to understand that these two are linked by the design choices made in the development of NAND flash. An important goal of NAND flash development was to reduce the chip area required to implement a given capacity of flash memory, and thereby to reduce cost per bit and increase maximum chip capacity so that flash memory could compete with magnetic storage devices like hard disks.

NOR and NAND flash get their names from the structure of the interconnections between memory cells.[10] In NOR flash, cells are connected in parallel to the bit lines, allowing cells to be read and programmed individually. The parallel connection of cells resembles the parallel connection of transistors in a CMOS NOR gate. In NAND flash, cells are connected in series, resembling a NAND gate, and preventing cells from being read and programmed individually: the cells connected in series must be read in series.

When NOR flash was developed, it was envisioned as a more economical and conveniently rewritable ROM than contemporary EPROM, EAROM, and EEPROM memories. Thus random-access reading circuitry was necessary. However, it was expected that NOR flash ROM would be read much more often than written, so the write circuitry included was fairly slow and could only erase in a block-wise fashion; random-access write circuitry would add to the complexity and cost unnecessarily.

Page 34: Cam

Because of the series connection and removal of wordline contacts, a large grid of NAND flash memory cells will occupy perhaps only 60% of the area of equivalent NOR cells[11] (assuming the same CMOS process resolution, e.g. 130 nm, 90 nm, 65 nm). NAND flash's designers realized that the area of a NAND chip, and thus the cost, could be further reduced by removing the external address and data bus circuitry. Instead, external devices could communicate with NAND flash via sequential-accessed command and data registers, which would internally retrieve and output the necessary data. This design choice made random-access of NAND flash memory impossible, but the goal of NAND flash was to replace hard disks, not to replace ROMs.

[edit] Write Endurance

The write endurance of SLC Floating Gate NOR flash is typically equal or greater than that of NAND flash, while MLC NOR & NAND Flash have similar Endurance capabilities. Example Endurance cycle ratings listed in datasheets for NAND and NOR Flash are provided.

NAND Flash is typically rated at about 100K cycles (Samsung OneNAND KFW4G16Q2M)

SLC Floating Gate NOR Flash has typical Endurance rating of 100K to 1,000K cycles (Numonyx M58BW 100K; Spansion S29CD016J 1000K)

MLC Floating Gate NOR has typical Endurance rating of 100K cycles (Numonyx J3 Flash)

[edit] Flash file systems

Because of the particular characteristics of flash memory, it is best used with either a controller to perform wear-levelling and error correction or specifically designed file systems which spread writes over the media and deal with the long erase times of NOR flash blocks. The basic concept behind flash file systems is: When the flash store is to be updated, the file system will write a new copy of the changed data over to a fresh block, remap the file pointers, then erase the old block later when it has time.

One of the earliest flash file systems was Microsoft's FFS2 (presumably preceded by FFS1), for use with MS-DOS in the early 1990s.[12]

Around 1994, the PCMCIA, an industry group, approved the Flash Translation Layer (FTL) specification, which allowed a Linear Flash device to look like a FAT disk, but still have effective wear levelling. Other commercial systems such as FlashFX and FlashFX Pro by Datalight were created to avoid patent concerns with FTL.

ZFS by Sun Microsystems has been optimized to manage Flash SSD systems, both as cache as well as main storage facilities, available for OpenSolaris, FreeBSD, and Mac OS X operating systems. Sun has announced a complete line of Flash enabled systems and storage devices.

Page 35: Cam

JFFS was the first flash-specific file system for Linux, but it was quickly superseded by JFFS2, originally developed for NOR flash. Then YAFFS was released in 2002, dealing specifically with NAND flash, and JFFS2 was updated to support NAND flash too.

In practice, flash file systems are only used for "Memory Technology Devices" ("MTD"), which are embedded flash memories that do not have a controller. Removable flash memory cards and USB flash drives have built-in controllers to perform wear-levelling and error correction so use of a specific flash file system does not add any benefit. These removable flash memory devices use the FAT file system to allow universal compatibility with computers, cameras, PDAs and other portable devices with memory card slots or ports.

[edit] Capacity

Multiple chips are often arrayed to achieve higher capacities for use in consumer electronic devices such as multimedia player or GPS. The capacity of flash chips generally follows Moore's Law because they are manufactured with many of the same integrated circuits techniques and equipment.

Consumer flash drives typically have sizes measured in powers of two (e.g. 512 MB, 8 GB). This includes SSD's as hard drive replacements, even though traditional hard drives tend to use decimal units. Thus, a 64GB SSD is actually 64 gibibytes (using binary units), even though many stores frequently advertise these units as 64 gigabytes. In reality, there is less than this amount due to overhead, such as disk formatting.

In 2005, Toshiba and SanDisk developed a NAND flash chip capable of storing 1 GB of data using Multi-level Cell (MLC) technology, capable of storing 2 bits of data per cell. In September 2005, Samsung Electronics announced that it had developed the world’s first 2 GB chip.[13]

In March 2006, Samsung announced flash hard drives with a capacity of 4 GB, essentially the same order of magnitude as smaller laptop hard drives, and in September 2006, Samsung announced an 8 GB chip produced using a 40 nanometer manufacturing process.[14]

In January 2008 Sandisk announced availability of their 12GB MicroSDHC and 32GB SDHC Plus cards.[15][16]

[edit] Transfer rates

Commonly advertised is the maximum read speed, NAND flash memory cards are generally faster at reading than writing.

Transferring multiple small files, smaller than the chip specific block size, could lead to much lower rate.

Page 36: Cam

Access latency has an influence on performance but is less of an issue than with their hard drive counterpart.

Sometimes denoted in MB/s (megabyte per second), or in number of "X" like 60x 100x or 150x. "X" speed rating makes reference to the speed at which a legacy audio CD drive would deliver data, 1x is equal to 150 kibibytes per second.

For example, a 100x memory card goes to 150 KiB x 100 = 15000 KiB per second = 14.65 MiB per second.

Note that the exact speed depends on whether the marketer means 106 bytes or 220 bytes by "megabyte".

[edit] Applications

[edit] Serial flash

Serial flash is a small, low-power flash memory that uses a serial interface, typically SPI, for sequential data access. When incorporated into an embedded system, serial flash requires fewer wires on the PCB than parallel flash memories, since it transmits and receives data one bit at a time. This may permit a reduction in board space, power consumption, and total system cost.

There are several reasons why a serial device, with fewer external pins than a parallel device, can significantly reduce overall cost:

Many ASICs are pad-limited, meaning that the size of the die is constrained by the number of wire bond pads, rather than the complexity and number of gates used for the device logic. Eliminating bond pads thus permits a more compact integrated circuit, on a smaller die; this increases the number of dies that may be fabricated on a wafer, and thus reduces the cost per die.

Reducing the number of external pins also reduces assembly and packaging costs. A serial device may be packaged in a smaller and simpler package than a parallel device.

Smaller and lower pin-count packages occupy reduced PCB area. Lower pin-count devices simplify PCB routing.

[edit] Firmware storage

With the increasing speed of modern CPUs, parallel flash devices are often much slower than the memory bus of the computer they are connected to. Conversely, modern SRAM offers access times below 10 ns, while DDR2 SDRAM offers access times below 20 ns. Because of this, it is often desirable to shadow code stored in flash into RAM; that is, the code is copied from flash into RAM before execution, so that the CPU may access it at full speed. Device firmware may be stored in a serial flash device, and then copied into SDRAM or SRAM when the device is powered-up.[17] Using an external serial flash

Page 37: Cam

device rather than on-chip flash removes the need for significant process compromise (a process that is good for high speed logic is generally not good for flash and vice-versa). Once it is decided to read the firmware in as one big block it is common to add compression to allow a smaller flash chip to be used. Typical applications for serial flash include storing firmware for hard drives, Ethernet controllers, DSL modems, wireless network devices, etc.

[edit] Flash memory as a replacement for hard drives

Main article: Solid-state drive

An obvious extension of flash memory would be as a replacement for hard disks. Flash memory does not have the mechanical limitations and latencies of hard drives, so the idea of a solid-state drive, or SSD, is attractive when considering speed, noise, power consumption, and reliability.

There remain some aspects of flash-based SSDs that make the idea unattractive. Most important, the cost per gigabyte of flash memory remains significantly higher than that of platter-based hard drives. Although this ratio is decreasing rapidly for flash memory, it is not yet clear that flash memory will catch up to the capacities and affordability offered by platter-based storage. Still, research and development is sufficiently vigorous that it is not clear that it will not happen, either.[citation needed]

There is also some concern that the finite number of erase/write cycles of flash memory would render flash memory unable to support an operating system. This seems to be a decreasing issue as warranties on flash-based SSDs are approaching those of current hard drives.[18][19]

As of May 24, 2006, South Korean consumer-electronics manufacturer Samsung Electronics had released the first flash-memory based PCs, the Q1-SSD and Q30-SSD, both of which have 32 GB SSDs.[20] Dell Computer introduced the Latitude D430 laptop with 32 GB flash-memory storage in July 2007 -- at a price significantly above a hard-drive equipped version.[citation needed]

At the Las Vegas CES 2007 Summit Taiwanese memory company A-DATA showcased SSD hard disk drives based on Flash technology in capacities of 32 GB, 64 GB and 128 GB.[21] Sandisk announced an OEM 32 GB 1.8" SSD drive at CES 2007.[22] The XO-1, developed by the One Laptop Per Child (OLPC) association, uses flash memory rather than a hard drive. As of June 2007, a South Korean company called Mtron claims the fastest SSD with sequential read/write speeds of 100 MB/80 MB per second.[23]

Rather than entirely replacing the hard drive, hybrid techniques such as hybrid drive and ReadyBoost attempt to combine the advantages of both technologies, using flash as a high-speed cache for files on the disk that are often referenced, but rarely modified, such as application and operating system executable files. Also, Addonics has a PCI adapter

Page 38: Cam

for 4 CF cards,[24] creating a RAID-able array of solid-state storage that is much cheaper than the hardwired-chips PCI card kind.

The ASUS Eee PC uses a flash-based SSD of 2GB to 20GB, depending on model. The Apple Inc. Macbook Air has the option to upgrade the standard hard drive to a 64GB Solid State hard drive. The Lenovo ThinkPad X300 also features a built-in 64GB Solid State Drive.

[edit] Industry

One source states that, in 2008, the flash memory industry includes about US$9.1 billion in production and sales. Apple Inc. is the third largest purchaser of flash memory, consuming about 13% of production by itself.[25] Other sources put the flash memory market at a size of more than US$20 billion dollars in 2006, accounting for more than eight percent of the overall semiconductor market and more than 34 percent of the total semiconductor memory market.[26]

[edit] Flash scalability

The aggressive trend of process design rule shrinks in NAND Flash memory technology effectively accelerates Moore's Law.

Due to its relatively simple structure and high demand for higher capacity, NAND Flash memory is the most aggressively scaled technology among electronic devices. The heavy competition among the top few manufacturers only adds to the aggression. Current projections show the technology to reach ~20 nm by ~2010. While the expected shrink timeline is a factor of two every three years per original version of Moore's law, this has recently been accelerated in the case of NAND flash to a factor of two every two years.

As the feature size of Flash memory cells reach the minimum limit (currently estimated ~20 nm), further Flash density increases will be driven by greater levels of MLC, possibly 3-D stacking of transistors, and process improvements. Even with these

Page 39: Cam

advances, it may be impossible to economically scale Flash to smaller and smaller dimensions. Many promising new technologies (such as FeRAM, MRAM, PMC, PCM, and others) are under investigation and development as possible more scalable replacements for Flash.[27]

[edit] See also

List of emerging technologies CompactFlash Wear levelling DataFlash Open NAND Flash Interface Working Group 1T-FLASH List of flash file systems

[edit] References

1. ̂ Fulford, Benjamin (24 June 2002). "Unsung hero". Forbes. Retrieved on 2008-03-18. 2. ̂ US   patent   4531203 Fujio Masuoka 3. ̂ "NAND vs. NOR flash technology: The designer should weigh the options when using

flash memory" (February 2002). Retrieved on 2008-07-11. 4. ̂ Bez, R.; Camerlenghi, E.; Modelli, A. & Visconti, A. (2003-04), "Introduction to flash

memory", Proceedings of the IEEE 91(4): 489-502, <http://ieeexplore.ieee.org/iel5/5/26994/01199079.pdf?tp=&arnumber=1199079&isnumber=26994>. Retrieved on 15 August 2008

5. ̂ Kim, Jesung; Kim, John Min; Noh, Sam H.; Min, Sang Lyul & Cho, Yookun (2002-05), "A Space-Efficient Flash Translation Layer for CompactFlash Systems", Proceedings of the IEEE 48(2): 366-375, <http://ieeexplore.ieee.org/iel5/30/21778/01010143.pdf?tp=&isnumber=&arnumber=1010143>. Retrieved on 15 August 2008

6. ̂ "Samsung ECC algorithm". Samsung (2008-06). Retrieved on 2008-08-15. 7. ̂ "www.onfi.org/docs/ONFI_1_0_Gold.pdf" (PDF). 8. ̂ A list of ONFI members is available at http://www.onfi.org/onfimembers.html. 9. ̂ "www.intel.com/pressroom/archive/releases/20070530corp.htm". 10. ̂ See pages 5-7 of Toshiba's "NAND Applications Design Guide" under External links. 11. ̂ Pavan, Paolo; Bez, Roberto; Olivo, Piero & Zononi, Enrico (1997-08), "Flash Memory

Cells — An Overview", Proceedings of the IEEE 85(8): 1248-1271, <http://ieeexplore.ieee.org/iel3/5/13533/00622505.pdf?tp=&isnumber=&arnumber=622505>. Retrieved on 15 August 2008

12. ̂ Microsoft FFS2 patent 13. ̂ "www.xbitlabs.com/news/memory/display/20050912212649.html". 14. ̂ "www.tgdaily.com/content/view/28504/135/". 15. ̂ 12GB MicroSDHC 16. ̂ [ http://www.sandisk.com/Corporate/PressRoom/PressReleases/PressRelease.aspx?

ID=4091 32GB SDHC Plus] 17. ̂ Many serial flash devices implement a bulk read mode and incorporate an internal

address counter, so that it is trivial to configure them to transfer their entire contents to

Page 40: Cam

RAM on power-up. When clocked at 50 MHz, for example, a serial flash could transfer a 64 Mbit firmware image in less than two seconds.

18. ̂ "www.storagesearch.com/semico-art1.html". 19. ̂ "www.storagesearch.com/bitmicro-art1.html". 20. ̂

"www.samsung.com/he/presscenter/pressrelease/pressrelease_20060524_0000257996.asp".

21. ̂ "Future of Flash revealed". 22. ̂ "SanDisk SSD Solid State Drives". 23. ̂ MTRON | Home 24. ̂ "Addonics PCI adapter for 4 CF cards". 25. ̂ Deffree, Suzanne (4 2008). "Apple sneezes, flash industry gets sick". EDN 2008 (7):

74. Retrieved on 2008-04-19. 26. ̂ Yinug, Christopher Falan (7 2007). "The Rise of the Flash Memory Market: Its Impact

on Firm Behavior and Global Semiconductor Trade Patterns". Journal of International Commerce and Economics. Retrieved on 2008-04-19.

27. ̂ Kim, Kinam & Koh, Gwan-Hyeob (2004-05-16), Future Memory Technology including Emerging New Memories, Serbia and Montenegro: Proceedings of the 24th International Conference on Microelectronics (published 2004-05), pp. 377-384, <http://ieeexplore.ieee.org/iel5/9193/29143/01314646.pdf?tp=&isnumber=&arnumber=1314646>. Retrieved on 15 August 2008

Digital Memories Survive Extremes Flash memory database

[edit] Flash file systems (general references)

Presentation on various Flash File Systems - 9/24/2007 Article regarding various Flash File Systems - 2005 USENIX Annual Conference Survey of various Flash File Systems - 8/10/2005 Anatomy of Linux Flash File Systems - 5/20/2008

[edit] External links

Open NAND Flash Interface Working Group NAND vs. NOR flash technology: The designer should weigh the options when

using flash memory A Nonvolatile Memory Overview How Flash Memory Works SanDisk Flash Memory Plant What is NAND Flash NAND Flash Applications NAND Flash Applications Design Guide from Toshiba (explains the low-level

details of interfacing with common NAND flash chips) NAND vs. NOR Flash Memory from Toshiba Samsung Develops New Flash Memory Chip - AP, October 23, 2007 Numonyx - Intel & ST Micro Flash merger

Page 41: Cam

Retrieved from "http://en.wikipedia.org/wiki/Flash_memory"

Page 42: Cam

Central processing unitFrom Wikipedia, the free encyclopedia

  (Redirected from CPU)Jump to: navigation, search"CPU" redirects here. For other uses, see CPU (disambiguation).

Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75 mm) in its packaging.

A Central Processing Unit (CPU), is a description of a class of logic machines that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage. The term itself and its initialism have been in use in the computer industry at least since the early 1960s (Weik 1961). The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation has remained much the same

Early CPUs were custom-designed as a part of a larger, sometimes one-of-a-kind, computer. However, this costly method of designing custom CPUs for a particular application has largely given way to the development of mass-produced processors that are suited for one or many purposes. This standardization trend generally began in the era of discrete transistor mainframes and minicomputers and has rapidly accelerated with the popularization of the integrated circuit (IC). The IC has allowed increasingly complex CPUs to be designed and manufactured to tolerances on the order of nanometers. Both the miniaturization and standardization of CPUs have increased the presence of these digital devices in modern life far beyond the limited application of dedicated computing machines. Modern microprocessors appear in everything from automobiles to cell phones to children's toys.

Contents

[hide] 1 History of CPUs

Page 43: Cam

o 1.1 Discrete transistor and IC CPUs o 1.2 Microprocessors

2 CPU operation 3 Design and implementation

o 3.1 Integer range o 3.2 Clock rate o 3.3 Parallelism

3.3.1 Instruction level parallelism 3.3.2 Thread level parallelism 3.3.3 Data parallelism

4 See also 5 Notes 6 References

7 External links

[edit] History of CPUs

Main article: History of general purpose CPUs

EDVAC, one of the first electronic stored program computers.

Prior to the advent of machines that resemble today's CPUs, computers such as the ENIAC had to be physically rewired in order to perform different tasks. These machines are often referred to as "fixed-program computers," since they had to be physically reconfigured in order to run a different program. Since the term "CPU" is generally defined as a software (computer program) execution device, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer.

Page 44: Cam

The idea of a stored-program computer was already present during ENIAC's design, but was initially omitted so the machine could be finished sooner. On June 30, 1945, before ENIAC was even completed, mathematician John von Neumann distributed the paper entitled "First Draft of a Report on the EDVAC." It outlined the design of a stored-program computer that would eventually be completed in August 1949 (von Neumann 1945). EDVAC was designed to perform a certain number of instructions (or operations) of various types. These instructions could be combined to create useful programs for the EDVAC to run. Significantly, the programs written for EDVAC were stored in high-speed computer memory rather than specified by the physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the large amount of time and effort it took to reconfigure the computer to perform a new task. With von Neumann's design, the program, or software, that EDVAC ran could be changed simply by changing the contents of the computer's memory. [1]

While von Neumann is most often credited with the design of the stored-program computer because of his design of EDVAC, others before him such as Konrad Zuse had suggested similar ideas. Additionally, the so-called Harvard architecture of the Harvard Mark I, which was completed before EDVAC, also utilized a stored-program design using punched paper tape rather than electronic memory. The key difference between the von Neumann and Harvard architectures is that the latter separates the storage and treatment of CPU instructions and data, while the former uses the same memory space for both. Most modern CPUs are primarily von Neumann in design, but elements of the Harvard architecture are commonly seen as well.

Being digital devices, all CPUs deal with discrete states and therefore require some kind of switching elements to differentiate between and change these states. Prior to commercial acceptance of the transistor, electrical relays and vacuum tubes (thermionic valves) were commonly used as switching elements. Although these had distinct speed advantages over earlier, purely mechanical designs, they were unreliable for various reasons. For example, building direct current sequential logic circuits out of relays requires additional hardware to cope with the problem of contact bounce. While vacuum tubes do not suffer from contact bounce, they must heat up before becoming fully operational and eventually stop functioning altogether.[2] Usually, when a tube failed, the CPU would have to be diagnosed to locate the failing component so it could be replaced. Therefore, early electronic (vacuum tube based) computers were generally faster but less reliable than electromechanical (relay based) computers.

Tube computers like EDVAC tended to average eight hours between failures, whereas relay computers like the (slower, but earlier) Harvard Mark I failed very rarely (Weik 1961:238). In the end, tube based CPUs became dominant because the significant speed advantages afforded generally outweighed the reliability problems. Most of these early synchronous CPUs ran at low clock rates compared to modern microelectronic designs (see below for a discussion of clock rate). Clock signal frequencies ranging from 100 kHz to 4 MHz were very common at this time, limited largely by the speed of the switching devices they were built with.

Page 45: Cam

[edit] Discrete transistor and IC CPUs

CPU, core memory, and external bus interface of a DEC PDP-8/I. made of medium-scale integrated circuits

The design complexity of CPUs increased as various technologies facilitated building smaller and more reliable electronic devices. The first such improvement came with the advent of the transistor. Transistorized CPUs during the 1950s and 1960s no longer had to be built out of bulky, unreliable, and fragile switching elements like vacuum tubes and electrical relays. With this improvement more complex and reliable CPUs were built onto one or several printed circuit boards containing discrete (individual) components.

During this period, a method of manufacturing many transistors in a compact space gained popularity.The integrated circuit (IC) allowed a large number of transistors to be manufactured on a single semiconductor-based die, or "chip." At first only very basic non-specialized digital circuits such as NOR gates were miniaturized into ICs. CPUs based upon these "building block" ICs are generally referred to as "small-scale integration" (SSI) devices. SSI ICs, such as the ones used in the Apollo guidance computer, usually contained transistor counts numbering in multiples of ten. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs. As microelectronic technology advanced, an increasing number of transistors were placed on ICs, thus decreasing the quantity of individual ICs needed for a complete CPU. MSI and LSI (medium- and large-scale integration) ICs increased transistor counts to hundreds, and then thousands.

In 1964 IBM introduced its System/360 computer architecture which was used in a series of computers that could run the same programs with different speed and performance. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM utilized the concept of a microprogram (often called "microcode"), which still sees

Page 46: Cam

widespread usage in modern CPUs (Amdahl et al. 1964) . The System/360 architecture was so popular that it dominated the mainframe computer market for the decades and left a legacy that is still continued by similar modern computers like the IBM zSeries. In the same year (1964), Digital Equipment Corporation (DEC) introduced another influential computer aimed at the scientific and research markets, the PDP-8. DEC would later introduce the extremely popular PDP-11 line that originally was built with SSI ICs but was eventually implemented with LSI components once these became practical. In stark contrast with its SSI and MSI predecessors, the first LSI implementation of the PDP-11 contained a CPU composed of only four LSI integrated circuits (Digital Equipment Corporation 1975).

Transistor-based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lower power consumption, transistors also allowed CPUs to operate at much higher speeds because of the short switching time of a transistor in comparison to a tube or relay. Thanks to both the increased reliability as well as the dramatically increased speed of the switching elements (which were almost exclusively transistors by this time), CPU clock rates in the tens of megahertz were obtained during this period. Additionally, while discrete transistor and IC CPUs were in heavy usage, new high-performance designs like SIMD (Single Instruction Multiple Data) vector processors began to appear. These early experimental designs later gave rise to the era of specialized supercomputers like those made by Cray Inc.

[edit] Microprocessors

Main article: Microprocessor

The integrated circuit from an Intel 8742, an 8-bit microcontroller that includes a CPU running at 12 MHz, 128 bytes of RAM, 2048 bytes of EPROM, and I/O in the same chip.

Page 47: Cam

Intel 80486DX2 microprocessor in a ceramic PGA package.

The introduction of the microprocessor in the 1970s significantly affected the design and implementation of CPUs. Since the introduction of the first microprocessor (the Intel 4004) in 1970 and the first widely used microprocessor (the Intel 8080) in 1974, this class of CPUs has almost completely overtaken all other central processing unit implementation methods. Mainframe and minicomputer manufacturers of the time launched proprietary IC development programs to upgrade their older computer architectures, and eventually produced instruction set compatible microprocessors that were backward-compatible with their older hardware and software. Combined with the advent and eventual vast success of the now ubiquitous personal computer, the term "CPU" is now applied almost exclusively to microprocessors.

Previous generations of CPUs were implemented as discrete components and numerous small integrated circuits (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on a very small number of ICs; usually just one. The overall smaller CPU size as a result of being implemented on a single die means faster switching time because of physical factors like decreased gate parasitic capacitance. This has allowed synchronous microprocessors to have clock rates ranging from tens of megahertz to several gigahertz. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity and number of transistors in a single CPU has increased dramatically. This widely observed trend is described by Moore's law, which has proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity to date.

While the complexity, size, construction, and general form of CPUs have changed drastically over the past sixty years, it is notable that the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as von Neumann stored-program machines. As the aforementioned Moore's law continues to hold true, concerns have arisen about the limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like electromigration and subthreshold leakage to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the quantum computer, as well as to

Page 48: Cam

expand the usage of parallelism and other methods that extend the usefulness of the classical von Neumann model.

[edit] CPU operation

The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program.The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all von Neumann CPUs use in their operation: fetch, decode, execute, and writeback.

The first step, fetch, involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a program counter (PC), which stores a number that identifies the current position in the program. In other words, the program counter keeps track of the CPU's place in the current program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units.[3] Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures (see below).

The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's instruction set architecture(ISA).[4] Often, one group of numbers in the instruction, called the opcode, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a place to locate a value: a register or a memory address, as determined by some addressing mode. In older designs the portions of the CPU responsible for instruction decoding were unchangeable hardware devices. However, in more abstract and complicated CPUs and ISAs, a microprogram is often used to assist in translating instructions into various configuration signals for the CPU. This microprogram is sometimes rewritable so that it can be modified to change the way the CPU decodes instructions even after it has been manufactured.

After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU are connected so they can perform the desired operation. If, for instance, an addition operation was requested, an arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations). If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set .

Page 49: Cam

The final step, writeback, simply "writes back" the results of the execute step to some form of memory. Very often the results are written to some internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but cheaper and larger, main memory. Some types of instructions manipulate the program counter rather than directly produce result data. These are generally called "jumps" and facilitate behavior like loops, conditional program execution (through the use of a conditional jump), and functions in programs.[5] Many instructions will also change the state of digits in a "flags" register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of "compare" instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later jump instruction to determine program flow.

After the execution of the instruction and writeback of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as the "Classic RISC pipeline," which in fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller). It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline.

[edit] Design and implementation

Main article: CPU designPrerequisites

Computer architecture

Digital circuits

[edit] Integer range

The way a CPU represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common decimal (base ten) numeral system to represent numbers internally. A few other computers have used more exotic numeral systems like ternary (base three). Nearly all modern CPUs represent numbers in binary form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" voltage.[6]

Page 50: Cam

MOS 6502 microprocessor in a dual in-line package, an extremely popular 8-bit design.

Related to number representation is the size and precision of numbers that a CPU can represent. In the case of a binary CPU, a bit refers to one significant place in the numbers a CPU deals with. The number of bits (or numeral places) a CPU uses to represent numbers is often called "word size", "bit width", "data path width", or "integer precision" when dealing with strictly integer numbers (as opposed to floating point). This number differs between architectures, and often within different parts of the very same CPU. For example, an 8-bit CPU deals with a range of numbers that can be represented by eight binary digits (each digit having two possible values), that is, 28 or 256 discrete numbers. In effect, integer size sets a hardware limit on the range of integers the software run by the CPU can utilize.[7]

Integer range can also affect the number of locations in memory the CPU can address (locate). For example, if a binary CPU uses 32 bits to represent a memory address, and each memory address represents one octet (8 bits), the maximum quantity of memory that CPU can address is 232 octets, or 4 GiB. This is a very simple view of CPU address space, and many designs use more complex addressing methods like paging in order to locate more memory than their integer range would allow with a flat address space.

Higher levels of integer range require more structures to deal with the additional digits, and therefore more complexity, size, power usage, and general expense. It is not at all uncommon, therefore, to see 4- or 8-bit microcontrollers used in modern applications, even though CPUs with much higher range (such as 16, 32, 64, even 128-bit) are available. The simpler microcontrollers are usually cheaper, use less power, and therefore dissipate less heat, all of which can be major design considerations for electronic devices. However, in higher-end applications, the benefits afforded by the extra range (most often the additional address space) are more significant and often affect design choices. To gain some of the advantages afforded by both lower and higher bit lengths, many CPUs are designed with different bit widths for different portions of the device. For example, the IBM System/370 used a CPU that was primarily 32 bit, but it used 128-bit precision inside its floating point units to facilitate greater accuracy and range in floating point numbers (Amdahl et al. 1964) . Many later CPU designs use similar mixed bit width, especially when the processor is meant for general-purpose usage where a reasonable balance of integer and floating point capability is required.

[edit] Clock rate

Main article: Clock rate

Page 51: Cam

Most CPUs, and indeed most sequential logic devices, are synchronous in nature.[8] That is, they are designed and operate on assumptions about a synchronization signal. This signal, known as a clock signal, usually takes the form of a periodic square wave. By calculating the maximum time that electrical signals can move in various branches of a CPU's many circuits, the designers can select an appropriate period for the clock signal.

This period must be longer than the amount of time it takes for a signal to move, or propagate, in the worst-case scenario. In setting the clock period to a value well above the worst-case propagation delay, it is possible to design the entire CPU and the way it moves data around the "edges" of the rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a component-count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely been compensated for by various methods of increasing CPU parallelism (see below).

However architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided in order to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.

One method of dealing with the switching of unneeded components is called clock gating, which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs.[9] Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire CPUs have been built without utilizing a global clock signal. Two notable examples of this are the ARM compliant AMULET and the MIPS R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous ALUs in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math

Page 52: Cam

operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for embedded computers (Garside et al. 1999) .

[edit] Parallelism

Main article: Parallel computing

Model of a subscalar CPU. Notice that it takes fifteen cycles to complete three instructions.

The description of the basic operation of a CPU offered in the previous section describes the simplest form that a CPU can take. This type of CPU, usually referred to as subscalar, operates on and executes one instruction on one or two pieces of data at a time.

This process gives rise to an inherent inefficiency in subscalar CPUs. Since only one instruction is executed at a time, the entire CPU must wait for that instruction to complete before proceeding to the next instruction. As a result the subscalar CPU gets "hung up" on instructions which take more than one clock cycle to complete execution. Even adding a second execution unit (see below) does not improve performance much; rather than one pathway being hung up, now two pathways are hung up and the number of unused transistors is increased. This design, wherein the CPU's execution resources can operate on only one instruction at a time, can only possibly reach scalar performance (one instruction per clock). However, the performance is nearly always subscalar (less than one instruction per cycle).

Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that cause the CPU to behave less linearly and more in parallel. When referring to parallelism in CPUs, two terms are generally used to classify these design techniques. Instruction level parallelism (ILP) seeks to increase the rate at which instructions are executed within a CPU (that is, to increase the utilization of on-die execution resources), and thread level parallelism (TLP) purposes to increase the number of threads (effectively individual programs) that a CPU can execute simultaneously. Each methodology differs both in the ways in which they are implemented, as well as the relative effectiveness they afford in increasing the CPU's performance for an application.[10]

[edit] Instruction level parallelism

Main articles: Instruction pipelining and Superscalar

Basic five-stage pipeline. In the best case scenario, this pipeline can sustain a completion rate of one instruction per cycle.

Page 53: Cam

One of the simplest methods used to accomplish increased parallelism is to begin the first steps of instruction fetching and decoding before the prior instruction finishes executing. This is the simplest form of a technique known as instruction pipelining, and is utilized in almost all modern general-purpose CPUs. Pipelining allows more than one instruction to be executed at any given time by breaking down the execution pathway into discrete stages. This separation can be compared to an assembly line, in which an instruction is made more complete at each stage until it exits the execution pipeline and is retired.

Pipelining does, however, introduce the possibility for a situation where the result of the previous operation is needed to complete the next operation; a condition often termed data dependency conflict. To cope with this, additional care must be taken to check for these sorts of conditions and delay a portion of the instruction pipeline if this occurs. Naturally, accomplishing this requires additional circuitry, so pipelined processors are more complex than subscalar ones (though not very significantly so). A pipelined processor can become very nearly scalar, inhibited only by pipeline stalls (an instruction spending more than one clock cycle in a stage).

Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.

Further improvement upon the idea of instruction pipelining led to the development of a method that decreases the idle time of CPU components even further. Designs that are said to be superscalar include a long instruction pipeline and multiple identical execution units. [Huynh 2003] In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions can be executed in parallel (simultaneously). If so they are dispatched to available execution units, resulting in the ability for several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is able to dispatch simultaneously to waiting execution units, the more instructions will be completed in a given cycle.

Most of the difficulty in the design of a superscalar CPU architecture lies in creating an effective dispatcher. The dispatcher needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in such a way as to keep as many execution units busy as possible. This requires that the instruction pipeline is filled as often as possible and gives rise to the need in superscalar architectures for significant amounts of CPU cache. It also makes hazard-avoiding techniques like branch prediction, speculative execution, and out-of-order execution crucial to maintaining high levels of performance. By attempting to predict which branch (or path) a conditional instruction will take, the CPU can minimize the number of times that the entire pipeline must wait until a conditional instruction is completed. Speculative execution often provides modest performance increases by executing portions of code that may or may not be needed after a conditional operation completes. Out-of-order execution somewhat rearranges the order in which instructions are executed to reduce delays due to data dependencies.

Page 54: Cam

In the case where a portion of the CPU is superscalar and part is not, the part which is not suffers a performance penalty due to scheduling stalls. The original Intel Pentium (P5) had two superscalar ALUs which could accept one instruction per clock each, but its FPU could not accept one instruction per clock. Thus the P5 was integer superscalar but not floating point superscalar. Intel's successor to the Pentium architecture, P6, added superscalar capabilities to its floating point features, and therefore afforded a significant increase in floating point instruction performance.

Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions at rates surpassing one instruction per cycle (IPC).[11] Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar. In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software interface, or ISA. The strategy of the very long instruction word (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost ILP and thereby reducing the design's complexity.

[edit] Thread level parallelism

Another strategy of achieving performance is to execute multiple programs or threads in parallel. This area of research is known as parallel computing. In Flynn's taxonomy, this strategy is known as Multiple Instructions-Multiple Data or MIMD.

One technology used for this purpose was multiprocessing (MP). The initial flavor of this technology is known as symmetric multiprocessing (SMP), where a small number of CPUs share a coherent view of their memory system. In this scheme, each CPU has additional hardware to maintain a constantly up-to-date view of memory. By avoiding stale views of memory, the CPUs can cooperate on the same program and programs can migrate from one CPU to another. To increase the number of cooperating CPUs beyond a handful, schemes such as non-uniform memory access (NUMA) and directory-based coherence protocols were introduced in the 1990s. SMP systems are limited to a small number of CPUs while NUMA systems have been built with thousands of processors. Initially, multiprocessing was built using multiple discrete CPUs and boards to implement the interconnect between the processors. When the processors and their interconnect are all implemented on a single silicon chip, the technology is known as a multi-core microprocessor.

It was later recognized that finer-grain parallelism existed with a single program. A single program might have several threads (or functions) that could be executed separately or in parallel. Some of earliest examples of this technology implemented input/output processing such as direct memory access as a separate thread from the computation thread. A more general approach to this technology was introduced in the 1970s when systems were designed to run multiple computation threads in parallel. This technology is known as multi-threading (MT). This approach is considered more cost-effective than multiprocessing, as only a small number of components within a CPU is replicated in

Page 55: Cam

order to support MT as opposed to the entire CPU in the case of MP. In MT, the execution units and the memory system including the caches are shared among multiple threads. The downside of MT is that the hardware support for multithreading is more visible to software than that of MP and thus supervisor software like operating systems have to undergo larger changes to support MT. One type of MT that was implemented is known as block multithreading, where one thread is executed until it is stalled waiting for data to return from external memory. In this scheme, the CPU would then quickly switch to another thread which is ready to run, the switch often done in one CPU clock cycle. Another type of MT is known as simultaneous multithreading, where instructions of multiple threads are executed in parallel within one CPU clock cycle.

For several decades from the 1970s to early 2000s, the focus in designing high performance general purpose CPUs was largely on achieving high ILP through technologies such as pipelining, caches, superscalar execution, Out-of-order execution, etc. This trend culminated in large, power-hungry CPUs such as the Intel Pentium 4. By the early 2000s, CPU designers were thwarted from achieving higher performance from ILP techniques due to the growing disparity between CPU operating frequencies and main memory operating frequencies as well as escalating CPU power dissipation owing to more esoteric ILP techniques.

CPU designers then borrowed ideas from commercial computing markets such as transaction processing, where the aggregate performance of multiple programs, also known as throughput computing, was more important than the performance of a single thread or program.

This reversal of emphasis is evidenced by the proliferation of dual and multiple core CMP (chip-level multiprocessing) designs and notably, Intel's newer designs resembling its less superscalar P6 architecture. Late designs in several processor families exhibit CMP, including the x86-64 Opteron and Athlon 64 X2, the SPARC UltraSPARC T1, IBM POWER4 and POWER5, as well as several video game console CPUs like the Xbox 360's triple-core PowerPC design, and the PS3's 8-core Cell microprocessor.

[edit] Data parallelism

Main articles: Vector processor and SIMD

A less common but increasingly important paradigm of CPUs (and indeed, computing in general) deals with data parallelism. The processors discussed earlier are all referred to as some type of scalar device.[12] As the name implies, vector processors deal with multiple pieces of data in the context of one instruction. This contrasts with scalar processors, which deal with one piece of data for every instruction. Using Flynn's taxonomy, these two schemes of dealing with data are generally referred to as SISD (single instruction, single data) and SIMD (single instruction, multiple data), respectively. The great utility in creating CPUs that deal with vectors of data lies in optimizing tasks that tend to require the same operation (for example, a sum or a dot product) to be performed on a large set of data. Some classic examples of these types of tasks are multimedia applications

Page 56: Cam

(images, video, and sound), as well as many types of scientific and engineering tasks. Whereas a scalar CPU must complete the entire process of fetching, decoding, and executing each instruction and value in a set of data, a vector CPU can perform a single operation on a comparatively large set of data with one instruction. Of course, this is only possible when the application tends to require many steps which apply one operation to a large set of data.

Most early vector CPUs, such as the Cray-1, were associated almost exclusively with scientific research and cryptography applications. However, as multimedia has largely shifted to digital media, the need for some form of SIMD in general-purpose CPUs has become significant. Shortly after floating point execution units started to become commonplace to include in general-purpose processors, specifications for and implementations of SIMD execution units also began to appear for general-purpose CPUs. Some of these early SIMD specifications like Intel's MMX were integer-only. This proved to be a significant impediment for some software developers, since many of the applications that benefit from SIMD primarily deal with floating point numbers. Progressively, these early designs were refined and remade into some of the common, modern SIMD specifications, which are usually associated with one ISA. Some notable modern examples are Intel's SSE and the PowerPC-related AltiVec (also known as VMX).[13]

[edit] See also

At Wikiversity, you can learn about: Introduction to Computers/Processor Addressing mode CISC Computer bus Computer engineering CPU cooling CPU core voltage CPU design CPU power dissipation CPU socket Floating point unit Instruction pipeline Instruction set Notable CPU architectures RISC Wait state Ring (computer security) Stream processing

[edit] Notes

Page 57: Cam

1. ̂ While EDVAC was designed a few years before ENIAC was built, ENIAC was actually retrofitted to execute stored programs in 1948, somewhat before EDVAC was completed. Therefore, ENIAC became a stored program computer before EDVAC was completed, even though stored program capabilities were originally omitted from ENIAC's design due to cost and schedule concerns.

2. ̂ Vacuum tubes eventually stop functioning in the course of normal operation due to the slow contamination of their cathodes that occurs when the tubes are in use. Additionally, sometimes the tube's vacuum seal can leak, which accelerates the cathode contamination. See vacuum tube.

3. ̂ Since the program counter counts memory addresses and not instructions, it is incremented by the number of memory units that the instruction word contains. In the case of simple fixed-length instruction word ISAs, this is always the same number. For example, a fixed-length 32-bit instruction word ISA that uses 8-bit memory words would always increment the PC by 4 (except in the case of jumps). ISAs that use variable length instruction words,increment the PC by the number of memory words corresponding to the last instruction's length.

4. ̂ Because the instruction set architecture of a CPU is fundamental to its interface and usage, it is often used as a classification of the "type" of CPU. For example, a "PowerPC CPU" uses some variant of the PowerPC ISA. A system can execute a different ISA by running an emulator.

5. ̂ Some early computers like the Harvard Mark I did not support any kind of "jump" instruction, effectively limiting the complexity of the programs they could run. It is largely for this reason that these computers are often not considered to contain a CPU proper, despite their close similarity as stored program computers.

6. ̂ The physical concept of voltage is an analog one by its nature, practically having an infinite range of possible values. For the purpose of physical representation of binary numbers, set ranges of voltages are defined as one or zero. These ranges are usually influenced by the circuit designs and operational parameters of the switching elements used to create the CPU, such as a transistor's threshold level.

7. ̂ While a CPU's integer size sets a limit on integer ranges, this can (and often is) overcome using a combination of software and hardware techniques. By using additional memory, software can represent integers many magnitudes larger than the CPU can. Sometimes the CPU's ISA will even facilitate operations on integers larger than it can natively represent by providing instructions to make large integer arithmetic relatively quick. While this method of dealing with large integers is somewhat slower than utilizing a CPU with higher integer size, it is a reasonable trade-off in cases where natively supporting the full integer range needed would be cost-prohibitive. See Arbitrary-precision arithmetic for more details on purely software-supported arbitrary-sized integers.

8. ̂ In fact, all synchronous CPUs use a combination of sequential logic and combinatorial logic. (See boolean logic)

9. ̂ One notable late CPU design that uses clock gating is that of the IBM PowerPC-based Xbox 360. It utilizes extensive clock gating in order to reduce the power requirements of the aforementioned videogame console it is used in. (Brown 2005)

10. ̂ Neither ILP nor TLP is inherently superior over the other; they are simply different means by which to increase CPU parallelism. As such, they both have advantages and disadvantages, which are often determined by the type of software that the processor is intended to run. High-TLP CPUs are often used in applications that lend themselves well to being split up into numerous smaller applications, so-called "embarrassingly parallel problems." Frequently, a computational problem that can be solved quickly with high

Page 58: Cam

TLP design strategies like SMP take significantly more time on high ILP devices like superscalar CPUs, and vice versa.

11. ̂ Best-case scenario (or peak) IPC rates in very superscalar architectures are difficult to maintain since it is impossible to keep the instruction pipeline filled all the time. Therefore, in highly superscalar CPUs, average sustained IPC is often discussed rather than peak IPC.

12. ̂ Earlier the term scalar was used to compare the IPC (instructions per cycle) count afforded by various ILP methods. Here the term is used in the strictly mathematical sense to contrast with vectors. See scalar (mathematics) and Vector (geometric).

13. ̂ Although SSE/SSE2/SSE3 have superseded MMX in Intel's general purpose CPUs, later IA-32 designs still support MMX. This is usually accomplished by providing most of the MMX functionality with the same hardware that supports the much more expansive SSE instruction sets.

[edit] References

a  b  Amdahl, G. M., Blaauw, G. A., & Brooks, F. P. Jr. (1964). "Architecture of the IBM System/360". IBM Research.

a  Brown, Jeffery (2005). "Application-customized CPU design". IBM developerWorks. Retrieved on 2005-12-17.

a  Huynh, Jack (2003). "The AMD Athlon XP Processor with 512KB L2 Cache" pp. 6-11. University of Illinois - Urbana-Champaign. Retrieved on 2007-10-06.

a  Digital Equipment Corporation (November 1975). "LSI-11 Module Descriptions", LSI-11, PDP-11/03 user's manual, 2nd edition, Maynard, Massachusetts: Digital Equipment Corporation, 4-3.

a  Garside, J. D., Furber, S. B., & Chung, S-H (1999). "AMULET3 Revealed". University of Manchester Computer Science Department.

Hennessy, John A.; Goldberg, David (1996). Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers. ISBN 1-55860-329-8.

a  Gary D. Knott (1974) A proposal for certain process management and intercommunication primitives ACM SIGOPS Operating Systems Review. Volume 8 , Issue 4 (October 1974). pp. 7 - 44

a  MIPS Technologies, Inc. (2005). "MIPS32® Architecture For Programmers Volume II: The MIPS32® Instruction Set". MIPS Technologies, Inc..

a  Smotherman, Mark (2005). "History of Multithreading". Retrieved on 2005-12-19. a  von Neumann, John (1945). "First Draft of a Report on the EDVAC". Moore School of

Electrical Engineering, University of Pennsylvania. a  b  Weik, Martin H. (1961). "A Third Survey of Domestic Electronic Digital Computing

Systems". Ballistic Research Laboratories.

[edit] External links

Listen to this article (2 parts) · (info)Part 1 • Part 2

This audio file was created from a revision dated 2006-06-13, and does not reflect subsequent edits to the article. (Audio help)

More spoken articlesMicroprocessor designers

Page 59: Cam

Advanced Micro Devices - Advanced Micro Devices, a designer of primarily x86-compatible personal computer CPUs.

ARM Ltd - ARM Ltd, one of the few CPU designers that profits solely by licensing their designs rather than manufacturing them. ARM architecture microprocessors are among the most popular in the world for embedded applications.

Freescale Semiconductor (formerly of Motorola) - Freescale Semiconductor, designer of several embedded and SoC PowerPC based processors.

IBM Microelectronics - Microelectronics division of IBM, which is responsible for many POWER and PowerPC based designs, including many of the CPUs utilized in late video game consoles.

Intel Corp - Intel, a maker of several notable CPU lines, including IA-32, IA-64, and XScale. Also a producer of various peripheral chips for use with their CPUs.

MIPS Technologies - MIPS Technologies, developers of the MIPS architecture, a pioneer in RISC designs.

NEC Electronics - NEC Electronics, developers of the 78K0 8-bit Architecture, 78K0R 16-bit Architecture, and V850 32-bit Architecture.

Sun Microsystems - Sun Microsystems, developers of the SPARC architecture, a RISC design.

Texas Instruments - Texas Instruments semiconductor division. Designs and manufactures several types of low-power microcontrollers among their many other semiconductor products.

Transmeta - Transmeta Corporation. Creators of low-power x86 compatibles like Crusoe and Efficeon.

VIA Technologies - Taiwanese maker of low-power x86-compatible CPUs.

Further reading How Microprocessors Work

[hide] v • d • e

CPU technologies

ArchitectureISA : CISC  · EPIC · OISC · RISC · VLIW · ZISC · CISC-RISC (x86) · Harvard architecture · Von Neumann architecture

ParallelismTypes

Distributed computing · Grid computing · Cloud computing

PipelinInstruction pipelining (In-Order & Out-of-Order

Page 60: Cam

eexecution · Register renaming · Speculative execution

LevelBit · Instruction Superscalar · Data · Task

ThreadsMultithread · Simultaneous multithreading · Hyperthreading · Superthreading

LogicBitwise operation

Types Vector processor · DSP

ComponentsALU · FPU · Registers · Cache · Microcontroller · FPGA · ASIC · ASIP · SoC · Logic Device · Multiprocessing · MCM · DCM

ProgrammingFlynn's taxonomy (SISD • SIMD • MISD • MIMD) · 32 bit / 64 bit · 128 bit

Power management

APM · APCI (states) · DPMS (VESA) · Dynamic frequency scaling · Dynamic voltage scaling · Clock gating