Top Banner
ICC Module 3 Lesson 1 – Computer Architecture 1 / 9 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 7 – Architectural Parallelism School of Computer Science & Communications P. Ienne (charts), Ph. Janson (commentary)
9

ICC Module 3 Lesson 1 – Computer Architecture 1 / 9 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 7 – Architectural.

Jan 19, 2016

Download

Documents

Jerome Lamb
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

PowerPoint Presentation

Computer ArchitectureClip 7 Architectural ParallelismSchool of Computer Science & CommunicationsP. Ienne (charts), Ph. Janson (commentary)

Information, Computing & CommunicationICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonThis video clip is part of the E.P.F.L. introductory course on Information, Computing, and Communication.It is the seventh and last one in a set of video clips on computer architecture.

1OutlineClip 0 IntroductionClip 1 Software technology Assembler languageAlgorithmsRegistersData instructionsInstruction numberingControl instructionsClip 2 Hardware architecture Von Neumanns stored program computer architectureData storage and processingControl storage and processingClip 3 Hardware design Instruction encodingHarware implementation Transistor technologyClip 4 Computing circuitsClip 5 Memory circuitsHardware performanceClip 6 Logic parallelismClip 7 Architecture parallelismFirst clipPrevious clipNext clip

ICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonLike the previous video clips it gives another example of how computer performance can be gained, in this case by architectural rather than logic tricks.2Two simple examples of performance increase:At the circuit levelReducing the delay of an adderAt the processor structure levelIncreasing the throughput of instructions => this clipHow can one increase performance beyond transistor speed ?t= Reduce delaywaiting to get a result= Increase throughputnumber of results per time unitt

ICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonThe performance of anything can always be increased in one of two ways:- either by reducing the time or delay it takes to do something- or by increasing the throughput or the number of things done by unit of time-The previous clip gave an example of computer performance gain through reduced delay of logic operations.-This clip gives an example of computer performance gain through increased throughput at the architectural level.

3Our processor Arithm.unit103: loadr1, 0104: loadr2, -21105: addr3, r7, r4106: multr2, r5, r9107: subr8, r7, r9108: loadr9, r4109: addr3, r2, r1110: subr5, r3, r4111: loadr2, r3112: addr1, r2, -1113: addr8, r1, -1114: divr4, r1, r7115: loadr2, r4Loadr1, 0Loadr2, -21Addr3, r7, r4Multr2, r5, r9Subr8, r7, r9Loadr9, r4 executes normallyone instruction at a timeCan we do better ?

ICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonA simple Von Neumann processor, such as discussed in preceding video clips normally executes instructions one at a time as shown here.4103: loadr1, 0104: loadr2, -21105: addr3, r7, r4106: multr2, r5, r9107: subr8, r7, r9108: loadr9, r4109: addr3, r2, r1110: subr5, r3, r4111: loadr2, r3112: addr1, r2, -1113: addr8, r1, -1114: divr4, r1, r7115: loadr2, r4Arithm.unitArithm.unitDoubling the throughput of our processorLoadr1, 0Loadr2, -21Addr3, r7, r4Multr2, r5, r9Subr8, r7, r9Loadr9, r4We could imagine executingtwo instructions at a time!Do you see the problem ?!

ICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonOne can however imagine an enhanced processor in which more transistors are invested to build a second arithmetic unit and execute two operations at a time in parallel as shown here, thus doubling the throughput of the system - but do you then foresee a problem?5103: loadr1, 0104: loadr2, -21105: addr3, r7, r4106: multr2, r5, r9107: subr8, r7, r9108: loadr9, r4109: addr3, r2, r1110: subr5, r3, r4111: loadr2, r3112: addr1, r2, -1113: addr8, r1, -1114: divr4, r1, r7115: loadr2, r4Arithm.unitArithm.unitDoubling the throughput of our processorAdd r3, r2, r1Add r5, r3, r4The problem is that the 2nd instruction needs a valuecomputed by the 1st instruction!Unless one is careful the result will be wrong !Do you see the problem ?!

ICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonDo you now see the problem?-The problem is that there is a sequencing dependency between the next two instructions.The second one follows the first one in the originally sequential program for the very good reason that it needs a result from the first one.Unless one pays attention to that the computation will produce wrong results and possibly crash the computer or worse cause something to break beyond the computer.6103: loadr1, 0104: loadr2, -21105: addr3, r7, r4106: multr2, r5, r9107: subr8, r7, r9108: loadr9, r4109: addr3, r2, r1110: subr5, r3, r4111: loadr2, r3112: addr1, r2, -1113: addr8, r1, -1114: divr4, r1, r7115: loadr2, r4Arithm.unitArithm.unitDoubling the throughput of our processorAddr3, r2, r1Subr5, r3, r4Loadr2, r3Addr1, r2, -1Addr8, r1, -1Divr4, r1, r7NOTHINGNOTHINGIn practice one executesbetween one and two instructionsat a time and then the result is correct

ICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonThe solution requires that an intelligent arbiter between the two arithmetic units detect when the second instruction depends on a result of the first one and in those cases execute them one after the other as in a basic computer with a single arithmetic unit.-Thus in practice one cannot quite double the throughput of the computer.One can only approach a doubling while having to concede a small waste every time the computer hits a sequential dependency.

7A superscalar processorArithm.unitArithm.unitArithm.unitArithm.unitRegister bankDependency detectionAll modern processors for portable computers as well as servers include thisin addition they reorder and execute instructions before knowing whether they need to be (for instance after an instruction such as jump_lte)

ICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonSo-called super-scalar processors can be built with two, four or even more parallel units to increase their throughput and thus speed up their computation.All modern computers do this.

In addition, since they cannot know in advance what the outcome of a condition jump might be, when they hit one they perform in parallel the testing of the condition as well as BOTH of the instructions that could follow the test, whether it comes out true or false.After the test and both instructions have been executed, the result of the instruction not corresponding to the actual outcome of the test is thrown away as if it had never been computed.

As in the case of investing in more transistors and power consumption to achieve faster operations, achieving higher throughput with more arithmetic units also costs more transistors and of course consumes more power, some of which is even wasted.8One can modify the structure of a system to execute programs fasterOne can add resources to processors to make then fasterOr one can use simpler processors to spare energy

Performance engineering (2)This is an example of computer architecture, which is another branch of Computer Engineering

ICC Module 3 Lesson 1 Computer Architecture# / 9 2015 Ph. JansonIn conclusion of this last clip, one can modify the architecture of a system to increase its throughput.This costs additional transistors and power consumption.On the other hand one can also build simpler processor structures to save power.

These techniques belong to computer architecture, which is another branch of computer engineering.9