Pipelining & Riscs


Pipelining
  • Pipelining is a key implementation technique used to build fast processors that can be seen in RISC architecture. It allows the execution of multiple instructions to overlap in time.
  • Entire processing flow is broken up into multiple stages, and a new data/instruction is processed by a stage potentially as soon as it is done with the current data/instruction, which then goes onto the next stage for further processing.
  • In a non-pipelined processing, by contrast, the next data/instruction is processed after the entire processing of the previous data/instruction is complete.



Instruction Pipelining
  • Typical instruction execution sequences: fetch, decode, read, execute, write, etc
  • In a non-pipelined CPU, instructions are performed “one at a time”.ie. before an instruction is begun, the preceding instruction is completed.


  • In a pipelined CPU, the execution of instructions is performed in “stages”. Separate hardware is provided to handle each of these stages. Instructions proceed through the CPU stages:

  • To implement instruction pipelining, desirable features of (instruction set) IS:
  1. all instructions same length
  2. registers specified in same place in instruction
  3. memory operands only in loads or stores, i.e. RISC
  • But, it is not always the case in reality

Pipelining Observations 

  • If we assume that the fetch and execute stages require the same amount of time, and
  • If the computer has two hardware units, one for fetching instructions and the other for executing them (what is the implication?).
        -The fetch and execute operations can each be completed in one clock cycle.

Two Stage Instruction Pipeline

Pipelining of Unequal Stages

  • Important for pipelining where stages are unequal:
  1. Always take the largest of the stage delay to be the cycle time.
  2. No stage overlaps and latency must be constant.
  3. Ensure that instruction overlap is the same as the cycle time else get timing diagram is wrong.

Timing diagram

Wrong timing diagram: Overlaps!


Wrong Timing Diagram: Latencies not constant


Correct timing diagram


Pipeline Performance


Total time for equal stages


Total time for unequal stages


Speedup

Speedup of a k-stage pipeline for n instructions :


Calculating Speedup

Example 1: Based on the 2 stage timing diagram


Example 2: Based on the 6 stage timing diagram


We can see from here indeed increasing the number of stages with the other values being same, speedup improves.

Example 3: Based on the 3 unequal stages example


Throughput 

Pipelined Throughput= n/Tk (n)
Non-pipeline Throughput = n/To (n)

Where n= total no of instructions
To (n) is the Total Time for Non-pipelining, Tk (n) is the Total Time for Pipelining
Both throughputs is in instructions/per unit time(s) 

Limits to Pipelining

  • Factors that limits performance enhancement:
  • Unequal duration/delay of stages
  • Conditional branch instruction or interrupts. Ex: 
  1. Instruction 3 is a conditional branch to instruction 15
  2. No instructions completed during time units 9-12. This is performance penalty incurred because we could not anticipate the branch
  3. Flushing of pipeline 
  4. Pipelined operation cannot be maintained in the presence of branch or jump instructions.

Hazards as limitations to pipelining

  • 3 types of hazards: 
  1. Resource hazards : HW cannot support this combination of instructions (single person to fold and put clothes away, washer-drier)
  2. Data hazards: Instruction depends on result of prior instruction still in the pipeline
  • Data dependencies example
         A = B + C
         D = E + A
         C = G x H
  • Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

RISC: Reduced Instruction Set Computers

  • Major advances in computer :
  1. The family concept
  2. Separates architecture from implementation
  3. Microprogrammed control unit
  4. Cache memory
  5. Solid State RAM
  6. Microprocessors
  7. Pipelining
  8. Introduces parallelism into fetch execute cycle
  9. Multiple processors

CISC and RISC

  • The next step: Reduced Instruction Set Computer in processor architecture

  • Key features of CISC:
  1. Large number of predefined instructions making high level programming languages easy to design and implement. 
  2. Supports microprogramming to simplify computer architecture

  • Key features of RISC
  1. Limited and simple instruction set 
  2. Large number of general purpose registers or use of compiler technology to optimize register use.
  3. Emphasis on optimizing the instruction pipeline

Arguments for CISC
  • A rich instruction set should simplify the compiler by having instructions which match the high-level language instructions.
  • This works fine if the number of HL languages is very small.
  • Since the programs are smaller in size, they have better performance:
  1. They take up less memory space and need fewer instruction fetch cycles.
  2. Fewer number of instructions are executed, which may lead to smaller execution time.

Drawbacks of CISC
  • CPU complexity
         The control unit design (mainly instruction decoding) becomes complex since the instruction set is large with heavily encoded instructions.
  • System size and cost
         There is a lot of hardware circuitry due to complexity of the CPU. This increases the hardware cost of the system and also the power requirement.
  • Complex machine instructions may not match high-level language statements exactly, in which case they may be of little use.
          This will be a major problem if the number of languages is getting bigger.

CISC characteristics
  • Varying number of instructions per cycle
  • Small number of general purpose registers
  • More addressing modes
  • More instruction formats : fewer instructions can be used to implement a given task
  • Use microcode
  • Variable length instruction 
  • Simplified compiler: microprogram instructions could be written to match constructs of high level languages

RISC Characteristics
  • One instruction per cycle
  • Register to register operations
  • Few, simple addressing modes
  • Few, simple instruction formats
  • Hardwired design (no microcode)
  • Fixed instruction format
  • More compile time/effort


About the author

Admin
Donec non enim in turpis pulvinar facilisis. Ut felis. Praesent dapibus, neque id cursus faucibus. Aenean fermentum, eget tincidunt.

0 comments:

Template by GagakHitam
Copyright © 2012 Gagak Hitam and Hishamuddin.