Pipelining & Riscs
- Pipelining is a key implementation technique used to build fast processors that can be seen in RISC architecture. It allows the execution of multiple instructions to overlap in time.
- Entire processing flow is broken up into multiple stages, and a new data/instruction is processed by a stage potentially as soon as it is done with the current data/instruction, which then goes onto the next stage for further processing.
- In a non-pipelined processing, by contrast, the next data/instruction is processed after the entire processing of the previous data/instruction is complete.
Instruction Pipelining
- Typical instruction execution sequences: fetch, decode, read, execute, write, etc
- In a non-pipelined CPU, instructions are performed “one at a time”.ie. before an instruction is begun, the preceding instruction is completed.
- In a pipelined CPU, the execution of instructions is performed in “stages”. Separate hardware is provided to handle each of these stages. Instructions proceed through the CPU stages:
- To implement instruction pipelining, desirable features of (instruction set) IS:
- all instructions same length
- registers specified in same place in instruction
- memory operands only in loads or stores, i.e. RISC
- But, it is not always the case in reality
Pipelining Observations
- If we assume that the fetch and execute stages require the same amount of time, and
- If the computer has two hardware units, one for fetching instructions and the other for executing them (what is the implication?).
Pipelining of Unequal Stages
- Important for pipelining where stages are unequal:
- Always take the largest of the stage delay to be the cycle time.
- No stage overlaps and latency must be constant.
- Ensure that instruction overlap is the same as the cycle time else get timing diagram is wrong.
Timing diagram
Wrong timing diagram: Overlaps!
Wrong Timing Diagram: Latencies not constant
Correct timing diagram
Pipeline Performance
Total time for equal stages
Total time for unequal stages
Speedup
Speedup of a k-stage pipeline for n instructions :
Calculating Speedup
Example 1: Based on the 2 stage timing diagram
Example 2: Based on the 6 stage timing diagram
We can see from here indeed increasing the number of stages with the other values being same, speedup improves.
Example 3: Based on the 3 unequal stages example
Throughput
Pipelined Throughput= n/Tk (n)
Non-pipeline Throughput = n/To (n)
Where n= total no of instructions
To (n) is the Total Time for Non-pipelining, Tk (n) is the Total Time for Pipelining
Both throughputs is in instructions/per unit time(s)
Limits to Pipelining
- Factors that limits performance enhancement:
- Unequal duration/delay of stages
- Conditional branch instruction or interrupts. Ex:
- Instruction 3 is a conditional branch to instruction 15
- No instructions completed during time units 9-12. This is performance penalty incurred because we could not anticipate the branch
- Flushing of pipeline
- Pipelined operation cannot be maintained in the presence of branch or jump instructions.
Hazards as limitations to pipelining
- 3 types of hazards:
- Resource hazards : HW cannot support this combination of instructions (single person to fold and put clothes away, washer-drier)
- Data hazards: Instruction depends on result of prior instruction still in the pipeline
- Data dependencies example
A = B + C
D = E + A
C = G x H
- Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
RISC: Reduced Instruction Set Computers
- Major advances in computer :
- The family concept
- Separates architecture from implementation
- Microprogrammed control unit
- Cache memory
- Solid State RAM
- Microprocessors
- Pipelining
- Introduces parallelism into fetch execute cycle
- Multiple processors
CISC and RISC
- The next step: Reduced Instruction Set Computer in processor architecture
- Key features of CISC:
- Large number of predefined instructions making high level programming languages easy to design and implement.
- Supports microprogramming to simplify computer architecture
- Key features of RISC
- Limited and simple instruction set
- Large number of general purpose registers or use of compiler technology to optimize register use.
- Emphasis on optimizing the instruction pipeline
Arguments for CISC
- A rich instruction set should simplify the compiler by having instructions which match the high-level language instructions.
- This works fine if the number of HL languages is very small.
- Since the programs are smaller in size, they have better performance:
- They take up less memory space and need fewer instruction fetch cycles.
- Fewer number of instructions are executed, which may lead to smaller execution time.
Drawbacks of CISC
- CPU complexity
- System size and cost
There is a lot of hardware circuitry due to complexity of the CPU. This increases the hardware cost of the system and also the power requirement.
- Complex machine instructions may not match high-level language statements exactly, in which case they may be of little use.
This will be a major problem if the number of languages is getting bigger.
CISC characteristics
- Varying number of instructions per cycle
- Small number of general purpose registers
- More addressing modes
- More instruction formats : fewer instructions can be used to implement a given task
- Use microcode
- Variable length instruction
- Simplified compiler: microprogram instructions could be written to match constructs of high level languages
RISC Characteristics
- One instruction per cycle
- Register to register operations
- Few, simple addressing modes
- Few, simple instruction formats
- Hardwired design (no microcode)
- Fixed instruction format
- More compile time/effort
0 comments: