An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

Table 1 Comparison of algorithms

In-order execution	Tomasulo’s algorithm	Proposed algorithm (LR)
Static-scheduling	Hardware dynamic-scheduling	Static-scheduling
Compiler tries to reorder theinstructions during the compilation time in order to reduce the pipeline stalls	The dynamic scheduling of the hardware tries to rearrange the instructions during run-time to reduce the pipeline stalls	Compilation time instructionexecution
Uses less hardware	More hardware unit added	Use more powerful algorithmic techniques (sorting)
Sequential-order	Register-renaming is used to reduce the stall	Sorting takes place first, then execution of an instruction
Bottom-up approach	Re-ordering of CPU instructions	Hybrid order of an in-orderand OOO
For ex: char x; //read x, starts on cycle 1 & completes on cycle 2; int a= 10 + 20; // assignment to a, starts on cycle 3 & completes on cycle 4; print char x; // starts on cycle 5 & completes on cycle 6;	char x; // read x, starts on cycle 1 & completes on cycle 2; int a= 10 + 20; // assignment to a, starts on cycle 2 & completes on cycle 3; print char x; // starts on cycle 3 & completes on cycle 4;	char x; // read x, starts on cycle 1 & completes on cycle 2; int a= 10 + 20; // assignment to a, starts on cycle 2 & completes on cycle 3; print char x; // starts on cycle 3 & completes on cycle 4; Due to hardware unit, more power dissipation