4. Refer to the following code the answer the remaining questions. loop: load r1, r5, 40 ; r1  data_memory[r5 + 40] add r1, r1, r2 ; r1  r1 + r2 store r1, r5, 40 ; data_memory[r5 + 40]  r1 addi r5, r5, -4 ; r5  r5 – 4 bne r5, 0, loop ; branch to loop if r5 != 0 (r5 not equal to zero) Assume that the original for loop is to execute an even number of iterations, N. The loop now is unrolled 2 times. Show the assembly language listing of the unrolled loop with no output dependences but no additional instruction scheduling. Select from registers r6, r7, and r8, in this order, as necessary to prevent introducing dependences in the unrolled code. Do not waste registers (use a register unnecessarily).