Brief Solution to Assignment 4 1(a) # bytes #memory operand accesses (b)
2-address Version 4*5 + 2 5
3-address Version 6*2 + 4 5
2-address version: mov r1, [X] mov r3, r1 mov r2, [Y] sub r2, [Z] mul r3, r2 mov [Z], r3 3-address version: mov r1, [X] sub r2, r1, [Y] sub r1, r1, [Z] mul [Z], r1, r2
(c)
(d)
2(i) (ii) (iii) (iv) (v) (vi) (vii) (viii)
The two given versions differ in code size (3-address version is shorter) and fewer instructions but not in operand memory accesses. So the 3-address version is preferred. push push sub push push sub mul pop
X Y X Z
Z
4 (because edx is 4 bytes) 1 (the byte directive tells the assembler it is a single byte from memory location pointed to by ecx) 24 (the ‘again’ loop is executed once for each character in the message list) list + 24 (ecx initially contains list – 4 but goes through +4 and then 24 times of +1) 8 (corresponding to the total number of space characters scanned) It counts the number of space characters in the message and returns it in register eax. list – 4, 4 and 32 are the immediate operands that appear in the code. 4 (push edx; mov edx, [ecx]; cmp byte[ecx], 32; pop edx). You may also include ret (which pops the return address from the stack).
3(a) Load X AddI Y Store Z
1 2 3 F D M F D F
4 5 6 7 8 (time) X MM X D M X
The above trace shows that it takes at least 8 cycles to complete the three instructions (not the 4 + 3 -1 = 6 cycles in the ideal case). Note the store cannot occur until the addition completes at cycle 6. (b)
Ideal speedup = # stages = 4
(c)
Ideal speedup is not usually achievable because there can be (i) data dependency between adjacent instructions, (ii) control dependency that changes the control flow, sometime conditionally, and (iii) nonavailability of needed functional unit/memory.