isa by Didier A.

Instruction Set Architectures Based on slides from Jones & Bartlett

Chapter 5 Objectives • Understand the factors involved in instruction set architecture design. • Gain familiarity with the diversity of memory addressing modes. • Understand the concepts of instruction-level pipelining (and parallelism) and its affect upon execution performance.

COMP 228, Fall 2009

Instruction Set Architectures

5.1 Introduction • This chapter builds upon the ideas in Chapter 4. • We present a detailed look at different instruction formats, operand types, and memory access methods. • We will see the interrelation between machine organization and instruction formats. • This leads to a deeper understanding of computer architecture in general.

COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats Instruction sets are differentiated by the following: • Number of bits per instruction. • Stack-based or register-based. • Number of explicit operands per instruction. • Operand location. • Types of operations. • Type and size of operands.

COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats Instruction set architectures are measured according to: • Main memory space occupied by a program. • Instruction complexity. • Instruction length (in bits). • Total number of instructions in the instruction set.

COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats In designing/using an instruction set, consideration is given to these variations of features: • Instruction length (#bits per instruction). – Fixed vs variable length.

– Variable length allows using shorter encoding of more frequent instructions.

• Number of operands: typically 0,1,2,3. • Number of addressable registers: RISC allows a larger register file. • Memory organization. – Byte- or Word-addressed.

• Addressing modes.

– Direct, indirect, indexed, or combinations of these.

COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats â&#x20AC;˘ The next consideration for architecture design concerns how the CPU will store data. â&#x20AC;˘ Three choices: 1. A stack architecture: instructions operate on data stored in a stack; 0-address machine. 2. A register architecture: instructions operate on data in registers/memory: 1-address, 2-address, 3-address machines. In choosing one over the other, the tradeoffs include: simplicity (and cost) of hardware design, execution speed (performance) and ease of use (programmability). COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats • In a stack architecture, instructions and operands are implicitly taken from the stack.

– A stack is a storage where data is pushed onto and popped from the top (of the stack).

• In an accumulator architecture (1-address), one of the operands of a binary operation is implicitly in the accumulator. – The other operand is in memory, creating lots of bus traffic. – It also limits the number of data temporally stored in the processor.

• In a general purpose register (GPR) architecture (2- or 3- address machine), the operands are explicitly specified. – It is faster than accumulator architecture (data needed can be completely local).

COMP 228, Fall 2009

Instruction Set Architectures

Stack Arithmetic Memory

(Top of Stack Pointer)

SP Æ

Memory

SP Æ

::::

:::: (after push x)

Memory

SP Æ

COMP 228, Fall 2009

y x

Memory

SP Æ

x+y

::::

(after push y)

(after add)

Instruction Set Architectures

5.2 Instruction Formats • Most systems today are GPR systems. • There are three types: – Memory-memory where two or three (2 source and 1 destination) operands may be in memory. – Register-memory where at least one operand must be in a register. – Load-store where no operands may be in memory. This is typically found in RISC architecture.

• The number of operands and the number of available registers has a direct affect on instruction length (space cost) and execution time (time cost). COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats • Stack machines use one - and zero-operand instructions. • LOAD (Push) and STORE (POP) instructions require a single memory address operand. • Other instructions use operands from the stack implicitly. • PUSH and POP operations involve only the stack’s top element. • Binary instructions (e.g., ADD, MULT) use the top two items on the stack, each involving 2 pop and 1 push operations in implementation. COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats • Stack architectures require us to think about arithmetic expressions a little differently. • We are accustomed to writing expressions using infix notation, such as: Z = X + Y. • Stack arithmetic corresponds to using the postfix notation: Z = XY+. – This is also called reverse Polish notation, (somewhat) in honor of its Polish inventor, Jan Lukasiewicz (1878 1956).

COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats • The principal advantage of postfix notation is that parentheses are not used, and has an encoding structure that mimics the corresponding stack operations. • For example, the infix expression,

Z = (X × Y) + (W × U), becomes:

Z = X Y × W U × + in postfix notation.

COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats • In a stack ISA, the postfix expression,

Z = X Y × W U × + immediately translates into the following code: PUSH X PUSH Y MULT PUSH W PUSH U MULT ADD POP Z COMP 228, Fall 2009

Instruction Set Architectures

Note: MULT involves popping the stack twice and pushing the result back onto the stack 13

5.2 Instruction Formats • In comparing with a one-address ISA, like MARIE, the infix expression,

Z = X × Y + W × U the translated code becomes: LOAD X MULT Y STORE TEMP LOAD W MULT U ADD TEMP STORE Z

COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats • In a two-address ISA, (e.g.,Intel, Motorola), the infix expression,

Z = X × Y + W × U is translated into: LOAD R1,X MULT R1,Y LOAD R2,W MULT R2,U ADD R1,R2 STORE Z,R1

COMP 228, Fall 2009

Instruction Set Architectures

Note: One-address ISAs often require one operand to be a register. 15

5.2 Instruction Formats • With a three-address ISA, (e.g.,mainframes), the infix expression,

Z = X × Y + W × U is translated into: MULT R1,X,Y MULT R2,W,U ADD Z,R1,R2 Would this program execute faster than the corresponding (longer) program that we saw in the stack-based ISA?

COMP 228, Fall 2009

Instruction Set Architectures

5.2 Instruction Formats • Example: A system has 16 registers and 4K of memory. • We need 4 bits to access one of the registers. We also need 12 bits for a memory address. • If the system is to have 16-bit instructions, we have two choices for our instructions:

COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.1 • In comparing stack, 1-address, 2-address and 3address machines, which of these will likely lead to programs with fewer instructions? • • • •

A: B: C: D:

Stack 1-address 2-address 3-address

COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.2 • Continuing with the previous comparison, which of them will likely lead to programs involving the most memory operand accesses (read or write)? • • • •

A: B: C: D:

Stack 1-address 2-address 3-address

COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.3 • Which of these architectures is easier to use (program)? • • • •

A: B: C: D:

stack 1-address 2-address 3-address

COMP 228, Fall 2009

Instruction Set Architectures

5.4 Addressing • Immediate addressing is where the data is part of the instruction: the operand value is already specified in the instruction. • Direct addressing is where the address of the data is specified in the instruction. • Register addressing is where the data is located in a register specified in the instruction. • Indirect addressing is where the address of the address of the data is specified in the instruction. • Register indirect addressing uses a register to store the address of the address of the data. COMP 228, Fall 2009

Instruction Set Architectures

5.4 Addressing â&#x20AC;˘ Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the address in the operand to determine the effective address of the data. â&#x20AC;˘ Based addressing is similar except that a base register is used instead of an index register. â&#x20AC;˘ The difference between these two is that an index register holds an offset relative to the address given in the instruction, and a base register holds a base address where the address field specified in the instruction represents a displacement from this base. COMP 228, Fall 2009

Instruction Set Architectures

5.4 Addressing â&#x20AC;˘ For the instruction shown, what is the value loaded into the accumulator for each addressing mode?

COMP 228, Fall 2009

Instruction Set Architectures

5.4 Addressing â&#x20AC;˘ These are the values loaded into the accumulator for each addressing mode.

COMP 228, Fall 2009

Instruction Set Architectures

5.5 Instruction-Level Pipelining • Some CPUs divide the fetch-decode-execute cycle into smaller steps. • These smaller steps for a set of independent instructions can be executed in parallel to increase throughput; this is called ILP (instruction level parallelism). • Parallel execution can be implemented using the assembly line principle: called pipelining. Pipelining has been the key driving force in improving the performing of processors for over 2 decades. The next slide shows an example of instruction-level pipelining. COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.4 Suppose M[100] = 200, M[200] = 300, and X (index register) = 100. What is the content of AC from Loadx 100? (indexed) A: B: C: D:

100 200 300 None of the above

COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.5 Suppose M[100] = 200, M[200] = 300, and X (index register) = 100. What is the content of AC from Loadi 100? (immediate) A: B: C: D:

100 200 300 None of the above

COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.6 Suppose M[100] = 200, M[200] = 300, and X (index register) = 100. What is the content of AC from LoadI 100? (indirect) A: B: C: D:

100 200 300 None of the above

COMP 228, Fall 2009

Instruction Set Architectures

5.5 Instruction-Level Pipelining â&#x20AC;˘ Suppose a fetch-decode-execute cycle were broken into the following smaller steps: 1. Fetch instruction. 2. Decode opcode. 3. Calculate effective address of operands.

4. Fetch operands. 5. Execute instruction. 6. Store result.

â&#x20AC;˘ Suppose we have a six-stage pipeline. S1 fetches the instruction, S2 decodes it, S3 determines the address of the operands, S4 fetches them, S5 executes the instruction, and S6 stores the result. COMP 228, Fall 2009

Instruction Set Architectures

5.5 Instruction-Level Pipelining â&#x20AC;˘ For every clock cycle, one small step is carried out, and the stages are overlapped.

S1. Fetch instruction. S2. Decode opcode. S3. Calculate effective address of operands. COMP 228, Fall 2009

S4. Fetch operands. S5. Execute. S6. Store result.

Instruction Set Architectures

5.5 Instruction-Level Pipelining • The theoretical speedup offered by a pipeline can be determined as follows: Let tp be the time per stage. Each instruction represents a task, T, in the pipeline. The first task (instruction) requires k × tp time to complete in a k-stage pipeline. The remaining (n - 1) tasks emerge from the pipeline one per cycle. So the total time to complete the remaining tasks is (n - 1)tp. Thus, to complete n tasks using a k-stage pipeline requires: (k × tp) + (n - 1)tp = (k + n - 1)tp. COMP 228, Fall 2009

Instruction Set Architectures

5.5 Instruction-Level Pipelining â&#x20AC;˘ If we take the time required to complete n tasks without a pipeline and divide it by the time it takes to complete n tasks using a pipeline, we find:

â&#x20AC;˘ If we take the limit as n approaches infinity, (k + n - 1) approaches n, which results in a theoretical speedup of:

COMP 228, Fall 2009

Instruction Set Architectures

5.5 Instruction-Level Pipelining • Our neat equations take a number of things for granted. • First, we have to assume that the architecture supports fetching instructions and data in parallel. • Second, we assume that the pipeline can be kept filled at all times. This is not always the case. Pipeline hazards arise that cause pipeline conflicts and stalls.

COMP 228, Fall 2009

Instruction Set Architectures

5.5 Instruction-Level Pipelining • An instruction pipeline may stall, or be flushed for any of the following reasons: – Resource conflicts: instructions may need the same

component (such as bus/ALU).

– Data dependencies: the next (or a future) instruction depends on the result of the current instruction. – Conditional branching (control dependency): a jump instruction depends on the result of the current instruction.

• Measures can be taken at the software level as well as at the hardware level to reduce the effects of these hazards, but they cannot be totally eliminated. COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.7 Suppose a pipelined processor involves 10 segments. What is the maximum speedup that can be attained due to pipelining? A: B: C: D:

1 5 10 > 10

COMP 228, Fall 2009

Instruction Set Architectures

5.6 Real-World Examples of ISAs • We return briefly to the Intel and MIPS architectures from the last chapter, using some of the ideas introduced in this chapter. • Intel introduced pipelining to their processor line with its Pentium chip. • The first Pentium had two five-stage pipelines. Each subsequent Pentium processor had a longer pipeline than its predecessor with the Pentium IV having a 24-stage pipeline. • The Itanium (IA-64) has only a 10-stage pipeline. [What implications do you deduce from this?] COMP 228, Fall 2009

Instruction Set Architectures

5.6 Real-World Examples of ISAs • Intel processors support a wide array of addressing modes. • The original 8086 provided 17 ways to address memory, most of them variants on the methods presented in this chapter. • Owing to their need for backward compatibility, the Pentium chips also support these 17 addressing modes. • The Itanium, having a RISC core, supports only one: register indirect addressing with optional post increment. COMP 228, Fall 2009

Instruction Set Architectures

Addendum: Pentium ISP â&#x20AC;˘ Programmable Registers ax = ah:al and similarly for the rest EIP = PC (in MARIE) 31

EAX

EDX

ECX

EBX

EBP

ESI

EDI

ESP

Eight 32- bit General Registers EIP EFLAGS Two Special Purpose Registers

COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP • The eight general registers can be used to buffer program variables so that more often used data are locally stored in the processor, under the control of the programmer (as coded). • We can interpret these as 8 accumulators, instead of only 1 in MARIE. This improves programmability and performance (due to locality of program data). • Technology can support hundred times this amount of registers. The challenge if the programmer/compiler can manage them well in programming. • The esp register plays a special role: it points to the top of the stack used by the program. Notice a stack exists even though the Pentium is not a stack-computer. COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP • Notable details – Two-operand instructions (CISC) – Memory-mapped I/O; data movement and arithmetic instructions can affect I/O page in memory – Stack-based procedure call and return (instead of JNS used in MARIE)

COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP • Operand is specified by the following addressing modes, used individually or in combination: • Base: – {eax, ecx, edx, ebx, ebp, esi, edi, esp}

• Index * scale: – {eax, ecx, edx, ebx, ebp, esi, edi} * {1, 2, 4, 8} – A scale factor of 1/2/4/8 allows traversing data stored as 1/2/4/8 bytes in consecutive memory locations.

• Displacement: – 0 or 8 or 32 bit displacement directly specified. COMP 228, Fall 2009

Instruction Set Architectures

Addendem Pentium ISP • Examples: – [ebx+esi+4]: Data is at memory address given by ebx + esi + 4; this is an example of base-indexed addressing with displacement. For example, ebx can be the starting address of a list of characters, esi is the index to the ith character of the list, and 4 is the 4th character after that. – ebx: Without the bracket (enclosing the register ebx), the operand is in the register ebx itself. Hence whenever [ ... ] appears, it refers to a memory operand, otherwise it refers to a register operand. – ebx + esi + 4 is not a valid operand: it is neither in memory nor in a register. The assembler will flag this as syntax error. COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP • Data Movement Instruction – mov ax, bx • This instruction moves the 2-byte data from register bx to register ax

– mov ax, [ebx] • The source operand is from memory whose address is specified in register ebx. • This instruction moves the 2-byte data from that address specified by ebx to register ax. • This is similar to the Load instruction in MARIE, except that it is more flexible as the destination register can be specified explicitly. COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP – mov [ebx], ax • This does the reverse move (i.e., Store) from register ax to memory location specified by ebx

– mov [ebx], eax • This differs from the previous only in the size of the operand; this time it moves 4 bytes as the source register eax is 4 bytes.

– mov ah, ‘A’ • This moves a 1-byte (immediate) data into register ah; the 1-byte data is the ASCII code of letter A (i.e., 4116). COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.8 Suppose M[100] = 200, ebx = 100, eax = 200. What is the result of executing mov eax, [ebx]? A: B: C: D:

eax = 200 ebx = 200 eax = 100 ebx = 100

COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.9 Suppose ebx = 1000 (hex), eax = 20000(hex). What is the result of executing mov ax, bx? A: B: C: D:

eax = 1000 (hex) eax = 21000 (hex) eax = 30000 (hex) None of the above

COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.10 Suppose eax = 12345678(hex). What is the result in executing mov ax, ’30’? A: B: C: D:

eax = 30 eax = 3330 eax = 12345630 eax = 12343330

COMP 228, Fall 2009

Instruction Set Architectures

Addendeum Pentium ISP • • • • • • • • • • • • • • • • • • • • • •

segment msg len segment id segment _start:

exit:

COMP 228, Fall 2009

.data ;data segment db ‘Hello World! PleaseType Your ID’,0xA equ $ - msg ; length of message .bss ;uninitialized data resb 10 ; reserve 10 bytes for ID .text ; code segment global _start ; global program name ; program entry mov eax, 4 ; select kernel call #4 mov ebx, 1 ; default output device mov ecx, msg ; pointer to message mov edx, len ; third argument: length int 0x80 ; invoke kernel to write mov eax, 3 ; select kernel call #3 mov ebx, 0 ; default input device mov ecx, id ; pointer to id int 0x80 ; invoke kernel to read mov eax, 4 ; select kernel call #4 mov ebx, 1 int 0x80 ; echo id read mov eax, 1 ; select system call #1 int 0x80 ; invoke kernel call Instruction Set Architectures

Clicker 5.11 Count the number of distinct immediate operands in the previous program. A: B: C: D: E:

4 5 6 7 >7

COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP • Arithmetic/Logic Instructions – add eax, ebx • add the content of register ebx to the content of register eax. – Before (in hex): – After (in hex):

eax = 12345678 ebx = 11111111 eax = 23456789 ebx = 11111111

– sub eax, [ebx] • Subtract the content of memory location specified by ebx from the content of register eax – Before (in hex): – After (in hex): COMP 228, Fall 2009

eax = 12345678 [ebx] = 11111111 eax = 01234567

Instruction Set Architectures

Addendum Pentium ISP • mul bx – dx:ax = ax * bx – This instruction assumes ax as a default operand. – Multiplying 2-byte with 2-byte creates a 4-byte result to be stored in the register pair dx:ax • Before (in hex): • After (in hex): COMP 228, Fall 2009

ax = FFFF dx = 000F

Instruction Set Architectures

bx = 0010 ax = FFF0 51

Clicker 5.12 Suppose eax = 1234, ebx = 1010, ecx = 2000. What is the result of executing sub bx, ax? A: B: C: D:

ebx = FEDC ebx = FFFFFEDC ebx = 2044 None of the above

COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.13 Suppose eax = 1010, ebx = 101, edx = 0 (all in hex). What is the result of executing mul bl? A: B: C: D:

eax = 1010 edx = 102020 eax = 102020 None of the above

COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP • Control Flow Instructions – jmp X: jump to instruction at address X. – jz X: jump to X if the Z flag = 1 (previous arithmetic/logic result is zero). – jg X: jump to X if the previous arithmetic/logic result is greater than zero. – jle X: jump to X if the previous arithmetic/logic result is less than or equal to zero. – jne X: jump to X if the previous arithmetic/logic result is not equal to zero. – Among others. COMP 228, Fall 2009

Instruction Set Architectures

Clicker 5.14 Suppose <S,C,O,Z> = <1,0,1,0>. What is the result of executing jz x? A: B:

the jump will succeed (i.e. eip becomes x) the jump will not succeed

COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP • • • • • • • • • • • • • • • • • • • • •

segment v1 v2 segment dotprod segment

.data db db .bss resw .text global

1,2,3,4,5,6 5,6,7,8,9,0 1 _asm_main

_asm_main:

cont:

exit:

COMP 228, Fall 2009

mov sub sub mov mov imul add add sub jnz mov mov int

ecx, 6 ; initialize count to 6 esi, esi ; equivalent to mov esi, 0 dx, dx ; initialize dx (result) to 0 al, [esi+v1] ;based index addressing to vector bl, [esi+v2] bl ; ax = al * bl dx, ax ; update the dot product in dx esi, 1 ; increment esi by 1 ecx, 1 ; decrement ecx by 1 cont ; if ecx ≠ 0, jump to cont [dotprod], dx ; update dot product in memory eax, 1 ; return control to kernel 0x80 Instruction Set Architectures

Addendum Pentium ISP • Write a program fragment that computes the sum of 0 + 1 + 2 + … + x where x is an integer already stored in register eax [if x = 0 then a result of 0 should be returned in register ebx.] mov again: cmp jle add dec jmp done :::::: COMP 228, Fall 2009

ebx, 0 eax, 0 done ebx, eax eax again

// initialize ebx to 0 // integer > 0? // integer ≤ 0, get out

Instruction Set Architectures

Addendum Pentium ISP • Procedure Call/Return via Stack – call subr will lead to the following: • The return address (address of the instruction that follows this instruction) will be pushed onto the stack • eip will be changed to subr (this resembles Jump subr in MARIE)

– ret will lead to the following: • The return address is popped from the stack and moved into eip; hence control returns to the caller that previously has executed call subr • This can be contrasted with JumpI in MARIE

COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP â&#x20AC;˘ Diagram demonstrating procedure call/return Memory

Memory executes ret

executes call subr

esp

Memory

return address

esp

push eip onto the stack

pop eip from the stack

eip

COMP 228, Fall 2009

eip

Instruction Set Architectures

Addendum Pentium ISP • The stack allows multiple level procedure call and return, including recursive calls (when a procedure calls itself directly or indirectly). • The stack can also be used as the medium for passing data from the caller to the procedure and vice versa. For example, a caller can push data onto the stack before executing call subr. When the procedure (subr) is entered, the latter can access these data via the stack, assuming there is a clear protocol between the caller and the procedure where and what these data are. • An example is given next. COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP â&#x20AC;˘ Example: The caller pushes two 4-byte data onto the stack before calling the procedure (sum). The procedure should replace one of these two data with their sum on the stack before returning the caller. Caller: Procedure sum: push dword[X] sum: push eax push dword[Y] mov eax, [esp+12] call sum add [esp+8], eax pop ebx pop eax ret *Note: dword is NASM notation for a 4-byte memory operand. The extra push/pop eax saves/restores the content of register eax that belongs to the caller (to avoid contaminating important data that belongs to the caller). COMP 228, Fall 2009

Instruction Set Architectures

Addendum Pentium ISP • Snapshots of the Stack in Call/Return:

esp Æ esp Æ

value of Y

value of X

Upon executing push dword[Y]

COMP 228, Fall 2009

return address

Upon executing call sum

Instruction Set Architectures

Addendum Pentium ISP • More Snapshots of the Stack

esp Æ

eax return address value of Y value of X

Upon executing push eax

COMP 228, Fall 2009

esp Æ

value of X+Y value of X

Upon executing ret

Instruction Set Architectures