Volume 2, Spl. Issue 2 (2015)
e-ISSN: 1694-2310 | p-ISSN: 1694-2426
A Formal Approach to Design and Verification of Different Modules of a Processor having Software Based Pipelining Architecture Sanjeev Kumar1, Narender kumar2, Tejinder Singh3 Assitant Professor, Department of Electronics and Communication Engineering, Baddi University of Emerging Sciences and Technology, Baddi, Solan (H.P.) sanjeevbhatti@baddiuniv.ac.in1 , narenrder.k.s@baddiuniv.ac.in2 , tejindersingh@baddiuniv.ac.in3
Abstract— This paper proposes a formal approach to design and verification of different modules used in processor, which is based on software based on software based pipelining architecture. Microprocessors have grown from 8 bits to 16 bits, 32 bits, and currently to 64 bits. Microprocessor architecture has also grown from complex instruction set computing (CISC) based to reduced instruction set computing (RISC) based on a combination of RISC-CISC based and currently very long instruction word (VLIW) based. In this paper we have performed the hardware design and verification of a 32-bit VLIW microprocessor capable of operating four operations per VLIW instruction word on ASIC and FPGA technology. The VLIW microprocessor begins with the technical specifications which involve the voltage requirements, performance requirements, area utilization, VLIW instruction set, register file definition, and details of operation for each instruction. From these technical details, the architecture and microarchitecture consisting of four pipes running in parallel allowing for four operations executed in parallel are described in detail with each pipe being split into four stages of pipelining. Keywords— Reduced Instruction Set Computing (RISC), Complex Instruction Set Computing (CISC), Very Long Instruction Word (VLIW), instruction level parallelism (ILP ).
I. INTRODUCTION Microprocessors and microcontrollers are widely used in the world today. They are used in everyday electronic systems, be it systems used in industry or systems used by consumers. Complex electronic systems such as computers, ATM machines, POS systems, financial systems, transaction systems, control systems, and database systems all use some form of microcontroller or microprocessor as the core of their system. Consumer electronic systems such as home security systems, chipbased credit cards, microwave ovens, cars, cell phones, PDAs, refrigerators, and other daily appliances have within the core of their systems either a microcontroller or microprocessor. Microprocessors and microcontrollers are very similar in nature. In fact, from a top level perspective, a microprocessor is the core of a microcontroller. A microcontroller basically consists of a microprocessor as its central processing unit (CPU) with peripheral logic surrounding the microprocessor core. As such it can be viewed that a microprocessor is the building block for a microcontroller. A microcontroller has many uses. It is commonly used to provide a system level solution for things such as controlling a car’s electronic system, home security systems, ATM system, communication systems, BUEST, Baddi
RIEECE-2015
daily consumer appliances (such as microwave oven, washing machine), and many others. II. TYPES OF MICROPROCESSORS The first microprocessor was developed by Intel Corp in 1971. It was called 4004. The 4004 was a simple design compared to the microprocessors that we have today. However, back in 1971 the 4004 was a state-of-the-art microprocessor [1]. Microprocessors today have grown manifold from their beginnings. Present-day microprocessors typically run in hundreds of megahertz ranging to gigahertz in their clock speeds. They have also grown from 8 bits to 16, 32, and 64 bits. The architecture of a microprocessor has also grown from CISC to RISC and VLIW. Complex instruction set computing (CISC) is based on the concept of using as little instruction as possible in programming a microprocessor. CISC instruction sets are large with instructions ranging from basic to complex instructions. CISC microprocessors were widely used in the early days of microprocessor history [1]. Reduced instruction set computing (RISC) microprocessors are very different from CISC microprocessors. RISC uses the concept of keeping the instruction set as simple as possible to allow the microprocessor’s program to be written using only simple instructions. This idea was presented by John Cocke from IBM Research when he noticed that most complex instructions in the CISC instruction set were seldom used while the basic instructions were heavily utilized. Apart from the CISC and RISC microprocessors, there is a different generation of microprocessor based on a concept called very long instruction word (VLIW). VLIW microprocessors make use of a concept of instruction level parallelism (ILP)—executing multiple instructions in parallel. There are many applications in the multimedia domain happen to contain a lot of Instruction Level Parallelism (ILP), because they typically consist of many independent repetitive calculations. Very Long Instruction Word (VLIW) processors exploit ILP by means of a compiler that is completely aware of the target processor architecture [2]. VLIW microprocessors are not the only type of microprocessors that take advantage of executing multiple instructions in parallel. Superscalar super pipeline 258
Volume 2, Spl. Issue 2 (2015)
CISC/RISC microprocessors are also able to achieve parallel execution of instructions. A. Types of Microprocessor Architectures To achieve high performance for microprocessors, the concept of pipeline is introduced into microprocessor architecture. In pipelining, a microprocessor is divided into multiple pipe stages. Each pipe stage can execute an instruction simultaneously. When a stage in the pipe has completed executing its instruction, it will pass the results to the next stage for further processing while it takes another instruction from its preceding stage. Figure 1 shows the instruction execution for a pipeline microprocessor that has the four basic stages of pipe:
Fig. 1. Instruction execution for pipeline microprocessor [1].
1. fetch — This stage of the pipeline fetches instruction/data from instruction cache/memory.
e-ISSN: 1694-2310 | p-ISSN: 1694-2426
To achieve multiple instruction execution, multiple pipes can be put together to form a superscalar microprocessor. A superscalar microprocessor increases in complexity but allows multiple instructions to be executed in parallel. Figure 2 shows the instruction execution for a superscalar pipeline microprocessor. VLIW microprocessors use a long instruction word that is a combination of several operations combined into one single long instruction word. This allows a VLIW microprocessor to execute multiple operations in parallel. Figure 3 shows the instruction execution for a VLIW microprocessor. Although both superscalar pipeline and VLIW microprocessors can execute multiple instructions in parallel, each microprocessor is very different and has its own set of advantages and disadvantages. B. Instruction layout for VLIW Instruction Set The operation code consists of 8 bits, with the most significant bit being a reserved bit for future expansion. Bits 7 to 0 are used to represent the 36 different possible operations. Similarly, each internal register is assigned eight address bits with the most two significant bits being a reserved bits for future expansion and we take it as zero bit for simplicity. TABLE I.
INSTRUCTION SET LAYOUT
2. decode — This stage of the pipeline decodes the instruction fetched by the fetch stage. The
Bit [31:24]
Bit [23:16]
Bit [15:8]
Bit [7:0]
decode stage also fetches register data from the register file.
Operation Code
Source1 address
Source2 address
Destination address
3. execute —This stage of the pipeline executes the instruction. This is the stage where the ALU (arithmetic logic unit) is located. 4. writeback —This stage of the pipeline writes data into the register file. A pipeline microprocessor as shown in Figure 1 consists of basic four stages. These stages can be further subdivided into more stages to form a super-pipeline microprocessor.
Fig. 2. Instruction execution for superscalar pipeline microprocessor [1].
Fig. 3. Instruction execution for VLIW microprocessor [1].
259
The columns for source1, source2 and destination address are internal register addresses. The VLIW microprocessor has 40 internal registers and each is defined with its own register address. C. Architectural Specifications of Processor The microprocessor fetches instructions from an external instruction cache into its internal instruction buffers and decoders. The instruction is then passed on to multiple execution units which allows for multiple operations to be executed in parallel. the VLIW microprocessor can be simplified and architectured using a pipeline technology of four stages: 1. The VLIW microprocessor is architectured to take advantage of the pipeline technology. 2. Each 32-bit VLIW instruction word consists of four operations. To maximize the performance capability, the architecture is built to execute the four operations in parallel. Each operation is numbered and categorized as pipe1, pipe2, pipe3 and pipe4 with pipe1 operating operation 1, pipe2 operating operation 2, pipe3 operating operation 3 and pipe4 operating operation 4. 3. Each operation is split into four stages: fetch stage, decode stage, execute stage, and writeback stage. Four stages are chosen to keep the architecture simple yet efficient. The fetch stage fetches the VLIW instruction and data from external devices such as memory. The decode stage decodes the VLIW instruction to determine what operations each pipe needs to execute. The execute stage executes the operation decoded by the decode stage. The writeback stage (the last stage of the pipe) writes the
BUEST, Baddi
RIEECE -2015
Volume 2, Spl. Issue 2 (2015)
e-ISSN: 1694-2310 | p-ISSN: 1694-2426
results from the execution of the instruction into internal registers. 4. All four operations share a set of forty 32-bit internal registers, which forms a register file. During the decode stage, data are read from the register file and during writeback stage, data are written into the register file.
Fig. 4. VLIW Top level architecture [2]
Fig. 5. Interface signal diagram of fetch unit
Upon completion of execution of an operation, the final stage (writeback stage) will write the results of the operation into the register file, or read data to the output of the VLIW microprocessor for read operation. Figure 4 shows the interface signal diagram of the VLIW microprocessor. III. SIMULATION AND SYNTHESIS RESULTS The VLIW microprocessor consists of four stages (fetch, decode, execute and writeback). For ease of understanding, each operation is numbered and categorized as pipe1, pipe2, pipe3 and pipe4 with pipe1 operating operation 1, pipe2 operating operation 2, pipe3 operating operation 3 and pipe4 operating operation 4. All three operations within the VLIW instruction word have access to a forty 32-bit register file. The RTL code for the VLIW microprocessor can be split into five separate modules: fetch, decode, execute, writeback, and register file. A.
Fig.6. Simulation result of Fetch unit
Fetch Unit
The fetch module’s functionality is to fetch VLIW instruction and data from an external instruction/data cache. The fetched information is passed to the decode module to allow the instruction to be decoded. It is also passed to the register file module to allow the execute module to retrieve data from its register file for those operations that access internal registers.
B. Decode Unit
Fig. 7. Interface signal diagram of Decode unit
BUEST, Baddi
RIEECE-2015
260
Volume 2, Spl. Issue 2 (2015)
e-ISSN: 1694-2310 | p-ISSN: 1694-2426
Fig. 10. Simulation result of Register File unit
Fig. 8. Simulation result of Decode unit
C. Register File Unit The register file module’s functionality is to act as a local storage space in the VLIW microprocessor. Contents of the register file module is read and passed to the execute module, while results of operations is written to the register file module by the write-back module. Table below shows the interface signals for the register file module and its interface signal functionality.
Fig. 11. Chip layout of Register File unit
D. Excecute Unit The execute module is the most complicated module in the VLIW microprocessor. Its functionality is to execute the operations of the VLIW instruction.
Fig. 12. Data flow representation of Execute module using Modelsim
Fig. 9. Interface Signal diagram of Register File unit
261
BUEST, Baddi
RIEECE -2015
Fig. 15. Dataflow representation of Writeback module using Modelsim Fig. 13. Simulation result of Execution Unit
Fig. 16. Simulation result of Writeback Unit
IV. CONCLUSION In this paper the architecture is based on ILP (Instruction level Parallelism) in four instructions are used in parallel. In this paper the design of 32bit VLIW Processor is presented in which five modules (i.e. Fetch, Decode, Execute, Registerfile, Writeback) are designed. For verification of different modules Xilinx ISE 9.2i, Modelsim SE 5.7g and HDL designer of Mentor Graphics are used. TABLE II.
Design Unit
Previous Design
Present Design
No. of operations
16
36
No. of registers
16
40
No. of parallel instructions
3
4
Register file Size in Bits
32
32
Fig. 14. Chip layout of Execution unit
E. Writeback Unit The writeback module is the last stage within the VLIW microprocessor. Its functionality is to write the results of executed operations into the register file. The following table shows the interface signals for the writeback module and its interface signal functionality.
TABLE III.
BUEST, Baddi
RIEECE-2015
SPECIFICATIONS AND COMPARISON OF DESIGN PARAMETERS WITH PREVIOUS DESIGN
SYNTHESIS RESULTS OF DIFFERENT MODULES OF VLIW MICROPROCESSOR.
262
Volume 2, Spl. Issue 2 (2015)
e-ISSN: 1694-2310 | p-ISSN: 1694-2426
Performanc e
Fetch
Decod e
Registe r file
Execute
Write back
Speed
1285.843 Mhz
200 Mhz
466.810 Mhz
74.995 Mhz
200Mh z
Throughput
467 MIPS 2.142ns
200 MIPS 1.513ns
Net Delay
0.426ns
Power Consumption Estimated junction temperature Gate Count
946mW
946 mW 34 C
334mW
946mW
34 C
200 MIPS 2.207n s 0.851n s 946 mW 34 C
75 MIPS
Gate Delay
1286 MIPS 0.778ns
25 C
34 C
3341
770
52576
345793
2051
Set Up Time
3.438ns
4.394ns
2.524ns
2.520ns
10.703n s 3.670ns
1.513ns
Hold Time Bonded IO
515
2.207n s 2.520n s 323
575
683
464
0.586ns
10.703n s 4.909ns
0.691ns
2.520ns
V. FUTURE WORK In the present design 32 bit VLIW Processor is designed. In future we can extend this design for 64 bit. And the no. of instructions can also be increased. In the present design forty 32bit general purpose registers are used so in future design sixty four 32 bit registers can be used. The processor supports only integer operations. So in future one can design floating point ALU and 32 bit Floating Point multiplier because that will have wider range compared to integers and provides more flexibility in scientific calculations. To increase the processor speed architecture should be pipelined. So registers are connected between every two design units.
263
VLIW Processor can be designed based on low power. A VLIW microprocessor can also be designed for DSP applications, MPEG (Motion Picture Expert Group) Audio/Video Applications. In Trimedia TM-300 VLIW is used as a processor. REFERENCES [1]
Kai Hwang, Faye, A. Briggs, “Cpomputer Architecture and parallel processing”, Sung Kung Computer Book Company, 1986. [2] Stephen Wong, Thijs van As, “ρ-VEX: A Reconfigurable and Extensible VLIW Processor”, IEEE International Conference on Field-Programmable Technology (ICFPT’08), 2008. [3] Samir Palnitkar, “Verilog HDL, A Guide to Digital Design and Synthesis”, USA : Sun Microsystems Inc.- California, 2003. [4] M. Morries Mano, “Computer System Architecture”, Prentice-Hall of India Private Limited, 1986. [5] Harry F. Jordan, “Computer System Design & Architecture”, Prentice Hall; 2 edition, December 6, 2003. [6] Israel Koren, “Computer Arithmetic Algorithems”, A K Peters/CRC Press; 2 edition, November 30, 2001. [7] Katherine Compton, “An Introduction to Reconfigurable Computing”, Department of Electrical & Computer Engineering, Northwestern University, April 2008. [8] John L. Hennesy, David A. Patterson, “Computer Architecture, “A Quantative Approach”, Morgan Kaufmann; 5 edition, September 30, 2011. [9] Fisher, Joseph A., Paolo Faraboschi, and Cliff Young. “Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools”, New York: Morgan Kaufmann, 2004. [10] Joseph A. Fisher , “Embedded Computing: A VLIW Approach to Architecture”, Morgan Kaufmann, 1st edition, December 31, 2004. [11] R.Seshasayanan, Dr S.K.Srivatsa, “Implementation of Novel Pipeline VLIW Architecture On FPGA”, International Journal of Computer Science and Security, vol 7 no.7, pp. 264-268, July 2007.
BUEST, Baddi
RIEECE -2015