Performance Analysis of D Flip-Flop Implemented inGDI and ACPL Low power Design Techniques by ISERP ISERP

Prathyusha Konduri et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 177 - 183

Performance Analysis of D Flip-Flop Implemented in GDI and ACPL Low power Design Techniques Student, M.Tech (VLSI Design), SENSE, VIT University, Vellore, Tamil Nadu, India prathyushakonduri@gmail.com

Pobbireddy Sameera

Student, M.Tech (VLSI Design), SENSE, VIT University, Vellore, Tamil Nadu, India dheerasameera@gmail.com

Abstract— Low power design has become one of the main concerns in VLSI Design. Of the various building blocks in digital designs, one of the most complex and power consuming is the flip-flop. This paper expounds the different architectures of flipflops with various CMOS logic families, GDI and adiabatic low power design techniques.

Keywords- ACPL Technique, CMOS VLSI, Flip-flops, GDI INTRODUCTION

Power consumption has become a critical concern in both high performance and portable applications. There are three major sources of power dissipation in a CMOS circuit. Ptotal = Pswitching + PSC + Pleakage

(1)

IJ A

Ptotal is the total power dissipation of a CMOS circuit, Pswitching is the switching power, PSC is the short circuit power, and Pleakage is the leakage power. Paper [1] discusses about the power consumption in the digital circuits which is proportional to the square of supply voltage. Technology scaling can be used to reduce power consumption in which the threshold voltage is scaled in proportion to the supply voltage. Due to this scaling, leakage currents have become one of the main power consumers and this leads to substantial increase in sub-threshold leakage power. In paper [2], S. Kang discusses about the elements of low power for integrated systems. In many of VLSI chips, the power dissipation of the clocking system, including the clock distribution network and flip-flops often consumes the total chip power. Flip-flops and latches are fundamental building blocks of sequential digital circuits. The design trend is to use more pipeline stages for high throughput, which increases the number of flip-flops in a chip. Paper [3] discusses about the basic flip-flop timing parameters that are clock-to-output (Clk-Q) delay, setup and hold times. They reflect in the system-level performance as flip-flop delay (sometimes called latency) and internal race immunity.

ISSN: 2230-7818

Student, M.Tech (VLSI Design), SENSE, VIT University, Vellore, Tamil Nadu, India pavan.vlsi43@gmail.com

Figure 1. Definition of setup and hold times

The Clk-Q delay is the delay measured from the active clock edge to the output. Setup and hold times are defined as the amount of time the synchronous input (D) must be stable before and after active edge of clock (Figure.1). The flip-flop environment in a digital system, Figure.2 has to satisfy equation (2) for correct operation. The clock period (T) must be greater or equal to the sum of worst-case Clk-Q delay, tClkQ, flip-flop setup time (tsetup), maximum combinational logic delay (tlogic) and relative clock skew (tskew). The flip-flop delay has to satisfy maximum delay restriction given by equation (2). D =1.05.tClk-Q + tsetup < T – tlogic – tskew (2)

Technique, Low power digital circuits, Sequential circuits.

Pavan Kumar.V

Prathyusha Konduri

Figure 2.

Flip-flop environment in digital system

Section II presents the different flip-flop topologies designed to reduce power consumption. Section III presents different CMOS design techniques and section IV presents the adiabatic low power techniques implemented in sequential circuits. Section V discusses about the results and the section VI concludes the paper.

Page 177

Prathyusha Konduri et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 177 - 183

II.

FLIP-FLOP TOPOLOGIES

Paper [3] also discusses about the Flip-Flop topologies. The most commonly used flip-flop design techniques are conventional master-slave latch pairs and pulse triggered latches. Other low-energy designs often derived from the conventional techniques; use double-edge triggering, reducedswing clock, or internal gating. Figure 5. Semi dynamic FF and Hybrid FF

An example of a fully differential pulsed-latch is the modified sense amplifier based flip-flop (MSAFF) is shown in Figure.6.

Figure 3.

A. Master-slave latch pair A flip-flop can be designed as a latch pair, where one is transparent high, and the other is transparent low. The transmission-gate flip-flop with input gate isolation (TGFF) is shown in Figure.3, where the input gate isolation is added for better noise immunity. An additional inverter at the output of TGFF provides non-inverting operation. The pseudo-static C2MOS flip-flop of Figure.4 is obtained by adding a weak C2MOS feedback at the outputs of the master and the slave latches in dynamic C2MOS-FF [3].

Transmission gate flip-flop

Figure 6.

Modified Sense Amplifier FF

IJ A

Internal clock gating provides disabling of the internal clock when the input and output data are equal. The clock-ondemand flip-flop (COD-FF) is shown in Figure.7.

Figure 4. C2MOS flip-flop

B. Pulse triggered latches A pulse-triggered latch is also a two-stage flip-flop where the first stage is a pulse generator (PG), and the second stage is a latch. The term pulse-triggered means that the data is entered on the rising edge of the clock pulse, but the output doesnâ&#x20AC;&#x;t reflect the change until the falling edge of clock pulse. The semi-dynamic flip-flop (SDFF) is shown in Figure.5. A dynamic front-end provides a clock pulse that triggers a backend static latch. The hybrid latch flip-flop (HLFF) shown in Figure.5 is very similar to SDFF with static PG. It samples the data on one edge and thus eliminates a retardation of data flow on the opposite edge. It is similar to latches because it can provide a soft clock edge which allows slack passing and minimizes the effects of clock skew on cycle time.

ISSN: 2230-7818

Figure 7. COD-FF

Clock gating is integrated in the pulse generator, which generates the pulse. A transmission gate flip-flop with internal clock gating (GTGFF) is shown in Figure.8.

Page 178

Prathyusha Konduri et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 177 - 183 TABLE I.

PERFORMANCE ANALYSIS OF FLIP -FLOPS

Flip-Flop Type SDFF

Power (µW) 262

HLFF

250

CCFF

185

LSDFF

132

III. GTGFF

IJ A

Paper [2] discusses about several small-swing clocking schemes and their drawbacks. The half swing scheme requires four clock signals. It suffers from skew problems among the four clock signals and requires additional chip area. A Reduced Clock-swing flip-flop (RCSFF) requires an additional high power-supply voltage to reduce the leakage current. A single-clock flip-flop for half-swing clocking doesn‟t need high power-supply voltage but has a long latency. HLFF and SDFF consume large amounts of power due to redundant transitions at internal nodes. To reduce the redundant power consumption in internal nodes of highperformance flip-flops a Conditional capture flip-flop has been introduced. However, HLFF, SDFF and CCFF use full-swing clock signals that cause significant power consumption in the clock tree. A low-swing clock double-edge triggered flip-flop (LSDFF) is discussed and is as shown in Figure.9.

CMOS LOGIC DESIGN TECHNIQUES

Paper [4] discusses different CMOS logic design techniques to reduce the power consumption like CMOS complementary logic, Pseudo nMOS, Dynamic CMOS, Clocked CMOS logic (C2MOS), CMOS Domino logic, Cascade voltage switch logic (CVSL), Modified Domino logic, Pass Transistor Logic (PTL). PTL is one of the form logic popular in low-power design. The advantages of PTL over CMOS design are discussed in paper [5] 1) High speed, due to small node capacitances. 2) Low power dissipation, as a result of reduced number of transistors. 3) Lower interconnection effect, due to smaller area.

Figure 8.

No. of transistors

Figure 9. LSDFF

For LSDFF, with a simple clocking scheme, double-edge triggering can be implemented to sample and transit data at both the rising edge and the falling edge of the clock. Hence the clock frequency can be lowered to half and power consumption can be reduced by 50%. To prevent performance degradation of LSDFF due low-swing clock, low-Vt transistors are used for the clocked transistors without significant leakage current problem, but consist of more number of transistors. Table I shows the comparison of LSDFF with other flip-flops.

ISSN: 2230-7818

In spite of these advantages of PTL, there are two main drawbacks. First, the threshold voltage across the single channel pass transistors result in reduced drive and hence slower operation at reduced voltages. Second, since the high input voltage level is not VDD the PMOS device in inverter is not fully turned off. In order to overcome these problems some sort of PTL techniques have been discussed [5]. Transmission gate CMOS (TG) uses transmission gate logic to realize complex logic functions using less number of transistors. It solves the problem of low logic level swing by using PMOS as well as NMOS. Complementary pass transistor logic (CPL) uses NMOS pass transistor logic with CMOS output inverters. Small stack height and internal node low swing are important features but it suffers from static power consumption due to the low swing at the gates of the output inverters. Double pass transistor logic (DPL) uses complementary transistors to keep full swing operation and reduce the dc power consumption. This eliminates the need for restoration circuitry. One disadvantage of DPL is large area due to the presence of PMOS transistors. A new low power design technique that solves most of the problems known as Gate-Diffusion-Input (GDI) technique is discussed in paper [6]. This technique allows reducing power consumption, propagation delay and area of digital circuits while maintaining low complexity of logic design. The GDI method is based on the simple cell as shown in the Figure.10. A basic GDI cell contains four terminals – G (common gate input on NMOS and PMOS transistors), P (the outer diffusion node of PMOS transistor), N (the outer diffusion node of NMOS transistor), and D (common diffusion node of both transistors).

Page 179

Prathyusha Konduri et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 177 - 183

Figure 11. D – Flip flop implementation in GDI technique

Table II shows how various logic functions implemented with GDI cell. It enables simpler gates, transistor count and lower power dissipation.

Paper [7] discusses about the operation of Flip-Flops implemented in GDI Technique in sub-threshold region. When the supply voltage is decreased to reduce power consumption then, this leads to increase in sub-threshold leakage power. Therefore to reduce the energy we go for sub-threshold circuits. Sub-threshold current of an MOSFET transistor occurs when gate-source voltage (VGS) of transistor is lower than the threshold voltage (VTH). When VGS is greater than VTH, majority carriers are repelled from the gate area of the transistor and a minority carrier channel is created. This is called as strong inversion. When VGS is lower than VTH there are less minority carriers, but their presence comprises a current and the state is known as weak inversion. Operation of digital circuits in the sub-threshold region, utilizes this current, minimizing power consumption in lowfrequency systems. Sub-threshold circuit operation is driven by currents much weaker than standard strong-inversion circuits, characterized by longer propagation delays and limited to lower frequencies. A block diagram of the basic GDI flip-flop in sub-threshold region is as shown in Figure.13. The design is composed of pair of latches comprising a GDI multiplexer (see Figure.12) and cross-coupled pair of inverters. GDI multiplexers are composed of a single pair of transistors thus reducing area, power consumption and clock load.

TABLE II.

LOGIC FUNCTIONS IMPLEMENTED WITH GDI CELL

Function

„0‟

A‟B

„1‟

A‟+B

„1‟

A+B

„0‟

AND

A‟B+AC

MUX

„0‟

„1‟

A‟

NOT

Figure 10. Basic GDI cell

Most of these functions are complex (6 – 12 transistors) in CMOS as well as in PTL implementations but very simple (only 2 transistors per function) in GDI design method. Table 3 shows the implementations of AND, OR and XOR gates in GDI, CMOS and PTL.

IJ A

TABLE III. AND, OR , XOR GATES IN GDI ,CMOS AND PTL

Figure.11 shows the implementation of D-Flip flop in GDI technique. This architecture isn‟t always suitable for strong inversion operation due to the threshold voltage drop, but this is substantially reduced in sub-threshold operation [6].

Figure 12. GDI Multiplexer symbol and schematic

The cross-coupled inverters ensure that strong signals are passed from the multiplexers and block any reverse currents through the multiplexers. This flip-flop comprises 12 transistors, a relatively small number, substantially reducing area and capacitance.

Figure 13. Flip-flop design with GDI multiplexers

ISSN: 2230-7818

Page 180

Prathyusha Konduri et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 177 - 183

Figure 14. Improved Flip-flop design with GDI multiplexers

IV.

This design comprises of 10 transistors plus a delay element that can be implemented with a resistor or a transistor. This architecture takes the advantage of the feedback in second stage making first stage redundant. This way the transistor count is reduced achieving smaller area and the setup time can be reduced by the addition of inverter. The new setup time is given as, tsetup = tP1 + tInv1 + tN3 + tInv2 - tDelay (4) In other words, tsetup can be reduced by adding larger delay. To eliminate glitching a minimum setup time of t P1 + tInv1 should be kept. If the delay is applied to the clock that is connected to Mux1, a hold time equal to tN3 + tInv2 - tDelay is introduced. However, the hold time remains zero if the delay is only applied to the gate of N1. ADIABATIC LOGIC FAMILIES

IJ A

In CMOS circuits the energy stored on the capacitor is dissipated as heat and also in CMOS logic the delay and loss are more. An alternative to CMOS logic is the Adiabatic logic in which the energy stored on the capacitor is recycled. There are many works on adiabatic logic that can achieve considerable energy savings with the help of several adiabatic logic families like Split-level Charge Recovery Logic (SCRL), Two-Level Adiabatic Logic (2LAL) [8], Efficient Charge Recovery Logic (ECRL) [9], nMOS Reversible Energy Recovery Logic (nRERL), and Complementary Pass-transistor Adiabatic Logic (CPAL) are discussed in paper [10]. But SCRL requires a larger silicon area and number of phases require in the clocked power is large. ECRL eliminates the precharge diode but the charge of output loads canâ&#x20AC;&#x;t be completely recovered and energy dissipation is highly dependant on output load capacitance [10]. nRERL recovers the output load charge employing boot strap mechanism but energy loss of internal boot strapping nodes is not small. CPAL uses complementary pass-transistor for logic evaluation and transmission gates for energy recovery, to realize efficient energy transfer and low energy loss [10].

ISSN: 2230-7818

There are two types of energy loss in quasi-adiabatic circuits, adiabatic loss and non-adiabatic loss. The adiabatic loss occurs when current flows through un-ideal switch, which is proportional to the frequency of the power-clock. If any voltage difference between the two terminals of a switch exists when it is turned on, non-adiabatic loss occurs. The nonadiabatic loss, which is independent of the frequency of the power-clock, is proportional to the node capacitance and the square of the voltage difference. But CPAL circuits have internal dissipation to charge and discharge nodes. It also has adiabatic dissipation to charge the output load capacitance. Paper [11] presents the adiabatic CPL circuits that consists of pure NMOS transistors and use CPL blocks for evaluation and bootstrapped NMOS switches for driving output loads, so that non-adiabatic loss of output nodes is eliminated. A CPL inverter using DC power supply is as shown in Figure.15.

The required setup time (tsu) is the path delay of the input signal (D) to the feedback port of Mux1. Accordingly, the setup time is given as tsetup = tP1+ tinv1+ tinv2 (3) A modification can be implemented to this basic architecture to improve the setup time and reduce area. The architecture is as shown in Fig.14. This can be achieved by removing the first stage feed back inverter and passing it to second stage.

Figure 15. CPL inverter using DC power supply

The evaluation logic block consists of four NMOS transistors (N1-N4) and the load driven circuit consists of two CMOS output inverters that drive the output loads. Since the output high level of CPL evaluation logic block is VDD-VTN (VTN is the threshold voltage of NMOS transistors), two pullup PMOS transistors are introduced to achieve the level restoring thus reducing short circuit power consumption of the output inverters. To recover and reuse the supplied energy, the DC power supply of the CPL circuit can be replaced by the power clock. A CPL inverter using power clock (AC power supply) is as shown in Figure.16.

Figure 16. CPL inverter using AC power supply

Because NMOS transistors N5 (or N6) can be bootstrapped to a higher level than VDD-VTN as the power clock Ń&#x201E; rises gradually, the PMOS transistors can be eliminated. A pair of cross-coupled NMOS transistors N7 and N8 is added to obtain non-floating output, i.e. makes the undriven output node is grounded.

Page 181

Prathyusha Konduri et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 177 - 183

Figure 18. D flip-flop using ACPL technique

RESULTS AND DISCUSSIONS

Different architectures of flip-flops designed to reduce power dissipation are presented. Various CMOS logic design techniques and Adiabatic low power techniques are presented. The sequential circuits using GDI Technique and ACPL are implemented. The Table IV shows the comparison results of both these techniques in terms of transistor count and power dissipation implemented in the D flip-flop. TABLE IV. COMPARISON RESULTS

Sequential Circuits (D Flip-Flop) No. of transistors

GDI (Sub threshold)

GDI

ACPL

62.6ρW

545.80µW

636.2fW

The operation of the adiabatic CPL inverter is that during the time interval T1, the input IN goes high while the input INb is low. Therefore N1 and N3 are turned on. The node X is charged to about VDD-VTN while node Y is clamped to ground. During T2 as the clock ф goes up, the node X can be bootstrapped to a higher level than VDD-VTN due to gate-tochannel capacitance of the transistor N5. Therefore as the clock ф rises, the node OUT is charged through the bootstrapped NMOS switch (N5) without non-adiabatic loss, and fully-swing is obtained. At the same time, when the node OUT rises above VTN, N8 will be turned on and the node OUTb is clamped to ground. During the time interval T3, the node OUT is the same as the clock ф, while the node OUTb is still at ground. At the same time, the node X will keep its state because the node X is isolated. During T4, as the voltage of the clock ф falls from VDD to ground, the charge on the node OUT is recovered through the transistor N5. Adiabatic CPL hasn‟t non-adiabatic loss on output nodes because its operation for output loads is a full adiabatic loss therefore adiabatic CPL circuits consume less power than other techniques. The implementation of two-input AND/NAND, OR/NOR and XOR/XNOR gates and multiplexer using Adiabatic Complementary Pass-transistor Logic is as shown in Fig.17. Only the CPL evaluation blocks are shown in the Figure.17 and other transistors are omitted for simplicity.

Power dissipation

IJ A

VI.

Figure 17. Adiabatic CPL two-input gates (a) AND (b) OR (c) XOR and (d) multiplexer

Adiabatic CPL circuits are suitable for the design of flipflops and sequential circuits as it uses fewer transistors. Paper [12] presents the implementation of D flip-flop using ACPL technique which is as shown in Fig.18. The supply can be pulse, trapezoidal or sinusoidal signals. Pulse and trapezoidal signals are easy to analyze but difficult to generate hence sinusoidal power clocks are used which can be generated by simple LC circuits.

In this paper different flip-flop topologies has been reviewed and evaluated based on the performance metrics like area, power, delay and transistor count. But the proposed flip-flops have the disadvantages of transistor count, delay and power dissipation. So a new technique, Gate-Diffusion-Input (GDI) technique is adopted for reducing the transistor count. To reduce the power dissipation, adiabatic low power techniques have been presented. ACPL can be used in flip-flops because of fewer transistors and for high performance. The GDI technique and ACPL technique implementation in D Flip-Flop and the comparison results have been discussed.

REFERENCES [1] [2] [3] [4]

ISSN: 2230-7818

CONCLUSION

A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS digital design”. IEEE J. Solid-State Circuits, vol. 27, pp. 473-484, Apr. 1992. S. Kang, "Elements of Low Power Design for Integrated Systems".ISLPED ‟03, August, 2003. D. Markovic, B.Nokolic, R. Brodersen, “Analysis and Design of Low-energy Flip-Flops,” International Symposium on Low Power Electronics and Design, pp. 52-55, Aug. 2001. N. Weste and K. Eshraghian Principles of CMOS digital design.

Page 182

Prathyusha Konduri et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 177 - 183 [5]

[6] [7] [8] [9] [10] [11] [12]

A. Morgenshtein, A. Fish, I.A. Wagner, “Gate-Diffusion Input (GDI) – A Power Efficient Method for Digital Combinatorial Circuits,” IEEE Trans. VLSI, vol. 10, no.5, pp. 566-581, October 2002. A. Morgenstein, A. Fish, I. Wagner, “An Efficient Implementation of D-Flip-Flop Using the GDI Technique,” ISCAS ‟04, pp. 673676, May 2004. Sagi Fisher, Adam Teman, Dmitry Vaysman,Alexander Gertsman , Orly Yadid-Pecht, “ Ultra Low power Sub-threshold Flip-Flop”. Benjamin Gojman, “Adiabatic Logic”, August 2004. Y. Moon and D. Jeong, “An efficient charge-recovery logic circuit,” IEEE Journal of Solid-State Circuits, vol. 31, no. 4, pp. 514-522, 1996. J. P. Hu, L. Z. Cen, X Liu, “A new type of low-power adiabatic circuit with complementary pass-transistor logic”, Proc. 5 th Inter. Conf. on ASIC, Beijing, China, pp. 1235-1238, 2003. Ling Wang, Jianping Hu, and Jing Dai, “A low-power multiplier using adiabatic CPL circuits”, Integrated Circuits, 2007. ISIC‟07, International Symposium, pp. 21-24, 2007. “High Performance Sequential Circuits with Adiabatic Complementary Pass-Transistor Logic (ACPL)”

K. Prathyusha completed her B.Tech in Electrical and Electronics Engineering from Lakireddy Balireddy College of Engineering, Mylavaram, Andhra Pradesh, India in 2009. She is now pursuing her Master of Technology (M.Tech) in VLSI Design at VIT University, Vellore, Tamil Nadu, India. Her interest includes Digital Design, ASIC Design, VLSI Testing, and she did some projects in the area of Low power in VLSI Design. She is an active participant in technical events and presented projects at national level.

Pobbireddy Sameera completed her B.Tech in Electronics and Communication Engineering from Malineni Lakshmaih Engineering College Ongole, India in 2008. She is now pursuing her Master of Technology (M.Tech) in VLSI Design at VIT University, Vellore, Tamil Nadu, India. Her interest includes Digital Design, ASIC Design, Low Power IC Design and RFIC Design.

IJ A

Pavan Kumar.V completed his B.Tech in Electrical and Electronics Engineering from Lakireddy Balireddy College of Engineering, Mylavaram, Andhra Pradesh, India in 2009. He is now pursuing her Master of Technology (M.Tech) in VLSI Design at VIT University, Vellore, Tamil Nadu, India. His interest includes Analog Design and Digital Design.

ISSN: 2230-7818

Page 183