Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 310 - 314
VLSI Architecture and ASIC Implementation of ICE Encryption Algorithm Prof. P. Magesh Kannan
Venkatesan Sellappan
VLSI Division, School of Electronics Engineering, VIT University, Vellore, India E-mail: mageshkannan.p@vit.ac.in
PG Student – VLSI Design, SENSE VIT University, Vellore, India E-mail: venkatesan.srec@gmail.com
Keywords-ICE, Montgomery Multiplication, multiplication, Systolic architecture
I.
INTRODUCTION
T
Modular
IJ A
Security is a primary requirement of any wireless cryptographic protocol. In order to find a solution to this always up to date problem, cryptographic algorithms are constructed to provide secure communication applications. There are many good algorithms with different usages and characteristics, not all of them can be characterized fully secure [1], [2]. In cryptography, ICE (Information Concealment Engine) is a block cipher published by Kwan in 1997. The algorithm is similar in structure to DES, but with the addition of a key-dependent bit permutation in the round function. The key-dependent bit permutation is implemented efficiently in software. The ICE algorithm is not subject to patents, and the source code has been placed into the public domain.
The ICE algorithm was designed for use in software applications. Those applications however are slow due to the use of modular arithmetic [3]. So the need for faster implementations is great. That can be achieved through hardware implementations. Considering the fact that hardware implementations are generally faster and more Ease of Use reliable than software implementations the outcome of a hardware design is even more interesting. In this paper, the architecture and the VLSI implementation of the ICE encryption algorithm are proposed. The system operates for the both encryption and decryption processes and has been optimized for low hardware resources
ISSN: 2230-7818
and for high–speed performance. The proposed architecture has very encouraging performance result in terms of speed and throughput. This makes the design very useful in current applications that use DES as the base of a cryptographic protocol [2]. ICE is a Feistel network with a block size of 64 bits. The standard ICE algorithm takes a 64-bit key and has 16 rounds. A fast variant, Thin-ICE, uses only 8 rounds. An open-ended variant ICE-n, uses 16n rounds with 64n bit key. They described an attack on Thin-ICE which recovers the secret key using 2 23 chosen plaintexts with a 25% success probability. If 2 27 chosen plaintexts are used, the probability can be improved to 95%. For the standard version of ICE, an attack on 15 out of 16 rounds was found, requiring 2 56 work and at most 2 56 chosen plaintexts.
ES
Abstract— In modern security, the need for safe cryptographic algorithms that are hardware implemental is mandatory. A hardware architecture is proposed in this paper, for the implementation of the Information Concealment Engine encryption algorithm. The Information Concealment Engine algorithm was designed for use in software applications. Those applications are slow due to the use of modular arithmetic. So the need for faster implementations becomes mandatory. That can be achieved through hardware implementations. The system operates for the encryption processes and has been optimized for low hardware resources and for high-speed. In this paper we are going to discuss about the ICE, DES and Triple DES, also we compare the performance of all. The proposed architecture has been implemented as ASIC in 180 nm technology.
In this paper we are going to discuss about the ICE, DES and Triple DES, also we compare the performance of all. A. DES
The Data Encryption Standard or DES algorithm is a 64bit block cipher that can use key size of 64 bits. Both ICE and DES uses standard Feistel block cipher but one of the differences from DES is that after the expansion function E, key permutation is used in ICE. This secret key encryption algorithm uses a key that is 56 bits, or seven characters long. At the time it was believed that trying out all 72,057,594,037,927,936 possible keys (a seven with 16 zeros) would be impossible because computers could not possibly ever become fast enough. In 1998 the Electronic Frontier Foundation (EFF) built a special-purpose machine that could decrypt a message by trying out all possible keys in less than three days. The machine cost less than $250,000 and searched over 88 billion keys per second. B. Triple DES The Triple-DES variant was developed after it became clear that DES by itself was too easy to crack. It uses three 56bit DES keys, giving a total key length of 168 bits. Encryption using Triple-DES is simply cipher text = EK3(DK2(EK1(plaintext))) DES encrypt with K1, DES decrypt with K2, then DES encrypt with K3.Because Triple-DES applies the DES algorithm three times (hence the name), Triple-DES takes
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 310
Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 310 - 314
three times as long as standard DES. Decryption using TripleDES is the same as the encryption, except it is executed in reverse. Decryption is the reverse: plaintext = DK1(EK2(DK3(cipher text))) The paper is organized as follows. In section 2 the ICE encryption algorithm is introduced. In section 3 the F function is described. In section 4, a thorough analysis of the architecture and the VLSI implementation is made. In section 5 Montgomery multiplication is described. In section 6 the performance is presented and some conclusions are presented in section 7. THE ICE ENCRYPTION ALGORITHM
The advantages of Feistel structure are one-to- one mapping between plaintext and cipher text, which is necessary for a cipher to be decryptable. Secondly, Feistel ciphers have been publicly cryptanalysed for more than two decades, and no systematic weakness has been uncovered. And finally, Feistel ciphers are reasonably fast and simple to implement in software. Speed and simplicity were two important design aims for ICE [3]. III.
THE F FUNCTION
IJ A
The ICE F function is similar in structure to the one used in DES, with the exception of keyed permutation described below [2]. The function as a whole is illustrated in Figure. 1
ISSN: 2230-7818
Figure 1. ICE F Function
A. The Expansion Function E The 32-bit plaintext half is expanded to four 10-bit values, E1, E2, E3, E4, in the following manner [2], [3]. E1=P1P0P31P30P29P28P27P26P25P24 E2=P25P24P23P22P21P20P19P18P17P16 E3=P17P16P15P14P13P12P11P10P9P8 E4=P9P8P7P6P5P4P3P2P1P0 This expansion function was chosen because four 10-bit values were needed for the S-boxes, and it was reasonably fast to implement in software [3].
ES
ICE is a standard Feistel block cipher [2], [3] with a structure similar to DES. It takes a 64-bit plaintext, which is split into two 32-bit halves. In each round of the algorithm the right half and a 60-bit subkeys are fed into the function F. The output of F is XORed with the left half, and then the halves are swapped. This is the Transformation Round of the ICE algorithm. This process is repeated for 16 rounds [2], [3].However the final swap is left out. The decryption process is the same, except that the subkeys are used in reverse order.
T
II.
B.
Keyed Permutation After expansion, keyed permutation is used. The permutation subkeys is 20 bits long, and is used to swap bits between E1 and E3, and between E2 and E4. When the odd bits of the permutation key are set they swap El relative bits with E3 bits else they swap E2 relative bits with E4 bits. For example, if bit 1 of the subkey is set, bit 0 of E1 and E3 will be swapped. If bit 2 of the subkey is set, bit 1 of E2 and E4 will be swapped [2],[3]. The values E1, E2, E3, and E4, after being permuted, are XORed with 40 bits of subkeys, then used as input to the four S-boxes, S1, S2, S3, and S4. Each S-box takes a 10 bit input and produces an 8 bit output [3]. C. The Permutation Function P The four 8 bit S-box outputs are combined via a P-box into a 32-bit value, which become the output of the F function. The P-box, which is specified in table 1, was designed to maximize diffusion from each S-box, and ensure that bits which are separated by 16 places never come from the same S-box, nor from S-boxes separated by two places(S1 and S3) [3].
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 311
Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 310 - 314
D.
The S-Boxes The S-boxes in ICE are similar in structure to those used in LOKI [4] in their use of Galois field exponentiation. Each Sbox takes a 10-bits input X. Bits X9 and X 0are concatenated to form the row selector R. Bits X8.....X1 are concatenated to form the 8-bit column selector C. For each row R, there is an XOR offset value OR, and a Galois field prime PR [2], [3]. The 8-bit output of an S-box for an input X is given by (C XOR OR)7 mod PR under 8-bit Galois field arithmetic. The exponent 7 was chosen because it is a one-to-one function [2], [3]. The XOR offsets for each row in each S-box are given in table 1, while the prime numbers are specified in table 2. Table 1. The S-Box offset values
Figure 2. The Proposed Feedback Architecture
IV.
THE PROPOSED ARCHITECTURE
IJ A
The proposed architecture is shown in Figure 2.This architecture performs both encryption and decryption with input plaintext and 64-bit input keys. It uses an input and an output register. Each of them stores the values of left and right part of every round and swaps the two parts if needed [2]. The Key Expansion Unit creates the round keys, using the 64-bit Input Key following the specifications of ICE. The Keys are stored inside the RAM for every round [2]. The encryption process is simple. In each clock cycle, the data stored in the input register are inserted in the ICE Transformation Round along with the subkeys stored in the RAM. This process is repeated for 16 rounds. However at the final round, the Output Register is used in order to swap the left and right part of the ICE Transformation Round Output. That value of the Output Register is the cipher text. The decryption process follows the same process. However the subkeys are used in reverse order [2].
ISSN: 2230-7818
T
From the analysis of ICE, main design function lies in the ICE transformation round. Especially, in the implementation of the F function, shown in Figure 3. The F function has four parts. The Expansion function E, the key permutation, the S-boxes and the permutation function P [2]. Key permutation can be easily implemented using two multiplexers 2-1, while the Expansion and permutation functions are just rearrangement of wires. So the main implementation cost of the F function lies in the design of the S-boxes [2].
ES
Table 2 . The S-Box Galois Field Prime values
Considering that each S-box uses modular exponentiation in order to calculate its output, a VLSI architecture based on Montgomery Multiplication algorithm is proposed [2]. This component is finding the mathematical function A7mod P. The architecture of this component is pipelined, with 6 stages, and it is based on the following algorithm: FUNCTION A7mod P(X, P) 1. A=MM(X, R’, P) 2. B=MM(A, A, P) 3. C=MM(B, A, P) 4. D=MM(C, C ,P) 5. E=MM(D, A, P) 6. Out=MM(E, 1, P)
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 312
Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 310 - 314
Let A, B, and M be the multiplicand and multiplier and modulus respectively and let n be the length of operands [9]. The pre-condition of the Montgomery algorithm are as follows. 1) 0 < A, B < M 2) 2n-1 < M <2n and gcd(M, 2)=1. So M must be odd number. Function MM (A, B, M) 1. R=0 2. For k=o to n-1 do begin 3. q=(a0+xky0)mod b 4. R=R+ xkB +qM 5. R=R/b end
ES
T
6. Return R End This algorithm is a modified version of original Montgomery Multiplication algorithm. The base b is considered Radix 2 (b=2) and R=2n where n is the bit length of operand A, B [2].
Figure 3. The ICE Transformation Round
IJ A
MM is the Montgomery Multiplication function [5-15] and R’= R2mod p is precalculated, fixed number. Step 1 is needed to transform the input value X into Montgomery format and step 6 to change the Montgomery formatted result E into a normal number value. The Montgomery Multiplication algorithm was implemented using a systolic architecture, shown in Figure 4 and Figure 5 [2]. V.
MONTGOMERY MULTIPLICATION
The operation of modular multiplication consists of two steps: one generates the product P= A×B and the other reduces this product P modulo M [5-16]. The implement a multiplication is based on an iterative adder-accumulator for the generated partial product. This solution is quite slow as the final result is only available after n clock cycles; n is the size of operands [9]. The advantage of Montgomery calculation is that we do not need subtractions in order to reduce the intermediate results. The disadvantage is the fact that the Montgomery’s -n modular multiplication calculates A.B2 mod M instead of A.B mod M and two Montgomery’s modular multiplications are required for one modular multiplication [6]. This algorithm computes the product of two integers modulo a third one without performing division by M. It yields the reduced product using a series of additions [9].
ISSN: 2230-7818
Figure 4. The Processing Element (a) and the PE for the calculation of q (b) of the Conventional Architecture
Introducing carry save redundant logic, the modified Montgomery multiplication can be transformed to [8] Function MMcs (A, B, M) 1. C2in=0, C1in=0, Sin=0 2. For k=0 to n-1 do begin 3. Q=(Sin0+C1in0+C2in0+Akb0)mod 2 4. C1+C2+S=C2in+C1in+Sin+AkY+qM 5. C2in=(C2)/2, C1in=(C1)/2, Sin=S/2 End 6. Return C2in/2, C1in/2 and Sin/2 The Carry2 signal of a PE is connected with the next PE of the next row of PE, Carry1 signal is connected with the same PE of the next row of PE (the same column) while the Sum
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 313
Prof. P. Magesh Kannan et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 5, Issue No. 2, 310 - 314
signal is connected with the previous PE of the next row starting from bit 0 [8].
VII. CONCLUSIONS ICE is a symmetric key block cipher designed for software applications. An efficient architecture for the VLSI implementation is proposed in this paper. It is designed for high clock speed - performance and minimized area resources using Montgomery multiplication method. It is proven that the ICE algorithm, used for this architecture, is able to be implemented on hardware applications. The implementation on ASIC is a system that has an external clock of 3 GHz and a throughput of 12 Gbits/sec. Compared to other popular encryption algorithms it is concluded that the ICE implementation’s performance is better than most block encryption algorithms implementations. REFERENCES
PERFORMANCE
T
ES
Figure 5. The Systolic Conventional Architecture
VI.
The proposed architecture has been captured by using Verilog HDL. All the internal components of the design were Synthesized placed and routed using cadence tools using ASIC. The VLSI synthesis results are shown in Table 3 contain the Frequency, Area and Power comparisons Table 3 . Covered Frequency, Area and Power comparisons
Freq
Power
Area
Algorithm
(GHz)
(pW)
ICE
3
68631.1
571191
DES
1
76236.8
293458
TRIPLE DES
1
251144.4
223921
IJ A
Encryption
In Table 3, the implementations are compared the power, Covered area and operating frequency. ICE performance In operating frequency is better compared to DES and TRIPLE DES blocks ciphers.
ISSN: 2230-7818
[1] Bruce schneier, Applied Cryptography –Protocols, Algorithms and Source Code in C, John Wiley & Sons, Seconded. New York, 1996. [2] A. P. Fournaris, N. Sklavos and O. Koufopavlou,―VLSI Architecture and FPGA Implementation of ICE Encryption Algorithm,‖ [3] M Kwan, ― The Design of the ICE Encryption Algorithm‖, in proc. Of Fast software Encryption workshop, 1997. [4] L. Brown, J. Pieprzyk and J. Seberry, LOKI: A cryptographic primitive for Authentication and Secrecy Applications, Advance in cryptologyAUSCRYPT, 90 Proceedings, Springer-verlag, pp. 229-236, 1990 [5] Peter L. Montgomery, ―Modular multiplication without trial division,‖ Mathematics of Computation, vol. 44, no.170, pp. 519-521, 1985 [6] David Narh Amanor, Christof Paar, Jan Pelzl and Viktor Bunimov, Manfred Schimmler, ― Efficient hardware architectures for Modular Multiplication on FPGAS‖, [7] C. T. Koc, T. Acar and B.S Kaliski, ―Analysing and Comparing Montgomery Multiplication Algorithm,‖ IEEE micro, vol. 16, no. 3 pp.26-33, June 1996. [8] A. P. Fournaris and O. Koufopavlou, ― Montgomery Modular Multiplication Architectures and hardware implementations for an RSA cryptosystem,‖ [9] Nadia Nedjah and Luiza De Macedo Mourelle, ―Systolic hardware Implementation for the Montgomery Modular Multiplication‖, [10] S. E. Eldridge and C. D. Walter, Hardware implementation of Montgomery’s Modular Multiplication Algorithm‖, IEEE Transaction on Computers, 42(6):619-624, 1993. [11] C. D Walter,―Montgomery Exponentiation Needs No Final Subtraction‖, Electronics Letters, vol. 35 no. 21, October 1999, pp 1831-1832. [12] D. J Guan, ― Montgomery Algorithm for Modular Multiplication,‖ Lecture notes, National Sun Yat Sen University, 2001. [13] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to algorithms, The MIT Press, Cambridge, 1990. [14] Shay Gueron, ―Enhanced Montgomery Multiplication,‖ in proc. Of Workshop on Cryptographic Hardware and Embedded System, San Francisco, August 13-15 2002. [15] Taek-Won Kwan, Chang-Seok you, Won-seok Heo, Yong-kyu Kang, and Jun-Rim Choi, ―Two implementation methods of a 1024-bit RSA cryptoprocessor Based on Modified Montgomery Algorithm‖, proceedings of 2001 IEEE ISCAS, 01, May 6-9, Sydney, Australia
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 314