ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
Implementation of Congestion Aware Routing Algorithm Using For NoC Benila Arockia Selvi. A1, Sugapriya. R2 P.G.SCholar, Department of ECE, Vandayar Engineering College, Thanjavur, India1 Assistant professor, Department of ECE, Vandayar Engineering College, Thanjavur, India 2 3
Abstract— The process variation (PV) on delay is a major reason to collapsed the performance in advanced technologies. The performance of different routing algorithms is determined with/without PV for various traffic patterns. The saturation throughput, congestion and average message delay are used as performance metrics to evaluate the throughput. PV increases the average message delay by up to 90% and decreases the saturation throughput by up to 29% compared with nominal characteristics of different routing algorithms. Global routing algorithm is proposed for asynchronous network-on-chip design. Global routing is adaptive, low cost, and scalable. The global routing algorithm outperforms different adaptive routing algorithms in the average delay and congestion for various traffic patterns. Global routing can achieve up to 12%–32% average message delay lower than that of other routing algorithms. Moreover, the proposed scheme yields improvements in saturation throughput by up to 11%–82% compared with PDCR routing algorithms.
Index Terms— Asynchronous design, PDCR Algorithm, congestion, delay, network on chip (NoC), process variation (PV), global routing algorithms
I. INTRODUCTION International Technology Roadmap for Semiconductors presents the process variation (PV) parameters as a critical challenge for IC manufacture [1]. Systematic and random variations are two sources for PV [2]. With technology scaling down, random variation becomes significantly larger than systematic variation [3]. Random variation appears in logic gates and interconnects. The impact of random PV emerged on
low and high levels of designs. One of the key factors of designing network on chip (NoC) is the routing algorithm. An efficient routing algorithm is required to achieve high performance. Hence, ignoring the impact of PV during the design of any routing algorithm results in unexpected average message delay and saturation throughput. Average message delay and saturation through-put are used as two metrics to evaluate the performance of a routing algorithm. The saturation throughput occurs when no additional messages can be injected successfully to the network [4].
However, the average message delay increases exponentially beyond the network saturation [4], [5]. As a hardware solution, a new router design is proposed to mitigate PV impact [6]. In [7], a variation-adaptive variable-cycle router configures its cycle latency adaptively corresponding to the spatial PV to increase the network frequency in the synchronous network. Adaptive routing algorithm for multicore NoC architectures is presented in [8] to reduce saturation bandwidth degradation caused by PVs. In [9], source routing algorithm is introduced to enhance the speed of the communication in a NoC based on the PV.
Fig. 1. Asynchronous design with the RCU block.
To the best of our knowledge, the work presented in this paper is the first work to investigate the impact of PV on different routing algorithms. Moreover, an adaptive routing algorithm that is aware of the PV and congestion for asynchronous NoC designs is introduced in this paper. In this paper, a novel adaptive routing algorithm is proposed for asynchronous NoC designs to reduce the effect of PV. The presented algorithm is applicable with any source of PV. The technique is insensitive to the source of the variation. The novel routing algorithm uses the PV and congestion information as metrics to select the suitable output port (OP), as shown in Fig. 1. This paper is organized as follows. In Section II, PV in asynchronous NoC design is determined. The novel routing algorithm is described in Section III. Circuit-level implementation for asynchronous routers is used to analyze the delay of NoC router with/without PV as described in Section IV. In Section V, simulation results are provided. Some conclusions are demonstrated in Section VI.
All Rights Reserved © 2016 IJARMATE
70
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
II. PROCESS VARIATION IN ASYNCHRONOUS NOC DESIGN As a consequence of the existence of PV in NoC design, variation in the maximum operating frequency of individual cores is to be determined in the traditional synchronous NoC. Synchronous router sends a grant message in every clock cycle. Therefore, the frequencies of all routers in synchronous design should match the lowest frequency. As a result, the saturation throughput and the network performance degrade as the impact of PV increases in fully synchronous design. The effect of PV on network performance of synchronous NoC is very systematic. On the other hand, asynchronous NoC requires deeper analysis. The asynchronous NoC router sends a grant message when it gets a request or after it finishes transmitting a flit. Consequently, each router in asynchronous NoC design can be operated with its own maximum frequency since the communication between the routers is organized by the handshaking process. To determine the delay variation under PV conditions, asynchronous NoC routers are designed. Asynchronous router is divided into input port (IP), output port (OP), and routing control unit (RCU). The IP includes dual-to-single converter, asynchronous single rail FIFO, and single-to-dual converter. The OP is composed of C-elements and dual-rail module. A bidirectional point-to-point interconnects are used in the communication between any two routers or a router and a local resource. In addition, external lines are added to each router to exchange the information about the congestion and delay with PV (DPV) with its neighbors as described in the following section. The handshake protocols are the bundled-data encoding for single-rail protocol and the delay-insensitive encoding for dual-rail protocol. Handshaking signals are necessary to synchronize data transfer between processing elements (PEs) in asynchronous design. Global interconnects are distributed among the routers to transfer data and acknowledgment (ACK) signals. RCU is used to implement the routing algorithm and select the suitable OP for incoming message. The structure of asynchronous router is implemented to evaluate the delay with/without PV, as presented in Section IV. Random variation has two primary components: 1) gate variation and 2) interconnect variation. Random variation of logic gates changes the threshold voltage and effective channel length of the transistor. Interconnect variation occurs in inter-connect dimensions [width (W), height (H), space (S), and dielectric thickness (T)]. One of the major problems in NoC design is the considerable mismatch between two identical devices that occurs when the amount of random variation increases. PV increases the variance of delay compared with nominal values. The impact of gate delay variation on asynchronous NoC router is described in Section II-A. Interconnect delay variation is important to be considered in calculating delay variation for asynchronous NoC router, as described in Section II-B.
A. Router Delay Variation The threshold voltage (Vth) and effective channel length (L gate) are statistically independent and follow Gaussian distribution. Negligible spatial correlation exists between L gate and Vth of devices. The variation in circuit delay is considered using Monte Carlo (MC) simulation. In each MC iteration, the delay due to PV in asynchronous router is evaluated using the average value of delays for all MC iterations since the handshaking protocols are responsible for transferring valid data before starting the following transmission. Assuming a large NoC size, the number of ports is determined by multiplying the number of routers by the number of ports per router. The total number of ports in the network is M. For network of M ports, the average (or mean) delay determines the delay of any port. Circuit delay for each router is determined as follows [10].
Where 0 ≤ i < M, Drout −i is the critical path delay in each MC iteration and Drout is the average of the delay magnitudes. B. Interconnect Delay Variation Assuming that the dimension parameters are statistically independent and follow Gaussian distribution, the variation in the electrical model of interconnect (resistance, inductance, and capacitance) is determined. For data and ACK lines, the interconnect delay for each iteration i , Dint −i is evaluated by considering the whole path from driver to load including RLC model of the line and the inserted repeaters. For M iterations, the average of the evaluated interconnect delay is given by
The total delay for asynchronous router DASR is calculated as follows: DASR =DROUT+DINT (3) The total delay under PV has a significant impact on circuit performance [10]. Christo Ananth et al. [11] proposed a secure hash message authentication code. A secure hash message authentication code to avoid certificate revocation list checking is proposed for vehicular ad hoc networks (VANETs). The group signature scheme is widely used in VANETs for secure communication, the existing systems based on group signature scheme provides verification delay in certificate revocation list checking. In order to overcome this delay this paper uses a Hash message authentication code (HMAC). It is used to avoid time consuming CRL checking and it also ensures the integrity of messages. The Hash message authentication code and digital signature algorithm are used to make it more secure . In this scheme the group private keys are distributed by the roadside units (RSUs) and it also manages the vehicles in a localized manner. Finally, cooperative message authentication is used among entities, in
All Rights Reserved © 2016 IJARMATE
71
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
which each vehicle only needs to verify a small number of messages, thus greatly alleviating the authentication burden. From another point of view, the delay variation is the major reason to deteriorate the performance of different routing algorithms, as demonstrated in Section V. Novel routing algorithm should be developed to be aware of the variation parameters (DPV of router and interconnect). It should also have the capability to adapt with congestion. Therefore, a novel adaptive routing algorithm is proposed and described in the following section. III. PDCR ALGORITHM Deterministic routing algorithms [12], on the contrary to the most adaptive routing algorithms [13], [14], define the path from the source to destination irrespective of the congestion in the network. Adaptive routing algorithms outperform deterministic ones since adaptive algorithms aim to select the less congested paths to produce load balance in the network, especially under realistic traffic loads. However, taking the congestion only into consideration is not effective methodology since random PV leads to diverse delays for each router and interconnect in network topology. Thereby, the adaptive routing algorithm that ignores the DPV can select path with low congestion but with high delay which leads to reduction in the overall NoC performance. Proceeding from this point, the adaptive routing algorithm should be aware of the DPV and congestion to determine the most appropriate path. Process variation delay and congestion aware routing (PDCR) algorithm is introduced as a novel routing algorithm for asynchronous NoC routers [15]. PDCR gathers information about the congestion and DPV of the adjacent neighbors to be able to make routing decision. DPV can be defined using test flit (TF) messages. The description of TF fields is presented in Section III-A. PDCR algorithm has different parameters, as defined and discussed in Section III-B. The description of PDCR algorithm is presented in Section III-C. The evaluation metrics for different routing algorithms are described in Section III-D. A. Test Flit Description Globally asynchronous locally synchronous technique is used by implementing asynchronous NoC design to apply the handshake protocols between each two adjacent routers, and provide a synchronous interface with each PE. The local clocks in the PEs are used to determine the timestamp (TS) to measure DPV. Local clocks in the PE are usually much faster than the communication speed. The skew in those local clocks of the PEs has minor effect on determining the variation in the delay. Each TF carries the TS that is stored in PV-table (PVT) on each router in the network. The calculation of TS and the description of each entry in PVT are presented in this section.
R set of routers in the network and P set of communication ports are assumed. Let Pin and Pout be the set of IP and OP for current router (curr R), where currR ∈ R. The output direction for each router is Pout = { N , E , W , S, C }, where N , E , W , S, and C are north, east, west, south, and core OP direction, respectively. Each router sends one TF that carries the takeoff time to its neighbor routers. When TF is received in the other routers, each router calculates the delay. Subsequently, each neighbor replies with a new TF including the TS value to the source router. Therefore, each router needs to retain the values of the changing delays under the PV of its neighbors (the delay between its OPs and the IP of its neighbors). TS from current router to each adjacent neighbor router (ANR) is given by
∀ curr R ∈ R TSd = Dint−out(curr R) + Dint−inp(d ) d ∈ { N , E , W , S} (4)
where d is the direction of OPs for current router based on its position into mesh topology. Dint−out(currR) is the DPV for OP of current router and Dint−inp(d ) is the DPV for IP of each ANR. For mesh topology, the number of Ops for inside router, border router, and corner router are different. Inside routers have five ports, four ports are connected to neighboring routers and one port to PE. Border routers have four ports and corner routers have three ports. Each router contains a PVT, which consists of four entries. Each entry in the table contains an adjacent neighbor d ∈ Pout and the TS for each one TSd ∈ DPV. Border and corner routers have less number of adjacent neighbors. NULL is used to fill the empty entries in their tables. PVTs are determined in the initialization time, where each router communicates with its neighbors to determine the value of DPV. This process is not required to run more than one time, since the values of DPV are not changed during the normal operation of the NoC, and is independent of the routing algorithms. Therefore, the initialization time is evaluated once. Sending the estimated values uses different methods, as defined in [16], based on the implementation of NoC router, and it is out of the scope of this paper. A separate communication link is the chosen method for sending the estimated values of DPV and the congestion information between the routers. The calculation of the DPV and the congestion values are presented in Section III-B. B. Modeling of DPV and Congestion PDCR selects OP based on acquired information about DPV and congestion from the ANRs. The congestion can be
All Rights Reserved © 2016 IJARMATE
72
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
determined using the free buffer of the IP of the neighbor router. The PVd for each OP is the DPV of current router plus the TS with PV. PVd is given by PVd = Drout(curr R) + TSd (5) where Drout(curr R) is the delay of current router with PV. PDCR selects an OP from the admissible OPs that satisfies the lowest congestion and DPV. However, there may not be an OP that satisfies these two conditions. This means that some OPs may have low congestion but high PV or vice versa. Thereby, there is a predicament in selecting the appropriate OP from the perspective of both congestion and DPV. Therefore, PDCR depends on two threshold values (PVthr and Cthr), where PVthr is the DPV threshold and Cthr is the congestion threshold. The OP at each router is chosen to achieve the balance between avoiding the congestion regions and avoiding passing through OP with a considerable DPV. Thereby, PDCR algorithm may prefer an OP with lowest congestion and has an acceptable DPV value that is less than the specified threshold value, as presented in Section III-C. The threshold values are determined at each router to select the suitable Pout to route the message. Assume that PDCR compares between two admissible OPs( Pout|i and Pout| j ) to choose the suitable Pout. If the congestion of one port Ci is less than that of the other port C j , while the DPV of the first port PVi is higher than that of the other port PVj with value λPV as shown in the following: Ci ≤ C j PVi ≥ PVj + λPV (6) Therefore, Pout|i can be the acceptable OP to route the message when λPV ≤ PVthr. From (6) and (7)
(7)
PVi − PVj ≤ PVthr. (8) Therefore, the difference value of DPV between two admissible OPs λPV should be less than or equal to PVthr to route the message on this OP. The value of PVthr is calculated based on the average of the difference between the PV delays for each pair of ANR and currR. PVthr is given by
where PVi is the PV delay between the currR and the ANR in the i -direction and n is the number of ANR of current router. In addition, the value of Cthr is defined as follows:
The proposed algorithm can be divided into two procedures: 1) determining target node (TN) and 2) selection criterion for the OP. The details of the procedures are described in Sections III-C1 and III-C2. 1. Determining Target Node: At source router, a random intermediate (IM) router is chosen between the source and the destination as an IM station during the message trip. Thereby, the message has two phases ( ph0 and ph1) when it is routed from the source to the destination. At ph0, the message is routed from the source to the IM node. ph1 is used when the message is forwarded from the IM router to the desti-nation router. This technique is used to avoid the congestion regions [17], [18]. In PDCR, a uniform random distribution function is used to select a random IM router between the source and destination. In addition, phase ( ph) and IM fields are added into each message to retain the values of the message phase and the IM router identification (ID). Each router needs to declare the TN whether it is the IM or destination router. When each router forwards the message to the TN, it applies XY and YX routing algorithms to calculate the OP direction (i.e., N = 0). The integer value of the output direction is denoted by Px y when XY routing algorithm is used. Py x denotes the integer value of the output direction when YX routing algorithm is used to route the message for TN. The pseudocode of the TN computation is shown in Fig. 2. The default value of ph field of the message is set to zero. However, ph field of the message is assured from ph0 to ph1 in one of the following cases: 1) If the current router is the IM router; 2) if the currR exists in the same row of the destination router (rx == dx ); 3) if the currR exists in the same column of the destination router (ry == dy ); where the coordinates of current router are rx for X coordi-nator and ry for Y coordinator. In addition, dx is used for the X coordinator of the destination node and dy is used for the Y coordinator of the destination node. If one of the three conditions is true, this is sufficient to make ph field equal to one, and hence, the TN is assigned to the destination router ID. On the other hand, when none of the three condi-tions is achieved, ph field equals zero and hence the TN is assigned IM field of the message.
where Ci is the congestion of the neighbor in the i direction. C. PDCR Procedure
All Rights Reserved © 2016 IJARMATE
73
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
Fig. 2. Pseudocode of the TN computation procedure.
Fig. 4. Pseudocode of OP selection based on DPV and congestion.
Fig. 3. Exploiting the same path more than one time.
The last two conditions are used to avoid the packet exploiting the same path more than one time during arrival trip to the destination. Exploiting the same path more than once can be clarified in the following example, as shown in Fig. 3. The source node, destination node, and IM node are chosen at (0, 1), (2, 3), and (2, 0), respectively. The path is calculated from srcID to IM nodes based on DPV and congestion. The source (0, 1) routes the packet to (1, 1) and (2, 1) as the next hops. Then, the packet is forwarded to the IM (2, 0). In the second phase, the packet is routed from the IM node to desID. Based on DPV and congestion, the packet is sent from IM (2, 0) to (2, 1). Therefore, the path between (2, 0) and (2, 1) is used more than one time which increases number of hops between srcID and destID and increases the commu-nication delay. In this case, the IM node and the destination node are on the same row. Consequently, when the packet reach node (2, 1), the phase is changed from ph0 to ph1 and the TN is assigned to the destination router ID (2, 3) instead of IM (2, 0), as shown in Fig. 3. Therefore, the last two ph conditions are used to avoid such a scenario. Moreover, PDCR guarantees the deadlock free. Adopting XY and YX routing algorithms as sub-algorithms ensures the deadlock-free condition [19], [20].
Fig. 5. Example for the same direction to OPs.
2) Selection Criterion: After applying XY and YX routing algorithms, PDCR distinguishes between these two output directions (Px y , Py x ) based on the congestion and DPV. At each router, the congestion (Cx y ) of the neighbor router and the DPV (x y P V ) between the currR and the neighbor router (if XY routing algorithm is used) are compared with the congestion (Cy x ) of the neighbor router and the DPV (yx P V ) between the currR and the neighbor router (if YX routing algorithm is used). Fig. 4 contains the pseudocode of the selection criterion for the OP based on DPV and congestion. By comparing two ports using six parameters, there are three main scenarios that should be handled. First, if the output direction Px y equals the output direction Py x , then the proposed PDCR routes this direction, as shown in Fig. 5. If the IM is (0, 3) and destination ID (3, 3) then the OP Px y equals Py x = (1, 3). Second, with different OP directions, if the congestion Cx y equals Cy x , the DPV is used to choose the next hop. The output direction with the lowest DPV is always chosen as a route direction. Consequently, if the x y P V of the next router is less than yx P V , then the message is routed in direct XY and vice versa. When the x y P V equals yx P V , then PDCR chooses next hop direction randomly between Px y and Py x using a uniform random distribution function. With dissimilar OP directions and congestion values (Cx y and Cy x ), PDCR chooses to route the message to the Px y direction, if the Cx y is less than Cyx and xyPV satisfies one of the following criteria. 1) The x y P V is less than or equal to yx P V .
All Rights Reserved Š 2016 IJARMATE
74
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
2) The x y P V is greater than yx P V with acceptable value P V thr . The opposite conditions occurs if the Py x direction is chosen by PDCR. If no output satisfies the previous criteria, the following conditions are applied. The Px y direction is chosen to route the message to the TN when the x y P V is less than yx P V and the Cx y satisfies one of the following criteria. 1) Cx y is less than Cy x . 2) Cx y is greater than Cy x with acceptable value of Cthr. If both congestion (Cx y and Cy x ) are not equal and the last two conditions do not produce an output direction, the chosen
σAMD is the standard deviation of average message delay, and µAMD is the mean value of message delay. TABLE I SUFFICIENT NUMBER OF MC ITERATIONS
route is selected randomly between Pxy and Pyx. The oppo-site conditions occurs if Py x direction is chosen by PDCR. To evaluate PDCR algorithm compared with different routing algorithm, two evaluation metrics are presented in Section III-D.
TABLE II PROCESS VARIATION PARAMETERS
D. Evaluation Metrics Average message delay and saturation throughput are the two metrics which are used to evaluate the performance of routing algorithms [21]–[24]. The saturation throughput occurs when no additional messages can be injected successfully to the network. It can be measured at the injection rate where the average message delay reaches twice the average zero load (the lower bound on the average message delay) [21], [25], [26]. The average message delay [4] is determined at an injection rate of non-saturated traffic (IRNT) (below the saturation throughput point) [21]. When the injection rate reaches the saturation point, the average message delay increases exponentially, as shown in Section V. The message delay is determined using a TS assigned to each generated message from the source node till it is received at the destination node. The average message delay Dav is given by
Where k is the total number of received messages at the destination nodes and Di is the delay of the message i . Standard deviation of average message delay σAMD rep-resents the deviation from average message delay. Through different values of average message delay, it is misleading to compare the average message delay variation of routing algorithms using the standard deviation since it expresses an absolute measurement value. Therefore, AMDvar represents the variation as a percentage of the mean value of message delay for different routing algorithms, as presented in Section V. AMDvar is given by
Where AMDvar is a variation of the average message delay,
The impact of PV on the circuit implementation of asynchronous router is demonstrated in Section IV. In Section V, the influence of PV on the performance of different routing algorithms and PDCR is presented.
IV. CIRCUIT LEVEL IMPLEMENTATION FOR ASYNCHRONOUS ROUTERS Advanced design system (ADS) tools are used to build asynchronous router net lists. Fabrication technology of 32 nm is used to implement the circuit [27]. ADS tools are used to implement routers, and distributed RLC model for inter-router channel to implement complete NoC designs. Data/ACK interconnects are metalized in semi global layer (M8). The parameter of the layer are used to implement the model of interconnect [28]. These parameters are W=286nm, S= 286nm, H=571 nm, and T= 585 nm for 32 nm technology. The model parameter values (resistance R, capacitance C , and inductance L ) of interconnect are calculated as R = 309.95 _, C = 0.369 pF, and L = 1.1 nH for 0.59-mm interconnect [29]. Using the mentioned setup and technology and without PV, the nominal delay of asynchronous router is 6 ns. Repeaters are used to divide the long inter-connect into equal short segments. The delay of the longest interconnects is increased as the number of repeaters increases. The delay of interconnect for mesh topology is 184 ps [29]. This interconnect length requires eight repeaters. The delay of routers and that of interconnects under PV parameters are modeled using ADS MC simulation. The number of routers and the number of ports in each router for mesh topology, in addition to the sufficient number of MC
All Rights Reserved © 2016 IJARMATE
75
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
iterations, are reported in Table I. The sufficient number of MC iterations = no. of routers × port count. The number of routers depends on the number of PE. Assuming 16 PE, the number of iterations is set to 100 for more accurate circuit results although the sufficient number of iterations is less than 100, as listed in Table I. To determine the probability distribution function of delay and delay variation, PV parameters are modeled using Gaussian distribution. The variations of process parameters for logic gates and interconnects are listed in Table II [1], [30].
sources generate eight-flit packets. In addition, each FIFO buffer has a capacity of four flits. To guarantee the accuracy of results, the simulation at each injection rate has been repeated 100 times with different traffic scenarios (generated randomly based on a uniform distribution). The average message delay has been cal-culated at each injection rate for all of the routing algorithms. To evaluate the PDCR algorithm performance, its average message delay and saturation throughput are compared with four other well-known routing algorithms, namely, odd–even (OE) [20], Romm [17], minimally adaptive XY (MAXY) [13], and dynamic adaptive deterministic (DyAD) [14]. In Section V-A, the effect of PV on the performance of different routing algorithms is provided. A comparison between PDCR and different routing algorithms are demonstrated in Section V-B.
Fig. 6. Probability distribution function of delay using different architectures for (a) IP of asynchronous router, (b) OP of asynchronous router, and (c) Inter-router interconnect
The probability distribution function of delay for IP, OP, and interconnect delay of asynchronous router is shown in Fig. 6 [29]. The delay variation Delayvar represents the variation as a percentage of the mean value of delay [29]. The delay variation in interconnect has a higher impact on circuit performance compared with variation in logic as technology scaling down. The delay variation with 32 nm for asynchronous router and interconnect in mesh topology is 10.27% and 35.35%, respectively [29]. The behavior of different routing algorithms is demonstrated under two situations (nominal and with PVs) as shown in Section V. V. SIMULATION RESULTS Different values of delay for routers and interconnects are calculated using fabrication technology 32 nm. PV parameters are modeled using Gaussian distribution. PDCR algorithm and all different routing algorithms are implemented in heterogeneous NoC simulator (HNoCS) [31]. HNoCS is based on OMNeT++ [32] which supports modeling of asynchronous NoC routers. Mesh topology 8 × 8 network is constructed using HNoCS. Different traffic patterns (uniform, transpose, bit reverse, and bit complement) are applied to achieve fair comparison between different algorithms. Traffic All Rights Reserved © 2016 IJARMATE
76
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
Fig. 7. Average message delay for different routing algorithms with/without PV under (a) uniform, (b) transpose, (c) bit complement, and (d) bit reverse traffic patterns.
demonstrated in this section under different traffic patterns. Uniform, transpose, bit complement and bit reverse are the assumed traffic patterns, as shown in Fig. 7(a)–(d), respectively. As shown in Fig. 7, DPV leads to increasing the average delay relative to nominal. The saturation throughput and average message delay for different routing algorithms without considered the PV no process variation (NPV), and under the PV are listed in Tables III and IV, respectively, under various traffic pat-terns. The percentages of variability for both evaluation metrics under PV compared with nominal values (NPV) are evaluated and listed in Tables III and IV. As shown in Table III, due to the PV, the saturation throughput of different routing algorithms is decreased with at least 14% relative to the nominal criteria under the different traffic
Table:III Impact Of The Pv On The Saturation Throughput For Different Routing Algorithms
Table:IV Impact Of The Pv On Average Message Delay For Different Routing Algorithms
A. Impact of PV on the Performance of Routing Algorithms The average message delay relative to injection rate for different routing algorithms with/without PV under various traffic patterns is determined. Uniform, transpose, bit complement, and bit reverse are the assumed traffic patterns. Every source node sends messages with an equal likelihood to other nodes in uniform traffic pattern. The destination address for transpose, bit complement, and bit reverse traffic patterns is determined by manipulating the bits of the source address [1]. In transpose traffic profile, for n × n mesh network, a source at location (i, j ) only sends a data packet to another node at location (n − 1− i, n − 1− j ). For source node with bit address {b3, b2, b1, b0} the traffic is sent to destination {b0, b1, b2, b3} and {−b3, −b2, −b1, −b0} for bit reverse and bit complement traffic patterns, respectively. Romm (oblivious routing algorithm) randomly picks an IM node located between the source and destination nodes to avoid congestion. Also, partially adaptive routing algorithm such as OE turn model algorithm prohibits the east to north and east to south (north to west and south to west) turns at any router located in an even (odd) column. In addition, (dubbed from DyAD switching), judiciously switches between deter-ministic and adaptive routing based on network congestion condition. Finally, MAXY is considered as an adaptive routing algorithm based on congestion for making its decision. OE, Romm, MAXY, and DyAD are tested with/without applying PV on mesh topology. PV has various impact on performance of the routing algorithm as All Rights Reserved © 2016 IJARMATE
77
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016
performance than the other routing algorithms under uniform traffic pattern. As reported in Table V, the saturation throughput under uniform traffic pattern overruns that of Romm, MAXY, OE, and
Fig. 8. Average message delay for PDCR and different routing algorithms with PV under (a) uniform, (b) transpose, (c) bit complement, and (d) bit reverse traffic patterns.
Fig. 9. Average message delay variation for PDCR and different routing algorithms with PV under various traffic patterns
Patterns. In addition, the PV has high impact on some algorithms such as OE which saturates at lower injection rate with 31% reduction relative to the nominal criteria under uniform traffic pattern. Moreover, the average of saturation throughput is also calculated. Romm is the most algorithms that is affected by the PV. Romm saturates with 29% reduction in the injection rate relative to nominal under different traffic patterns, as reported in Table III. The average message delay is determined for all routing algorithms. The average message delay of different routing algorithms is determined at IRNT as described. The impact of the PV can differently affect the average message delay for diverse routing algorithms, as listed in Table IV. Due to increase in the PV, the average message delay can increase by 28%–140% under various traffic patterns. On the average, OE has the highest average message delay increase of 90% relative to nominal for traffic patterns. In all cases, PV has a significant impact on the performance of the routing algorithm. Consequently, routing algorithm should have information about the DPV of the routers and interconnect to avoid the negative effects on saturation throughput and average message delay. The novel routing algorithm based on DPV and congestion is simulated under the same conditions and compared with the other routing algorithms in Section V-B. B. Simulation Results of PDCR The performance of PDCR compared with different routing algorithms with PV under various traffic patterns such as uniform, transpose, bit complement and bit reverse is shown in Fig. 8(a)–(d), respectively. As shown in different schemes of Fig. 8, PDCR outperforms the other routing algorithms and achieves an improvement in both the average message delay and the saturation point. The saturation throughput of PDCR under various traffic patterns is listed in Table V. Moreover, the percent improvement for PDCR compared with the other routing algorithms with PV is also reported. PDCR has better
Table:V Improvement in Saturation Throughput with PDRC
DyAD by 40%, 20%, 60%, and 30%, respectively. In transpose traffic pattern, PDCR continues to perform better than other routing algorithms expect for MAXY. MAXY performs better than PDCR in transpose traffic pattern by 9% in saturation throughput. MAXY depends on the minimum distance and congestion to reach the destination. Transpose concentrates the load on individual source–destination pairs. Transpose traffic, according to its traffic formula, targets to select the destination on the diagonal of the source node. Then, the absolute differences between source and the destination on the X and Y coordinates are the same. This status is more suitable with MAXY routing algorithm. Because, MAXY routing algorithm depends on congestion to select the next node in the previous status whether that was chosen in X - direction or Y -direction as a result of congestion arbitration. Consequently, the next step of MAXY algorithm is always toward the other direction to reduce the absolute difference of the other coordinate. In that case, MAXY algorithm tends to create zigzag path between each pair of source and destination nodes. Therefore, MAXY can reach its destination readily with minimum average message delay. Unlike PDCR, MAXY has a considerable instability issues with other traffic patterns such as the bit complement pattern. For the bit complement traffic pattern, PDCR outperforms the other routing strategies. The saturation throughput increases between 12.5% and 25%. Furthermore, PDCR has a
All Rights Reserved © 2016 IJARMATE
78
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016 Table VI Improvement in Average Message Delay with PDRC
higher saturation throughput than other adaptive routing schemes under bit reverse traffic pattern. The saturation throughput increases by 41%, 16%, 83%, and 66% compared with Romm, MAXY, OE, and DyAD, respectively. Moreover, PDCR has lower average message delay under various traffic patterns. As reported in Table VI, the average message delay of PDCR is determined at IRNT under various traffic patterns. The reduction in average message delay is reported. For the uniform traffic, PDCR reduces the average message delay by up to 31% compared with OE. The improvement in average message delay with PDCR under transpose traffic pattern versus OE is 47%. For the bit complement traffic pattern, under non-saturated traffic conditions, PDCR gives an improvement ranging from 6% to 39% in average message delay. PDCR approach has a lower average message delay under bit reverse traffic pattern, with an improvement of 28.3% compared with OE. On the average, PDCR reduces the average message delay between 12% and 32% compared with other approaches. On the other hand, the variation of average message delay for different routing algorithms and PDCR under various traffic patterns is shown in Fig. 9. OE algorithm has higher AMDvar since it depends on deterministic minimal paths between the source–destination pairs which increase the average message delay variation. As mentioned in previous consequence of implementing PDCR based on DPV and congestion, PDCR outperforms the other algorithms under all traffic patterns expect MAXY under transpose traffic as mentioned above. As shown in Fig. 9, the variation of average message delay for PDCR under various traffic patterns is almost less than or equal 5%. VI. CONCLUSION Delay variation in logic gates and interconnect is produced as a result of PV which impacts NoC design. The delay variation is a major reason to deteriorate the performance of the routing algorithms. PV decreases the saturation throughput and increases the average message delay relative to nominal. This paper presents the first study of the influence of the PV on different routing algorithm. Due to the PV, different routing algorithms can saturate at lower injection rate relative to the nominal under various traffic patterns. The saturation throughput of different routing algorithms
decreases with PV between 15% and 31% under uniform traffic, 14% and 30% under transpose traffic, 22% and 28% under bit complement traffic, and 14% and 29% under bit reverse traffic. In addition, the average message delay of different routing algorithms can increase with PV between 50% and 91% for uniform traffic, 29% and 77% for transpose traffic, 77% and 140% for bit complement traffic, and 34% and 79% for bit reverse traffic pattern. On the average, PV decreases saturation throughput by up to 29%, 20%, 24%, and 18% for Romm, MAXY, OE, and DyAD, respectively, compared with the nominal values. Moreover, the average message delay is increased compared with the nominal characteristics by up to 48% for Romm, 63% for MAXY, 90% for OE, and 84% for DyAD. A novel routing algorithm (PDCR) is implemented based on DPV and congestion. PDCR is able to enhance the saturation throughput by up to 44%, 13%, 82%, and 54% compared with Romm, MAXY, OR, and DyAD, respectively. In addition, PDCR has the ability to reduce the average message delay by up to 24%, 13%, 32%, and 15% compared with Romm, MAXY, OE, and DyAD, respectively. PDCR routing algorithm is adaptive, low cost and scalable for asynchronous NoC design. REFERENCES [1]
International Technology Roadmap for semiconductors.[online].Available: http://www.itrs.net/Links/2013ITRS/Home2013.htm, accessed 2013. [2] D. Boning and S. Nassif, “Models of process variations in device and interconnect,” in Design of High-Performance Microprocessor Circuits, A. Chandrakasan, Ed. IEEE Press, 2000. [3] Springer-Verlag, 2010, pp. 153–171. R. Garg and S. P. Khatri, [4] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, “Perfor-mance evaluation and design trade-offs for network-on-chip interconnect architectures,” IEEE Trans. Comput., vol. 54, no. 8, pp. 1025–1040, Aug. 2005. [5] M. Palesi, R. Holsmark, S. Kumar, and V. Catania, “Application specific routing algorithms for networks on chip,” IEEE Trans. Parallel Distrib. Syst., vol. 20, no. 3, pp. 316–330, Mar. 2009 [6] C. Nicopoulos et al., “On the effects of process variation in network-on-chip architectures,” IEEE Trans. Dependable Secure Comput., vol. 7, no. 3, pp. 240–254, Aug. 2010. [7] Y. Nakata, H. Kawaguchi, and M. Yoshimoto, “A process-variation-adaptive network-on-chip with variable-cycle routers and variable-cycle pipeline adaptive routing,” IEICE Trans. Electron., vol. E95.C, no. 4,P.P 523-533. [8] Y. Markovsky, Y. Patel, and J. Wawrzynek, “Using adaptive routing to compensate for performance heterogeneity,” in Proc. ACM/IEEE Int. Symp. Netw.-Chip, May 2009, pp. 12–21. [9] A. Sharifi and M. Kandemir, “Process variation-aware routing in NoC based multicores,” in Proc. ACM/EDAC/IEEE Design Autom. Conf., Jun. 2011, pp. 924–929 [10] R. Ezz-Eldin, M. A. El-Moursy, and H. F. A. Hamed, “Asynchronous high throughput NoC under high process variation,” in Proc. IEEE 20th Int. Conf. Electron., Circuits, Syst. (ICECS), Dec. 2013, pp. 361–364. [11] Christo Ananth, M.Danya Priyadharshini, “A Secure Hash Message Authentication Code to avoid Certificate Revocation list Checking in Vehicular Adhoc networks”, International Journal of Applied Engineering Research (IJAER), Volume 10, Special Issue 2, 2015,(1250-1254) [12] W. Dally and B. Towles, Principles and Practices of Interconnection Networks. San Mateo, CA, USA: Morgan Kaufmann, 2004
All Rights Reserved © 2016 IJARMATE
79
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. 2, Issue 3, March 2016 [13] N. Rameshan, A. Biyani, M. Gaur, V. Laxmi, and M. Ahmed, “Qos aware minimally adaptive XY routing for NoC,” in Proc. Int. Conf. Adv. Comput. Commun., 2009, pp. 1–3. [14] J. Hu and R. Marculescu, “DyAD—Smart routing for networks-on-chip,” in Proc. 41st Annu. Design Autom. Conf., Jul. 2004, pp. 260–263. [15] R. Ezz-Eldin, M. A. El-Moursy, and H. F. A. Hamed, “Novel routing algorithm for minimum on delay with process variation and conges-tion in asynchronous NoC,” in Proc. IEEE Int. Conf. High Perform. Commun., Aug. 2015. [16] S. Ranka et al., “Contemporary computing,” in Communications in Computer and Information Science, vol. 40. Berlin, Germany: Springer-Verlag, Aug. 2009. [17] T. Nesson and S. L. Johnsson, “ROMM routing on mesh and torus networks,” in Proc. ACM Symp. Parallel Algorithms Archit., 1995.pp:275-287. [18] L. G. Valiant, “A scheme for fast parallel communication,” SIAM J. Comput., vol. 11, no. 2, pp. 350–361, 1982. [19] M. Dehyadgari, M. Nickray, A. Afzali-Kusha, and Z. Navabi, “Evaluation of pseudo adaptive XY routing using an object oriented model for NOC,” in Proc. 17th Int. Conf. Microelectron., Dec. 2005.P.P 13-15. [20] G. M. Chiu, “The odd-even turn model for adaptive routing,” IEEE Trans. Parallel Distrib. Syst., vol. 11, no. 7, pp. 729–738, Jul. 2000. [21] E.-J. Chang, H.-K. Hsin, S.-Y. Lin, and A.-Y. Wu, “Path-congestion-aware adaptive routing with a contention prediction scheme for network-on-chip systems,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 33, no. 1, pp. 113–126, Jan. 2014. [22] G. Ascia, V. Catania, M. Palesi, and D. Patti, “Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip,” IEEE Trans. Comput., vol. 57, no. 6, pp. 809–820, Jun. 2008. [23] Perform. Anal. Syst. Softw., Apr. 2013, pp. 86–96. [24] A.-Y. Wu et al., “Regional ACO-based cascaded adaptive routing for load balancing in mesh-based network-on-chip systems,” IEEE Trans. Comput., Mar. 2014. [25] H.-K. Hsin, E.-J. Chang, and A.-Y. Wu, “Implementation of ACO-based selection with backward-ant mechanism for adaptive routing in network-on-chip systems,” IEEE Embedded Syst. Lett., vol. 5, no. 3, pp. 46–49, Sep. 2013. [26] L. Shang, L.-S. Peh, and N. K. Jha, “PowerHerd: A distributed scheme for dynamically satisfying peak-power constraints in interconnection networks,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 25, no. 1, pp. 92–110, Jan. 2006. [27] Predictive Technology Model. [Online]. Available: http://www.eas. asu.edu/~ptm, accessed 2011. [28] The Nangate Open Cell Library. [Online]. Available: https://www.si2. org/openeda.si2.org/projects/nangatelib/, accessed 2008. [29] R. Ezz-Eldin, M. A. El-Moursy, and H. F. A. Hamed, “High throughput asynchronous NoC design under high process variation,” Integr., VLSI J., vol. 49, pp. 1–13, Mar. 2015. [30] G. Chen et al., “Predictions of CMOS compatible on-chip optical interconnect,” Integr., VLSI J., vol. 40, no. 4, pp. 434–446, Jul. 2007. [31] Y. Ben-Itzhak, E. Zahavi, I. Cidon, and A. Kolodny, “HNOCS: Modular open-source simulator for heterogeneous NoCs,” in Proc. Int. Conf. Embedded Comput. Syst., Jul. 2012, pp. 51–57. [32] A. Varga et al., “The OMNeT++ discrete event simulation system,” in Proc. Eur. Simulation Multiconf., Jun. 2001, pp. 319–324.
All Rights Reserved © 2016 IJARMATE
AUTHORS BIOGRAPHY
A.Benila Arockia Selvi received the B.E degree in Electronics & Communication Engineering from Vandayar Engineering College, Tamilnadu,India in 2014 and the Master’s Degree in VLSI Design from Vandayar Engineering College, Tamilnadu, India in 2016. Her area of interest is rely on NoC Design
R.Sugapriya received B.E degree in Electronics & Communication Engineering from Annai Terasa College of Engineering, Tamilnadu, India in 2011 and the Master’s Degree in Applied Electronics from Jeyaram College of Engineering and Technology, Tamilnadu, India in 2013. She is currently working as an Assistant Professor in the department of Electronics & Communication Engineering in Vandayar Engineering College, Tamilnadu, India. Her Area of interest is on Rely on Embedded.
80