Advances in Computational Sciences and Technology
ISSN 0973-6107 Volume 3 Number 2 (2010) pp. 223–235
© Rearch India Publications
/acst.htm
Run-Time Reconfigurable Pipelined Modified
Baugh-Wooley Multipliers经典脑筋急转弯
阿拉贡王国Aswathy Sudhakar1 and D. Gokila2
理想的英语
1,2VLSI Design Group, Department of Electronics and Communication Engineering Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
易经六十四卦详解Coimbatore-641105, India
E-mal:, 2d_gokila@cb.amrita.edu
Abstract
FPGA implementation of multipliers caus area overhead problem since they are implemented parately with parate functionalities even if they are to perform the same calculation. This also caus incread power dissipation.
Also FPGA device sizes can only increa as fast as the silicon process technology will allow it to. The multiplier redundancy can be removed by Run-Time Reconfiguration which is the very best solution for reducing FPGA area thereby reducing the power consumption. In this paper, the basic structure of the Baugh-Wooley (BW) multiplier, which is proved to be the best suitable choice for multiplier reconfiguration is modified for optimizing constraints like speed, area, accuracy and power consumption. The reconfiguration on Baugh-Wooley multiplier is explained with respect to the optimum reconfiguration constraints. Various techniques like pipelining, true rounding, 2-D pipeline gating are effectively incorporated in the modified BW multiplier design to optimize the constraints. The large speed overhead caud by reconfiguration limits it from practical applications which are overcome by using pipeline registers. Effective u of pipeline registers is made so as to control the speed overhead problem occurring through the reconfiguration of multiplier functionalities. The results are compared between non-reconfigurable, non-reconfigurable pipelined and reconfigurable 8-stage pipelined modified BW multiplier architectures for n=8, 16, 32, and 64. It is obtained that the
Partial Run-Time Reconfiguration reduces the area by 31-40%. Pipelining improves the speed by 4-5 times whereas run-time reconfiguration results in 2-5% reduction in overall power dissipation.
224 Aswathy Sudhakar and D. Gokila Keywords: Run-Time Reconfiguration, Baugh-Wooley Multiplier, Fixed-
Width Multiplier, Reconfiguration Speed, Sub-Word Multiplication, 2-D Pipeline Gating and Pipeline Register.
Introduction
Digital designs can be implemented on a configurable logic platform provided by FPGAs. The designs bad on FPGAs are beginning to dominate in almost all fields due to various advantages like simplicity, portability, high performance and low power dissipation and as the complexity relentlessly increas, FPGAs are beginning to be ud in application domains requiring intensive arithmetic operations, such as signal processing. FPGAs are now-a-days technologically evolved and complex that they are able to host an entire system-on-chip. Recent advances in VLSI technology provide very complex digital-signal-processing algorithms to be cost effectively implemented. But at the same time, the design complexity to achieve high-speed performance, redu
ced area and power efficiency becomes a major challenge. FPGAs are very costly in terms of area occupation and more importantly in terms of power consumption. Also a multi-million gate FPGA era is expected. Power consumption in a chip increas due to increa in the number of gates and also due to the implementation of redundant modules rving the same functionality repeatedly. It also increas the heat loss which results in device damage. A very effective solution consists of reconfiguration of FPGAs.
世界最大地震Reconfiguration of multipliers in an FPGA reduces the power dissipation as well as area overhead mainly by limiting the number of redundant modules. There are two levels of reconfiguration architecture [1]: design-time (static configuration of some architectural parameters) and at run-time (dynamic reconfiguration of the overall functionality or algorithm implemented by the array). Dynamic reconfiguration is heavily ud to implement a given computation, and it is utilized to exploit time multiplexing of the FPGA resources and ensure software-oriented flexibility of the cells. Dynamic or run-time reconfiguration (RTR) [1] is defined as the ability to change or modify the functional configuration of a device on the fly. For practically any DSP system, multiplication is a vital function. The reconfiguration of multiplier blocks is thus taken into consideration to improve the reconfiguration constraints. The various multiplier types bad on architectures [2, 3, 4], introduced n
ecessitates the need for an efficient multiplier lection for a particular function. From the comparisons, the Baugh-Wooley Multiplier is the power and area efficient one and is the best suited choice for reconfiguration due to high-speed and regularity in the architecture. Reconfiguration of BW multipliers [5] keeping n+w most significant columns [6, 10] is performed for reducing the power dissipation. But this generates accuracy reduction in low-resolution fixed-width multiplication caud by the redundant bits in the sign-extension circuit and limits its reconfiguration efficiency. In [7], the prented work reduced the accuracy degradation in fixed-width multipliers by truncation with rounding technique which has accuracy almost equal to the rounding technique with a little circuit complexity. With the 2-D pipeline gating [8] technique in the Baugh-Wooley reconfigurable multiplier, the low-resolution errors
Run-Time Reconfigurable Pipelined Modified 225 are minimized as it improves the regularity of the computing architecture. But this has area overhead and incread redundancy and are minimized in [9] by the sub-word multiplication technique. Here the limited pipelining structures fail to achieve high speed and 2-D pipeline gating caus timing mismatch which leads to incread latency. In [11], a BW multiplier which included techniques to save the power consumption with truncated multiplication. This has poor accuracy and incread redundancy. In [12], the accuracy degrades du
e to variation in the co-ordination of processing of bits in the input samples. So for almost all the techniques discusd, the implementations caus the reconfiguration overhead which results in a huge decrea in speed due to incread latency and hence making reconfiguration practically difficult.
In this paper, the best suited multiplier for reconfiguration, the Baugh-Wooley multiplier architecture is modified to make it suitable for reconfiguration among three arithmetic functions. The three configuration modes (CM) include: 1) n x n fixed-width multiplier 2) two n/2 x n/2 fixed-width multipliers 3) n/2 x n/2 full-precision multiplier. The constraints taken into consideration for optimization are Power Dissipation, Area, Speed and Accuracy. Various methods like pipelining, reconfiguration, true rounding, 2-D pipelining etc are incorporated in the modified BW architecture to optimize the performance constraints and to overcome the huge reconfiguration overhead in speed. Also a comparison is made between non-reconfigurable, non-reconfigurable pipelined and reconfigurable pipelined modified BW multipliers for n=8, 16, 32, and 64.
The rest of the paper is organized as follows. The modified BW Multiplier design is explained in Section 2. Section 3 gives the design of reconfigurable modified BW multiplier with optimum constraints. In Section 4, implementation results in terms of power, area and speed and comparisons
are prented. Section 5 concludes the paper. Modified Baugh Wooley (BW) Multiplier Design
Pipelined modified BW multiplier design for n=8 is shown in Figure 1(a). The techniques added with the conventional BW multiplier are pipelining, truncation with rounding and 2-D pipelining. The pipelining effect in multiplication is shown in Figure 1(b). The registers are placed in between various multiplication stages according to the functionality required. The registers are termed as pipeline registers. The same technique is ud for modifying the basic multiplier design to improve the speed. As in Figure 1(a), the partial product bits created are stored in the pipeline registers. After gregation of partial products pipeline registers follows the partial product block. This avoids the delay and the latency caud by the two’s complement calculation. When pipeline registers are ud here in the design, the blocks following pipeline register access input from the register so that the preceding blocks can operate on successive data. This considerably improves the processing time of a large amount of successive input data. Pipelining ensures the partial product bits to be ready for summation simultaneously so that the errors due to latency can be removed. Latency caus the error to be propagated to the next stages of calculations. Pipelined registers are ud to decrea the delay, thereby improving the speed. Here
226 Aswathy Sudhakar and D. Gokila the multiplier module itlf is pipelined along with the pipelined
reconfigurable structure.
True rounding or Truncation with Rounding of the partial products [8] improves the accuracy of the final output also with reduced power dissipation. True Rounding requires adding a ‘1’ at the n th least significant bit position of the product which helps to control the possible errors while truncation is performed. After gregation, the negative partial products are inverted and added to the positive partial products. The inversion is done by 2-D pipelining [7] to maintain the regularity in the design. The addition of partial products thus formed gives the final product. This design has considerable increa in speed due to pipelining and accuracy improvement due to true rounding.
(a) (b) Figure 1. (a)Efficient fixed-width modified BW multiplier for n=8; (b) Pipelining in Multiplication.
Design of Reconfigurable Modified BW Multiplier for Optimum Constraints
The performance constraints taken into consideration here are: 1) Power Dissipation, 2) Area, and 3) Speed. The partial product array diagram for an nxn modified BW multiplier is shown in Figure 2. Figure 2(a) shows the generalized product diagram and 2(b) shows that for n=8. Figure 2(c) shows the partial products remaining after truncation for design. Here the LSBs are masked by truncation. The ‘w’ bit decides the length of the output product. In this design ‘w=1’ is taken.
Run-Time Reconfigurable Pipelined Modified 227
(a)
(b) (c)
Figure 2: (a) Partial Product Array Diagram for an n-bit modified BW Multiplier, (b) Partial Product Array Diagram for an 8-bit modified BW Multiplier and (c) Partial products after truncation.
Figure 3: Design of reconfigurable modules (MUL1, MUL2 and MUL3).新年愿望怎么写>彩虹姑娘
The modules are defined by the sub-word multiplication technique [6] as shown in Figure 4. Here X [x7 x6 ……x0] and Y [y7 y6 ……y0 ] are the two 8-bit inputs. The sub-words are X1[x7 x6 x5 x4] , X0[x3 x2 x1 x0], Y1[y7 y6 y5 y4] and Y0[x3 x2 x1 x0]. It is clear from the design that MUL1 and MUL2 modules perform the fixed-width 4x4 multiplication and MUL module performs the full-precision 4x4 multiplication. This can be ud for the reconfiguration between the functionalities.
>结婚电子请柬