首页 > 美文阅读

FiPRe An Implementation Model to Enable Self-Reconfigurable Applications

更新时间:2023-05-10 15:55:36 阅读：评论：0

FiPRe: An Implementation Model to Enable

Self-Reconfigurable Applications

Leandro Möller, Ney Calazans, Fernando Moraes, Eduardo Brião,

Ewerson Carvalho, Daniel Camozzato

Pontifícia Universidade Católica do Rio Grande do Sul (FACIN-PUCRS),

Av. Ipiranga, 6681 - P 30/Bl 4 - 90619-900 - Porto Alegre – RS– BRASIL

{moller,calazans,moraes,briao,ecarvalho,

camozzato}@inf.pucrs.br

Abstract. ASIPs and reconfigurable processors are architectural choices to

extend the capabilities of a given processor. ASIPs suffer from fixed hardware

after design, while ASIPs and reconfigurable processors suffer from the lack of

a pre-established instruction t, making them difficult to program. As

intermediate choice, reconfigurable coprocessors systems (RCSs) contain

dedicated hardware (coprocessors) coupled to a standard processor core to

accelerate specific tasks, allowing inrting or substituting hardware

functionalities at execution time. This paper propos a generic model for

RCSs, targeted to reconfigurable devices with lf-reconfiguration capabilities.

A proof-of-concept ca study is prented as well.

1. Introduction

A single processor may meet the requirements of veral embedded system scenarios if it is somehow parameterizable. Application-specific instruction t processors (ASIPs) and reconfigurable instruction t processors (RISPs) [1] are two opposite forms of implementing processors with regard to runtime parameterization trade-offs. ASIPs provide flexibility and performa

nce at the cost of extra silicon area for each new function directly supported in hardware. If the application requires a new specific functionality, the ASIP is redesigned. RISPs are processors where some or all instructions are implemented as dedicated hardware and loaded on demand, according to the software execution flow. Here, the highest degree of flexibility is achieved. However, the lack of a pre-established fixed instruction t makes it harder to generate the object code for new applications. This occurs becau each new function must be supported at the same time by the dedicated hardware and the compiler.

An intermediate solution, named reconfigurable coprocessors systems (RCSs) is addresd in this paper. As ASIPs and RISPs, RCSs contain dedicated hardware (coprocessors) to accelerate specific tasks. However, the are not fixed at design time as in ASIPs. It is possible to inrt or substitute hardware functionalities at execution time in RISPs and RCSs without having to redesign the processor. Contrary to what happens in RISPs, RCSs contain a standard processor core with a fixed instruction t, enabling the u of standard compilers.

The processor and the parameterizable parts are looly coupled in RCSs and tightly coupled in RISPs and ASIPs. Additionally, ASIPs and RISPs are inherently quential approaches, while RCSs may benefit from the parallel execution of the processor software and dedicated computations in eac

h coprocessor. Communication between the processor and the coprocessors can be achieved in this ca through the u interrupts.

A potential performance bottleneck faced by embedded applications in RISPs and RCSs is the latency to perform hardware reconfiguration, which can be orders of magnitude longer than the time to perform application atomic operations. To reduce or eliminate this problem, RISPs and RCSs assume the existence of an infrastructure to control the storage and the dynamic loading of hardware configurations, usually called a configuration controller [2].

Consider the current trend to increa the number of embedded processors in SoCs, leading to the concept of “a of processors” systems [3], and add to this the above From the, it is possible to justify the objective of this paper, which compris proposing a generic implementation model for RCSs called FiPRe, and introducing a ca study ud to evaluate the ideas behind the model.

2. The FiPRe Implementation Model and the R82R Ca Study

The FiPRe (Fi xed core P rocessor connected to Re configurable Coprocessors) model is conceived to allow lf-reconfigurable applications implemented as RCSs and can be understood from the ca study example in Fig. 1. First, there is a Fixed Region, reconfiguration actions. This region also cont

ains a configuration controller (CC), to manage the details of the reconfiguration process. The existence of external devices intended to provide input/output capabilities for the embedded system is also part of the model. Besides, a memory is needed to store coprocessor bitstreams, a block named Configuration Memory. Finally, the model assumes the existence of a Reconfigurable Region that contains a subt of configured coprocessors. This region prents data exchange and configuration interfaces to the rest of the system.

A fixed instruction t processor provides advantage in terms of code and hardware reu, becau neither the processor nor the compiler needs to be changed in the process of developing coprocessors to achieve performance and functionality goals.

The CC handles coprocessor lection and dismiss procedures produced by the processor. When lection is executed, the configuration memory is accesd and a coprocessor bitstream is nt to the configuration interface.

In order to evaluate the FiPRe model to implement RCSs embedded systems, an example ca study, named R82R was designed and implemented. The system was implemented in a single VirtexII device, with the exception of the Configuration Memory. The ca study employed a soft core

processor customized for the FiPRe model. The changes made in the original processor were to add specific instructions to support reconfiguration (Table 1), and a specific external interface to the

Host

Computer

Fig. 1. General structure of the R8NR system.

Fig. 1 displays the organization of the R82R system. The system is compod by three main modules: a host computer, providing an interface with the system ur; a configuration memory, containing all partial bitstreams ud during system execution; the FPGA, containing fixed and reconfigurable regions of the R82R system. The fixed part in the FPGA compris the R8R processor [4], its local memory, containing instructions and data, a system bus controlled by an arbiter, and peripherals (rial interface and the configuration controller). The rial interface peripheral provides capabilities for communication with the host computer (an RS232 rial interface). The CC is a specialized hardware, acting as a slave of the R8R and of the host computer, which fills the configuration memory before system execution starts.

The R8R processor was wrapped to provide communication with (i) the local memory; (ii) the system bus; (iii) the CC; (iv) the reconfigurable region. The interface to the reconfigurable areas compris three identical ts of signals interconnected through special components furnished by the FPGA vendor, called bus macros.

Table 1. Instructions added to the R8 processor in order to produce the R8R processor.

3. Results

The R8NR system has been prototyped and is operational in two versions, with one and two reconfigurable areas, respectively (R81R and R82R). A V2MB1000 board from Insight-Memec was employed in the prototyping process.

Fig. 2 shows the comparison between the number of operations executed and the time to execute hardware and software versions of three 16/32-bit arithmetic nodules: multiplication, division and square root. Note that for each module, the execution time grows linearly but with different slopes for software and hardware implementations. Also, the reconfiguration time adds a fixed latency (~10 ms) to the hardware implementations. The fixed latency is an approximation of the time measured to configure one FPGA area dedicated to hold one coprocessor.

The hardware reconfiguration latency, 10ms, is

dependent on the size of the reconfigurable area

partial bitstream and on the performance of the

CC module. This graph was plotted for a CC

working at 24MHz frequency and for

reconfigurable bitstreams of 46Kbytes,

corresponding to a reconfigurable coprocessor

with an area of roughly one tenth of the

employed million-gate device.

Fig. 2. Execution time versus the number of operations for three arithmetic modules, multiplication, division and square root, implemented in HW (hw suffix) and SW (sw suffix).

The break even point for each functionality determines when a given hardware implementation starts to be advantageous with regard to a plain software implementation, bad on the number of times this hardware is employed before it is reconfigured. From the graph, it can be en that the multiplier, division and square root coprocessors are advantageous starting from 750, 260 and 200 executions without an intervening reconfiguration step. Consider the application of a filter (e.g. edge

or smooth) over an image with 800x600 pixels. If only one operation is applied per pixel 480000 operations are executed, easily justifying the u of a hardware coprocessor. This simple experiment highlights how in practice it is possible to take advantage of RCSs, achieving performance gains, flexibility and system area savings.

The R82R ca study was synthesized using Leonardo Spectrum. The area report data for the fixed modules is prented in Table 2.

Table 2. Area report data for a XC2V1000 FPGA and for 0.35 µm ASIC CMOS tecnology.

Module ASIC Gates LUTs FFs%LUTs

R82R633110205559.96

Memory3139307366 2.99

Serial Interface5430616607 6.01

CC2790493218 4.81

Arbiter15727150.26

Total178472443176123.85

Configuration controllers (CC) found in the literature are mostly software implementations. The CC propod here was implemented in hardware, having a small area footprint (around 3,000 gates) and is expected to prent superior performance over software versions in terms of reconfiguration speed. Another important advantage to implement the CC in hardware is that the embedded processor is free to execute tasks in parallel during the reconfiguration process.

The Modular Design method [5] employed for generating partial bitstreams, limits

minimum size reconfigurable area contains 1280 LUTs (each column contains 320 LUTs). Nevertheless, the implemented coprocessors u in average 140 LUTs. Therefore, it is possible to implement much larger coprocessors in the areas. Examples are simple dedicated processors, FFT operators, and image filters.

4. Conclusions

The major contribution of the prent work is the FiPRe model for RCSs. An advantage of the model is the parallelism between processor and coprocessors, enabling the u of non-blocking operations.

Also, the compiler does not need to be changed when a new coprocessor is added. On the other hand, an incread latency in communication may be obrved, since the system parts are looly coupled.

Also, since RCSs are reconfigurable systems, they potentially reduce the final system cost, as the ur can employ smaller configurable devices, downloading partial bitstreams on demand. In addition, partial system reconfiguration makes it possible to benefit from a virtual hardware approach, in the same manner that prent day computers benefit from the u of virtual memory.

Application areas for RCSs are another crucial point. Unless sound applications are developed to show real advantage of using RCSs over conventional solutions, such as RISPs and ASIPs, RCSs will remain no more than an academic exerci on an interesting subject area. Ongoing work includes performance measurement of benchmarks and improvements on the configuration controller to reduce the time wasted during partial reconfiguration.

5. References

[1] F. Barat; R. Lauwereins. Reconfigurable instruction t processors: a survey. In:

Rapid System Prototyping (RSP´00), pp.168-173, 2000.

[2] D. Robinson; P. Lysaght. Modeling and Synthesis of Configuration Controllers for

Dynamically Reconfigurable Logic Systems using the DCS CAD Framework. In: 9th

Field-Programmable Logic and Applications (FPL’99), 1999.

[3]J. Henkel. Closing the SoC Design Gap. IEEE Computer, vol 36(9), pp. 119-121,

2003.

[4] F. Moraes and N. Calazans. R8 Processor Architecture and Organization

Specification and Design Guidelines. 2003. www.inf.pucrs.br/~gaph/Projects/

R8/public/R8_arq_spec_eng.pdf

[5]Xilinx, Inc. Two Flows for Partial Reconfiguration: Module Bad or Difference