Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor wit

更新时间:2023-05-19 15:58:52 阅读: 评论:0

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology
EE382C: Embedded Software Systems
Literature Survey
巧克力淋面蛋糕David Brunke
Young Cho
Applied Rearch Laboratories:
The University of Texas at Austin
Abstract
Real-time digital sonar beamforming is a computationally intensive algorithm that has been implemented in the past primarily in custom embedded hardware.  With recent advancements in native signal processing extensions for general-purpo processors, it is possible to implement sonar beamfo
rming using off-the-shelf hardware.  Current implementation on a Sun UltraSPARC multiprocessor suggests a very promising platform for transitioning such applications to general-purpo systems.  This paper propos to continue the previous implementation by modifying the beamforming kernels to u AltiVec, a new native signal processing extension from PowerPC.  AltiVec is a Single Instruction Multiple Data (SIMD) architecture capable of executing up to four 32-bit floating-point multiply and accumulate (MAC) operations per instruction.  The kernels utilizing such powerful features of the AltiVec are expected to perform with significant speedup over previous implementations on other general-purpo processors.
1.0Introduction
In last few decades, the Digital Signal Processor (DSP) market has grown substantially to meet the demand of the high performance signal processing community. Such growth in the signal processing market has allowed the general computing communities to incorporate the technology into their applications.  As a result, general-purpo processors began to embed signal processing architectures into their own processing cores to provide economic system solutions for computationally intensive applications [1].
Real-time digital sonar beamforming is one such application once only feasible on custom hardware that can now be successfully implemented on commercial, off-the-shelf computers with native signal processing extensions.  One recent implementation us a commercial general-purpo 8-way symmetric multiprocessor (SMP) workstation from Sun Microsystems [2].  The beamforming kernels exploit the inherent data parallelism by using Single Instruction Multiple Data (SIMD) arithmetic operations available in the Visual Instruction Set (VIS) extensions to the UltraSPARC processor.  By using a sixteen 333-MHz UltraSPARC Enterpri rver, a real-time beamformer delivering 4 GFLOPS on 160 MB/s of streaming data was realized.
The goal of our rearch is to further explore the effectiveness of the embedded extensions by optimizing and asssing the performance of beamforming kernels using AltiVec from PowerPC, which is one of the newest native signal processing instruction ts.  We also plan to analyze the results obtained from the two embedded signal processing extensions to asss the architectural advantages and disadvantages.
Sensor
Array  Interpolate •
•τ1τN
∑Beam
Time Delay
北京明天幼稚集团Interpolate
•••Fig. 1:  Digital Interpolation Beamformer
2.0Beamforming Approach
Conventional sonar beamformers u the signals collected from nsor elements to determine from what direction the sonar signal returns after deflecting off of an object.This conventional horizontal time-domain beamforming algorithm consists of appropriately delaying and summing the weighted outputs of an array of nsor elements.The weighting of the nsor outputs helps to improve the spatial respon [3].
9月13日虫虫吉他谱
The problem with this conventional approach is that it requires a sample rate that is veral times the Nyquist rate for adequate time delay resolution.  This is undesirable becau it requires additional bandwidth for the overall system. A practical solution employs digital interpolation with Finite Impul Respon (FIR) interpolation filters to achieve a satisfactory time delay resolution [3].  This solution is shown in Fig. 1.  Analog data is sampled at a given sampling interval, and then followed by interpolation, time delay, and summation.
3.0Native Signal Processing Extensions
Many high performance embedded applications are programmed on systems with a few general-purpo processors as system controllers with a larger number (possibly hundreds) of specialized DSPs to perform scientific calculations.  However, this type of system has many disadvantages due to different programming platforms and unequal
performance advances in the two parate technologies.  Therefore, many manufacturers of high performance general-purpo processors are integrating ts of native signal processing instructions onto their processor cores to offer solutions requiring fewer processors.
3.1UltraSPARC VIS
The Visual Instruction Set (VIS) is a t of signal processing instructions bad on the SIMD architecture.  The floating-point data of the UltraSPARC processor core is enhanced with graphics units to support VIS.  Although graphics units share the register file resource with the floating-point units, they are distinguished from floating-point units by performing fixed-point vector arithmetic. VIS provides over 50 new CPU instructions such as format conversions, arithmetic and logic instructions, address handling, memory access instructions, and veral others.  Equipped with 64-bit registers, they can be partitioned with 2, 4, or 8 data words, and can perform operations on multiple words with a single instruction.  Thus, VIS can achieve up to four times speedup with 8-bit by 16-bit fixed-point multiplication using the SIMD arithmetic logic [4].
3.2PowerPC AltiVec
AltiVec is the native signal processing extension for PowerPC processors.    This short vector SIMD architecture is embedded into a general purpo RISC processor core to add powerful signal processing capabilities.  Unlike the UltraSPARC VIS, AltiVec vector logic is not an enhancement to an existing arithmetic unit.  It is ctioned into a parate sub-unit of the processor as are the floating-point and integer units.  The vector unit adds more than 150 new SIMD instructions to the PowerPC instruction t for advanced signal processing programming.
128 bit register
InB花菜
128 bit register Out
128 bit register InA
128 bit register InC 32 by 128 bit Vector Register File
Vector Logic Unit
Figure 2: Block Diagram of PowerPC AltiVec Unit Architecture
The vector unit has its own 32 by 128-bit wide register file that allows execution of up to four 32-bit floating point MAC operations per instruction [5].  AltiVec is potentially a much more powerful signal processing extension than VIS due to its greater logic resources.
4.0Current Implementation
The current implementation that we are going to build upon adds vertical beamforming to the approa
ch in [3] to enable projection of a 3-D underwater image [2].  In addition, the interpolation for the horizontal beamforming kernel is simplified by using a two-point FIR filter for the digital interpolation. A two-point FIR filter is possible without a critical loss of resolution becau the sampling rate is t to two times the Nyquist sampling rate.  The overall system description is shown in Fig. 3.Vertical Beamformer 40 MB/s 32 MB/s V 32 MB/s
船怎么读
32 MB/s Sensor
Elements H H H Horizontal Beamformers
contracting24 MB/s 24 MB/s 24 MB/s Three Staves
元素周期律教案
Fan 1 Beam Fan 2 Beam Fan 3 Beam Figure 3: Block Diagram of the 3-D sonar beamformer
40 MB/s 40 MB/s
40 MB/s
4.1Vertical Beamforming
The vertical beamformer computes three ts of data, each of which is nt to a horizontal beamformer.  Thus, three dot products are computed with each column of 10 vertical transducers (staves) and three coefficient vectors; each contributes to the vertical resolution.  In the current implementation, a given sample for 80 horizontal elements with 10 vertical transducers requires 2400 MACs.  After the vertical beamforming, the output needs to be in floating-point format for the horizontal beamformer.  Therefore, integer-to-floating-point type conversion needs to be performed on the result [2].
4.2Horizontal Beamforming
In the current implementation, the time delays can be determined by projecting the mi-circular array of nsor elements onto an axis perpendicular to the pointing direction for each beam.  The time delay is then defined as the distance from each element to the perpendicular axis divided by the speed of sound.  Not all of the elements are ud to compute the desired beam becau the respon in their directions is relatively small, causing unnecessary calculation [6].
A two-point FIR filter is ud to perform the time delay interpolation to achieve the desired horizontal steering delay resolution.  A total of 61 beams are formed from 80 elements, with 50 horizontal elem
ents ud per beam.  Each beam sample, b i[k], is made up of a two-point FIR filter for each nsor element x n, which weights, delays, and adds two time samples. The weightings, w kn0and w kn1, are determined by the delay fraction and beamformer shading.  The delay, τin, for each beam i and nsor n, is an integer sample delay.  Thus, the beam sample output b i[k]is given by the following equation, where each

本文发布于:2023-05-19 15:58:52,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/696276.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:教案   蛋糕   集团   周期律   淋面   北京   元素
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图