首页 > 美文鉴赏

automated visual inspection

更新时间:2023-06-16 07:10:39 阅读：评论：0

A custom computing solution to

automated visual inspection of silicon wafers敖

Peter Athanas, Lynn Abbott, Mark Cherbaka, Bharadwaj Pudipeddi, Kevin Paar

Virginia Polytechnic Institute and State University

Department of Electrical Engineering

Blacksburg, Virginia 24061-0111

A BSTRACT

This paper illustrates through a specific example the utility and effectiveness of using the reconfigurable capabilities of an FPGA-bad custom computing platform as part of an industrial inspection system. The inspection task examined here is typical of many industrial inspection tasks which require the identification of veral different types of failures of products on a manufacturing asmbly line. Like many inspection tasks, a number of features must be examined-- each of which may require unique signal processing. A custom computing platform can provide the demanding signal

processing performance while maintaining the capability of rapidly reconfiguring for an assortment of tasks.

Keywords: custom computing machines, real-time image processing, industrial inspection

英雄联盟简介>一个厂一个人1. M OTIVATION

The failures that occur manifest themlves in a number of ways which, for the application discusd here, have unique visual properties, and require special image processing for each class of flaw. The computational requirements for this inspection are immen, and are too demanding for contemporary general-purpo machines. Various off-the-shelf image processing solutions do quite well in addressing a subt of the tasks; however, due to the wide variety of operations required, ready-made solutions with fixed bus and computing units fail to be general-purpo enough to address all of the image processing kernels. Custom computing machines feature programmable computing units and interconnection resources which are readily reprogrammed under software control. While a custom computing engine will never equal the performance of a full ASIC solution, a single platform can be rapidly reconfigured to a number of applications. Reconfiguring from one task to another does not require physical changes, but is accomplished by downloading a hardware pers

悴组词组onalization databa to the custom computing platform, which can be done in just a few conds.

The platform ud here to provide this capability is an experimental custom computing platform called 1,2. Custom computing platforms are emerging as a class of computing engine that can provide near application-specific computational performance, and can also be configured to accommodate a wide variety of tasks. With a custom computing platform, not only can the specific operations be custom designed (for function and size), but the data paths can also be customized for individual applications. is an attached processor featuring programmable processing elements and programmable communication paths. The system utilizes arrays of RAM-bad field-programmable gate arrays (FPGAs), crossbar networks, and distributed memory as a means of accomplishing the above goals. Even though was not designed specifically for image processing, this platform posss architectural properties that make it well suited for the computation and data transfer rates that are characteristic of this class of problems.

This paper illustrates, through the detection of a particular category of failures, the utility of using a custom computing platform. The category examined here is the identification of visual discontinuities on silicon wafer surfaces. The discontinuities typically manifest themlves as minute scratches, fissures, or fractures.

Figure 1: A summary of common wafer fabrication flaws.

2. S ILICON W AFER I NSPECTION

The inspection task examined here relates to the manufacture of silicon wafers which are ud as a

commodity for integrated circuit foundries. One factor that influences the yield of an integrated circuit is the quality of the ba wafer prior to processing. Flaws in the ba wafer undoubtedly lead to component failures. Wafer flaws are minute. To an untrained eye, even under a magnifying glass, a bad wafer is indistinguishable from a good wafer. Inspection of wafers by hand is labor intensive, subject to fatigue, and is highly prone to misclassification. An automated inspection system is immune to fatigue, and provides consistency in performance.

A wafer that is ready for personalization at an IC foundry is a typically a six inch circular disk approximately 1mm thick, and rembles a hand mirror in that one side (sometimes both) is highly polished and reflective. The process of creating a wafer from molten silicon up to this point requires many steps -- all of which can introduce flaws into the wafer structure. In a typical Czochraiski process3, a large silicon ingot is grown from a ed. The ingot is shaped and sliced. The wafer slices pass through veral grinding, etching, and polishing steps prior to their first inspection. Even though a ba wafer at this point is far less complex than a wafer having completed fabrication at an IC foundry, the failure modes are numerous, and the ways the failure modes manifest themlves in the wafers are even more complex. Some of the faults that are examined in the inspection process (which are the target for the automated inspection process) are summarized in Table 1.

Each of the flaws are extremely subtle; nonetheless, a subt of the flaws is exaggerated for illustration in Figure 1. The inspection process is further challenged becau of the nature of the silicon wafers. As mentioned, the wafers are finely polished, and reflect visible light. Becau of this, the illumination of wafers-under-test must be done with care to eliminate perturbations introduced by the lighting sources. The wafers considered here range from 4 to 6 inches in diameter.开网店卖什么

Table 1: A list of the flaws that may occur in the wafer manufacturing

process.

2.1 Project parameters

Becau the flaws are subtle, meticulous inspection is required. Extremely high resolution imagery and highly controlled lighting is required to catch many of the flaws which are on the order of a micron in size. In an automated inspection system, a t of monochrome cameras, each capable of providing images of 512×512 resolution can be ud. For appropriate sampling, the cameras have a field-of-view of approximately one square inch. The wafer can be rotated and strobed under the cameras in order to cover the entire wafer surface. The camera data, when digitized is prented to the platform. will be dynamically programmed to perform the necessary image processing functions to detect the above listed faults. The inspection results are then conveyed back to a host computer which will make the final judgment on the fate of the wafer. The inspection time for each wafer is limited to be under eight conds. A large number of computations on many quences of images must be performed within this sub-eight-cond window.

3. D ISCONTINUITY D ETECTION

In this ction, a strategy is prented to detect straight-line discontinuities visually prent on the w

afer surface. The method developed for this process is illustrated in Figure 2. In the process shown, the high spatial frequency energy associated with discontinuities are accentuated with the edge detector process. Unfortunately, other image data are also accentuated. Despite the attempts to make the inspection area free of contaminants, it is expected that a low concentration of free dust particles will be prent, and may reside on the surface of the wafer during inspection, and are not to be categorized as a defect. A method developed to cope with the problems was to apply the straight-line Hough transform to the filtered edge data, and determine if there is sufficient evidence of collinear high-frequency spectral components, which would be characteristic of this class of flaw. Since only a “pass/fail” answer is needed as the final output, a technique was developed that reduces the Hough calculation.

The speed and the area of the solution had to satisfy the system constraints. The wafer images are prented to at a rate of 30 frames per cond. It was desired to process at real-time data rates, or clo enough to real-time so as to minimize the number of dropped frames. Furthermore, the solution had to u the fewest number of FPGAs to allow other concurrent operations. The remainder of this ction details the construction of each of the components given the time/area constraints. Some

Figure 2: Conceptual pipeline approach to a

discontinuity detection.

familiarity with the architecture is assumed1,2,4. In each of the operations prented, VHDL5 was ud for task specification and modeling. Synopsys synthesis tools were ud to transform the VHDL behavior into executable code for the platform 13.维护近义词

3.1 Low-pass (smoothing) filter

The purpo of the smoothing filter is to remove noi in a given image. Smoothing is usually desira

墨鱼干怎么做好吃

ble in situations where the camera is in a noisy environment, where the signals propagate on noisy media, or where the images are otherwi noisy. In this ca, fine dust particles on the wafer surface can add “salt and pepper” noi 7,8. Smoothing filter templates can vary in size and structure, yet the computational burden in computing the filter output increas geometrically as the template enlarges. Each output pixel value takes on the weighted sum of its value and its neighbors' values, determined by the filter template. The values can all be weighted equal or can favor the center of the template. The net effect of the smoothing filter is to "blur" the image and thus eliminate noi. For wafer inspection, we want to smooth the image enough to eliminate fal edge detection while maintaining the image so that actual edges (scratches, etc.) are still detectable. The smoothing filter template specified in this ca is 4 × 4 of uniform unity. The upper right middle square is ud as the center pixel. For the top and bottom edges, the image pass through without being smoothed, while for the sides it is smoothed using wrap-around.

The goal of the smoothing filter was to be able to run it at real time speed. Since a new pixel is available every clock cycle, there needs to be a pixel produced every clock cycle to keep-up with the real-time stream. A design of the smoothing filter was implemented, simulated, and synthesized to run on the custom computing platform.

3.1.1 High level approach

The input data is prented to in raster scan order. For a 4 × 4 template, the previous 3 lines plus 4 values need to be buffered. To meet the goal of one result every clock cycle, new data needs to be read into RAM while a pipelined adder accepts 16 values every clock cycle. Since there are 16 values to be read at a time, there needs to be at least 12 RAM units plus the 4 final values (in registers) to accomplish the 16 simultaneous reads. All 16 values in the template are added together, then divided by 16 to obtain the final value for a given location. This is done in hardware by implementing an inverted tree of depth 4 of 16-bit adders. The final output is shifted four times to obtain the correct value in the same 8-bit format of the input. Since we need to produce data at the same rate that we are accepting new data, a scheme was needed to allow the RAM to be read and to be written at the same time. The 12 RAM unit size was cut in half and the number of RAM units doubled to allow interleaving. This way, 12 of the

RAM units are being read while the other 12 are available for updating data. The array of RAM units can be thought of in three rows of 8 RAM units. The storage cell count is a multiple of 512, which is the image width, and allows for minimal control.

3.1.1 Low level approach

The above algorithm was partitioned over three chips. The first chip accepts the input data from the camera, and contains the first 12 RAM units, while the cond chip accepts data from Chip 1, and contains the cond 12 RAM units. The third chip handles all control and performs the additions.

Chips 1 and 2 are very similar. They each contain an 8-state machine that reads a new pixel and stores it in a register. The state machine then broadcasts a "column" of data consisting of the new pixel and the three pixels above the current pixel that are currently stored in RAM. Using ’s extensive programmable communication resources and crossbar network, Chip 1 broadcasts a column of pixels each cycle for 4 of the 8 states, and pass a new pixel for Chip 2 for the other 4 states. Chip 2 either pass the column of Chip 1's data to Chip 3, or produces its own column of data. This approach gives each chip data in and out every clock cycle. When the data is produced, the new pixel in the register is stored into a local RAM. A circular buffer is implemented with the on-chip RAMs; thus, overwriting unneeded pixels. In this way, control logic is kept to a minimum. Only one state is required per column of RAM as oppod to each RAM with a circular buffer approach. This reduces the number of necessary states by 67%.

Chip 3 us a global counter to keep track of boundary conditions. If in a boundary condition, the pixel is simply pasd through (for lack of data to do the operation); otherwi, the smoothed value i

s available and ud as the output. The 4-level adder is pipelined for maximum speed.

The final approach has been verified through simulation and through operation, and runs at frame rate with a latency of approximately three scan lines. The output stream can then be prented to the Roberts operator.

3.2 The Roberts operator

This ction describes the implementation of an edge detection algorithm using Roberts operator7 on the custom computing platform. Edge detection is the determination of the points of an image which lie on edges in the image where edges are the borders between regions with a large frequency difference. It is one of the major steps required for feature extraction in an image. This particular design is developed for automatic flaw detection in IC wafers. It should be noted that the Roberts operator is esntially a non-linear high-pass filter, and tends to accentuate all discontinuities in the image, including dust particles that may be prent on the wafer surface.

The input to this design is a 512x512 image arriving in raster order at the rate of one pixel per clock cycle. Every incoming pixel is tested using Roberts operator, and an output of ‘1’ is generated if it is an edge point and an output of ‘0’ if otherwi. Becau of the pipeline processing in the design, out

put pixels are generated every clock cycle after an initial latency of ven pixel clocks. The whole design is modeled in VHDL, and synthesized on one FPGA of the

Roberts operator consists of two masks W1 and W2:

W1(n) = | I(n-1) - I(n-512) |(1)

W2(n) = | I(n) + I(n-513) |,(2)

where n denotes the position of the central pixel, and I(k) denotes the intensity of the k th pixel. Both are the respons from the introduction of the n th input pixel. The final value is obtained from the two respons using the following equation,

W(n) = [ (W1(n) + W2(n)) / 2.0 ](3)

If this value is greater than a threshold, than the n th pixel is considered as an edge point. Therefore, IF W(n) > THRESHOLD THEN

F(n) = ‘1’;条子肉的家常做法

ELSE

F(n) = ‘0’;

END IF;

where F(n) is the output corresponding to the n th pixel.

3.2.1 Details of the Roberts implementation

Two goals of this task were to implement the entire algorithm on one FPGA, and to operate at 30 frames per cond. Since the input frame is 512 pixels by 512 pixels, and the Roberts templates are 2x2, a minimum buffer size of 514 pixels (or bytes) is required; hence, four circular RAMs, each 128x8, and two 8-bit registers are created on the chip for this purpo. On-chip RAMs are ud instead of off chip memory becau of the concurrent reads and writes that are necessary for full-sp

eed processing. One drawback is that the on-chip RAMs consume considerable space. The RAMs are created using Xilinx hard macros 6 for better processing speed and more effective routing. To distinguish the start and end of frames, an 18-bit counter is necessary, which can index all 512×512 pixels.

Zeros are produced for the first 513 pixels to address image boundary conditions. This insures that the image boundary will not be confud for a flaw. Processing starts with the arrival of the 514th pixel. After that, the first pixel is discarded making room for the 515th pixel in a circular fashion. The process of discarding the least recently stored pixel on arrival of the latest pixel continues, allowing a constant buffer size of 514.

The absolute respon of each mask is calculated by using a comparator, a multiplexer and an 8-bit subtractor. A comparator compares the two inputs and its output is directed to the multiplexer so that finally the subtractor always subtracts the smaller of the two from the greater. The two respons are then fed to a 9-bit adder which determines the resultant respon. Due to the demands on the speed of computations, all the functional operations are pipelined. Figure 3 shows the different stages of the pipeline.

本文发布于:2023-06-16 07:10:39，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1040759.html

上一篇：2018年12月英语四级真题(卷二)

下一篇：2018年12月英语四级真题及答案第二套

标签：条子英雄近义词好吃家常简介维护墨鱼

留言与评论（共有 0 条评论）