Krazytech: Image Processing Algorithms on Reconfigurable Architecture using Handel-C

· Abstract

Computer manipulation of images is generally defined as Digital Image Processing (DIP). DIP is employed in variety of applications, including video surveillance, target recognition, and image enhancement. Some of the algorithms used in image processing include convolution, edge detection and contrast enhancement. These are usually implemented in software but may also be implemented in special purpose hardware to reduce speed. In this work the canny edge detection architecture has been developed using reconfigurable architecture and hardware modeled using a C-like hardware language called Handle-C.

The proposed architecture is capable of producing one edge pixel every clock cycle. The hardware modeled was implemented using the DK2 IDE tool on the RC1000 Xilinx Vertex FPGA based board. The algorithm was tested on standard image processing benchmarks and significances of the result are discussed.

1. Introduction: Digital image processing is an ever expanding and dynamic area

with applications reaching out into our everyday life such as medicine, space exploration, surveillance, automated industry inspection and many more areas. Application specific hardware implementation offers much greater speed than a software implementation. Implementing complex computation tasks on hardware and by exploiting the parallelism and pipelining in algorithms yield significant reduction in execution times. The two types of hardware design technologies are Full custom hardware design and semi custom hardware design, which are programmable devices like Digital signal processors (DSP’s) and Field Programmable Gate Arrays (FPGA’s). Full custom design offers highest performance, but its complexity and the cost is very high. The FCD cannot be changed and the design time is also very high. FCDs are used in high volume commercial applications. FPGA devices. Hardware design techniques such as parallelism and pipelining can be developed on a FPGA, which is not possible in dedicated DSP designs. Implementing image processing algorithms on reconfigurable hardware minimizes the time-to market cost, enables rapid prototyping of complex algorithms and simplifies debugging and verification. Therefore, FPGAs are an ideal choice for implementation of real time image processing algorithms. A new C like hardware description language called Handel-C introduced by Celoxica, it allows the designer to focus more on the specification of an algorithm rather than adopting a structural approach to coding. The goal of this work is to implement image processing algorithms like median filter, morphing, convolution and edge detection on FPGA using Handel-C.

2. Image Processing Algorithms: This section discusses the theory of most commonly used image processing algorithms like, 1) Filtering, 2)Morphological Operations, 3) Convolution and 4) Edge detection.

2.1 Median Filter: A median filter is a non-linear digital filter which is able to preserve sharp signal changes and is very effective in removing impulse noise. Linear filters have no ability to remove this type of noise without affecting the distinguishing characteristics of the signal. A standard median operation is implemented by sliding a window of odd size over an image. At each window position the sampled values of signal or image are sorted, and the median value of the samples is taken as the output that replaces the sample in the center of the window.

The main problem of the median filter is its high computational cost . The execution times are reduced by implementing median filters on FPGAs.

2.2 Morphological Operation:

The term morphological image processing refers to a class of algorithms that is interested in the geometric structure of an image. Morphology can be used on binary and gray scale images, and is useful in many areas of image processing, such as skeletonization, edge detection, restoration and texture analysis. A morphological operator uses a structuring element to process an image . The structuring element is a window scanning over an image, which is similar to the pixel window used in the median filter. When the structuring element scans over an element in the image, there may be instance where the structuring element completely fits inside the object or does not fit inside the object

The most basic building blocks for many morphological operators are erosion and dilation Erosion as the name suggests is shrinking or eroding an object in an image. Dilation on the other hand grows the image object. Both of these objects depend on the structuring element and how it fits within the object.

2.3 Convolution Operation: In image processing the convolution is used to implement operators whose output pixel values are simple linear combination of certain input pixels values of the image. The basic idea is that a window of some finite size is scanned over an image. The output pixel value is the weighted sum of the input pixels within the window where the weights are the values of the filter assigned to every pixel of the window. The window with its weights is called the convolution mask.

2.4 Edge Detection: First the image is smoothed by Gaussian Convolution. A simple 2-D first derivative operator is applied to the smoothed image to highlight regions of the image with high first spatial derivatives. Edges translate into ridges in the gradient magnitude image. The algorithm then tracks along the top of these ridges and sets to zero all pixels that are not actually on the ridge top so as to give a thin line in the output, a process known as non-maximal suppression. The 5x5 Gaussian convolution mask of standard deviation ( = 1.4) used for smoothing. The horizontal and vertical gradients are calculated using the differences between adjacent pixels, one way to find edges is to explicitly use a {-1 , +1 } operator. Prewitt masks are based on the idea of the central difference:

I (x,y+1)- I (x,y-1)/2.

corresponds to the following convolution kernel:

These convolutions are applied to the results obtained from the smoothing stage to get the horizontal (dx) and vertical (dy) gradients. The values of each component of the gradient determined from the previous stage are employed to determine the magnitude and direction. Classically, to calculate the direction of the gradient the arctangent is employed. The arctangent is a very complex operation, which increases the logic depth and delay. The value and sign of the components of the gradient are used to analyze the direction. Consider the pixel Px,y and the derivative at the pixel are dx and dy ,the gradient at p is approximated as shown in Figure 2.4. Comparison is made among the actual pixel and its neighbors along the direction of the gradient.

The center pixel Px,y is considered to be an edge, if Px,y and Px,y If both conditions are not satisfied the center pixel is eliminated. The output image of this stage consists of some individual pixels and is usually thresholded to decide which edges are significant.

3. Implementation Resources

3.1 Handle-C: Handel-C is essentially an extended subset of the standard ANSI-C language, specifically designed for use in a hardware environment. Unlike other C to FPGA tools which rely on going via several intermediate stages, Handel-C allows hardware to be directly targeted from software, allowing a more efficient implementation to be created. The Handel-C compiler comes packaged with the Celoxica DK1 development environment.

3.2 Targets Supported by Handel-C: Handel-C supports two targets. The first is a simulator target that allows development and testing of code without the need to use any hardware. This is supported by a debugger and other tools. The second target is the synthesis of a netlist for input to place and route tools. Place and route is the process of translating a netlist into a hardware layout. This allows the design to be translated into configuration data for particular chips.

4. Hardware Implementation: The algorithms implemented in this work use the moving window operator. The moving window operator usually process one pixel of the image at a time, changing its value by some function of a local region of pixels. The operator moves over the image to process all the pixels in the image. A 3x3 moving window is used for the median filtering, morphological and edge detection algorithms and a 5x5 moving window used in Gaussian smoothing operation. For the pipelined implementation of image processing algorithms all the pixels in the moving window operator must be accessed at the same time for every clock. A 2D-matrix of First In First Out buffers are used to create the effect of moving an entire window of pixels through the memory for every clock cycle. Architecture of 3x3 moving window uses two FIFO buffers. The size of the FIFO buffer is given as W-M, where W is the width of the image and M the size of the window (Mx M). To access all the values of the window for every clock cycle the two FIFO buffers must be full.

4.1 Median Filter: A median filter is implemented by sliding a window of odd size on image. A 3x3 window size is chosen for implementation for median filter, because it is small enough to fit onto the target FPGA’s. The median filtering operation sorts the pixel values in a window in ascending order and picks up the middle value, the center pixel in the window is replaced by the middle value. The most efficient method of accomplishing sorting is with a system of hardware compare/sort units, which allows sorting a window of pixels into an ascending order .

4.2 Morphological Operations: The basic morphological operators are erosion and dilation. The erosion and dilation of a grayscale image are called grayscale erosion or dilation. These grayscale erosion is performed by minimum filter, whereas the dilation is performed by maximum filter. In a 3x3 minimum filter, the center pixel is replaced by a minimum value of the pixels in the window In a maximum filter, the center pixel is replaced by a maximum value of the pixels in the window.

4.3 Convolution Operation: Convolution is a very complex operation that requires huge computational power. To calculate a pixel for a given mask of size m x n, m * n multiplications, m *n-1 additions and one division are required. A single cycle divide or multiplication produces a large amount of hardware and long delays through deep logic. In order to improve the performance of the convolution operation, it is necessary to reduce the multiplication and division operators. Multiplication and division can be done using bit shifting, but this is only possible with the powers of 2’s.

4.4 Edge Detection: Edge detector operation consists of four stages: 1. Image smoothing 2. Vertical and Horizontal Gradient Calculation 3. Directional Non Maximum Suppression 4.Threshold.

4.4.1 Image Smoothing: Smoothing of the image is achieved by 5x5 Gaussian convolutions. A 5x5 moving window operator is used, four FIFO buffers are employed to access all the pixels in the 5x5 window at the same time. Since the design is pipelined, the Gaussian smoothing starts once the 2 FIFO buffers are full. That is, the output is produced after a latency of twice width of image plus three cycles. The output of this stage is given as input to the next stage.

4.4.2 Vertical and Horizontal Gradient Calculation: An 8-bit pixel in row order of the image produced during every clock cycle in the image smoothing stage is used as the input in this stage.The gradient calculation introduces negative numbers. In Handel-C, negative numbers can be handled easily by using signed data types. Signed data means that a negative number is interpreted as the 2’s complement of number. In the design, an extra bit is used for signed numbers as compared to unsigned 8 bit numbers . Two gradient values are calculated for each pixel, one or vertical and other for horizontal. The 9 bits of vertical gradient and the 9 bits of the horizontal gradient are concatenated to produce 18 bits. Since the whole design is pipelined, an 18 bit number is generated during every clock cycle, which forms the input to the next stage.

4.4.3 Directional Non Maximum Suppression: The output of the previous stage is used as input in this stage. In order to access all the pixels in the 3x3 window at the same time two eighteen bit FIFO buffers of width of the image minus three array size are employed. Once the direction of the gradient is known, the values of the pixels found in the neighborhood of the pixel under analysis are interpolated. The pixel that has no local maximum gradient magnitude is eliminated. The comparison is made between the actual pixel and its neighbors, along the direction of the gradient.

4.4.4 Threshold: The output obtained from the non maximum suppression stage contains single edge pixels which contribute to noise. This can be eliminated by thresholding. To get the connected path between the weak edge pixel and the strong edge pixel, a 3x3 window operator is used. If the center pixel is a strong edge pixel and any of the neighbors is a weak edge pixel, then weak edge pixel is considered as a strong edge pixel. The resultant image is an image with optimal edges.

5. Results: The image processing algorithms discussed above were modeled in Handel-C using the DK2 environment. The design was implemented on RC1000-PP Xilinx Vertex-E FPGA based hardware. The timing result of the image processing algorithms on a 256x256 size gray scale Lena image is shown in Table 5.1. The hardware implementation of the algorithms is compared with implementation on a PentiumIII 1.3 GHz machine using Visual C++ Version 6.0 without any optimization. The speed of our FPGA solution for the image processing algorithms is 20 times faster than the software implementation. Figure 5.1 shows the output of hardware implemented images.

		Xilinx Vertex-E FPGA		Pentium III
		Freq [MHz]	Time [ms]	Freq [MHz]	Time [ms]
Median Filter, Morphological Operation		25.9	2.56	1300	51
Gaussian convolution	A	25.9	2.68	1300	31
Gaussian convolution	B	42	1.57	1300	31
Gaussian Smoothing	C	42.03	1.58	1300	16
Gaussian Smoothing	D	50.99	1.31	1300	16
Edge detection		16	4.21	1300	47

Table 5.1: Timing Result edge detection algorithm

A: Direct division by 115 B: Division using right shift( >> 7) C: Direct Multiplication D: LUT based Multiplication

Krazytech

Thursday, September 9, 2010

Image Processing Algorithms on Reconfigurable Architecture using Handel-C

No comments:

Post a Comment