Convolution of large 3D images on GPU and its decomposition

biomed - Karas Pavel , Svoboda , Svoboda David

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

12 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

In this article, we propose a method for computing convolution of large 3D images. The convolution is performed in a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of the CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in frequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory consumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on overall computational time. We also study the implementation on multiple GPUs and compare the results between the multi-GPU and multi-CPU implementations.

Sujets

Convolution

Décomposition

Fourier transform

FFT

GPU

Cuda

Informations

Publié par	biomed
Publié le	01 janvier 2011
Nombre de lectures	11
Langue	English

Extrait

Karas and Svoboda EURASIP Journal on Advances in Signal Processing 2011, 2011 :120 http://asp.eurasipjournals.com/content/2011/1/120

R E S E A R C H Open Access Convolution of large 3D images on GPU and its decomposition l Ka * and David Svoboda Pave ras

Abstract In this article, we propose a method for computing convolution of large 3D images. The convolution is performed in a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of the CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in frequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory consumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on overall computational time. We also study the implementation on multiple GPUs and compare the results between the multi-GPU and multi-CPU implementations. Keywords: convolution, decomposition, Fourier transform, FFT, GPU, CUDA

1 Introduction discuss the time complexity of a convolution with The convolution of two signals can be employed for emphasis on large 3D images. We recall the convolution blurring images, deconvolving blurred images, edge theorem and its positive effect on the time complexity. detection, noise suppression, and in many other applica- For example, having a signal of 1000 × 1000 × 100 vox-tions [1-3]. For example, a cross-correlation and a els and a filter kernel of 100 × 100 × 100 voxels, which phase-correlation (which are important methods of is common in optical microscopy, the calculation using image registration) are both very similar to a convolu- the convolution theorem takes tens of seconds, instead tion since they have basically the same mathematical of several days, on the most recent CPU architecture. meaning except that a convolution involves reversing a Even better times can be obtained using graphic cards. signal [[1], p.211]. The convolution of large signals is The GPU-based convolution using the convolution the-also used for simulating image formation in optical sys- orem is described in [8]. As indicated by authors, the tems such as light microscopes [4]. The convolution is a FFT-based approach is suitable for large non-separable common method used in image processing; however, its kernels. computation is very time-co nsuming for large images. The essential part of the algorithm described above is Graphic cards can be employed for accelerating the the Fourier transform. The first attempt to compute the computation. Some of the algorithms can be found in fast Fourier transform on graphics hardware was NVIDIA whitepaper [5]. Here, a so-called naïve convo- described in [9]. The implementation was written in lution and a convolution with separable kernel are OpenGL and Cg shading languages and tested in the described, along with their optimized GPU implementa- convolution application. The comparison of convolution tion in CUDA. These algorithms can be used in many in spatial and frequency domain (for the description of applications, such as fast computation of Canny edge both approaches refer to the following section) was detection [6,7]. However, these approaches are not suita- made in [10]. A significant speedup was achieved by ble for general large kernels. implementing the algorithm on GPU, using HLSL and In optical microscopy, we often deal with both large DirectX. Recently, the NVidia ® CUDA programming input signals and kernels. Thus, in this article, we model [11] along with the CUFFT library [12] offers a framework for implementing convolution in a straight-* Corres ce: xkara forward manner. Besides CUFFT, other FFT librari CentrefporonBidoemnedicalImsa1g@efi.Amnuanlyi.scizs,FacultyofInformatics,Masaryk GPUweredeveloped,suchas[13]and[14].esTfhoer University, Botanicka 68a, Brno, Czech republic © 2011 Karas and Svoboda; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.