Newsletter

Newsletter
 

Case Studies


Learn how ArrayFire has worked in real code, including applications in academia, finance, government, life sciences, manufacturing, media, and oil & gas. With ArrayFire, you get the best in GPU and accelerator computing, ensuring real success in your business and research objectives.



Academia


Accelerating LTE Simulation
Tsinghua University
Speedup: 3X


lte system

Accelerating LTE Simulation

Authors: Yuan Gao, Yin Sun, Chun Hui Zhou, Xin Su, Xi Bin Xu, Shi Dong Zhou, Tsinghua University
Speedup: 3X

Fast simulations are a driving force in several research projects. However, the accompanying long simulation times can tend to be a drag in many of these projects. In this article, we shall bring up the example of the work on 3GPP LTE System Simulation by Yuan Gao et al (from Tsinghua University, Beijing) and demonstrate how the use of AccelerEyes software can significantly improve the simulator performance and lead to faster validation times in simulation projects.

Last Updated: 8 Aug 2011

High Performance Compressive Sensing
Rice University
Speedup: 5X


Great Wall of China

High Performance Compressive Sensing

Authors: Nabor Reyna and Wotao Yin from Rice University
Speedup: 5X

This work deals with reconstruction of signals using partial Fourier matrices (RecPF). The major computational components of the algorithm involve shrinkage and FFTs. AccelerEyes software is employed to accelerate this compute-heavy code.

Last Updated: 27 Jul 2011

Power System Simulations
Indian Institute of Technology, Roorkee, India
Speedup: 35X


power flow image

Power Flow on the GPU

Authors: Indian Institute of Technology, Roorkee
Speedup: 35X

Power flow studies are one of the most important aspects of power system planning and operation. The power flow reveals the sinusoidal steady state characteristics of the entire system - voltages, real and reactive power generated, and absorbed and line losses- elucidating the voltage magnitudes and angles at each bus, the generation of each generating unit, and real and reactive power losses in the system. All this is necessary to ensure the security, economy, and control of electrical energy distribution. Learn how AccelerEyes software can deliver magnitudes of performance improvement over CPU-based solutions.

 

Last Updated: 18 Apr 2010

Antenna Array Simulations
University of Naples Federico II
Speedup: 4.5X


Echo Generators

Design and simulate echo generators

Authors: A. Capozzoli, C. Curcio, A. Liseno at University of Naples Federico II
Speedup: 24X

Antenna array design involves repeated simulation to tune the many parameters involved, and waiting around for simulations to finish is no fun. Offloading the optimization problem onto the GPU cuts that time down significantly. In their recent paper, Capozzoli, Curcio, and Liseno of University of Naples Federico II demonstrated how a simple modification to their echo generator array simulation took advantage of the GPU to bring immediate speedups.

Last Updated: 20 Jul 2011

Laplace Transform Inversion
Acunum Algorithms and Simulations
Speedup: 3.8X


Laplace Transform Inversion on the GPU

Laplace Transform Inversion on the GPU

Authors: Patrick Kano and Moysey Brio at Acunum Algorithms and Simulations
Speedup: 3.8X

The numerical inversion of the Laplace transform is a long standing problem due its implicit ill-posedness. Patrick Kano and Moysey Brio of Acunum Algorithms and Simulations, with their experience in computational methods and algorithm development, found a solution that not only works, but is very fast.

Last Updated: 13 May 2011

Compressed Sensing for Image Reconstruction
College of Engineering, Roorkee, India
Speedup: 8X


Compressed Sensing Algorithms

Compressed Sensing Algorithms

Authors: Kuldeep Yadav, Ankush Mittal, M.A. Ansar and Avi Srivastava, College of Engineering, Roorkee, India
Speedup: 8X

Compressed sensing is very critical in the areas of medical image reconstruction, image acquisition or sensor networks. An algorithm for compressed sensing developed using a Basis Pursuit Algorithm shows over 8X speedup when run on an NVIDIA GPU.

Last Updated: 5 May 2011

Fat/Water Reconstruction for Medical Images
Case Western Reserve University
Speedup: 11.6X


Fat/Water Reconstruction

Improved Fat/Water Reconstruction Algorithm

Authors: D. H. Johnson, S. Narayan, C. A. Flask and D. L. Wilson, Case Western Reserve University
Speedup:11.6X

Case Western Reserve University researchers turned to GPUs running AccelerEyes software to develop a fast and robust version of the "Iterative Decomposition of water and fat with an Echo Asymmetry and Least-squares" (IDEAL) reconstruction algorithm. This algorithm uses a lot of Image Processing algorithms for reconstruction, and was shown to achieve very high speedups.

Last Updated: 25 Mar 2011

Finance


Option Pricing
Koch Supply & Trading
Speedup: 51.8X


koch logo

Option Pricing

Authors: Koch Supply & Trading
Speedup: 51.8x

Andrew Shin, Market Risk Manager of Koch Supply & Trading, achieves significant performance increases on option pricing algorithms using AccelerEyes software to accelerate his code with GPUs. Andrew says, "My buddy and I are, at best, novice programmers and we couldn't imagine having to figure out how to code all this in CUDA." But he found AccelerEyes software to be straight-forward. With these results, he says he can see AccelerEyes software and GPUs populating Koch's mark-to-futures cube, which contains its assets, simulations, and simulated asset prices.

 

Last Updated: 13 Aug 2012

GPU Computing in Automated Trader
Automated Trader
Speedup: 37.5X


finance chart

GPU Computing in Automated Trader

Authors: Automated Trader
Speedup: 37.5x

The Q1 2012 issue of Automated Trader contains an excellent Mashup piece reviewing software for algorithmic trading. The article provides a wonderful glimpse into the 1-2 month adventure of Andy Webb, Automated Trader.s Founder, and Wrecking Crew building a fast trading platform from several technologies. The full trading platform they built was quite extensive. The part that caught our eye was the core computational component of the pipeline. That component involved permuting 1,000 potential pairs with cointegration tests for 350 time windows on each potential pair.

 

Last Updated: 28 Feb 2012

GPUs in Quantitative Analytics and Finance
Private Bank


finance chart

The world of Quantitative finance is all about getting accurate results really, really fast. AccelerEyes is working with one of the largest banks in Spain to maximize their output using GPUs. Click the link below for an overview of the uses of GPU computing in finance.

 

Last Updated: 17 Mar 2010

Government


Powering Mars Research
NASA and UAA in Anchorage
Speedup: 5X


mars images

Powering Mars Research

Authors: NASA and UAA in Anchorage
Speedup: 5X

The main thrust of this research is improving mars rover image compression via GPUs and genetic algorithms. With AccelerEyes software and GPUs, the researchers were able to achieve 5X speedups on the larger data sizes. The algorithm works by pairing neighboring pixels with a random one and then adjusting the random pixel based on whether it incrementally improves the original image. Babb described the algorithm as an embarrassingly parallel process, ideally suited to GPU acceleration. He estimates he has been able to achieve a 20 to 30 percent error reduction in subjects like fingerprints and satellite imagery.

Last Updated: 6 Aug 2012

Radar Image Formation
System Planning Corporation
Speedup: ~45X


rader image formation

Radar Image Formation

Authors: Gary Rubin and Earl Sager - System Planning Corporation
Speedup: ~45X

Radar imaging is computationally intensive. As a result, many imaging algorithms apply FFT-based approximations. While efficient, these algorithms sacrifice data fidelity for speed. Other algorithms better preserve information, but are often too slow for many applications. At System Planning Corporation (SPC) , we have implemented a SAR/ISAR imaging routine based on the Backprojection algorithm. Using AccelerEyes software, we have demonstrated speedups of roughly 45x for large datasets.

Last Updated: 26 May 2010

Radar Clutter Reduction
System Planning Corporation
Speedup: 5X - 10X


marine navigation radar

Radar Clutter Reduction

Authors: David Berger and Gary Rubin - System Planning Corporation
Speedup: 5 to 10x

System Planning Corporation (SPC) uses AccelerEyes software to accelerate radar processing algorithms. The system processes raw data from marine navigation radars using a variety of thresholding techniques to extract real targets from clutter. This involves highly data-parallel processing in which each radar pulse is subjected to the same computations; very few operations occur across multiple pulses. Using AccelerEyes software, SPC has achieved 10x speed improvements relative to a Core i7-920 CPU and 5x improvements relative to a realtime DSP implementation.

Last Updated: 26 May 2010

Novel Algorithms for Linear Algebra
SAIC
Speedup: 3.5X


LU Decomposition

Novel Algorithms for LU Decomposition

Authors: Nolan Davis and Daniel Redig, SAIC
Speedup: 3.5X

Nolan Davis and Daniel Redig at SAIC recently presented work on Hybrid GPU/Multicore Solutions for Large Linear Algebra Problems where they developed a novel algorithm for LU decomposition, one of the most important routines in linear algebra. They presented a Hybrid CPU/GPU computing approach, where problems too large to fit in GPU memory can also be solved faster than using only the CPU.

Last Updated: 26 Jul 2011

Geolocation
BAE Systems
Speedup: 17X


Geolocation visualization

Geolocation

Authors: BAE Systems
Speedup: 17X

Geolocation is the identification of the real-world geographic location of a target of interest. In this application, the system receives the signal with an array of several antennas and computes the direction of arrival of the radio energy by measuring the time difference of arrival (or the phase difference) at the different antennas.

 

Last Updated: 13 Apr 2009

Tsunami Modeling
University of Minnesota, Boise State, Saint Scholastica , and NCAR
Speedup: 3X - 5X


Tsuanmi image

Tsunami Modeling

Authors: University of Minnesota, Boise State, Saint Scholastica , and NCAR
Speedup: 3 to 5X

Natural catastrophic disasters like tsunamis commonly strike with little warning. For most people, tsunamis are underrated as major hazards. People sometimes wrongly believe that they occur infrequently and only along distant coasts. Tsunamis are usually caused by earthquakes. Seismic signals can give some margin of warning since the speed of tsunami waves travels at 1/30 the speed of seismic waves. Still there is little time between the creation of the tsunami and its impact making fast processing critical to producing effective warning systems. AccelerEyes software was used to run an RBF simulation on the GPU with a time to solution not available by other alternatives.

 

Last Updated: 20 Dec 2009

Life Sciences


Parallelized Gene Predictors
University of Quebec
Speedup: 43X



Authors: University of Quebec
Speedup: 43X

Computerized approaches to studying the human genome are challenged by the exploding amount of data, which doubles roughly every 6 months. In order to deal with this burgeoning datasets, demands for faster processing power continue to arise. This work focuses on predicting genes using frequency analysis with FFTs and with an equivalent technique known as Goertzel's algorithm. In these applications, the emphasis of this paper is to propose tools to geneticists and molecular biologists for the prediction or identification of new genes using existing complementary strategies. The criteria for these tools are speed, reliability, accuracy and ease of use, thus requiring little training.

Last Updated: 26 Jun 2012

Pathology advances with GPUs
Northeastern University
Speedup: 100X+



Pathology advances with GPUs

Authors: Laboratory for Spectral Diagnosis at Northeastern University
Speedup: 100X+

One element of the hyperspectral image analysis workflow that requires more than a traditional desktop workstation or personal computer is Hierarchical Cluster analysis (HCA). HCA requires a large amount of data space and substantial computation time (~11 hours) for typical datasets using a single processor personal computer. Rather than following the traditional approach of moving to a lower level programming language like C or C++ and complex parallel programming paradigms such as OpenMP or the Massage Passing Interface (MPI), the lab utilized graphics processing units, or GPUs, and the AccelerEyes software platform. The solution allowed the lab to dramatically increase the performance of the analysis while substantially decreasing the amount of calendar time to reach the desired results.

Last Updated: 27 May 2010

Hepatitis C Virus - mutation modeling
Centers for Disease Control and Prevention
Speedup: ~20X


hepatitis C

Hepatitis C Virus - mutation modeling

Authors: CDC Research and Development Team
Speedup: ~20X

This case study provides a look at biological research regarding coordinated mutations of the Hepatitis C Virus (HCV). AccelerEyes provided collaborative R&D resources and greatly improved the speed of this HCV research with the use of parallelization, reducing the computing time from 40 days to less than 1 day. Most importantly, the conclusion of the case study illustrates the the relative price-performance of personal supercomputers that leverage GPUs and AccelerEyes software provides a compelling solution versus other architectures and approaches.

Last Updated: 10 Sep 2010

Accelerating the SPM package for Neuroimaging
Georgia Institute of Technology
Speedup: 3.5X


fmri image analysis

fMRI with SPM in Neuroimaging

Authors: Georgia Institute of Technology
Speedup: 3.5X

The Georgia Tech team explores the value of AccelerEyes software and GPUs for fMRI workflows within the popular SPM - Statistical Parametric Mapping software widely used in neuroscience research.

 

Last Updated: 8 Mar 2010

Medical Image Compression
Indian Institute of Technology, Roorkee, India
Speedup: 38X


skull image

Medical Image Compression

Authors: Jaideep Singh, Ipseeta Aruni, R. Balasubramanian - IIT - Roorkee, India
Speedup: 38x

This study presents the acceleration of Haar wavelet-based image compression algorithm for medical imaging on the Graphics Processing Unit (GPU) using AccelerEyes software. Due to bandwidth and image size constraints of medical imaging systems, image compression plays a vital role in reducing the bit rate of transmission or storage. Wavelet-based image compression provides the most promising approach for high quality image compression.

Last Updated: 23 June 2010

Brain Displacement
Spencer Technologies
Speedup: 12X


Brain visualization

Brain Displacement

Authors: Spencer Technologies
Speedup: 12X

Spencer describes how AccelerEyes software facilitates the development of fast algorithms enabling observation of brain displacement across depth with sampling density that far surpasses previous benchmarks.

 

Last Updated: 23 Jan 2010

Multidimensional Scaling for Genomics
Leibniz Institute of Plant Genetics and Crop Plant Research
Speedup: 20X - 35X


Gene Expression visualization

Multidimensional Scaling for Genomics

Authors: Leibniz Institute of Plant Genetics and Crop Plant Research
Speedup: 20 to 35X

Multidimensional scaling (MDS) is a general computing technique to turn a distance matrix into a set of reconstructed points with pair-wise relationships approximating the original distances by points located in a usually low-dimensional space. AccelerEyes software is used to enhance execution of the HiT-MDS procedure and delivers considerable performance improvement.

 

Last Updated: 26 Jun 2009

Drug Delivery Model
Georgia Institute of Technology
Speedup: 70X


Cell simulation visualization for Drug
                                   Discovery

Drug Delivery Model

Authors: Georgia Institute of Technology
Speedup: 70X

In this work, the researchers simulate the delivery of a novel nanoparticle chemotherapy drug to cancerous tissue. Simulation allows scientists to predict experimental outcomes and thus reduce the cost of development and time to clinical relevance. The simulation model includes blood vessels, tumor cells, and healthy cells and an engine to calculate the spatial distributions of both drug and oxygen. AccelerEyes software is used to speed up the diffusion calculations for the drug and oxygen within the tissue.

 

Last Updated: 09 Nov 2008

Biomedical Infrared Spectroscopy
University of Manchester and Nofima Mat, Norway
Speedup: Hours of runtime reduction


cancer cell image

Biomedical Infrared Spectroscopy

Authors: University of Manchester and Nofima Mat, Norway
Speedup: Hours of runtime reduction

The authors present an iterative algorithm that applies full Mie scattering theory and avoids noise accumulation in their iterative algorithm by integrating a curve-fitting step. AccelerEyes software along with NVIDIA GPUs are leveraged to reduce the time added by the curve-fitting step.

Last Updated: 20 May 2010

Manufacturing


Feature Learning on Images
Stanford University
Speedup: Hours of runtime reduction


Feature Learning

Feature Learning Architectures with GPU-acceleration

Authors: Andrew Ng, Stanford University
Speedup: Ability to process many images in parallel

Stanford researchers in Andrew Ng's group used GPUs and AccelerEyes software to speed up their work on Feature Learning Architectures. They decided to use AccelerEyes software for this study because of the need to quickly evaluate many architectures on thousands of images. AccelerEyes software taps into the immense computing power of GPUs and speeds up research utilizing many images.

Last Updated: 9 Apr 2011

Tomography of Vegetation - Filtered Back-Projection and Non-Uniform FFTs
Universita di Napoli Federico II
Speedup: 10X


Filtered Back-Projection

Tomography of Vegetation - Filtered Back-Projection and Non-Uniform FFTs

Authors: Drs. Capozzoli, Curcio, di Vico, and Liseno, Universita di Napoli Federico II
Speedup: 10X

In order to investigate changes of forest biomass, scientists use microwave tomography to image the vegetation. At the smallest scale, individual plants can be imaged to investigate branching and growth, but even synthetic aperture radar can reveal large-scale changes in regional ecology. To the right, you can see the experimental setup to image an individual plant.

Last Updated: 16 Aug 2011

Action Recognition with Independent Subspace Analysis
Stanford University
Speedup: 4.4X


Feature Learning

Action Recognition with Independent Subspace Analysis

Authors: Quoc Le, Will Zou, Serena Yeung, Andrew Ng, Stanford University
Speedup: 4.4X

In a paper at this year's CVPR 2011, entitled "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis", the authors explain how their unsupervised feature learning algorithm competes with other algorithms that are hand crafted or use learned features. For their training purposes, they used a multi-layered stacked convolutional ISA (Independent subspace analysis) network. An ISA is used for learning features from image patches without supervision.

Last Updated: 19 Aug 2011

Media & Computer Vision


Fast Computer Vision with OpenCV and ArrayFire
OpenCV Blogger
Speedup: 10X


OpenCV image

Fast Computer Vision with OpenCV and ArrayFire

Authors: OpenCV Blogger
Speedup: ~10X

The OpenCV library is the defacto standard for doing computer vision and image processing research projects. OpenCV includes several hundreds of computer vision algorithms, aimed for use in realtime vision applications. This case study shows how to use both libraries together. There is a simple example application that demonstrates using OpenCV for webcam access and ArrayFire for some basic processing routines and displaying results.

 

Last Updated: 24 Aug 2011

Video Processing
Google
Speedup: 10X - 20X


Google Video Processing image

Video Processing

Authors: Google and Stanford University
Speedup: 10 to 20X

Video content analysis is the basis for categorizing videos and enabling search by content. Growing interest in using sparse-coding methods to extract motion features in video in support of video content analysis led to the application of AccelerEyes software to improve performance by substantially accelerating the solution of the L1-regularized least-squares optimization problem.

 

Last Updated: 13 Jan 2010

Action Recognition with Independent Subspace Analysis
Stanford University
Speedup: 4.4X


Feature Learning

Action Recognition with Independent Subspace Analysis

Authors: Quoc Le, Will Zou, Serena Yeung, Andrew Ng, Stanford University
Speedup: 4.4X

In a paper at this year's CVPR 2011, entitled "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis", the authors explain how their unsupervised feature learning algorithm competes with other algorithms that are hand crafted or use learned features. For their training purposes, they used a multi-layered stacked convolutional ISA (Independent subspace analysis) network. An ISA is used for learning features from image patches without supervision.

Last Updated: 19 Aug 2011

Music Beat Analysis
Georgia Tech
Speedup: 15X


Music Beat Analyzer

Music Beat Analysis

Authors: Vidhur Vohra - Georgia Tech
Speedup: 15X

Did you ever wonder how the music visualizer in your media player works? Watching it pulsate in synchrony with the beats of the song is almost as entertaining as listening to the song itself! Researchers have been attempting to detect beats in audio signals for many years, and there are many techniques available, from the simplest (and least accurate) to more complicated algorithms that are highly accurate. All algorithms, though, perform some form of signal processing and frequency analysis, applications highly suited to GPU Computing.

Last Updated: 11 Aug 2011

Optimization methods for deep learning
Stanford Artificial Intelligence Laboratory
Speedup: Improved Accuracy


SAIL image

Optimization methods for deep learning

Authors: Stanford Artificial Intelligence Laboratory
Speedup: Improved Accuracy

Researchers at SAIL (Stanford Artificial Intelligence Laboratory), have done it again. They have successfully used AccelerEyes software to speed up the training part of Deep Learning algorithms. In their paper titled .On Optimization Methods for Deep Learning., they experiment with some of the well known training algorithms and demostrate their scalability across parallel architectures (GPUs as well as multi-machine networks). The algorithms include SGDs (Stochastic Gradient Descent) L-BFGS (Limited BFGS used for solving non-linear problems), CG (Conjugate Gradient).

 

Last Updated: 20 Sep 2011

Feature Learning on Images
Stanford University
Speedup: Hours of runtime reduction


Feature Learning

Feature Learning Architectures with GPU-acceleration

Authors: Andrew Ng, Stanford University
Speedup: Ability to process many images in parallel

Stanford researchers in Andrew Ng’s group used AccelerEyes software to speed up their work on Feature Learning Architectures. They decided to use AccelerEyes software for this study because of the need to quickly evaluate many architectures on thousands of images. AccelerEyes software taps into the immense computing power of GPUs and speeds up research utilizing many images.

Last Updated: 9 Apr 2011

Digital Holography for Imaging
National University of Ireland, Maynooth
Speedup: 17X


Digital Holography

Digital Holography

Authors: Nitesh Pandey, Damien Kelly, Bryan Hennelly and Thomas Naughton from the National University of Ireland, Maynooth
Speedup:17X

Digital holography is a powerful imaging technique with many new applications like true 3D display. It allows the capture of both amplitude and phase information of the light reflected off the surface of 3D objects. Researchers at the National University of Ireland, Maynooth are developing techniques based on digital holography for 3D display applications.
Reconstruction of large digital holograms can be computationally intensive to generate on CPUs, but GPUs running AccelerEyes software offer amazing possibilities.

Last Updated: 30 Apr 2011

Oil & Gas


3D Mantle Convection - Geodynamics
Boise State, University of Colorado, University of Minnesota
Speedup: 2.5X - 4.5X


3D Mantle Convection image

3D Mantle Convection - Geodynamics

Authors: Boise State, University of Colorado, University of Minnesota
Speedup: 2.5 to 4.5X

The authors introduce a GPU implementation of a three-dimensional mantle convection modeling at a high Rayleigh number to the solid earth geophysics community. They outline code development time, compare performance of CPUs versus GPUs, and deliver powerful visualizations.

 

Last Updated: 10 Feb 2010

Ground Water Simulations
Louisiana State University
Speedup: >20X


lattice boltzmann model

Lattice Boltzmann Models - Ground Water Simulations

Authors: Kevin R. Tubbs and Frank T-C. Tsai at Louisiana State University
Speedup: >20X

A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing Processor in a single GPU workstation. GPU computing is performed using AccelerEyes software. Mass transport with velocity-dependent dispersion in shallow water flow is simulated by combining the MRT-LBM model and the TRT-LBM model. The GPU parallel performance increases as the grid size increases. The results indicate the promise of the GPU-accelerated LBM for modeling mass transport phenomena in shallow water flows.

Last Updated: 1 Dec 2010

Shallow Water Fluid Flow
Louisiana State University
Speedup: 10X


Fluid Flow visualization

Shallow Water Fluid Flow

Authors: Louisiana State University
Speedup: 10X

A lattice Boltzmann method (LBM) on high performance computing (HPC) environments for three-dimensional shallow water flow fields coupled to mass transport is developed. LBM is an attractive method for solving the multilayered shallow water equations because the extension to multilayer is straight forward with all of the simplicities and advantages of the LBM in mass transport in shallow water flows and the LBM performance on central processing unit (CPU)-based and graphics processing unit (GPU)-based HPC environments.

 

Last Updated: 6 Sep 2009