CUDA      OpenCL
C/C++    Fortran    Python C/C++

ArrayFire CUDA C/C++ Documentation

Quick Links
Release Notes
Getting Started
Quick Reference
Parallel Loops: gfor

ArrayFire is the fastest GPU matrix library with the simplest API.

ArrayFire is the fastest GPU software

  • It contains the fastest implementations of hundreds of matrix, signal, and image processing routines that enable it outperform CPU libraries like IPP, MKL, Eigen, Armadillo, and more.
  • It is optimized for any CUDA-enabled GPU. The same code will run on laptops, desktops, or servers.
  • It includes thousands of lines of highly-tuned device code.
  • It performs run-time analysis of your code to increase arithmetic intensity and memory throughput while avoiding unnecessary temporary allocations.
  • It combines and enhances all the best CUDA libraries available, including the fastest FFT, BLAS, and LAPACK implementations.

ArrayFire is the easiest-to-use GPU software

  • A few lines of ArrayFire code accomplishes what would have taken 10-100X lines in raw CUDA.
  • It is easier than templated programming and goes farther than simple directive-based approaches (and outperforms those approaches too).
  • It can be used in C/C++ applications by itself or integrated with your existing CUDA code (see more).

ArrayFire is the most comprehensive GPU software

  • It has hundreds of functions you need to make your code faster including arithmetic, linear algebra, statistics, signal processing, image processing, and related algorithms (see more).
  • It supports single- and double-precision floating point values, complex numbers, booleans, 32-bit signed and unsigned integers (see more).
  • It supports manipulating vectors, matrices, and N-dimensional arrays (see more).
  • It can execute loop iterations in parallel with gfor (see more).

See it in action...

Here's a stripped down example of Monte-Carlo estimation of Pi:

    #include <stdio.h>
    #include <arrayfire.h>
    using namespace af;

    int main() {

        // create GPU arrays
        int n = 20e6; // use 20 million random samples
        array x = randu(n), y = randu(n);

        // determine how many fell inside the unit circle?
        // ... calculate on GPU, return result to CPU
        float pi = 4.0 * sum<float>(hypot(x,y) < 1) / n;

        printf("pi = %g\n", pi);
        return 0;
    }

Download and Requirements

Download the latest stable version of the library and view documentation online. See Release Notes for a list of changes in each version. You can also download nightly builds including the latest features and bug fixes.

Supported Platforms:

  • Windows (32 and 64-bit) - XP, Vista, 7
  • Linux (32 and 64-bit) - Ubuntu 10+, Fedora 10+, OpenSUSE 11+, RHEL 5+, CentOS 5+, SLES 10+
  • Mac OSX (64-bit) - Snow Leopard (10.6.x)

Requirements:

The Getting Started tutorial walks through more detailed steps on getting your first example to compile and run. You can also browse through the examples online.

Support