A lot of ArrayFire users have been interested in the usage of ArrayFire in partnership with OpenGL for graphics computation. In the long run, we do plan to expand further on the interoperablilty and make it easier through ArrayFire. For now, we have developed a small example to expand on the usage of the CUDA-OpenGL interop API to assist in the interop operations between ArrayFire and OpenGL.

Some of the advantage of direct ArrayFire-OpenGL interop are:

  • Faster data transfers: Since the OpenGL buffers as well as ArrayFire data reside on the GPU, we can use a direct device to device copy rather than using the CPU as an intermediate and the relatively slow PCIe interface.
  • Offscreen rendering: It is commonly known that rendering to screen is a slow process. Using the API shown in this example, we can render directly of off screen framebuffers and retrieve the data in ArrayFire.
  • Transform feedback: This example also shows the usage of Transform feedback buffers in OpenGL. Transform feedback buffers allow the user to retrieve data from in-between the rendering pipeline, for example getting back the transformed vertices after the operations in vertex shaders.

Since a lot of data transfers to CPU are eliminated, we can do a lot more processing faster.

Let us now discuss the example. This example is relatively simple and renders a single triangle.

Code Walkthrough

The code for this example can be found at our github page.

The code relies upon the following libraries:

  • ArrayFire 2.0
  • CUDA-OpenGL Interop (CUDA 5.5)
  • GLFW 3
  • GLEW (1.x, latest version recommended)

Let us walk through the example step by step. Please note that any steps that do not contribute to the interop and are just common OpenGL code will not be covered in this blog.


int device = argc > 1? atoi(argv[1]):0;

int width = 400, height = 300, channels = 4;
int num_triangles = 1, num_vertices = 3;

array af_indices = array(3, indices);
array af_vertices = array(3, num_vertices, vertices);	//3 floats per vertex, 3 vertices
array af_colors = array(3, num_vertices, colors);		//3 floats per color, 3 colors
array af_image = constant(0.0, height, width, channels);
array pkd_image = constant(0.0, channels, width, height);
array af_project = constant(-1.0, 3, num_vertices);

This section initializes ArrayFire and create arrays for the basic data structure. Since ArrayFire stores data as column major, we decided that the arrays will have each vertex as one column, ie. if you had an array of 10 vertices with 3 coordinates each, then your array would have dimensions 3×10.

Most of the array names are self explanatory. pkd_image is used to store the packed image data returned by OpenGL through the framebuffer. af_image is used to store the returned image in the ArrayFire format (unpacked). af_project stores the result of the projection returned after the vertex shader stage (using transform feedback). The size of project is same as the vertex buffer.

bool onscreen = true;			//Change to false for offscreen renderering
init_glfw(width, height, channels, onscreen);

The onscreen flag is a useful one. When it is set to true, the render happens on the GLFW window. Although not a part of the code, it is possible to retrieve the rendered image back into ArrayFire. However, turning this flag to false results in offscreen rendering. This means rendering is done directly to a software framebuffer rather than to the screen, hence it is faster. It also allows the rendered image to be retrieved back into ArrayFire. If you wish to display this image then the ArrayFire image() function can be used (will be discussed a little later).

init_glfw is used to initialize the GLFW and GLEW context.

Initialize Buffers

GLuint vertex_b = 0;
cudaGraphicsResource_t vertex_cuda;
create_buffer(vertex_b, GL_ARRAY_BUFFER, &vertex_cuda, 3 * num_vertices * sizeof(float), GL_DYNAMIC_DRAW);

These lines are an abstraction to create the OpenGL buffers and bind them to their respective CUDA Graphics Resource object.

void create_buffer(GLuint& buffer, GLenum buffer_target, cudaGraphicsResource_t* cuda_resource, const unsigned size, GLenum buffer_usage, const void* data = NULL)
    glGenBuffers(1, &buffer);
    glBindBuffer(buffer_target, buffer);
    glBufferData(buffer_target, size, data, buffer_usage);
    CUDA(cudaGraphicsGLRegisterBuffer(cuda_resource, buffer, cudaGraphicsRegisterFlagsNone));
    glBindBuffer(buffer_target, 0);

This section of code shows the creation and registering of the OpenGL buffer with CUDA. The buffer generation and bindings are done exactly as in regular OpenGL. The cudaGraphicsGLRegisterBuffer registers the buffer with CUDA. If using a texture or renderbuffer or the like, then use cudaGraphicsGLRegisterImage.

// Render Buffer
GLuint render_b = 0;
cudaGraphicsResource_t render_cuda;
create_buffer(render_b, GL_RGBA32F, width, height, &render_cuda);

// Depth Buffer - for off screen rendering
GLuint depth_b = 0;
create_buffer(depth_b, GL_DEPTH_COMPONENT, width, height);

In the code above, the framebuffer and depthbuffer are initialized which can now be used for off screen rendering.

Framebuffer, Transform Feedback and Shader Loading

//Required for framebuffer copy
GLuint frame_b = 0;
bind_framebuffer(render_b, depth_b, frame_b);\

// Initialize shaders
init_program("shader.vert", "shader.frag");

// Initialize transform feedback
init_projection(transform_b, project_b);

This section of the code is used to initialize depth and framebuffers for off screen rendering.  It also loads and compiles the shaders and initializes the transform feedback buffers.

Render Loop


GLFW allows the user to control to render loop. So we use a simple while loop with a GLFW terminating condition.

Copy Data to OpenGL Buffers

float* d_vertices = af_vertices.device<float>();
			 3 * num_vertices * sizeof(float));

In the render loop, we first copy the vertex and color data to their respective OpenGL buffers. In the copy_from_device_pointer function, we have:

template<typename T>
void copy_from_device_pointer(cudaGraphicsResource_t cuda_resource,
                              T& d_ptr,
                              GLuint buffer_target,
                              const unsigned size)
    CUDA(cudaGraphicsMapResources(1, &cuda_resource));
    bool is_mapped = true;
    if (buffer_target == GL_RENDERBUFFER) {
        cudaArray* array_ptr = NULL;
        CUDA(cudaGraphicsSubResourceGetMappedArray(&array_ptr, cuda_resource, 0, 0));
        CUDA(cudaMemcpyToArray(array_ptr, 0, 0, d_ptr, size, cudaMemcpyDeviceToDevice));
    } else {
        T* opengl_ptr = NULL;
        CUDA(cudaGraphicsResourceGetMappedPointer((void**)&opengl_ptr, (size_t*)&size, cuda_resource));
        CUDA(cudaMemcpy(opengl_ptr, d_ptr, size, cudaMemcpyDeviceToDevice));
    unmap_resource(cuda_resource, is_mapped);

In this function, we first “map” the cuda resource of the buffer using the cudaGraphicsMapResources function, ie. enable the buffer for access by CUDA. This allows CUDA to copy memory to and from the buffer. Once completed, we must unmap the resources.

Other Memory Copying Notes

If using a renderbuffer or texture, the memory needs to be copied using CUDA textures of cudaArrays and then using the cudaMemcpy(To/From)Array function. If using other buffers like GL_ARRAY_BUFFER, we can simply get the pointer using the cudaGraphicsResourceGetMappedPointer function and then copy it using the cudaMemcpyDeviceToDevice flag.

Copying from OpenGL buffers to CUDA is also done in the same way. See the copy_to_device_pointer() function in the code.

In the code that follows the memory copies, we call the render function which renders the data to screen or framebuffer. It uses relatively well know OpenGL calls and hence I will not be covering it in the blog. Any questions can be asked on the ArrayFire forums. Once the rendering is complete, we copy the transform feedback memory and the framebuffer memory back to ArrayFire.

On exit, delete the buffers and exit.

Compile and Run


On Linux, the Makefile is provided. In the Makefile, please update the paths to ArrayFire (AF_PATH), CUDA and GLFW (GLFW_LIB_PATH).

As is, the makefile assumes that CUDA, GLFW and GLEW is installed in the default locations.

To compile and run:



On Windows, you can create a CUDA project in Visual Studio. Then add the locations of the include and library directories of ArrayFire, GLFW and GLEW in the project properties. The code is OS-independent and should run without any modifications.


The output, in case of onscreen rendering should be a rotating triangle. A sample screenshot is shown below.

ArrayFire CUDA OpenGL Interop

Here are some useful resources to look up:

Please feel free to ask any questions about this interop example on the ArrayFire forums. We are working on a larger, more extensible interop code that we will release upon request.

Known issues: The ArrayFire graphics API is known to not work as intended when using external OpenGL context. It is advised that visualizations be limited. If they still do not work, the saveimage function can be a fallback, albeit a last resort fallback.

Github Linkhttps://github.com/arrayfire/ArrayFire-OpenGL-Interop

{ Comments on this entry are closed }



We are thrilled to announce the official release ArrayFire v2.0, our biggest and best product ever!

ArrayFire v2.0 adds full commercial support for OpenCL devices including all AMD APUs and AMD FireProTM graphics, CUDA GPUs from NVIDIA, and other OpenCL devices from Imagination, Freescale, ARM, Intel, and Apple.

ArrayFire is a CUDA and OpenCL library designed for maximum speed without the hassle of writing time-consuming CUDA and OpenCL device code.  With ArrayFire’s library functions, developers can maximize productivity and performance. Each of ArrayFire’s functions has been hand-tuned by CUDA and OpenCL experts.

Announcing ArrayFire for OpenCL

  • Support for all of ArrayFire’s function library (with a few exceptions)
  • Same API as ArrayFire for CUDA enabling seamless interoperability
  • Just-In-Time (JIT) compilation of kernels for top performance
  • Specific tuning for Intel Xeon Phi coprocessors
  • Accelerated algorithms for image processing, signal processing, visualization, and more

Updates to ArrayFire for CUDA

  • New signal and image processing functions
  • Faster transpose and matrix multiplication
  • Enhanced debugging support for GDB and Visual Studio
  • Better examples and documentation

Maybe the upgrade you wanted isn’t mentioned here. Don’t despair! Check out our release notes, as well as our new and improved documentation, for a complete list of the many ArrayFire v2.0 enhancements available.

To learn more about which licensing option would be the best for your needs, visit our ArrayFire licensing page.

Just getting started with GPU computing? Need an extra hand on a project? Tap into our deep parallel computing expertise and vast code base by setting up a free technical consultation today. 

We’re always looking to make ArrayFire even better—let us know your thoughts through this short survey. We promise it’ll be worth your while!

Stay tuned for more exciting news from the ArrayFire gang coming soon…

{ Comments on this entry are closed }

Partners Magnify the SC13 Experience

November 21, 2013

Yesterday, we posted photos from our exhibit. Today was the last day of SC13, and we want to tip our hat to the wonderful partners that magnified our SC13 experience. Creative Consultants, Mellanox, and Allinea Creative Consultants ran an ArrayFire demo across several nodes using Mellanox interconnect. The demo was a multi-node, multi-GPU lattice boltzmann simulation. Allinea […]

Read the full article →

Photos from SC13

November 21, 2013

SC13 was awesome this week! Tomorrow is the last day of the exhibition. For those of you that did not make it to the show, here are some pictures from our exhibit: The AccelerEyes Booth ——————————————————————————————————– ArrayFire OpenCL Demo on ARM Mali ——————————————————————————————————– ArrayFire CUDA Demo on NVIDIA K40 ——————————————————————————————————– ArrayFire OpenCL Demo on Intel […]

Read the full article →

APU 2013 – Day 3 Recap

November 14, 2013

Big announcement here at #APU13! AMD CTO, Mark Papermaster, just announced 2 additions to the 2014 Mobile APU roadmap http://t.co/sWHMhb9AAe — AMD (@AMD) November 13, 2013 Today was the final day of AMD’s APU 2013 conference. The theme of today was mostly focused on gaming topics, so it was not as relevant to technical computing as yesterday. […]

Read the full article →

APU 2013 – Day 2 Recap

November 13, 2013

Today was the first full day of AMD’s APU 2013 conference. It was a whirlwind of heterogeneous computing. From the morning keynotes, three particular salient points stuck out to us: Mike Muller, CTO at ARM, talked about heterogeneous computing. He said it nicely with, “Heterogeneous computing is the future. It has also been our past, […]

Read the full article →

APU 2013 – Day 1 Recap

November 12, 2013

AMD’s APU 2013 kicked off today with keynotes and a welcome reception. The developer summit is themed as the epicenter of heterogeneous computing. AMD has a world class CPU and a world class GPU and is pushing the industry forward by combining both of those devices into the same chip, the APU. AMD’s APUs are programmable […]

Read the full article →

ArrayFire v2.0 Release Candidate Now Available for Download

October 7, 2013

ArrayFire v2.0 is now available for download. The second iteration of our free, fast, and simple GPU library now supports both CUDA and OpenCL devices. Major Updates ArrayFire now works on OpenCL enabled devices New and improved documentation Optimized for new GPUs–NVIDIA Kepler (K20) and AMD Tahiti (7970) New in ArrayFire OpenCL Same APIs as ArrayFire […]

Read the full article →

ArrayFire for Defense and Intelligence Applications – Joint Webinar Recap

September 30, 2013

In case you missed it, hundreds of attendees recently joined us in a special joint webinar with NVIDIA. The webinar was led by Kyle Spafford, a Senior Developer at AccelerEyes. Kyle detailed how GPU computing can be implemented in the defense and intelligence fields. Kyle specifically addressed enabling unique solutions for applications related to video analysis, recognition, […]

Read the full article →

Joint Webinar with NVIDIA – ArrayFire for Defense and Intelligence Applications

September 4, 2013

AccelerEyes and NVIDIA invite you to participate in a joint webinar designed to help you learn about ArrayFire, a productive, easy-to-use GPU software library for C, C++, and Fortran. Major defense and intelligence institutions are discovering just how effective GPU computing can be in enabling unique solutions for applications related to video analysis, recognition, and tracking. During this informative […]

Read the full article →