GTC is coming up…

The GPU Technology Conference (GTC) starts later this month and is sure to generate a new level of excitement and energy around GPU computing.  The conference includes over 250 technology sessions presented by industry, government, and academic technology leaders.  AccelerEyes is pleased to be well represented at this year’s conference by our technical leadership and a number of our customers.  If you plan to attend the conference be sure to include the sessions outlined below on your agenda.

In addition to being well represented, we are also flattered to see that others in the market have recognized that GPU Computing with MATLAB delivers clear productivity gains and that the performance improvements made possible by GPUs is a reality today.  Most notably, The MathWorks will share its vision and capabilities for GPU Computing with MATLAB during the conference, which should increase the visibility and demand for the technology worldwide.  We encourage everyone to attend the session to learn about their new offering.

AccelerEyes will be demoing Jacket at Table #56 and hope that you will stop by to see the latest and greatest Jacket technology during the conference.

Jacketized GTC Sessions

2132 – Accelerating Biologically Inspired Computer Vision Models

Join us for a discussion on applying commodity-server-based clusters and GPU-based clusters to simulating computer vision algorithms at a scale that approaches that of biological vision. We consider the limitations of each technology, survey approaches taken thus far, and suggest new hybrid models and programming frameworks to overcome current limitations and substantially improve performance.

Speaker: Tom Dean, Google Inc.
Topic: Computer Vision, Machine Learning & Artificial Intelligence
Time: Tuesday, September, 21st, 11:00 – 11:50

2268 – Think Data-Parallel! Building Data-Parallel Code with M

Discover and leverage parallelism inherent in pre-existing codes. Often times, parallelism is hidden in seemingly serial programs. This is due obfuscation via indexing or looping wherein the parallelism is seemingly non-existent. Several real-world examples of seemingly serial code demonstrate simple, yet surprisingly effective rules for detecting potential parallelism.

For each example, learn how to express the code at a higher, more concise level in M by vectorizing computations. We give several canned techniques of vectorization for many common, and sometimes very difficult, use cases.

Learn how such vectorization concisely brings the parallelism of code to the forefront and transforms programs that might have been originally difficult to run on a SIMT device very suitable for execution on the GPU. GPU speedups will be shown utilizing Jacket.

Speaker: Gallagher Pryor, AccelerEyes
Topic: General Interest
Time: Tuesday, September, 21st, 15:30 – 15:50

2300 – High-Performance Compressive Sensing using Jacket

This talk will present the ongoing work that I am doing in the L1-optimization group at Rice University. The purpose of the work is to merge both compressive sensing, for image/signal reconstructions and GPU computation, using NVIDIA’s GPUs to enhance the technology of CS.

This talk will cover basic concepts in compressive sensing and the easy adaptation of operating on the GPU, in particular working with Jacket (by AccelerEyes). We will then cover some of our numerical experiments that encompass the use of different flavors of algorithms.

Speaker: Nabor Reyna
Topics: Imaging, Tools & Libraries
Time: Wednesday, September, 22nd, 10:30 – 10:50

2201 – A Case Study of Accelerating Matlab Based Applications using GPUs

Learn how to accelerate Matlab based applications using GPUs. We cover a popular neuro-imaging software called SPM and show how to use CUDA and Jacket to speedup computationally intensive Matlab applications.

Speaker: Aniruddha Dasgupta, Georgia Institute of Technology
Topic: Medical Imaging & Visualization
Time: Wednesday, September, 22nd, 16:00 – 16:50

2271 – Compose CUDA Masterpieces! Write better, Leverage More

Not all CUDA code is created equally. Learn how to step up your CUDA game. Also, learn how to build large, multi-person CUDA projects for your organization.

In very clear descriptions, learn the difference between naïve GPU code, intermediate GPU code, and advanced GPU mastery. We show how careful construction of CUDA kernels can affect application performance.

We also discuss how Jacket tools greatly facilitate the development of CUDA-based projects.

Finally, we will debut the Jacket runtime’s new C/C++ library. With this library, the technical computing functions in Jacket’s MATLAB engine are made available in C/C++.

Speaker: James Malcolm, AccelerEyes
Topic: Tools & Libraries
Time: Thursday, September, 23rd, 16:00 – 16:50

2100 – Hybrid GPU/Multicore Solutions for Large Linear Algebra Problems

Large linear algebra problems may be solved using recursive block decomposition in which GPUs efficiently compute the sub-blocks and multicore CPUs put the sub-blocks back together within a large shared memory space. This talk will present benchmark results for such a hybrid approach, implemented in Matlab® and using Jacket® to access the GPU compute power.

Speaker: Nolan Davis, SAIC
Topics: High Performance Computing, Algorithms & Numerical Techniques, Signal processing
Time: Thursday, September, 23rd, 16:00 – 16:50

{ 0 comments }

Following our recent Jacket v1.4 Fermi architecture release, many of you requested data comparing the new NVIDIA Fermi-based Tesla C2050 versus the older Tesla C1060.

Over the years, AccelerEyes has developed an extensive suite of benchmark MATLAB applications, which are included in every Jacket installation. Using this suite of tests, we compared performance of the C2050 vs C1060 and are pleased to report the results here. We hope this information will be useful to Jacket programmers.

All tests were run on the same standard workstation with Jacket 1.4. The only thing that changed was the actual GPU board. In every case the C2050 beat the C1060. Double-precision examples on the Fermi-based board outperformed the older board by 50% in every case and better than 2x in many cases.

Note: ECC was enabled on the Fermi boards

In addition to the standard Jacket examples, matrix multiplication with SGeMM and DGeMM was performed and plotted in the following charts. This matrix multiply implementation was developed in-house at AccelerEyes and outperforms both CUBLAS and Magma considerably, see MTIMES benchmarks. Special thanks to Torben Larsen for benchmarking results.

2050 vs 1060 floating point performance

2050 vs 1060 floating point performance
As we generate or receive more comparison data we will communicate results.

{ 1 comment }

Jacket for MATLAB now available for NVIDIA Fermi!

July 14, 2010

We are pleased to announce Jacket 1.4, with support for the latest NVIDIA graphics processing units based on the Fermi architecture (Tesla 20-series and GeForce GTX 4xx-series). NVIDIA’s release of the Fermi architecture brings with it 448 computational cores, increased IEEE-754 floating-point arithmetic precision, error-correcting memory for reliable computation, and enhanced memory caching mechanisms. Highlights [...]

Read the full article →

SGEMM, MTIMES & CUBLAS performance on the GPU

June 24, 2010

AccelerEyes is focused on not only providing the most easy to use GPU programming platform for CUDA capable GPUs by leveraging the MATLAB® language, our engineering organization is always looking for ways to improve the performance of all areas in the Jacket platform. A case in point is some recent work with matrix multiplication, specifically [...]

Read the full article →

GCOMPILE & GPROFILE: A Sneak Peek

June 18, 2010

The research and engineering teams at AccelerEyes have prepared some exciting new additions for Jacket. These additions will enable you to get even more leverage out of NVIDIA GPUs for computing in MATLAB.  Over the past few years we’ve had the pleasure of working along side scientists and engineers using Jacket, and have learned a [...]

Read the full article →

Jacket accelerating life science and defense applications

May 28, 2010

With IBM’s decision this week to integrate Tesla technology into it’s high performance computing line, there should be no doubt that GP-GPU computing is more than a fad, organizations solving technical problems are able to do them more productively and efficiently than ever before with GPUs.  AccelerEyes’ customers are experiencing this first hand with the [...]

Read the full article →

Rapid Application Development platform for GPGPUs – Jacket with MATLAB®

May 23, 2010

If you’re a MATLAB user and want to apply your applications to NVIDIA GPUs for performance improvement but don’t want to write C, C++, or CUDA code, attend this seminar to learn more about Jacket for MATLAB – http://www.accelereyes.com/resources/junewebinar

Read the full article →

NVIDIA Fermi with CUDA and OpenCL

May 10, 2010

In December of 2008, we did a blog post answering questions from customers and prospects about the use of OpenCL for Jacket.  If you have not reviewed that blog post to gain some insight into our progress you can access it here – http://blog.accelereyes.com/blog/2008/12/30/opencl/. Some things have changed since that original post.  For example, NVIDIA [...]

Read the full article →

Vectorizing MATLAB Code for GPU Computing

May 5, 2010

Over the last couple of months we have participated in some pretty amazing stories about dramatic speed ups for customer M-codes. For example, one customer went from 400 minutes of run time to 20 seconds and then further optimization dropped their runtime from 20 seconds to 65 milliseconds! This signifies a greater than 1000x performance [...]

Read the full article →

Power Flow with Jacket & MATLAB on the GPU!

April 27, 2010

Learn how Jacket, GPUs, and MATLAB can deliver magnitudes of performance improvement over CPU-based solutions for Power flow studies. AccelerEyes, in collaboration with the Indian Institute of Technology in Roorkee, has developed this case study to illustrate the ability to study power flow models on graphics processing units using Jacket and MATLAB. Implementation on the [...]

Read the full article →