The research and engineering teams at AccelerEyes have prepared some exciting new additions for Jacket. These additions will enable you to get even more leverage out of NVIDIA GPUs for computing in MATLAB. Over the past few years we’ve had the pleasure of working along side scientists and engineers using Jacket, and have learned a great deal about how people want to use MATLAB and GPUs. These new additions make significant progress towards enabling programmers to profile GPU computations and exert greater control over compilation and kernel execution inside Jacket.
In this post, we introduce GPROFILE and GCOMPILE for Jacket in anticipation of their upcoming release in the near future. A brief overview of these two new functions follows.
GPROFILE: One-of-a-kind Profiling and Tuning Tool for GPUs and MATLAB
Historically, Jacket’s optimizations have occurred behind-the-scenes. Feedback from programmers on the Jacket Forums led us to dig in and develop a tool to enable users to monitor their scripts and see reports of runtime details that are unique to GPROFILE:
- timings between CPU and GPU for each GPU function,
- differences between the results of CPU and GPU functions (for easy numerics and debugging),
- unique timings per input size of CPU/GPU functions.
This diagnostic output will be available in the immediate future as:
- HTML reports viewable in a browser,
- Color coded reports on the MATLAB command line.
These reports immediately identify the areas of code that are taking advantage of the GPU, how much benefit the GPU is giving, and also identify areas of code that are slow and need attention. Coupled with the Jacket Tips and Code Vectorization resources, GPROFILE reports add a new level of high-performance tuning for Jacket codes.
GPROFILE Console Report
Below is an example session of work with GPROFILE. To profile Jacket code, the profiler is first enabled with the GPROFILE ON command (the GUI version, gprofview, will be presented in a later blog post). Then, the code to be profiled is executed – here we are running a simple k-means clustering script. GPROFILE REPORT (the base reporting mechanism on the console) gives a simple overview of performance on a per-command basis, 
Here, we see that, on average, all the calls to the functions SUBSREF, BSXFUN, TIMES, PERMUTE, EQ, and FIND on the GPU outperformed the CPU versions by varying factors as indicated by their green coloring. Red lines indicate commands that, on average, performed worse on the GPU than the CPU; these are most likely caused by either minuscule timings (such as the case with SUBSASGN at the end of the list), odd memory arrangements (as in the case of MIN), or small data sizes (as in SUBSASGN on line 17 – the data size report is not shown for SUBSASGN). In this case, GPROFILE is particularly useful as it identifies MIN as something that could be done differently to get the maximum work out of the GPU and in the future, GPROFILE will make suggestions on how to get the performance in such situations up.
Drilling down deeper by adding keywords (commands names, file names, and line numbers) to GPROFILE REPORT, information on a per-line basis is available and also on a per use case basis as shown on the last line of the screenshot. Here, we see that TIMES was executed on two matrices of size 23 million by 5. This information is very important for code development because data-parallel programs’ speed and algorithms depend on and are very sensitive to data size, arrangement, and shape. Being able to drill down to CPU vs GPU timings on a per-data-size and per-use-case basis is another unique feature of GPROFILE that allows programmers to optimize algorithms down to each of their use-cases or otherwise better-leverage their GPUs by maximizing the funnels of data making their way through code.
The final unique feature of GPROFILE is the identification of lazily evaluated functions or functions that are compiled directly to PTX by Jacket at run-time as shown in the last column of the GPROFILE REPORT output. By maximizing the use of these functions and chaining these together as much as possible, memory latency is minimized in the GPU code that Jacket runs.
GCOMPILE
GCOMPILE is a new feature that allows MATLAB developers to pre-compile sections of performance critical code. After isolating a critical section, developers format the code and pass it through GCOMPILE for analysis. The compiled function can then be applied with less overhead.
Let’s jump into an example. At present, one of the key missing features of GFOR are if-statements. With GCOMPILE, Jacket is able to handle such statements. For example, suppose you wanted to threshold an image and needed to use an if-statement. You need to use something like verbatim.m to push your critical function into a string for the compiler.
code = verbatim;
%{
function out = main(in, threshold)
if in > threshold
out = threshold / 5;
else
out = sin(in);
end
end
%}
% compile and get function handle for later use
fn = gcompile(code);
thresh = 42;
for i = 1:n
img = volume(:,:,i);
out(:,:,i) = fn(img, thresh); % apply the function to each image slice
end
The string code sample is essentially applied element-wise. In the main() function, the variable in contains a scalar element pulled from img while threshold is the same MATLAB scalar 42;
Another example kernel might look like this:
code = verbatim;
%{
function D = code(A, B, C)
D = A * C + cos(B); % all element-wise
end
%}
Jacket v1.4 release candidates are rolling out with new on-the-fly compilation technology, GCOMPILE is the first of many extensions building on that technology. Please contact support if you would like to be included in the pre-release testing. GCOMPILE is scheduled for inclusion in the first release candidate after v1.4. The initial version only supports a subset of the M language.
Here are some other planned features, but be sure to tell us what you want in the comments section or on our forums.
- Read the functions from disk.
- Nested functions
- Reductions (e.g. sum/min/max/..)
- For-loops, subscripting, etc.
~Puyan Lotfi, ~Brett Lucey (gcompile)
~Gallagher Pryor, ~Joe Uhl (gprofile)

