A case study in CUDA optimization

by James Malcolm on February 20, 2010


Jimi Malcolm, VP of Engineering and Co-founder of AccelerEyes takes about 15 minutes to share CUDA optimization strategies to maximize performance of CUDA code.  Watch the video below to find out what needs to go into strategizing CUDA development to maximize performance.  Jimi uses Median Filtering for this case study.

Get the Flash Player to see this player.


Rohit February 23, 2010 at 10:01 pm

Great tutorial! I wish it was there 1.5 years ago when I was evaluating motion estimation algorithms on CUDA.

Something that is missing from the evaluation is the effect of forcing the nvcc compiler to use 9 registers and prevent it from spilling the registers to memory and still using bubble sort. It would be interesting to see if that results in an improvement or degrades performance.

malcolm March 4, 2010 at 5:06 pm

It’s easy to force nvcc to use at most 9 registers, but then it’ll just spill more into lmem — there’s no way to prevent it from spilling to lmem except changing the algorithm so it demands fewer registers.

But I like your thinking, so I put together another set of experiments–some new, some old–and I got a little carried away so I made a new post.

Thanks for the suggestion!

Comments on this entry are closed.

{ 1 trackback }

Previous post:

Next post: