<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Using Parallel For Loops (parfor) with MATLAB® and Jacket</title>
	<atom:link href="http://blog.accelereyes.com/blog/2010/02/10/using-parallel-for-loops-parfor-with-matlab-and-jacket/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.accelereyes.com/blog/2010/02/10/using-parallel-for-loops-parfor-with-matlab-and-jacket/</link>
	<description>Helpful posts about GPU computing. Discussion of Jacket and ArrayFire. Real speedups on real code!</description>
	<lastBuildDate>Wed, 02 Feb 2011 15:32:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: brett</title>
		<link>http://blog.accelereyes.com/blog/2010/02/10/using-parallel-for-loops-parfor-with-matlab-and-jacket/comment-page-1/#comment-120</link>
		<dc:creator>brett</dc:creator>
		<pubDate>Fri, 26 Feb 2010 15:10:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.accelereyes.com/blog/?p=128#comment-120</guid>
		<description>Let me explain a little about ideal configuration, first.  If you want to
use Jacket within a parfor loop, you ought to have a number of GPUs equal to
the number of workers you have started in PCT.  So, for example, if you have
a 4 CPU machine and have it configured to use a matlab pool of 4 workers,
you would to have 4 GPUs.

If you have fewer GPUs than CPUs, our recommendation is to use fewer
workers.  If you use more workers than GPUs, you will get into a situation
called thrashing. This is essentially where the different workers are
competing with each other for access to the GPUs.  The end result is poor
performance because the GPUs spend too much time flipping from task to task,
rather than staying focused.

So, let&#039;s assume you have an optimal configuration with &quot;n&quot; number of CPUs
and GPUs matching.  In this case, with Jacket MGL, each of the workers would
have their own GPU.  Your parfor loop will then split the iterations across
each of the CPU/GPU pairs, and each worker would accomplished 1/n of the
iterations. Assuming you use the Jacket gsingle or gdouble data types, your
FFTs would then be offloaded by the workers to the GPU for computation.

In small matrices, your FFTs may not see much speed up, but if you are using
large matrices, we&#039;ve seen extremely significant performance improvements.</description>
		<content:encoded><![CDATA[<p>Let me explain a little about ideal configuration, first.  If you want to<br />
use Jacket within a parfor loop, you ought to have a number of GPUs equal to<br />
the number of workers you have started in PCT.  So, for example, if you have<br />
a 4 CPU machine and have it configured to use a matlab pool of 4 workers,<br />
you would to have 4 GPUs.</p>
<p>If you have fewer GPUs than CPUs, our recommendation is to use fewer<br />
workers.  If you use more workers than GPUs, you will get into a situation<br />
called thrashing. This is essentially where the different workers are<br />
competing with each other for access to the GPUs.  The end result is poor<br />
performance because the GPUs spend too much time flipping from task to task,<br />
rather than staying focused.</p>
<p>So, let&#8217;s assume you have an optimal configuration with &#8220;n&#8221; number of CPUs<br />
and GPUs matching.  In this case, with Jacket MGL, each of the workers would<br />
have their own GPU.  Your parfor loop will then split the iterations across<br />
each of the CPU/GPU pairs, and each worker would accomplished 1/n of the<br />
iterations. Assuming you use the Jacket gsingle or gdouble data types, your<br />
FFTs would then be offloaded by the workers to the GPU for computation.</p>
<p>In small matrices, your FFTs may not see much speed up, but if you are using<br />
large matrices, we&#8217;ve seen extremely significant performance improvements.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: none</title>
		<link>http://blog.accelereyes.com/blog/2010/02/10/using-parallel-for-loops-parfor-with-matlab-and-jacket/comment-page-1/#comment-115</link>
		<dc:creator>none</dc:creator>
		<pubDate>Thu, 25 Feb 2010 21:29:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.accelereyes.com/blog/?p=128#comment-115</guid>
		<description>How does CUDA/Jacket handle scheduling if I have say, 8 CPUs all doing CUDA FFTs within a parfor loop?  I&#039;ve tried this before (without Jacket) but the computer locked up (I have a separate video card).  

Is this kind of functionality supported and/or stable?</description>
		<content:encoded><![CDATA[<p>How does CUDA/Jacket handle scheduling if I have say, 8 CPUs all doing CUDA FFTs within a parfor loop?  I&#8217;ve tried this before (without Jacket) but the computer locked up (I have a separate video card).  </p>
<p>Is this kind of functionality supported and/or stable?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

