Thursday, July 28, 2011

Strange but true...

I had assumed that running code in a guest OS rather than native would result in a fairly heavy performance penalty. This, then, was a pleasant surprise:

#
of cores
Native
/ Guest
Multi
processor directives

Execution
time (secs)
1 Guest Top
level loop

76.4
4 Guest Top
level loop
20.1
4 Native Top
level loop
19.8
4

Native

Matrix
multiplication subroutine
23.0


The same code runs only 1.5% slower in the guest than it does running native.

Also interesting: putting the multi-processing directives into particularly intensive sub-routines (like matrix multiplication) but not in the top level program does generate a substantial improvement, not that different from multi-processing at the top level with no MP in the subroutine. (Of course this assumes the program calls this
routine frequently...) My concern was that the overhead of flipping into and out-of the MP environment when MP subroutines are called frequently would negate its benefits - but at least in this case it appears not to be a problem.

So if I identify the routines that are called frequently and  are particularly intensive, I can improve performance in the subroutine libraries and not have to worry when coding the main routine. (Of course I probably should have written more efficient code to begin with, but that's another story).

No comments:

Post a Comment