Quantcast
Channel: EngineerZone: Message List
Viewing all articles
Browse latest Browse all 28044

Re: compiled code (pragma optimize_for_speed) not optimal

$
0
0

Hi Christoph,

 

If you load the loop bound first into a local variable, then the #pragma no_alias isn't necessary (in fact, it isn't helping you anyway, as I noted above). The use of #pragma loop_count doesn't help to reduce the number of instructions in the loop itself (I always get a single instruction loop when the upper bound is a local variable). However it does help with the code around the loop - you're correct that a check that count > 0 is required to decide whether to enter the loop at all. Compare the following without the loop_count pragma:

 

  R0 = W[P1] (X);

  CC = R0 <= 0;

  if CC jump .P33L3 ;

 

.P33L6:

  P1 = R0;

  R0 = 0;

  LOOP .P33L2L LC0 = P1;

 

.P33L2:

  LOOP_BEGIN .P33L2L;

  [P0++] = R0;

  LOOP_END .P33L2L;

 

with the following when I use the pragma:

 

  R2 = R2 - R2 (NS) || R1 = W[P1] (X);

  P1 = R1;

  P0 = R0;

  LOOP .P33L2L LC0 = P1;

 

.P33L2:

  LOOP_BEGIN .P33L2L;

  [P0++] = R2;

  LOOP_END .P33L2L;

 

The loop kernel is identical, but the latter code is slightly more efficient to get to the loop.

 

If you're seeing different behaviour from this, perhaps you could post a complete, compilable example, and specify the compiler options and tools versions you are using?

 

All the best,

 

Michael.


Viewing all articles
Browse latest Browse all 28044

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>