Hi Mahesh,
If I understand the code correctly, the first instruction "s0 = dm(i0, m1);" is NOT a dummy read.
You need to initialize both r0 and s0 before SIMD mode takes effect. If r0 is not initialized then the instruction "dm(i0,m3)=f0;" will store an unknown value in memory due to whatever was present in r0 before these lines of code were executed.
This dm(i0,m3)=f0 write is critical because it takes care of the situation when the pointer i0 is at the last element in the delay line. Circular buffering will cause the pointer to wrap around to the beginning, but this write guarantees that the last delay line element PLUS ONE will contain the same data as the first delay line element, to take care of the issue where a SIMD read would cross the circular buffering boundary.
I think I understand it, I just want this explanation to be clear for others who might encounter this issue.
Thanks,
-Brian