IR signal bursts are usually several hundred microseconds in length. IR
receivers are reported to tolerate a 10% error in the incoming signal. Still,
this meant we needed to measure time to a few tens of microseconds accuracy.
Because we sought to integrate with the IOIO framework while handicapping it as
little as possible, we wanted to avoid allocating a dedicated timer to measure
time. Thus, we decided to use dead loops to delay execution where necessary
(i.e. to hold the IR LED on or off for the right amount of time).
Microchip ships with its libraries two macros, DelayMs(x) and Delay10us(x), that do exactly that – they expand to dead loops that should last for x milliseconds or tens of microseconds. When we first implemented our transmitter, as a stand-alone program with hard-coded burst values, we used the Delay10us macro successfully, and managed to turn on a DVD device. However, when we took the same hard coded values and, instead of inserting them directly to the macros, used a variable to pass them off, the signal stopped working. We used a phone camera to see if the IR LED is turning on at all, and noticed that using the variable version, the generated signal was significantly longer, this can be seen in the following videos: hard-coded vs. variable.
The delay loop has the following form:
#define Delay10us(x) \
{ \
unsigned long _dcnt; \
_dcnt=x*((unsigned long)(0.00001/(1.0/GetInstructionClock())/6)); \
while(_dcnt--); \
}
We used this macro in a series of instruction to generate the IR pulses. At first we thought that the difference stems from the fact that when the value is hard-coded, the compiler simply optimizes away the entire _dcnt calculation. This soon seemed improbable though: even if the value isn't hardcoded, the compiler can still calculate most of the _dcnt calculation, leaving a need to only multiply by x, which should be no more than one or two instruction (one MUL instruction, which takes at most 2 cycles, plus maybe another instruction to get the variable into a register) per burst. To verify this, we calculated the _dcnt values up front, and then passed them to the macro, which now looked like this:
{ \
unsigned long _dcnt; \
_dcnt=x; \
while(_dcnt--); \
}
Indeed, this did not improve the speed of the signal (at least not noticeably).
We moved on to take a closer look at the machine code of this while loop. Eventually, we realized that the loop actually compiles differently when the value for the _dcnt variable comes from a variable rather than being hard-coded. The loop's compiled code for the two version can be seen here:
00526A 500061 SUB W0, #0x1, W0 00526C 5880E0 SUBB W1, #0x0, W1 00526E 400FE1 ADD W0, #0x1, [W15] 005270 488FE0 ADDC W1, #0x0, [W15] 005272 3AFFFB BRA NZ, 0x526A |
005266 510161 SUB W2, #0x1, W2 005268 5981E0 SUBB W3, #0x0, W3 00526A B83808 MUL.UU W7, W8, W0 00526C 430081 ADD W6, W1, W1 00526E 420000 ADD W4, W0, W0 005270 4A8081 ADDC W5, W1, W1 005272 500061 SUB W0, #0x1, W0 005274 5880E0 SUBB W1, #0x0, W1 005276 510F80 SUB W2, W0, [W15] 005278 598F81 SUBB W3, W1, [W15] 00527A 3AFFF5 BRA NZ, 0x5266 |
The variable version on the right is more than twice in size, which is why we were able to see a difference between the two using just a camera – each loop took twice as long. To test this, we changed the divisor in the _dcnt calculation of the delay macro from 6 to 12 to compensate for this oddity, and signal started working again.
We weren't able to determine exactly why this was happening, so we were faced with a problem: using a modified macro is fragile, and future version of the compiler might work differently, breaking our solution. On the other hand, we still didn't want to completely take over a timer module. Finally, we consulted Sivan and he suggested the following solution: upon program initialization, temporarily use a timer to measure how many cycles are required for a dead loop with a known number of iteration, and use this value to adjust a delay function that would be used for future delay calls. This method should have a small enough error so IR signals would still work, while having the benefit of not requiring a dedicated timer, since the timer is no longer necessary after initialization. This is what we eventually implemented in the code.