A while ago I built a little remote to open my garage door (well, not actually my garage door but the one of the house I live in, but who cares). But the greedy kid I am, I wasn’t satisfied with just opening the garage door with it. After all it has four buttons, and so I planned on also using it as a remote for my yet to be built home automation system (a few parts of it already exist, but there is a lot of work to be done before I can really call it that).
The problem was that it had a nasty bug which was really hard to track down: It would work as intended for some time (sometimes hours, sometimes even days), but then, all of a sudden it would hang. And not only did it seize to work, it also stayed in some state where it consumed lots of power and ate up the battery in no time. This was far from usable in real life.
I had heard of a little thing called watchdog timer, which could reset the MCU if it hung. So the first thing I did was to add this watchdog timer to my code. Sure enough that didn’t fix my problem, so I tried everything I could think of from incrementing counters in EEPROM after every “real” instruction in the code to hooking up oscilloscopes and logic analyzers. But the only thing I found out was that it somehow, sometimes would get stuck in some weird kind of a reset loop.
When I ran out of ideas of what else to try I started badgering people on various internet forums. And after trying all sorts of changes to the code that people suggested, someone came up with the idea of deliberately causing a reset (among a few other things). I implemented that idea in my code and what happened was, that now every time I caused the reset, the MCU would hang. That may not sound like a good thing at first, but at least now I could reproduce the bug!
After that all it took was a bit of searching the web until I found out about a register called MCUSR, the MCU status register. Turns out that if the watchdog wants to reset the MCU, it writes some value to this register that would cause the MCU to perform the reset. But for some reason it doesn’t get cleared after the reset is done, causing the MCU to keep resetting itself until the battery is dead or the world ends, whichever comes first.
So in the end all I had to do was to set the MCUSB to zero right at the beginning of the setup() routine and all of a sudden the reset would be performed as intended and the MCU would work again. For now. I’ll have to give it a few weeks to really call this case closed, but I’m pretty confident that this was it.
All that’s left to do is to say a big THANK YOU to all the people who tried to help me solve this mystery, and especially JohnO on the Jeelabs forum, who came up with the idea of causing that reset on purpose (although he might have had something completely different in mind than crashing the MCU every time the reset fired).
The updated code can be downloaded here.