STM32 HW Watchdog

boikonur · January 4, 2025, 5:25pm

Hey All, it has been a while
Recently I noticed that when my kids play with my sabers, small SD reconnects due to hard hits :D, or some software related glitches (event flooding), make the saber stuck.
So I noticed we don’t utilize the hardware watchdog, which can help exactly to recover from such states. And on the other side It can allow pretty nice error handling mechanisms.
I presume it might have been already considered and deemed unusable, but I will give it a try .
Let me know what you think.

profezzorn · January 4, 2025, 8:33pm

I did experiment with some sort of hardware watchdog a long time ago. Might have been on the teensy though. I don’t actually remember what prompted me to do so. At the time, I couldn’t get it to work properly, and I don’t know that anybody has really asked for it since then. At least until now.

Anyways, I think having support for it would be nice, although obviously it would be better if it just didn’t crash at all. Also, I think I would require users who want a watchdog to enable it with a define, because otherwise some users will just not report problems.

I would have some comments on the code if it was a pull request, but the idea seems sound.

boikonur · January 7, 2025, 8:43am

I actually first approached it by making a reset in the Hardfaulth handler, but saw there is some USB polling chunk , and I risk introducing quite the boot loop condition
So my idea is to enable the HW watchdog, and also enhance it with software wathcdog, e.g a WDT task that does the wdt feeding which is notified from all the other tasks that can probably stuck. I need to give it a deeper thought.

Btw never payed attention to the proto thread mechanism you did for the OS, nice little real time Duff’s device. Amazing idea!

profezzorn · January 7, 2025, 9:31am

It should be enough to just have a SaberBase instance with a watchdog kick in the loop(). Everything except a few interrupts runs from loop(), so if loop() gets stuck, we want the watchdog to do something.

There is also some code somewhere that can tell you why a reset occurred (like, if it’s a sleep wakeup) I’m not sure if that code can tell that something was caused by the watchdog though.

boikonur · January 7, 2025, 1:02pm

Reset reason is recorded in the RCC register which can be read on boot
here-> arduino-proffieboard/system/STM32L4xx/Source/stm32l4_system.c
( it also checks the WDT reset flag)
Which will allow the basic function to at least print the reset reason on next boot.
For the extra verbose recording Im considering two options:

the RTC_BKP0R backup register which is enough to store also some info upon reset which task got stuck for example, PC or other useful registers and values (but i checked the 2.2 board doesn’t have VBAT supplied, and not sure if its the same case with the new board)
Flash, buut I’m afraid of wear due to frequent events

profezzorn · January 7, 2025, 7:21pm

The flash in the Proffieboards is quite resilient.
If you have a pool of flash memory, then append one record at a time, but erase one half at a time, the number of erase cycles would be at least 10x less than the number of records written, which would mean that it would take a very long time to wear out the flash.

That said, I’m not sure we care enough about the crash reasons to do that.