Random SD card mounting failures on OS7, with an odd fix?

I’ve had this specific saber for about a month. This issue only started happening yesterday. One of two things happen:

  1. I get random SD card not found errors upon de-activating the killswitch. Lights on chassis/hilt turn on.

  2. I get no verbal error, and nothing seemingly happens when I de-activate the killswitch. No lights, saber doesn’t work. This doesn’t happen if I take the SD card out, however. Also, if I attempt to plug it into the computer in this “frozen” state, my PC does not register the board.

I was able to narrow this down to this likely being the board. Things I’ve tried:

  • 3 known good SD cards. Kingston industrial, SanDisk Ultra, and an E123.

  • Formatted all of them

  • Tried older configs which did not have these issues

  • sdtest in serial monitor shows no issues with the reading ability on any of the cards… 1000-1400kb/s depending on the card.

The saber functions as normal once the SD card manages to mount

Onto the weird fix:

Upon defining ‘VERBOSE_SD_ERRORS’ in my config (using the master branch of ProffieOS), my issues literally vanished into thin air. I was able to get successful consecutive boots counting to over 100, never got a single error or a freeze.

I re-wrote the same config to my board, without ‘VERBOSE_SD_ERRORS’, and I was having issues again. More than half the time, my board was failing to mount the SD card.

And I again put the ‘VERBOSE_SD_ERRORS’ define back in my config, wrote it to board, and I got another 100+ successful consecutive boots.

Lastly, here is my config that is currently written to the board.

war_config.h (59.0 KB)

That is indeed more than a little weird.
What version of ProffieOS do you have? 7.8?

Yep I’m on 7.8 here

Currently uploading a vid for proof - will post back here in 10-20 mins.

watchable but still processing + timestamps in desc to be mindful of ppl’s time

Interesting!
Are you uploading as Serial only, or Serial + Mass Storage, or Serial + WebUSB or some other variant?

I ask because I had a board myself recently that was throwing SD card errors, prompting me to create the thread:

However I managed to still get the error with Verbose logging on. But I found that uploading with WebUSB off (i.e. Serial only) fixed it.

Prof, is there some common factor at the root of these observations that could be connected somehow?

I’ve only ever used Serial + WebUSB, I’ll try some configs on serial only shortly w/o verbose errors

So interestingly, using only Serial eliminated all of my SD card not found errors. However, half my attempts to get the saber to boot ended with the freeze error I described above.

Also I noticed a new issue:

My saber doesn’t wake up from idle using ‘IDLE_OFF_TIME’ if I leave it for a few minutes in that state. The saber becomes unresponsive. Leaving it plugged it via USB, I was able to let it go to idle and turn it back on 30 minutes later.

I used to be able to leave it for long periods of time in idle, and it would turn right back on.

1 Like

THis is a known issue with the 7.9 test and the current Master I believe, but you get it on official 7.8?

2 Likes

Ah thank you for that- no, I didn’t get it on 7.8, I am using the current master which is the likely culprit. I believe since the VERBOSE_SD_ERRORS define is limited to the current master, I will just learn to live with it until it is fixed!

I disabled the low clock speeds that cause freezing on the master branch.

1 Like

Greatly appreciated :bowing_man:

OK, so it would appear that a pattern is emerging. Until recently I always used Serial + WebUSB too, in order to give customers the opportunity to tweak without diving in too deep, but it was only recently when I had this problem again that I tried Serial only and got the same result as you, i.e. no SD card errors.

So Prof, is it premature to ask, have we finally managed to lure this gremlin out from the shadows and into the microwave - so that all we have to do now is figure out how to nuke it? :open_mouth: :pray: :crossed_fingers:

So, I have long suspected that maybe there is a buffer overrun somewhere in the serial code.
Obviously I haven’t found such a bug, if I had, I would have fixed it.

A buffer overrun in the serial code could cause modifications to whatever is before and/or after it in memory, which could have essentially random effects. However, most of the time, the thing that was before/after would be the same thing, and maybe, just maybe, it’s the SDSPI memory that is getting trampled, causing problems if the circumstances are just right…

2 Likes

Chasing buffer overflows, especially in large code bases, is always fun :stuck_out_tongue:

Interesting to see these issues being possibly figured out though.

just letting you know i tracked down the issue - in my case it was likely driver related since i have zero issues with the saber in question when i use my other pc to write configs to the board

also worth mentioning that i think the solutions before were a fluke even if there is a statistical substance to them. at a certain point, all solutions i mentioned prior stopped working and i kept encountering read errors.

it seemed like something was going wrong in the write process- presently, ~90% of writes to the board using the problematic PC ended up with issues. sometimes i got a good one, and there were no issues until it came time to re-upload to the board, but obviously this isn’t ideal if you’re constantly fiddling with the config.

i also sent the same saber in to someone else for some other issue, and they also had no issue with sd card read errors when they wrote to the board multiple times.