Out of curiosity, can you hear the “p” in voice_pack from your .wav files ? I know I probably will soon need a hearing-aid thingy, so my ears are not the best at “judging” sounds.
You need to understand that the source files for LPC conversion are only a sample rate of 8000. (not nice full blown 44100 like we’re used to).
A lot of information gets lost, particularly high frequency stuff like the pronunciation sounds of s, k, t, etc…
The breathy pop of a p is not easy to have come through. I had to artificially amplify the v and “ce” of voice, and the k of pack. I tried the p too but in conversion, .5 of a db makes a big difference, and it got blown out and sounded bad.
TL;DR expectations need to be lowered (heavily) when making these things.
For what it’s worth, it actually sounds pretty good considering.
It’s not just the low sampling frequency, but the LPC algorithm is also bad at consonants.
In LPC consonants basically become slightly filtered random noise. Sometimes this works ok, but a lot of time it comes out somewhere between two consonants and it’s hard to tell if it’s a p or a t, an f or a ‘th’, etc. Even the old clips have this problem sometimes.
I wasn’t complaining, just asking if you could hear the “p” or not (and wondering if I needed a hearing-aid thingy sooner rather later, hopefully later).
I’ve already updated your talkie codes in my local code, just need to update my Github branch.
True the “f” from “font” make it sometimes sound like “hont” but I guess there is some “expectation bias” going on, you expect to hear “font” so you do “hear” “font”.
Edit:
I don’t find the talkie code to be “fairly simple”. And I didn’t mean designing new algorithms, but I do see a bunch of “if” and “else” and I was thinking if it would be possible to trim down the “if” “do something”, “else” “do something else” to just “do one thing” because we know what to apply to the one chosen “voice_data” line. For example, some voice_data are const unsigned char (the ones created by you), the others are const uint8_t, some have 15 rate, some have another rate. So after we pick one voice_data line, we will know it’s rate and uint*_t, then we can start trimming down all the “if” and “else” that don’t apply to our one chosen voice_data line.
Fun fact, those both mean the same thing!
Where in the code do we ever check for voice pack “not found”? That would be a prop thing, right?
Would like if (!SFX&_mnum) suffice? And then it could throw ProffieOSErrors::voice_pack_not_found()?
edit - Wait. If it doesn’t exist we just get “font directory not found” and a serial printout that fontPath/common was NOT FOUND!
That’s fine.
Imho “font directory not found” should mean exactly that a directory listed in your font path does not exist.
That’s what sound_library::init() is for.
All code that needs a sound library should call sound_library::init() on the version of the sound library that it needs. The sound library then checks that the version is high enough here:
No sound back counts as being version 0, so if no voice pack is found, an “error in font directory” error is generated. here:
Currently there is no distinction between not finding a voice pack and having a voice pack with a version number that is too low, but that’s obviously fairly easy to fix.
Where do we go from here ? I have uploaded all the “talkie/errors/KEEP_MINIMUM_TALKIE_ONLY” changes to my Github branch
I know that not everything is perfect yet, but do I submit it as a PR anyway and we continue the discussion there ?
Probably a good idea.
Conceptually I have no problems with a define like this, but seeing the code makes things a lot more concrete…
PR submitted but I just realized that @NoSloppy also did a PR with similar changes for the discussed errors. It would probably be better to treat his PR first IMHO.