So, I realized that g++ has a bunch of internal parameters that can be set with the --param flag. The version of g++ we use have 300 of them.
So I did what any normal person would do, and wrote a program that would figure out which parameters can make the program smaller.
So far, my program has found options that will save about 4k compared to the “smallest size” option by applying these parameters:
-Os --param=case-values-threshold=0 --param=early-inlining-insns=0 --param=gcse-cost-distance-ratio=1 --param=gcse-unrestricted-cost=2 --param=ipa-cp-value-list-size=9 --param=ipa-sra-max-replacements=16 --param=iv-always-prune-cand-set-bound=20 --param=jump-table-max-growth-ratio-for-size=299 --param=large-function-insns=5398 --param=large-stack-frame=512 --param=large-stack-frame-growth=500 --param=lto-min-partition=1001 --param=max-combine-insns=4 --param=max-completely-peeled-insns=201 --param=max-crossjump-edges=201 --param=max-cse-insns=1001 --param=max-cse-path-length=20 --param=max-early-inliner-iterations=1 --param=max-hoist-depth=300 --param=max-inline-functions-called-once-insns=1000 --param=max-inline-functions-called-once-loop-depth=3 --param=max-inline-insns-single=1399 --param=max-jump-thread-duplication-stmts=1 --param=max-jump-thread-paths=6 --param=max-predicted-iterations=1010 --param=max-stores-to-merge=1300 --param=max-tail-merge-comparisons=100 --param=max-tail-merge-iterations=0 --param=min-crossjump-insns=1 --param=partial-inlining-entry-probability=35 --param=scev-max-expr-size=10 --param=sched-autopref-queue-depth=0 --param=sink-frequency-threshold=100 --param=sra-max-propagations=8 --param=sra-max-scalarization-size-Osize=200 --param=switch-conversion-max-branch-ratio=1 --param=tree-reassoc-width=2 --param=uninlined-function-insns=1 --param=uninlined-thunk-insns=20
Some of these parameters probably do nothing, and some of them may have a large impact on program speed, but at least we have some options, now we just have to figure out what does what.
So, 4k isn’t bad, but I also had a go at reducing the size for “-O3”, which so have produced these flags:
-O3 --param=case-values-threshold=0 --param=dse-max-alias-queries-per-store=257 --param=early-inlining-insns=5 --param=inline-heuristics-hint-percent=100 --param=inline-unit-growth=2 --param=ipa-cp-value-list-size=16 --param=max-completely-peeled-insns=20 --param=max-inline-insns-single=7 --param=max-jump-thread-duplication-stmts=1 --param=max-rtl-if-conversion-unpredictable-cost=80
These flags save a whopping 142k compared to plain -O3. (But may also make the program a lot slower of course.)
These parameters open up a lot of options for tweaking. However, figuring out which options are good ones and which ones hurt performance a lot is going to take some work.
If anybody wants to experiment with these options, you can go to your arduino15 directory, then go packages → proffieboard → hardware → stm32l4 → 4.6 and then open up boards.txt. In there you can find lines which have “.optimize” in them, these lines specifies what flags to use. There are individual lines for each board, and also individual lines for each option in the optimization menu.
If you do experiment with it, please report your results here. Also, I recommend using the “top” command to see if there is an impact on the speed of the code or not.