Masterpiece Theater presents: SVFs

profezzorn · October 24, 2021, 5:27am

It was a dark and stormy night, and I was trying to figure out why styles use so much RAM. So I used a debugger to print out all the things that are allocated in the style. Turns out that there is a whole bunch of variables called “value” or something similar.

So, it turns out that a whole bunch of the files in functions/ look something like this:

class RandomF {
public:
  void run(BladeBase* blade) { value_ = random(32768); }
  int getInteger(int led) { return value_; }
private:
  int value_;
};

As you can see, run calculates a value, it is stored in value_, and then getinteger returns it. It’s done this way so that RandomF will return the same value for every LED. Of course, a most of the time when we use functions like this, we don’t actually need the function to return the same value for every LED, because all we do is call getInteger(0) on it from some other run function. Here is an example that does both:

template<class MAX>
class SwingSpeedX {
 public:
 void run(BladeBase* blade) {
   max_.run(blade);
   float v = fusor.swing_speed() / max_.getInteger(0);
   value_ = clampi32(v * 32768, 0, 32768);
  }
  int getInteger(int led) { return value_; }
  MAX max_;
  int value_;
};

See how it only calls max_.getInteger(0) from run? So if we were to use a style like SwingSpeedX<RandomF<>>, storing the random value in value_ is entirely wasted. This might not be a great example, but there are lots of cases of this in the code. So, what can be done about it?

Well, the first step is to split up RandomF and similar classes into two parts: One that calculates the value, and one that stores the value so that getInteger() will work properly, and later we can work on optimizing away the storing part…

We’ll need a new API for the calculating part, and this new API is called an SVF, which means “Single Value Function”. For RandomF, the new SVF looks like this:

class RandomFSVF {
public:
  int calculate(BladeBase* blade) { return random(32768); }
  void run(BladeBase* blade) {}
};

So the new thing here is the calculate function, which is sort of like getInteger(0), or like a run() function which returns a value.

The part that does the storing of the value, looks like this:

class SingleValueBase {
public:
  int getInteger(int led) { return value_; }
  int value_;
};

// Converts an SVF to a FUNCTION
template<class SVF>
class SingleValueAdapter : public SingleValueBase {
public:
  void run(BladeBase* blade) {
    single_value_function_.run(blade);
    value_ = single_value_function_.calculate(blade);
  }
  PONUA SVF single_value_function_;
};

Breaking out the getInteger function into a non-templated class means that the compiler doesn’t have to generate one for each templated class, which could potentially save flash memory.

With this we can now replace the RandomF class with this:

using RandomF = SingleValueAdapter<RandomFSVF>;

So far we haven’t actually saved anything, but wait, there’s more…

We have a class that turns an SVF into a Function, how about a class that turns an Function into an SVF:

// Converts a FUNCTION to an SVF
template<class FUNC>
class SVFWrapper {
public:
  void run(BladeBase* blade) { f_.run(blade); }
  int calculate(BladeBase* blade) { return f_.getInteger(0); }
private:
  PONUA FUNC f_;
};

Now, in functions like SwingSpeedX above we can use the SVFWrapper to convert any function into an SVF, and now we can finally do some optimization:

template<class SVF>
class SVFWrapper<SingleValueAdapter<SVF>> : public SVF {};

This is called “template specialization”, and in this particular case we use it so make SVFWrapper<SingleValueAdapter<SOME_CLASS>> skip both the adapter and the wrapper and just use SOME_CLASS directly. So if we convert all functions that can be converted into SVFs into SVFs, and all the locations that call getInteger(0) into using the SVFWrapper, we can reduce the RAM usage a fair amount.

But wait, there’s more…

A lot of times, there is a Scale<> or something in between the SVF function and the place that calls getInteger(0), which ends up defeating the optimization we just made. Fortunately, I just learned some new template tricks that lets us create a ScaleSVF class, and then transform this:

Scale<SingleValueAdapter<A>, SingleValueAdapter<B>, SingleValueAdapter<C>>

(Where A, B & C are SVFs) into:

SingleValueAdapter<ScaleSVF<A, B, C>>

If you’re curious about how that actually works, go read the scale.h file: ProffieOS/scale.h at 3857923df32c68d962855331eb1222d6f61f458f · profezzorn/ProffieOS · GitHub

mcarcher · October 24, 2021, 8:37pm

Wow, most of that went over my head. What kind of improvement could you expect from a completely optimized style?

profezzorn · October 24, 2021, 9:08pm

Not quite clear yet.
On a large style, this seems to save a few hundred bytes or RAM. Perhaps 15% or so. It should also speed it up a bit, but I have not measured how much. Flash-memory wise, the savings are fairly modest, almost neutral.

profezzorn · October 24, 2021, 9:10pm

Oh, and if it wasn’t clear; the SVF optimzations are entirely invisible from the config file. So style writers never have to worry about it. It’s entirely internal to how ProffieOS operates.

profezzorn · October 24, 2021, 9:17pm

Also, I left out the really tricky part, which is the new template trick I learned that lets you do something similar to class specialization, but where the result is an entirely different class. This turns out to be required for these optimizations to work recursively through several layers of functions, but it takes fairly intimate knowledge of C++ to understand it…

LyleStyle · October 28, 2021, 3:54pm

Your programming skills are the scariest thing I’ve seen this Halloween.