Latency of a virtual function call in a C++ class

I did a tiny experiment to evaluate why my WS2812B library does not work the way I want… Underlying it does a call to a virtual method defined only for “bitbanging LED strips” that writes the bytes to the LEDs.

The test code

 volatile uint32_t aaa;

 uint32_t start = millis();
 for(int32_t i =0; i < 10000000; i++) {
 aaa++;
 }
 uint32_t delta = millis() - start;
 Serial.println(delta);
 start = millis();
 for(int32_t i =0; i < 10000000; i++) {
 aaa++;
 foo.doNothing(GbrPixels);
 }
 delta = millis() - start;
 Serial.println(delta);
 start = millis();
 for(int32_t i =0; i < 100000; i++) {
 aaa++;
 foo.sendBytes(9, GbrPixels);
 }
 delta = millis() - start;
 Serial.println(delta);

Results

16348
29554
23273

The first number is the reference time spend running the for loop. To trick the compiler to not optimize it, I update a volatile in it. The second number is the call to an empty  virtual function of my class. A naked call takes 1.3169usec or 21 cycles. My bitbanging function takes 232.72usec (3723 cycles) to push 3 pixels. Each bit takes about 18 cycles as I configured it. 1296 cycles for 3 pixels, or 81usec/bit means there’s an extra 33 cycles per bit of unaccounted extra work being done.

Since a LED refresh cycle resets after about 55 cycles, the 21 cycles of function call plus those lost 33 cycles are enough to throw the strip in reset mode when we call the function twice. Bummer…

If I use a standard class method (not virtual), I get 232.21usec, a saving of 8 cycles. So there’s no point to avoiding virtual functions. I will however…

By converting my port pin from a class variable to a constant (via class template), my function drops to 211.83usec or 3389 cycles, and overhead of 2093 cycles, or 29 cycles per pixel. The LED strip starts working. I bet at 8MHz this won’t work as I’m 3 cycles shy of the limit.

I’m still not sure where those cycles are being spent. However now I can write:

ws2812LedStrip<D,6> foo; // Port D, pin 6, aka Uno digital pin 6
uint8_t   GrbPixels[3*8] ={0x00,0x00,0x80, 0x00,0x00,0x80, 0x00,0x80,0x00, 0x80,0x00,0x00, 0x80,0x00,0x00, 0x80,0x00,0x00, 0x00,0x40,0x40, 0x00,0x10,0x10};  // g r b

cli(); 
foo.sendBytes(9, GbrPixels);
foo.sendBytes(9, GbrPixels);
sei();
delay(1000);

I can repeat a pattern on my LED strip by calling sendBytes() multiple times.

However I have a similar function using a palette and I cannot call it multiple times, though itself internally calls sendBytes multiple times. Just with that I already can display a thousand pixels with an Uno, before running out of memory.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s