To implement bitbanging I need to generate code that has precise timing. Each instruction’s cycles count.
Inline assembly directives are new to me and I got some feedback from the arduino community.
It turns out it’s not needed: PORTB |= 1<<pin is directly compiled to sei etc. so the cost is a known 2 cycles. For delays __builtin_avr_delay_cycles(n) generates an optimal spinning loop.