Small test this morning.
Now, lets drop the resolution
Merge the two
I think that you might be able to adjust the individual MCU encoding programmatically, and keep the ROIs on 8×8 pixel boundaries and thus the difference would not be so big.
Doing a delta between a previously received key-frame would definitely make the image even smaller; the reference frame is a reasonable representation of what the delta would look like, and thus the saving in bandwidth.