Ahh, the good old analog days; all monitors were 4:3, and all cameras provided 4:3 aspect video (for the sake of the argument, let’s just assume that this was universally true, and forget about PAL, NTSC, SECAM and all the other issues). As cameras match the monitors, there would be no letterboxing, a 2×2 would neatly show 4 feeds.
If you had a character generator, the text would be placed “on top of the video” which was fortunate, because placing it outside would cause the aspect of the text and the video to be closer to 4:4 and cause a mismatch in the aspect ratios. Mismatched aspect ratios means that we need to letterbox, crop or squeeze the video (causing people to look fat or skinny).
It seems as if most people hate letterboxing, and prefer a slighty squeezed image (I HATE squeezing!), and some televisions will use a non-linear scaling causing the video in the center to be natural, while the poor people at the edge look extra fat! This – to me – is the weirdest option, but the one that my dad most often selects when watching sports!
Today we have every conceivable combination of aspect ratios; 4:3 video feeds on ultra-widescreen monitors is perhaps the most extreme causing almost half the screen to be letterboxed or distorting the feed to the point that people look extremely overweight when standing, while they look emaciated when horizontal.
One of the solutions we are working on is a more dynamic presentation of the video. I think that we can probably use analytics to determine what we show on the screen. Imagine a parking lot, why not scale and crop as determined by analytics. As people walk across the lot, the analytics will provide a set of ROIs to the renderer. The renderer can then decide to either scale and crop to fit a bounding box of the ROIs – OR – it can split the feed into 2 “virtual feeds” each showing a smaller bounding box. Naturally, we’d need to smooth out the movement of the “virtual camera”, but I am sure it can be done.