2016年12月20日 星期二

Improve the performance of short seek for EXOPlayer

Test environment:


1.     [Movie]: 4K timer MPEG DASH streaming from https://www.youtube.com/watch?v=uo9dAIQR3g8.
2.     [Device]: HTC One X9u with Android 6.0 API = 23.
3.     [Code Base]: Branch = release-v2

Overview:


A simplified overview to EXOPlayer’s pipeline is given in fig 1.
For media codec renderer, to feed input buffers(as feedInputBuffer()), readsource() is called which executes readData() of ChunkSampleStream & DefaultTrackOutput sequentially. Finally read out a fetched sample from dataQueue.
On the other hand, the loading thread puts demuxed samples into dataQueue by calling sampleData().
For each track(ChunkSampleStream), it owns a member named sampleQueue (of class type DefaultTrackOutput). sampleQueue keeps track of the buffered data by dataQueue and InfoQueue. InfoQueue is a circular queue which keeps the read & write index for sampleData()(up stream) & readDate()(down stream) respectively.


Fig 1

Issue description:


Roughly the internal buffer of the playback pipeline is divided into two parts.
The first is within the sampleQueue.
The second is within the codec.

For my HTC One X9u, the # of mediacodec’s input buffers are 5 & output buffers are 3. Approximately the HW codec (MediaCodec) could contain at most ~ 0.5 seconds 4K frames.
Notice that in general movies (ex: those in MPEG DASH from YouTube) are of segment duration ~ 5 seconds. It implies that the max interval between key frame  may be over 5 seconds.

For fig 1, within the simplified internal buffer model at the bottom, each rectangle in red represents a key frame.

The claimed performance downgrade problem is shown below, as fig 2.
Fig 2

It happens when the seek target (represented in blue) is within sampleQueue but there is NO key frame precedes it. In this case we will fall into the condition: “if (sampleCountToKeyframe == -1)” and cleanup all the data we have fetched (include flushing codec also). It makes us loss a lot of data which may be downloaded hardly from wireless network.

Notice that movies such as those from YouTube are often made by setting segment’s length about (a little more than) 5 seconds. It is not too short so users might still hit the case as fig 2.

If the claimed condition happens, huge waste in data will happen.
For example, if you have had preloaded data period = DEFAULT_MAX_BUFFER_MS = 30 seconds or preloaded data amount = DEFAULT_VIDEO_BUFFER_SIZE = 12.5 Mb, to drop all preloaded data and do re-fetch hurts for mobile users without unlimited network data plan.

Also, it brings an obvious delay spent in re-fetch. The following links shows the comparison between original & improved methods. One could easily detect the improvement by comparing them.

Type
link
After optimization
Before optimization

Fig 3 gives the flowchart of original method.
Fig 4 gives the flowchart of improved method.


Figure 3: original method


Figure 4: improved method

There are some tips need to be clarified for fig 4.
1. We do NOT implement A for this proposal since it improves for the case where seek range is merely ahead current position by ~ 0.5-0.8 seconds (depends on devices).
2. If check 1 return "TRUE" but check 2 returns "FALSE", it means we are in the case as fig 2.
3. At step B, we do NOT flush CODEC but depend on the value = latestResetPosition = seek target to filter the out-of-range (decode only) samples at renderer.

Proposed improvement.


For doing better, we check the case if
1.     There is NO discontinuity of downloaded data.
2.     The seek target in within the sampleQueue.
3.     There is NO key frame before it.

Once all conditions are satisfied, we do NOT flush the downloaded data within sampleQueue & codec as before. Instead, we set the seek time to become a rendered's threshold = latestResetPosition. Filter the samples before latestResetPosition (since they are decoded only and need NOT to be rendered).
The threshold = latestResetPosition plays the same role as GStramer’s segment stop at renderer.

In fact, to filter the out-of-range in renderer instead of in sampleQueue is more efficient. It helps us to 
1. avoid unnecessary network traffic spent in refetch;
2. provide power saving; 
3. reduce the response time of seek.

沒有留言:

張貼留言