2016年12月22日 星期四

Improve the performance of far seek for EXOPlayer

ExoPlayer

Issue report = https://github.com/google/ExoPlayer/issues/2253

Check in:

Merged #2318.


Environment:


1.     [Movie]: 4K timer MPEG DASH streaming from https://www.youtube.com/watch?v=uo9dAIQR3g8.
2.     [Device]: HTC One X9u with Android 6.0 API = 23.
3.     [Code Base]: Branch = release-v2


Issue description:


Here we check the side of upstream and see if we could do optimize where.



fig 1: overview of the player pipeline

Fig 1 is the overview of the player pipeline. Please refer to http://programmingmemojohnchang.blogspot.tw/2016/12/improve-performance-of-short-seek-of.html.


fig 2

As figure 2, the rectangle with a dashed border is the latest media chunk we download. It has not been completely downloaded yet so we represent it by dashed border.

Originally if the seek target is behind sampleQueue at the upstream side, we will clean up the whole sampleQueue. Then do refetch from the media chunk which contains the search target.
The refetch media chunk is exactly the same as the last media chunk in dashed border.

Obviously, there is a waste in downloading redundant data as the part shown in green. As the result, if we keep the the data of the last partially downloaded media chunk, it could save some. 

If the conditions below are satisfied, we could try to locate the last key frame within sampleQueue and drop the out-of-range (decoded only) samples at renderer.
1.     The search target locates within the last media chunk.
2.     The nearest key frame precedes the search target has been either within sampleQueue or sent to decoder (or rendered).

To take use of this fact, we create a new function named skipToLastKeyframe(). It tries to find the last key sample within the sampleQueue. Also, we create a function named isWithinLastChunk() to check whether the seek target locates within the last media chunk.


Test environment:


1. [Movie]: 4K timer MPEG DASH streaming from: 
https://www.youtube.com/watch?v=uo9dAIQR3g8
2. [Device]: HTC One X9u with Android 6.0 API = 23.
3. [Code Base]: Branch = release-v2
4. Fix the bitrate to the max one = 22361348 bits/sec.


Test result:



fig 3


fig 4


fig 5: comparison


The experimental result is analyzed here.

Here we explain the meanings of the result within fig 3 & 4, and how we decide the test condition.


How we choose the target seek?

Every time triggering a seek, we call getBufferedPositionUs() to get the buffered position =  T and seek the T+(50ms). It insures the seek target always be out of the range of sampleQueue.
The example is as below (I put it in seekToInternal()).

    if (Util.upstreamOptimizationExploration) {
      /*In most of the cases video will have min buffered data*/
      periodPositionUs = loadingPeriodHolder.mediaPeriod.getBufferedPositionUs() + 50000 /*50ms*/;
    }


1. [search target,  media chunk start] 
represents the average distance between the seek target and the start of last media chunk when doing test.
2. data loss
represent how much data we flushed from the beginning of last media chunk to the end of sampleQueue.
3. AllRenderersReady:
How much time it spends to all renderers are ready (all of them have rendered the first frame) after seek is delivered. At this time you could see the 1st frame of the seek target time but is still NOT ready to play. 
4. HaveSufficientBuffer:
At this time the playback actually starts.
5. network speed:
The network speed when we probe the improvement performance.

Explanation to the result:
By uniform distribution, since a segment typically is ~ 5.2 seconds, in average the seek target will be away from the start of the containing media chunk by ~ 2.6 seconds. Therefore, the item (1) & (2) match with the fact.

(3) could also be explained by the fact that the bitrate is not uniformly distributed within a segment. Since the key frame at the front of a media chunk is usually larger than other samples, the bitrate distribution of the first half part is larger than the second half one. Hence the time spent in downloading the first half part will be greater than the time spent in downloading the other part.

(4) reflects the relationship between network speed and download bitrate by
(22361348 * 2.6) / 19380250 ~ 2.999935, compared to 2.914 of item 4.

Finally, the experimental result shows we can improve about:
1.     3162.865 us to render the first frame.
2.     1233.17 us to start playback.

Totally, it improves ~ 4396 us for the 4K test case, or speed up by:
((4228.225+2914) /(1065.36+1680.33)) ~ 252.5%.

Figure 6 & 7 summarizes the difference between original & proposed scheme.
The saving gain could be understood there.





fig 6: original method - redownload the last chunk 





fig 7: proposed method - avoid the redundant download

2016年12月20日 星期二

Improve the performance of short seek for EXOPlayer

Test environment:


1.     [Movie]: 4K timer MPEG DASH streaming from https://www.youtube.com/watch?v=uo9dAIQR3g8.
2.     [Device]: HTC One X9u with Android 6.0 API = 23.
3.     [Code Base]: Branch = release-v2

Overview:


A simplified overview to EXOPlayer’s pipeline is given in fig 1.
For media codec renderer, to feed input buffers(as feedInputBuffer()), readsource() is called which executes readData() of ChunkSampleStream & DefaultTrackOutput sequentially. Finally read out a fetched sample from dataQueue.
On the other hand, the loading thread puts demuxed samples into dataQueue by calling sampleData().
For each track(ChunkSampleStream), it owns a member named sampleQueue (of class type DefaultTrackOutput). sampleQueue keeps track of the buffered data by dataQueue and InfoQueue. InfoQueue is a circular queue which keeps the read & write index for sampleData()(up stream) & readDate()(down stream) respectively.


Fig 1

Issue description:


Roughly the internal buffer of the playback pipeline is divided into two parts.
The first is within the sampleQueue.
The second is within the codec.

For my HTC One X9u, the # of mediacodec’s input buffers are 5 & output buffers are 3. Approximately the HW codec (MediaCodec) could contain at most ~ 0.5 seconds 4K frames.
Notice that in general movies (ex: those in MPEG DASH from YouTube) are of segment duration ~ 5 seconds. It implies that the max interval between key frame  may be over 5 seconds.

For fig 1, within the simplified internal buffer model at the bottom, each rectangle in red represents a key frame.

The claimed performance downgrade problem is shown below, as fig 2.
Fig 2

It happens when the seek target (represented in blue) is within sampleQueue but there is NO key frame precedes it. In this case we will fall into the condition: “if (sampleCountToKeyframe == -1)” and cleanup all the data we have fetched (include flushing codec also). It makes us loss a lot of data which may be downloaded hardly from wireless network.

Notice that movies such as those from YouTube are often made by setting segment’s length about (a little more than) 5 seconds. It is not too short so users might still hit the case as fig 2.

If the claimed condition happens, huge waste in data will happen.
For example, if you have had preloaded data period = DEFAULT_MAX_BUFFER_MS = 30 seconds or preloaded data amount = DEFAULT_VIDEO_BUFFER_SIZE = 12.5 Mb, to drop all preloaded data and do re-fetch hurts for mobile users without unlimited network data plan.

Also, it brings an obvious delay spent in re-fetch. The following links shows the comparison between original & improved methods. One could easily detect the improvement by comparing them.

Type
link
After optimization
Before optimization

Fig 3 gives the flowchart of original method.
Fig 4 gives the flowchart of improved method.


Figure 3: original method


Figure 4: improved method

There are some tips need to be clarified for fig 4.
1. We do NOT implement A for this proposal since it improves for the case where seek range is merely ahead current position by ~ 0.5-0.8 seconds (depends on devices).
2. If check 1 return "TRUE" but check 2 returns "FALSE", it means we are in the case as fig 2.
3. At step B, we do NOT flush CODEC but depend on the value = latestResetPosition = seek target to filter the out-of-range (decode only) samples at renderer.

Proposed improvement.


For doing better, we check the case if
1.     There is NO discontinuity of downloaded data.
2.     The seek target in within the sampleQueue.
3.     There is NO key frame before it.

Once all conditions are satisfied, we do NOT flush the downloaded data within sampleQueue & codec as before. Instead, we set the seek time to become a rendered's threshold = latestResetPosition. Filter the samples before latestResetPosition (since they are decoded only and need NOT to be rendered).
The threshold = latestResetPosition plays the same role as GStramer’s segment stop at renderer.

In fact, to filter the out-of-range in renderer instead of in sampleQueue is more efficient. It helps us to 
1. avoid unnecessary network traffic spent in refetch;
2. provide power saving; 
3. reduce the response time of seek.

2016年12月10日 星期六

EXOPlayer memo

    List&ltClass&lt? extends Extractor&gt&gt extractorClasses = new ArrayList&lt&gt();         
    try {
      extractorClasses.add(
          Class.forName("com.google.android.exoplayer2.extractor.mkv.MatroskaExtractor")
              .asSubclass(Extractor.class));

一、如何得到Class的对象呢?有三种方法可以的获取:
1、调用Object类的getClass()方法来得到Class对象,这也是最常见的产生Class对象的方法。例如:     MyObject x;     Class c1 = x.getClass();

2、使用Class类的中静态forName()方法获得与字符串对应的Class对象。例如:      Class c2=Class.forName("MyObject"),Employee必须是接口或者类的名字。

3、获取Class类型对象的第三个方法非常简单。如果T是一个Java类型,那么T.class就代表了匹配的类对象。例如     Class cl1 = Manager.class;     Class cl2 = int.class;     Class cl3 = Double[].class;     注意:Class对象实际上描述的只是类型,而这类型未必是类或者接口。例如上面的int.class是一个Class类型的对象。由于历史原因,数组类型的getName方法会返回奇怪的名字。

asSubclass

public <U> Class<? extends U> asSubclass(Class<U> clazz)

Casts this Class object to represent a subclass of the class represented by the specified class object. Checks that that the cast is valid, and throws a ClassCastException if it is not. If this method succeeds, it always returns a reference to this class object.

This method is useful when a client needs to "narrow" the type of a Class object to pass it to an API that restricts the Class objects that it is willing to accept. A cast would generate a compile-time warning, as the correctness of the cast could not be checked at runtime (because generic types are implemented by erasure).