Changeset 291813 in webkit


Ignore:
Timestamp:
Mar 24, 2022 1:50:52 PM (4 months ago)
Author:
Jean-Yves Avenard
Message:

Safari can't play video completely at bilibili.com
https://bugs.webkit.org/show_bug.cgi?id=236440
rdar://88761053

Reviewed by Jer Noble.

Source/WebCore:

Video frames were incorrectly evicted during a call to appendBuffer
as the Source Buffer incorrectly assumed a discontinuity was present.

When appending data to a source buffer, the MSE specs describe a method
to detect discontinuities in the Coded Frame Processing algorithm
(https://www.w3.org/TR/media-source/#sourcebuffer-coded-frame-processing)
step 6:
"

  • If last decode timestamp for track buffer is set and decode timestamp is less than last decode timestamp:

OR

  • If last decode timestamp for track buffer is set and the difference between decode timestamp and last decode timestamp is greater than 2 times last frame duration.

"
The issue being what defines the last frame duration.
Is it the frame last seen in the coded frame processing loop or the frame
whose presentation timestamp is just before the one we are currently
processing.

H264 and HEVC have a concept of b-frames: that is a frame that depends
on a future frame to be decoded.
Those frames are found in the container and can be identified by their
presentation timestamp higher than the frame following in decode order.
Those present a challenge as the frame prior the current one in
presentation order, may actually only be found several frames back in
decode order.
Bug 181891 attempted to fix a similar issue, and used the longest
"decode duration" as a workaround to detect discontinuity in the content.
It mentioned adopting the same technique as in Mozilla's MSE
implementation, but Mozilla also skip discontinuity detection within a
media segment (https://www.w3.org/TR/media-source/#media-segment which for
fMP4 is a single moof box) an approach that can't be achieved with
CoreMedia's AVStreamDataParser.
As mentioned in bug 181891, CoreMedia ignore the decode timestamps' delta
and juggles with the sample's duration so that there's no discontinuity
in the demuxed samples' presentation time, causing false positive in the
gap detection algorithm.

Bilibili uses HEVC content, and uses an encoding that generate lots
of b-frames, with a very wide sliding window (seen up to 12 frames).
By using the longest frame duration found in either presentation or
decode duration as threshold to identify a discontinuity, we can
properly parse the content and not incorrectly evict appended frames.
(As a side note, the use of HEVC with B-Frames is peculiar as not all
hardware support it.)
It is difficult to identify here if the issue is within the bilibili's
content or CoreMedia's output, though the responsibility more than
likely lies with bilibili.

Test: media/media-source/media-mp4-hevc-bframes.html

  • platform/graphics/SourceBufferPrivate.cpp:

(WebCore::SourceBufferPrivate::TrackBuffer::TrackBuffer):
(WebCore::SourceBufferPrivate::resetTrackBuffers):
(WebCore::SourceBufferPrivate::didReceiveSample):

  • platform/graphics/SourceBufferPrivate.h:

LayoutTests:

  • media/media-source/content/test-bframes-hevc-manifest.json: Added.
  • media/media-source/content/test-bframes-hevc.mp4: Added.
  • media/media-source/media-mp4-hevc-bframes-expected.txt: Added.
  • media/media-source/media-mp4-hevc-bframes.html: Added.
Location:
trunk
Files:
4 added
5 edited

Legend:

Unmodified
Added
Removed
  • trunk/LayoutTests/ChangeLog

    r291807 r291813  
     12022-03-24  Jean-Yves Avenard  <jya@apple.com>
     2
     3        Safari can't play video completely at bilibili.com
     4        https://bugs.webkit.org/show_bug.cgi?id=236440
     5        rdar://88761053
     6
     7        Reviewed by Jer Noble.
     8
     9        * media/media-source/content/test-bframes-hevc-manifest.json: Added.
     10        * media/media-source/content/test-bframes-hevc.mp4: Added.
     11        * media/media-source/media-mp4-hevc-bframes-expected.txt: Added.
     12        * media/media-source/media-mp4-hevc-bframes.html: Added.
     13
    1142022-03-24  Matteo Flores  <matteo_flores@apple.com>
    215
  • trunk/LayoutTests/platform/glib/TestExpectations

    r291759 r291813  
    816816
    817817webkit.org/b/218317 media/media-source/media-source-trackid-change.html [ Failure ]
     818webkit.org/b/238201 media/media-source/media-mp4-hevc-bframes.html [ Failure ]
    818819
    819820webkit.org/b/211995 fast/images/animated-image-mp4.html [ Failure Timeout ]
  • trunk/Source/WebCore/ChangeLog

    r291811 r291813  
     12022-03-24  Jean-Yves Avenard  <jya@apple.com>
     2
     3        Safari can't play video completely at bilibili.com
     4        https://bugs.webkit.org/show_bug.cgi?id=236440
     5        rdar://88761053
     6
     7        Reviewed by Jer Noble.
     8
     9        Video frames were incorrectly evicted during a call to appendBuffer
     10        as the Source Buffer incorrectly assumed a discontinuity was present.
     11
     12        When appending data to a source buffer, the MSE specs describe a method
     13        to detect discontinuities in the Coded Frame Processing algorithm
     14        (https://www.w3.org/TR/media-source/#sourcebuffer-coded-frame-processing)
     15        step 6:
     16        "
     17        - If last decode timestamp for track buffer is set and decode timestamp
     18          is less than last decode timestamp:
     19        OR
     20        - If last decode timestamp for track buffer is set and the difference
     21          between decode timestamp and last decode timestamp is greater than
     22          2 times last frame duration.
     23        "
     24        The issue being what defines the last frame duration.
     25        Is it the frame last seen in the coded frame processing loop or the frame
     26        whose presentation timestamp is just before the one we are currently
     27        processing.
     28
     29        H264 and HEVC have a concept of b-frames: that is a frame that depends
     30        on a future frame to be decoded.
     31        Those frames are found in the container and can be identified by their
     32        presentation timestamp higher than the frame following in decode order.
     33        Those present a challenge as the frame prior the current one in
     34        presentation order, may actually only be found several frames back in
     35        decode order.
     36        Bug 181891 attempted to fix a similar issue, and used the longest
     37        "decode duration" as a workaround to detect discontinuity in the content.
     38        It mentioned adopting the same technique as in Mozilla's MSE
     39        implementation, but Mozilla also skip discontinuity detection within a
     40        media segment (https://www.w3.org/TR/media-source/#media-segment which for
     41        fMP4 is a single moof box) an approach that can't be achieved with
     42        CoreMedia's AVStreamDataParser.
     43        As mentioned in bug 181891, CoreMedia ignore the decode timestamps' delta
     44        and juggles with the sample's duration so that there's no discontinuity
     45        in the demuxed samples' presentation time, causing false positive in the
     46        gap detection algorithm.
     47
     48        Bilibili uses HEVC content, and uses an encoding that generate lots
     49        of b-frames, with a very wide sliding window (seen up to 12 frames).
     50        By using the longest frame duration found in either presentation or
     51        decode duration as threshold to identify a discontinuity, we can
     52        properly parse the content and not incorrectly evict appended frames.
     53        (As a side note, the use of HEVC with B-Frames is peculiar as not all
     54        hardware support it.)
     55        It is difficult to identify here if the issue is within the bilibili's
     56        content or CoreMedia's output, though the responsibility more than
     57        likely lies with bilibili.
     58
     59        Test: media/media-source/media-mp4-hevc-bframes.html
     60
     61        * platform/graphics/SourceBufferPrivate.cpp:
     62        (WebCore::SourceBufferPrivate::TrackBuffer::TrackBuffer):
     63        (WebCore::SourceBufferPrivate::resetTrackBuffers):
     64        (WebCore::SourceBufferPrivate::didReceiveSample):
     65        * platform/graphics/SourceBufferPrivate.h:
     66
    1672022-03-24  Alan Bujtas  <zalan@apple.com>
    268
  • trunk/Source/WebCore/platform/graphics/SourceBufferPrivate.cpp

    r291225 r291813  
    6868SourceBufferPrivate::TrackBuffer::TrackBuffer()
    6969    : lastDecodeTimestamp(MediaTime::invalidTime())
    70     , greatestDecodeDuration(MediaTime::invalidTime())
     70    , greatestFrameDuration(MediaTime::invalidTime())
    7171    , lastFrameDuration(MediaTime::invalidTime())
    7272    , highestPresentationTimestamp(MediaTime::invalidTime())
     
    102102    for (auto& trackBufferPair : m_trackBufferMap.values()) {
    103103        trackBufferPair.get().lastDecodeTimestamp = MediaTime::invalidTime();
    104         trackBufferPair.get().greatestDecodeDuration = MediaTime::invalidTime();
     104        trackBufferPair.get().greatestFrameDuration = MediaTime::invalidTime();
    105105        trackBufferPair.get().lastFrameDuration = MediaTime::invalidTime();
    106106        trackBufferPair.get().highestPresentationTimestamp = MediaTime::invalidTime();
     
    960960        // ↳ If last decode timestamp for track buffer is set and the difference between decode timestamp and
    961961        // last decode timestamp is greater than 2 times last frame duration:
    962         MediaTime decodeDurationToCheck = trackBuffer.greatestDecodeDuration;
    963 
    964         if (decodeDurationToCheck.isValid() && trackBuffer.lastFrameDuration.isValid()
    965             && (trackBuffer.lastFrameDuration > decodeDurationToCheck))
    966             decodeDurationToCheck = trackBuffer.lastFrameDuration;
    967 
    968962        if (trackBuffer.lastDecodeTimestamp.isValid() && (decodeTimestamp < trackBuffer.lastDecodeTimestamp
    969             || (decodeDurationToCheck.isValid() && abs(decodeTimestamp - trackBuffer.lastDecodeTimestamp) > (decodeDurationToCheck * 2)))) {
     963            || (trackBuffer.greatestFrameDuration.isValid() && decodeTimestamp - trackBuffer.lastDecodeTimestamp > (trackBuffer.greatestFrameDuration * 2)))) {
    970964
    971965            // 1.6.1:
     
    984978                trackBuffer.get().lastDecodeTimestamp = MediaTime::invalidTime();
    985979                // 1.6.3 Unset the last frame duration on all track buffers.
    986                 trackBuffer.get().greatestDecodeDuration = MediaTime::invalidTime();
     980                trackBuffer.get().greatestFrameDuration = MediaTime::invalidTime();
    987981                trackBuffer.get().lastFrameDuration = MediaTime::invalidTime();
    988982                // 1.6.4 Unset the highest presentation timestamp on all track buffers.
     
    12621256        }
    12631257
    1264         // NOTE: the spec considers "Coded Frame Duration" to be the presentation duration, but this is not necessarily equal
    1265         // to the decoded duration. When comparing deltas between decode timestamps, the decode duration, not the presentation.
     1258        // NOTE: the spec considers the need to check the last frame duration but doesn't specify if that last frame
     1259        // is the one prior in presentation or decode order.
     1260        // So instead, as a workaround we use the largest frame duration seen in the current coded frame group (as defined in https://www.w3.org/TR/media-source/#coded-frame-group.
    12661261        if (trackBuffer.lastDecodeTimestamp.isValid()) {
    12671262            MediaTime lastDecodeDuration = decodeTimestamp - trackBuffer.lastDecodeTimestamp;
    1268             if (!trackBuffer.greatestDecodeDuration.isValid() || lastDecodeDuration > trackBuffer.greatestDecodeDuration)
    1269                 trackBuffer.greatestDecodeDuration = lastDecodeDuration;
     1263            if (!trackBuffer.greatestFrameDuration.isValid())
     1264                trackBuffer.greatestFrameDuration = std::max(lastDecodeDuration, frameDuration);
     1265            else
     1266                trackBuffer.greatestFrameDuration = std::max({ trackBuffer.greatestFrameDuration, frameDuration, lastDecodeDuration });
    12701267        }
    12711268
  • trunk/Source/WebCore/platform/graphics/SourceBufferPrivate.h

    r291111 r291813  
    118118        WTF_MAKE_STRUCT_FAST_ALLOCATED;
    119119        MediaTime lastDecodeTimestamp;
    120         MediaTime greatestDecodeDuration;
     120        MediaTime greatestFrameDuration;
    121121        MediaTime lastFrameDuration;
    122122        MediaTime highestPresentationTimestamp;
Note: See TracChangeset for help on using the changeset viewer.