Changeset 177000 in webkit


Ignore:
Timestamp:
Dec 8, 2014 5:31:37 PM (9 years ago)
Author:
dino@apple.com
Message:

[Apple] Use Accelerate framework to speed-up FEGaussianBlur
https://bugs.webkit.org/show_bug.cgi?id=139310
<rdar://problem/18434594>

PerformanceTests:

Reviewed by Simon Fraser.

Add an interactive performance test that measures the speed of a set
of blur operations on a generated images.

  • Interactive/blur-filter-timing.html: Added.

Source/WebCore:

<rdar://problem/18434594>

Reviewed by Simon Fraser.

Using Apple's Accelerate framework provides faster blurs
than the parallel jobs approach, especially since r168577
which started performing retina-accurate filters.

Using Accelerate.framework to replace the existing box blur (what
we use to approximate Gaussian blurs) gets about a 20% speedup on
desktop class machines, but between a 2x-6x speedup on iOS hardware.
Obviously this depends on the size of the content being blurred,
but it is still good.

The change is to intercept the platformApply function on
FEGaussianBlur and send it off to Accelerate.

There is an interactive performance test: PerformanceTests/Interactive/blur-filter-timing.html

  • platform/graphics/filters/FEGaussianBlur.cpp:

(WebCore::kernelPosition): Move this to a file static function from the .h.
(WebCore::accelerateBoxBlur): The Accelerate implementation.
(WebCore::standardBoxBlur): The default generic/standard implementation.
(WebCore::FEGaussianBlur::platformApplyGeneric): Use accelerate or the default form.
(WebCore::FEGaussianBlur::platformApply): Don't try the parallelJobs approach if Accelerate is available.

  • platform/graphics/filters/FEGaussianBlur.h:

(WebCore::FEGaussianBlur::kernelPosition): Deleted. Move into the .cpp.

Source/WTF:

<rdar://problem/18434594>

Reviewed by Simon Fraser.

Add a HAVE_ACCELERATE flag, true on Apple platforms.

  • wtf/Platform.h:
Location:
trunk
Files:
1 added
6 edited

Legend:

Unmodified
Added
Removed
  • trunk/PerformanceTests/ChangeLog

    r176077 r177000  
     12014-12-08  Dean Jackson  <dino@apple.com>
     2
     3        [Apple] Use Accelerate framework to speed-up FEGaussianBlur
     4        https://bugs.webkit.org/show_bug.cgi?id=139310
     5
     6        Reviewed by Simon Fraser.
     7
     8        Add an interactive performance test that measures the speed of a set
     9        of blur operations on a generated images.
     10
     11        * Interactive/blur-filter-timing.html: Added.
     12
    1132014-11-13  Zalan Bujtas  <zalan@apple.com>
    214
  • trunk/Source/WTF/ChangeLog

    r176982 r177000  
     12014-12-08  Dean Jackson  <dino@apple.com>
     2
     3        [Apple] Use Accelerate framework to speed-up FEGaussianBlur
     4        https://bugs.webkit.org/show_bug.cgi?id=139310
     5        <rdar://problem/18434594>
     6
     7        Reviewed by Simon Fraser.
     8
     9        Add a HAVE_ACCELERATE flag, true on Apple platforms.
     10
     11        * wtf/Platform.h:
     12
    1132014-12-08  Myles C. Maxfield  <mmaxfield@apple.com>
    214
  • trunk/Source/WTF/wtf/Platform.h

    r176031 r177000  
    10931093#endif
    10941094
     1095#if PLATFORM(COCOA)
     1096#define HAVE_ACCELERATE 1
     1097#endif
     1098
    10951099#endif /* WTF_Platform_h */
  • trunk/Source/WebCore/ChangeLog

    r176999 r177000  
     12014-12-08  Dean Jackson  <dino@apple.com>
     2
     3        [Apple] Use Accelerate framework to speed-up FEGaussianBlur
     4        https://bugs.webkit.org/show_bug.cgi?id=139310
     5        <rdar://problem/18434594>
     6
     7        Reviewed by Simon Fraser.
     8
     9        Using Apple's Accelerate framework provides faster blurs
     10        than the parallel jobs approach, especially since r168577
     11        which started performing retina-accurate filters.
     12
     13        Using Accelerate.framework to replace the existing box blur (what
     14        we use to approximate Gaussian blurs) gets about a 20% speedup on
     15        desktop class machines, but between a 2x-6x speedup on iOS hardware.
     16        Obviously this depends on the size of the content being blurred,
     17        but it is still good.
     18
     19        The change is to intercept the platformApply function on
     20        FEGaussianBlur and send it off to Accelerate.
     21
     22        There is an interactive performance test: PerformanceTests/Interactive/blur-filter-timing.html
     23
     24        * platform/graphics/filters/FEGaussianBlur.cpp:
     25        (WebCore::kernelPosition): Move this to a file static function from the .h.
     26        (WebCore::accelerateBoxBlur): The Accelerate implementation.
     27        (WebCore::standardBoxBlur): The default generic/standard implementation.
     28        (WebCore::FEGaussianBlur::platformApplyGeneric): Use accelerate or the default form.
     29        (WebCore::FEGaussianBlur::platformApply): Don't try the parallelJobs approach if Accelerate is available.
     30        * platform/graphics/filters/FEGaussianBlur.h:
     31        (WebCore::FEGaussianBlur::kernelPosition): Deleted. Move into the .cpp.
     32
    1332014-12-08  Beth Dakin  <bdakin@apple.com>
    234
  • trunk/Source/WebCore/platform/graphics/filters/FEGaussianBlur.cpp

    r173397 r177000  
    3131#include "TextStream.h"
    3232
     33#if HAVE(ACCELERATE)
     34#include <Accelerate/Accelerate.h>
     35#endif
     36
    3337#include <runtime/JSCInlines.h>
    3438#include <runtime/TypedArrayInlines.h>
     
    4549
    4650namespace WebCore {
     51
     52inline void kernelPosition(int blurIteration, unsigned& radius, int& deltaLeft, int& deltaRight)
     53{
     54    // Check http://www.w3.org/TR/SVG/filters.html#feGaussianBlurElement for details.
     55    switch (blurIteration) {
     56    case 0:
     57        if (!(radius % 2)) {
     58            deltaLeft = radius / 2 - 1;
     59            deltaRight = radius - deltaLeft;
     60        } else {
     61            deltaLeft = radius / 2;
     62            deltaRight = radius - deltaLeft;
     63        }
     64        break;
     65    case 1:
     66        if (!(radius % 2)) {
     67            deltaLeft++;
     68            deltaRight--;
     69        }
     70        break;
     71    case 2:
     72        if (!(radius % 2)) {
     73            deltaRight++;
     74            radius++;
     75        }
     76        break;
     77    }
     78}
    4779
    4880FEGaussianBlur::FEGaussianBlur(Filter* filter, float x, float y, EdgeModeType edgeMode)
     
    263295}
    264296
    265 inline void FEGaussianBlur::platformApplyGeneric(Uint8ClampedArray* srcPixelArray, Uint8ClampedArray* tmpPixelArray, unsigned kernelSizeX, unsigned kernelSizeY, IntSize& paintSize)
    266 {
    267     int stride = 4 * paintSize.width();
     297#if HAVE(ACCELERATE)
     298inline void accelerateBoxBlur(const Uint8ClampedArray* src, Uint8ClampedArray* dst, unsigned kernelSize, int stride, int effectWidth, int effectHeight)
     299{
     300    // We must always use an odd radius.
     301    if (kernelSize % 2 != 1)
     302        kernelSize += 1;
     303
     304    vImage_Buffer effectInBuffer;
     305    effectInBuffer.data = src->data();
     306    effectInBuffer.width = effectWidth;
     307    effectInBuffer.height = effectHeight;
     308    effectInBuffer.rowBytes = stride;
     309
     310    vImage_Buffer effectOutBuffer;
     311    effectOutBuffer.data = dst->data();
     312    effectOutBuffer.width = effectWidth;
     313    effectOutBuffer.height = effectHeight;
     314    effectOutBuffer.rowBytes = stride;
     315
     316    // Determine the size of a temporary buffer by calling the function first with a special flag. vImage will return
     317    // the size needed, or an error (which are all negative).
     318    size_t tmpBufferSize = vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, 0, 0, 0, kernelSize, kernelSize, 0, kvImageEdgeExtend | kvImageGetTempBufferSize);
     319    if (tmpBufferSize <= 0)
     320        return;
     321
     322    void* tmpBuffer = fastMalloc(tmpBufferSize);
     323    vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, tmpBuffer, 0, 0, kernelSize, kernelSize, 0, kvImageEdgeExtend);
     324    vImageBoxConvolve_ARGB8888(&effectOutBuffer, &effectInBuffer, tmpBuffer, 0, 0, kernelSize, kernelSize, 0, kvImageEdgeExtend);
     325    vImageBoxConvolve_ARGB8888(&effectInBuffer, &effectOutBuffer, tmpBuffer, 0, 0, kernelSize, kernelSize, 0, kvImageEdgeExtend);
     326    WTF::fastFree(tmpBuffer);
     327
     328    // The final result should be stored in src.
     329    if (dst == src) {
     330        ASSERT(src->length() == dst->length());
     331        memcpy(dst->data(), src->data(), src->length());
     332    }
     333}
     334#endif
     335
     336inline void standardBoxBlur(Uint8ClampedArray* src, Uint8ClampedArray* dst, unsigned kernelSizeX, unsigned kernelSizeY, int stride, IntSize& paintSize, bool isAlphaImage, EdgeModeType edgeMode)
     337{
    268338    int dxLeft = 0;
    269339    int dxRight = 0;
    270340    int dyLeft = 0;
    271341    int dyRight = 0;
    272     Uint8ClampedArray* src = srcPixelArray;
    273     Uint8ClampedArray* dst = tmpPixelArray;
    274342
    275343    for (int i = 0; i < 3; ++i) {
     
    280348                boxBlurNEON(src, dst, kernelSizeX, dxLeft, dxRight, 4, stride, paintSize.width(), paintSize.height());
    281349            else
    282                 boxBlur(src, dst, kernelSizeX, dxLeft, dxRight, 4, stride, paintSize.width(), paintSize.height(), true, m_edgeMode);
     350                boxBlur(src, dst, kernelSizeX, dxLeft, dxRight, 4, stride, paintSize.width(), paintSize.height(), true, edgeMode);
    283351#else
    284             boxBlur(src, dst, kernelSizeX, dxLeft, dxRight, 4, stride, paintSize.width(), paintSize.height(), isAlphaImage(), m_edgeMode);
     352            boxBlur(src, dst, kernelSizeX, dxLeft, dxRight, 4, stride, paintSize.width(), paintSize.height(), isAlphaImage, edgeMode);
    285353#endif
    286354            std::swap(src, dst);
     
    293361                boxBlurNEON(src, dst, kernelSizeY, dyLeft, dyRight, stride, 4, paintSize.height(), paintSize.width());
    294362            else
    295                 boxBlur(src, dst, kernelSizeY, dyLeft, dyRight, stride, 4, paintSize.height(), paintSize.width(), true, m_edgeMode);
     363                boxBlur(src, dst, kernelSizeY, dyLeft, dyRight, stride, 4, paintSize.height(), paintSize.width(), true, edgeMode);
    296364#else
    297             boxBlur(src, dst, kernelSizeY, dyLeft, dyRight, stride, 4, paintSize.height(), paintSize.width(), isAlphaImage(), m_edgeMode);
     365            boxBlur(src, dst, kernelSizeY, dyLeft, dyRight, stride, 4, paintSize.height(), paintSize.width(), isAlphaImage, edgeMode);
    298366#endif
    299367            std::swap(src, dst);
     
    301369    }
    302370
    303     // The final result should be stored in srcPixelArray.
    304     if (dst == srcPixelArray) {
     371    // The final result should be stored in src.
     372    if (dst == src) {
    305373        ASSERT(src->length() == dst->length());
    306374        memcpy(dst->data(), src->data(), src->length());
    307375    }
    308 
     376}
     377
     378inline void FEGaussianBlur::platformApplyGeneric(Uint8ClampedArray* srcPixelArray, Uint8ClampedArray* tmpPixelArray, unsigned kernelSizeX, unsigned kernelSizeY, IntSize& paintSize)
     379{
     380    int stride = 4 * paintSize.width();
     381
     382#if HAVE(ACCELERATE)
     383    if (kernelSizeX == kernelSizeY && (m_edgeMode == EDGEMODE_NONE || m_edgeMode == EDGEMODE_DUPLICATE)) {
     384        accelerateBoxBlur(srcPixelArray, tmpPixelArray, kernelSizeX, stride, paintSize.width(), paintSize.height());
     385        return;
     386    }
     387#endif
     388
     389    standardBoxBlur(srcPixelArray, tmpPixelArray, kernelSizeX, kernelSizeY, stride, paintSize, isAlphaImage(), m_edgeMode);
    309390}
    310391
     
    318399inline void FEGaussianBlur::platformApply(Uint8ClampedArray* srcPixelArray, Uint8ClampedArray* tmpPixelArray, unsigned kernelSizeX, unsigned kernelSizeY, IntSize& paintSize)
    319400{
     401#if !HAVE(ACCELERATE)
    320402    int scanline = 4 * paintSize.width();
    321403    int extraHeight = 3 * kernelSizeY * 0.5f;
     
    379461        // Fallback to single threaded mode.
    380462    }
     463#endif
    381464
    382465    // The selection here eventually should happen dynamically on some platforms.
  • trunk/Source/WebCore/platform/graphics/filters/FEGaussianBlur.h

    r173397 r177000  
    7171    FEGaussianBlur(Filter*, float, float, EdgeModeType);
    7272
    73     static inline void kernelPosition(int boxBlur, unsigned& std, int& dLeft, int& dRight);
    7473    inline void platformApply(Uint8ClampedArray* srcPixelArray, Uint8ClampedArray* tmpPixelArray, unsigned kernelSizeX, unsigned kernelSizeY, IntSize& paintSize);
    7574
     
    8180};
    8281
    83 inline void FEGaussianBlur::kernelPosition(int boxBlur, unsigned& std, int& dLeft, int& dRight)
    84 {
    85     // check http://www.w3.org/TR/SVG/filters.html#feGaussianBlurElement for details
    86     switch (boxBlur) {
    87     case 0:
    88         if (!(std % 2)) {
    89             dLeft = std / 2 - 1;
    90             dRight = std - dLeft;
    91         } else {
    92             dLeft = std / 2;
    93             dRight = std - dLeft;
    94         }
    95         break;
    96     case 1:
    97         if (!(std % 2)) {
    98             dLeft++;
    99             dRight--;
    100         }
    101         break;
    102     case 2:
    103         if (!(std % 2)) {
    104             dRight++;
    105             std++;
    106         }
    107         break;
    108     }
    109 }
    110 
    11182} // namespace WebCore
    11283
Note: See TracChangeset for help on using the changeset viewer.