Changeset 206539 in webkit


Ignore:
Timestamp:
Sep 28, 2016, 1:30:44 PM (9 years ago)
Author:
fpizlo@apple.com
Message:

Optimize B3->Air lowering of Fence on ARM
https://bugs.webkit.org/show_bug.cgi?id=162342

Reviewed by Geoffrey Garen.

This gives us comprehensive support for standalone fences on x86 and ARM. The changes are as
follows:

  • Sets in stone the rule that the heaps of a B3::Fence tell you what the fence protects. If the fence reads, it protects motion of stores. If the fence writes, it protects motion of loads. This allows us to express for example load-load fences in a portable way: on x86 they will just block B3 optimizations and emit no code, while on ARM you will get some fence.
  • Adds comprehensive support for WTF-style fences in the ARM assembler. I simplified it just a bit to match what B3, the main client, knows. There are three fences: MemoryFence, StoreFence, and LoadFence. On x86, MemoryFence is ortop while StoreFence and LoadFence emit no code. On ARM64, MemoryFence and LoadFence are dmb ish while StoreFence is dmb ishst.
  • Tests! To test this, I needed to teach the disassembler how to disassemble dmb ish and dmb ishst. I think that the canonical way to do it would be to create a group for dmb and then teach that group how to decode the operands. But I don't actually know what are all of the ways of encoding dmb, so I'd rather that unrecognized encodings fall through to the ".long blah" bailout. So, this creates explicit matching rules for "dmb ish" and "dmb ishst", which is the most conservative thing we can do.
  • assembler/ARM64Assembler.h:

(JSC::ARM64Assembler::dmbISH):
(JSC::ARM64Assembler::dmbISHST):
(JSC::ARM64Assembler::dmbSY): Deleted.

  • assembler/MacroAssemblerARM64.h:

(JSC::MacroAssemblerARM64::memoryFence):
(JSC::MacroAssemblerARM64::storeFence):
(JSC::MacroAssemblerARM64::loadFence):

  • assembler/MacroAssemblerX86Common.h:

(JSC::MacroAssemblerX86Common::storeFence):
(JSC::MacroAssemblerX86Common::loadFence):

  • b3/B3FenceValue.h:
  • b3/B3LowerToAir.cpp:

(JSC::B3::Air::LowerToAir::lower):

  • b3/air/AirOpcode.opcodes:
  • b3/testb3.cpp:

(JSC::B3::testMemoryFence):
(JSC::B3::testStoreFence):
(JSC::B3::testLoadFence):
(JSC::B3::run):
(JSC::B3::testX86MFence): Deleted.
(JSC::B3::testX86CompilerFence): Deleted.

  • disassembler/ARM64/A64DOpcode.cpp:

(JSC::ARM64Disassembler::A64DOpcodeDmbIsh::format):
(JSC::ARM64Disassembler::A64DOpcodeDmbIshSt::format):

  • disassembler/ARM64/A64DOpcode.h:

(JSC::ARM64Disassembler::A64DOpcodeDmbIsh::opName):
(JSC::ARM64Disassembler::A64DOpcodeDmbIshSt::opName):

Location:
trunk/Source/JavaScriptCore
Files:
10 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/ChangeLog

    r206533 r206539  
     12016-09-28  Filip Pizlo  <fpizlo@apple.com>
     2
     3        Optimize B3->Air lowering of Fence on ARM
     4        https://bugs.webkit.org/show_bug.cgi?id=162342
     5
     6        Reviewed by Geoffrey Garen.
     7
     8        This gives us comprehensive support for standalone fences on x86 and ARM. The changes are as
     9        follows:
     10
     11        - Sets in stone the rule that the heaps of a B3::Fence tell you what the fence protects. If the
     12          fence reads, it protects motion of stores. If the fence writes, it protects motion of loads.
     13          This allows us to express for example load-load fences in a portable way: on x86 they will just
     14          block B3 optimizations and emit no code, while on ARM you will get some fence.
     15
     16        - Adds comprehensive support for WTF-style fences in the ARM assembler. I simplified it just a bit
     17          to match what B3, the main client, knows. There are three fences: MemoryFence, StoreFence, and
     18          LoadFence. On x86, MemoryFence is ortop while StoreFence and LoadFence emit no code. On ARM64,
     19          MemoryFence and LoadFence are dmb ish while StoreFence is dmb ishst.
     20
     21        - Tests! To test this, I needed to teach the disassembler how to disassemble dmb ish and dmb
     22          ishst. I think that the canonical way to do it would be to create a group for dmb and then teach
     23          that group how to decode the operands. But I don't actually know what are all of the ways of
     24          encoding dmb, so I'd rather that unrecognized encodings fall through to the ".long blah"
     25          bailout. So, this creates explicit matching rules for "dmb ish" and "dmb ishst", which is the
     26          most conservative thing we can do.
     27
     28        * assembler/ARM64Assembler.h:
     29        (JSC::ARM64Assembler::dmbISH):
     30        (JSC::ARM64Assembler::dmbISHST):
     31        (JSC::ARM64Assembler::dmbSY): Deleted.
     32        * assembler/MacroAssemblerARM64.h:
     33        (JSC::MacroAssemblerARM64::memoryFence):
     34        (JSC::MacroAssemblerARM64::storeFence):
     35        (JSC::MacroAssemblerARM64::loadFence):
     36        * assembler/MacroAssemblerX86Common.h:
     37        (JSC::MacroAssemblerX86Common::storeFence):
     38        (JSC::MacroAssemblerX86Common::loadFence):
     39        * b3/B3FenceValue.h:
     40        * b3/B3LowerToAir.cpp:
     41        (JSC::B3::Air::LowerToAir::lower):
     42        * b3/air/AirOpcode.opcodes:
     43        * b3/testb3.cpp:
     44        (JSC::B3::testMemoryFence):
     45        (JSC::B3::testStoreFence):
     46        (JSC::B3::testLoadFence):
     47        (JSC::B3::run):
     48        (JSC::B3::testX86MFence): Deleted.
     49        (JSC::B3::testX86CompilerFence): Deleted.
     50        * disassembler/ARM64/A64DOpcode.cpp:
     51        (JSC::ARM64Disassembler::A64DOpcodeDmbIsh::format):
     52        (JSC::ARM64Disassembler::A64DOpcodeDmbIshSt::format):
     53        * disassembler/ARM64/A64DOpcode.h:
     54        (JSC::ARM64Disassembler::A64DOpcodeDmbIsh::opName):
     55        (JSC::ARM64Disassembler::A64DOpcodeDmbIshSt::opName):
     56
    1572016-09-28  Joseph Pecoraro  <pecoraro@apple.com>
    258
  • trunk/Source/JavaScriptCore/assembler/ARM64Assembler.h

    r206525 r206539  
    14971497    }
    14981498   
    1499     ALWAYS_INLINE void dmbSY()
    1500     {
    1501         insn(0xd5033fbf);
     1499    ALWAYS_INLINE void dmbISH()
     1500    {
     1501        insn(0xd5033bbf);
     1502    }
     1503
     1504    ALWAYS_INLINE void dmbISHST()
     1505    {
     1506        insn(0xd5033abf);
    15021507    }
    15031508
  • trunk/Source/JavaScriptCore/assembler/MacroAssemblerARM64.h

    r206525 r206539  
    32163216    }
    32173217   
     3218    // We take memoryFence to mean acqrel. This has acqrel semantics on ARM64.
    32183219    void memoryFence()
    32193220    {
    3220         m_assembler.dmbSY();
    3221     }
    3222 
     3221        m_assembler.dmbISH();
     3222    }
     3223
     3224    // We take this to mean that it prevents motion of normal stores. That's a store fence on ARM64 (hence the "ST").
     3225    void storeFence()
     3226    {
     3227        m_assembler.dmbISHST();
     3228    }
     3229
     3230    // We take this to mean that it prevents motion of normal loads. Ideally we'd have expressed this
     3231    // using dependencies or half fences, but there are cases where this is as good as it gets. The only
     3232    // way to get a standalone load fence instruction on ARM is to use the ISH fence, which is just like
     3233    // the memoryFence().
     3234    void loadFence()
     3235    {
     3236        m_assembler.dmbISH();
     3237    }
    32233238
    32243239    // Misc helper functions.
  • trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h

    r206525 r206539  
    26352635        m_assembler.lock();
    26362636        m_assembler.orl_im(0, 0, X86Registers::esp);
     2637    }
     2638
     2639    // We take this to mean that it prevents motion of normal stores. So, it's a no-op on x86.
     2640    void storeFence()
     2641    {
     2642    }
     2643
     2644    // We take this to mean that it prevents motion of normal loads. So, it's a no-op on x86.
     2645    void loadFence()
     2646    {
    26372647    }
    26382648
  • trunk/Source/JavaScriptCore/b3/B3FenceValue.h

    r206226 r206539  
    4242    // the lowering of a Fence based on the heaps. For example, if a fence does not write anything
    4343    // then it is understood to be a store-store fence. On x86, this may lead us to not emit any
    44     // code, while on ARM we may emit a cheaper fence (dmb ishst instead of dmb ish).
     44    // code, while on ARM we may emit a cheaper fence (dmb ishst instead of dmb ish). We will do
     45    // the same optimization for load-load fences, which are expressed as a Fence that writes but
     46    // does not read.
    4547    //
    4648    // This abstraction allows us to cover all of the fences on x86 and all of the standalone fences
     
    6668    // dmb ish and dmb ishst. You can emit a dmb ishst by using a Fence with an empty write heap.
    6769    // Otherwise, you will get a dmb ish.
    68     // FIXME: Make this work right on ARM. https://bugs.webkit.org/show_bug.cgi?id=162342
    6970    // FIXME: Add fenced memory accesses. https://bugs.webkit.org/show_bug.cgi?id=162349
    7071    // FIXME: Add a Depend operation. https://bugs.webkit.org/show_bug.cgi?id=162350
  • trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp

    r206226 r206539  
    20502050        case Fence: {
    20512051            FenceValue* fence = m_value->as<FenceValue>();
    2052             if (isX86() && !fence->write)
     2052            if (!fence->write && !fence->read)
    20532053                return;
    2054             // FIXME: Optimize this on ARM.
    2055             // https://bugs.webkit.org/show_bug.cgi?id=162342
     2054            if (!fence->write) {
     2055                // A fence that reads but does not write is for protecting motion of stores.
     2056                append(StoreFence);
     2057                return;
     2058            }
     2059            if (!fence->read) {
     2060                // A fence that writes but does not read is for protecting motion of loads.
     2061                append(LoadFence);
     2062                return;
     2063            }
    20562064            append(MemoryFence);
    20572065            return;
  • trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes

    r206226 r206539  
    845845
    846846MemoryFence /effects
     847StoreFence /effects
     848LoadFence /effects
    847849
    848850Jump /branch
  • trunk/Source/JavaScriptCore/b3/testb3.cpp

    r206274 r206539  
    1305313053}
    1305413054
    13055 void testX86MFence()
     13055void testMemoryFence()
    1305613056{
    1305713057    Procedure proc;
     
    1306013060   
    1306113061    root->appendNew<FenceValue>(proc, Origin());
    13062     root->appendNew<Value>(proc, Return, Origin());
     13062    root->appendNew<Value>(proc, Return, Origin(), root->appendIntConstant(proc, Origin(), Int32, 42));
    1306313063   
    1306413064    auto code = compile(proc);
    13065     checkUsesInstruction(*code, "lock or $0x0, (%rsp)");
     13065    CHECK_EQ(invoke<int>(*code), 42);
     13066    if (isX86())
     13067        checkUsesInstruction(*code, "lock or $0x0, (%rsp)");
     13068    if (isARM64())
     13069        checkUsesInstruction(*code, "dmb    ish");
    1306613070    checkDoesNotUseInstruction(*code, "mfence");
    13067 }
    13068 
    13069 void testX86CompilerFence()
     13071    checkDoesNotUseInstruction(*code, "dmb    ishst");
     13072}
     13073
     13074void testStoreFence()
    1307013075{
    1307113076    Procedure proc;
     
    1307413079   
    1307513080    root->appendNew<FenceValue>(proc, Origin(), HeapRange::top(), HeapRange());
    13076     root->appendNew<Value>(proc, Return, Origin());
     13081    root->appendNew<Value>(proc, Return, Origin(), root->appendIntConstant(proc, Origin(), Int32, 42));
    1307713082   
    1307813083    auto code = compile(proc);
     13084    CHECK_EQ(invoke<int>(*code), 42);
    1307913085    checkDoesNotUseInstruction(*code, "lock");
    1308013086    checkDoesNotUseInstruction(*code, "mfence");
     13087    if (isARM64())
     13088        checkUsesInstruction(*code, "dmb    ishst");
     13089}
     13090
     13091void testLoadFence()
     13092{
     13093    Procedure proc;
     13094   
     13095    BasicBlock* root = proc.addBlock();
     13096   
     13097    root->appendNew<FenceValue>(proc, Origin(), HeapRange(), HeapRange::top());
     13098    root->appendNew<Value>(proc, Return, Origin(), root->appendIntConstant(proc, Origin(), Int32, 42));
     13099   
     13100    auto code = compile(proc);
     13101    CHECK_EQ(invoke<int>(*code), 42);
     13102    checkDoesNotUseInstruction(*code, "lock");
     13103    checkDoesNotUseInstruction(*code, "mfence");
     13104    if (isARM64())
     13105        checkUsesInstruction(*code, "dmb    ish");
     13106    checkDoesNotUseInstruction(*code, "dmb    ishst");
    1308113107}
    1308213108
     
    1451114537        RUN(testBranchBitAndImmFusion(Load, Int64, 1, Air::BranchTest32, Air::Arg::Addr));
    1451214538       
    14513         RUN(testX86MFence());
    14514         RUN(testX86CompilerFence());
    1451514539    }
    1451614540
     
    1452014544    }
    1452114545
     14546    RUN(testMemoryFence());
     14547    RUN(testStoreFence());
     14548    RUN(testLoadFence());
     14549   
    1452214550    if (tasks.isEmpty())
    1452314551        usage();
  • trunk/Source/JavaScriptCore/disassembler/ARM64/A64DOpcode.cpp

    r172813 r206539  
    11/*
    2  * Copyright (C) 2012 Apple Inc. All rights reserved.
     2 * Copyright (C) 2012, 2016 Apple Inc. All rights reserved.
    33 *
    44 * Redistribution and use in source and binary forms, with or without
     
    8585    OPCODE_GROUP_ENTRY(0x15, A64DOpcodeCompareAndBranchImmediate),
    8686    OPCODE_GROUP_ENTRY(0x15, A64DOpcodeHint),
     87    OPCODE_GROUP_ENTRY(0x15, A64DOpcodeDmbIsh),
     88    OPCODE_GROUP_ENTRY(0x15, A64DOpcodeDmbIshSt),
    8789    OPCODE_GROUP_ENTRY(0x16, A64DOpcodeUnconditionalBranchImmediate),
    8890    OPCODE_GROUP_ENTRY(0x16, A64DOpcodeUnconditionalBranchRegister),
     
    824826}
    825827
     828const char* A64DOpcodeDmbIsh::format()
     829{
     830    appendInstructionName("dmb");
     831    appendString("ish");
     832    return m_formatBuffer;
     833}
     834
     835const char* A64DOpcodeDmbIshSt::format()
     836{
     837    appendInstructionName("dmb");
     838    appendString("ishst");
     839    return m_formatBuffer;
     840}
     841
    826842// A zero in an entry of the table means the instruction is Unallocated
    827843const char* const A64DOpcodeLoadStore::s_opNames[32] = {
  • trunk/Source/JavaScriptCore/disassembler/ARM64/A64DOpcode.h

    r206525 r206539  
    11/*
    2  * Copyright (C) 2012 Apple Inc. All rights reserved.
     2 * Copyright (C) 2012, 2016 Apple Inc. All rights reserved.
    33 *
    44 * Redistribution and use in source and binary forms, with or without
     
    510510};
    511511
     512class A64DOpcodeDmbIsh : public A64DOpcode {
     513public:
     514    static const uint32_t mask = 0xffffffff;
     515    static const uint32_t pattern = 0xd5033bbf;
     516
     517    DEFINE_STATIC_FORMAT(A64DOpcodeDmbIsh, thisObj);
     518
     519    const char* format();
     520
     521    const char* opName() { return "dmb"; }
     522};
     523
     524class A64DOpcodeDmbIshSt : public A64DOpcode {
     525public:
     526    static const uint32_t mask = 0xffffffff;
     527    static const uint32_t pattern = 0xd5033abf;
     528
     529    DEFINE_STATIC_FORMAT(A64DOpcodeDmbIshSt, thisObj);
     530
     531    const char* format();
     532
     533    const char* opName() { return "dmb"; }
     534};
     535
    512536class A64DOpcodeLoadStore : public A64DOpcode {
    513537private:
Note: See TracChangeset for help on using the changeset viewer.