Changeset 206274 in webkit


Ignore:
Timestamp:
Sep 22, 2016 2:11:42 PM (8 years ago)
Author:
fpizlo@apple.com
Message:

Fences on x86 should be a lot cheaper
https://bugs.webkit.org/show_bug.cgi?id=162417

Reviewed by Mark Lam and Geoffrey Garen.
Source/JavaScriptCore:

It turns out that:

lock; orl $0, (%rsp)


does everything that we wanted from:

mfence


And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
So, we should use ortop from now on.

This part of the change is for the JITs. MacroAssembler::memoryFence() appears to always
mean something like an acqrel fence, so it's safe to make this use ortop. Since B3's Fence
compiles to Air MemoryFence, which is just MacroAssembler::memoryFence(), this also changes
B3 codegen.

  • assembler/MacroAssemblerX86Common.h:

(JSC::MacroAssemblerX86Common::memoryFence):

  • assembler/X86Assembler.h:

(JSC::X86Assembler::lock):

  • b3/testb3.cpp:

(JSC::B3::testX86MFence):
(JSC::B3::testX86CompilerFence):

Source/WTF:


It turns out that:

lock; orl $0, (%rsp)


does everything that we wanted from:

mfence


And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
So, we should use ortop from now on.

This part of the change just affects our Atomics. I also changed this in the JITs.

  • wtf/Atomics.h:

(WTF::x86_ortop):
(WTF::storeLoadFence):
(WTF::x86_mfence): Deleted.

Location:
trunk/Source
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • trunk/Source/JavaScriptCore/ChangeLog

    r206268 r206274  
     12016-09-22  Filip Pizlo  <fpizlo@apple.com>
     2
     3        Fences on x86 should be a lot cheaper
     4        https://bugs.webkit.org/show_bug.cgi?id=162417
     5
     6        Reviewed by Mark Lam and Geoffrey Garen.
     7
     8        It turns out that:
     9       
     10            lock; orl $0, (%rsp)
     11       
     12        does everything that we wanted from:
     13       
     14            mfence
     15       
     16        And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
     17        friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
     18        So, we should use ortop from now on.
     19       
     20        This part of the change is for the JITs. MacroAssembler::memoryFence() appears to always
     21        mean something like an acqrel fence, so it's safe to make this use ortop. Since B3's Fence
     22        compiles to Air MemoryFence, which is just MacroAssembler::memoryFence(), this also changes
     23        B3 codegen.
     24
     25        * assembler/MacroAssemblerX86Common.h:
     26        (JSC::MacroAssemblerX86Common::memoryFence):
     27        * assembler/X86Assembler.h:
     28        (JSC::X86Assembler::lock):
     29        * b3/testb3.cpp:
     30        (JSC::B3::testX86MFence):
     31        (JSC::B3::testX86CompilerFence):
     32
    1332016-09-22  Joseph Pecoraro  <pecoraro@apple.com>
    234
  • trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h

    r205656 r206274  
    26302630    }
    26312631   
     2632    // We take memoryFence to mean acqrel. This has acqrel semantics on x86.
    26322633    void memoryFence()
    26332634    {
    2634         m_assembler.mfence();
     2635        // lock; orl $0, (%rsp)
     2636        m_assembler.lock();
     2637        m_assembler.orl_im(0, 0, X86Registers::esp);
    26352638    }
    26362639
  • trunk/Source/JavaScriptCore/assembler/X86Assembler.h

    r205283 r206274  
    250250        OP_CALL_rel32                   = 0xE8,
    251251        OP_JMP_rel32                    = 0xE9,
     252        PRE_LOCK                        = 0xF0,
    252253        PRE_SSE_F2                      = 0xF2,
    253254        PRE_SSE_F3                      = 0xF3,
     
    26852686    }
    26862687   
     2688    void lock()
     2689    {
     2690        m_formatter.prefix(PRE_LOCK);
     2691    }
     2692   
    26872693    void mfence()
    26882694    {
  • trunk/Source/JavaScriptCore/b3/testb3.cpp

    r206226 r206274  
    1306313063   
    1306413064    auto code = compile(proc);
    13065     checkUsesInstruction(*code, "mfence");
     13065    checkUsesInstruction(*code, "lock or $0x0, (%rsp)");
     13066    checkDoesNotUseInstruction(*code, "mfence");
    1306613067}
    1306713068
     
    1307613077   
    1307713078    auto code = compile(proc);
     13079    checkDoesNotUseInstruction(*code, "lock");
    1307813080    checkDoesNotUseInstruction(*code, "mfence");
    1307913081}
  • trunk/Source/WTF/ChangeLog

    r206249 r206274  
     12016-09-22  Filip Pizlo  <fpizlo@apple.com>
     2
     3        Fences on x86 should be a lot cheaper
     4        https://bugs.webkit.org/show_bug.cgi?id=162417
     5
     6        Reviewed by Mark Lam and Geoffrey Garen.
     7       
     8        It turns out that:
     9       
     10            lock; orl $0, (%rsp)
     11       
     12        does everything that we wanted from:
     13       
     14            mfence
     15       
     16        And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
     17        friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
     18        So, we should use ortop from now on.
     19       
     20        This part of the change just affects our Atomics. I also changed this in the JITs.
     21
     22        * wtf/Atomics.h:
     23        (WTF::x86_ortop):
     24        (WTF::storeLoadFence):
     25        (WTF::x86_mfence): Deleted.
     26
    1272016-09-21  Alexey Proskuryakov  <ap@apple.com>
    228
  • trunk/Source/WTF/wtf/Atomics.h

    r205921 r206274  
    168168#elif CPU(X86) || CPU(X86_64)
    169169
    170 inline void x86_mfence()
     170inline void x86_ortop()
    171171{
    172172#if OS(WINDOWS)
     
    177177    MemoryBarrier();
    178178#else
    179     asm volatile("mfence" ::: "memory");
     179    // This has acqrel semantics and is much cheaper than mfence. For exampe, in the JSC GC, using
     180    // mfence as a store-load fence was a 9% slow-down on Octane/splay while using this was neutral.
     181    asm volatile("lock; orl $0, (%%rsp)" ::: "memory");
    180182#endif
    181183}
     
    183185inline void loadLoadFence() { compilerFence(); }
    184186inline void loadStoreFence() { compilerFence(); }
    185 inline void storeLoadFence() { x86_mfence(); }
     187inline void storeLoadFence() { x86_ortop(); }
    186188inline void storeStoreFence() { compilerFence(); }
    187189inline void memoryBarrierAfterLock() { compilerFence(); }
Note: See TracChangeset for help on using the changeset viewer.