Context Navigation

← Previous Changeset
Next Changeset →

Changeset 206274 in webkit

Timestamp:

Sep 22, 2016, 2:11:42 PM (9 years ago)

Author:

fpizlo@apple.com

Message:

Fences on x86 should be a lot cheaper
https://bugs.webkit.org/show_bug.cgi?id=162417

Reviewed by Mark Lam and Geoffrey Garen.
Source/JavaScriptCore:

It turns out that:

lock; orl $0, (%rsp)

does everything that we wanted from:

mfence

And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
So, we should use ortop from now on.

This part of the change is for the JITs. MacroAssembler::memoryFence() appears to always
mean something like an acqrel fence, so it's safe to make this use ortop. Since B3's Fence
compiles to Air MemoryFence, which is just MacroAssembler::memoryFence(), this also changes
B3 codegen.

assembler/MacroAssemblerX86Common.h:

(JSC::MacroAssemblerX86Common::memoryFence):

assembler/X86Assembler.h:

(JSC::X86Assembler::lock):

b3/testb3.cpp:

(JSC::B3::testX86MFence):
(JSC::B3::testX86CompilerFence):

Source/WTF:

It turns out that:

lock; orl $0, (%rsp)

does everything that we wanted from:

mfence

wtf/Atomics.h:

(WTF::x86_ortop):
(WTF::storeLoadFence):
(WTF::x86_mfence): Deleted.

Location:

trunk/Source

Files:

: 6 edited

JavaScriptCore/ChangeLog (modified) (1 diff)
JavaScriptCore/assembler/MacroAssemblerX86Common.h (modified) (1 diff)
JavaScriptCore/assembler/X86Assembler.h (modified) (2 diffs)
JavaScriptCore/b3/testb3.cpp (modified) (2 diffs)
WTF/ChangeLog (modified) (1 diff)
WTF/wtf/Atomics.h (modified) (3 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/Source/JavaScriptCore/ChangeLog

-              r206268
+              r206274
+-09-22  Filip Pizlo  <fpizlo@apple.com>
+        Fences on x86 should be a lot cheaper
+        https://bugs.webkit.org/show_bug.cgi?id=162417
+        Reviewed by Mark Lam and Geoffrey Garen.
+        It turns out that:
+            lock; orl $0, (%rsp)
+        does everything that we wanted from:
+            mfence
+        And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
+        friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
+        So, we should use ortop from now on.
+        This part of the change is for the JITs. MacroAssembler::memoryFence() appears to always
+        mean something like an acqrel fence, so it's safe to make this use ortop. Since B3's Fence
+        compiles to Air MemoryFence, which is just MacroAssembler::memoryFence(), this also changes
+        B3 codegen.
+        * assembler/MacroAssemblerX86Common.h:
+        (JSC::MacroAssemblerX86Common::memoryFence):
+        * assembler/X86Assembler.h:
+        (JSC::X86Assembler::lock):
+        * b3/testb3.cpp:
+        (JSC::B3::testX86MFence):
+        (JSC::B3::testX86CompilerFence):
 -09-22  Joseph Pecoraro  <pecoraro@apple.com>

trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h

r205656	r206274
2630	2630	}
2631	2631
	2632	// We take memoryFence to mean acqrel. This has acqrel semantics on x86.
2632	2633	void memoryFence()
2633	2634	{
2634		m_assembler.mfence();
	2635	// lock; orl $0, (%rsp)
	2636	m_assembler.lock();
	2637	m_assembler.orl_im(0, 0, X86Registers::esp);
2635	2638	}
2636	2639

trunk/Source/JavaScriptCore/assembler/X86Assembler.h

-              r205283
+              r206274
         OP_CALL_rel32                   = 0xE8,
         OP_JMP_rel32                    = 0xE9,
+        PRE_LOCK                        = 0xF0,
         PRE_SSE_F2                      = 0xF2,
         PRE_SSE_F3                      = 0xF3,
 …
+    }
+    void lock()
+    {
+        m_formatter.prefix(PRE_LOCK);
+    }
     void mfence()
+    {

trunk/Source/JavaScriptCore/b3/testb3.cpp

-              r206226
+              r206274
     auto code = compile(proc);
+    checkUsesInstruction(*code, "mfence");
+    checkUsesInstruction(*code, "lock or $0x0, (%rsp)");
+    checkDoesNotUseInstruction(*code, "mfence");
+}
 …
     auto code = compile(proc);
+    checkDoesNotUseInstruction(*code, "lock");
     checkDoesNotUseInstruction(*code, "mfence");
+}

trunk/Source/WTF/ChangeLog

-              r206249
+              r206274
+-09-22  Filip Pizlo  <fpizlo@apple.com>
+        Fences on x86 should be a lot cheaper
+        https://bugs.webkit.org/show_bug.cgi?id=162417
+        Reviewed by Mark Lam and Geoffrey Garen.
+        It turns out that:
+            lock; orl $0, (%rsp)
+        does everything that we wanted from:
+            mfence
+        And it's a lot faster. When I tried mfence for making object visiting concurrent-GC-TSO-
+        friendly, it was a 9% regression on Octane/splay. But when I tried ortop, it was neutral.
+        So, we should use ortop from now on.
+        This part of the change just affects our Atomics. I also changed this in the JITs.
+        * wtf/Atomics.h:
+        (WTF::x86_ortop):
+        (WTF::storeLoadFence):
+        (WTF::x86_mfence): Deleted.
 -09-21  Alexey Proskuryakov  <ap@apple.com>

trunk/Source/WTF/wtf/Atomics.h

-              r205921
+              r206274
 #elif CPU(X86) || CPU(X86_64)
 inline void x86_mfence()
+inline void x86_ortop()
+{
 #if OS(WINDOWS)
 …
     MemoryBarrier();
 #else
+    asm volatile("mfence" ::: "memory");
+    // This has acqrel semantics and is much cheaper than mfence. For exampe, in the JSC GC, using
+    // mfence as a store-load fence was a 9% slow-down on Octane/splay while using this was neutral.
+    asm volatile("lock; orl $0, (%%rsp)" ::: "memory");
 #endif
+}
 …
 inline void loadLoadFence() { compilerFence(); }
 inline void loadStoreFence() { compilerFence(); }
 inline void storeLoadFence() { x86_mfence(); }
+inline void storeLoadFence() { x86_ortop(); }
 inline void storeStoreFence() { compilerFence(); }
 inline void memoryBarrierAfterLock() { compilerFence(); }

Note: See TracChangeset for help on using the changeset viewer.