wiki:b3Discussion2015

Version 1 (modified by achristensen@apple.com, 6 years ago) (diff)

added b3 discussion notes

Geoff: artschnica data - JSC much faster on similar CPUs if we’re drawing to the screen, slightly slower for purely computational tasks

pizlo: FTL motivation: use c-like compiler to do final optimizations 10-50x time spent in llvm as JSC code when compiling with FTL b3 goal: reduce compiling time by 5x, this would increase scores on losing benchmarks wrote new compiler from scratch, 10x faster compiling time than llvm b3 uses better instruction selector, register allocator llvm instruction selector uses most of its compile time not done yet, probably measuring data within a month targeting all 64-bit architectures, right now it works best on x86_64, working on arm64 b3ir has two div instructions, chillDiv does double division then converts to int, arm div is more like chillDiv JSC regex engine slows down JetStream and Octane2 benchmarks Kraken should speed up llvm doesn’t do or want tail duplication optimization, which would speed up JSC parallel compiling with llvm has a memory bottleneck, but works ok on computers with lots of CPU cores Octane2 needs better garbage collection and better regex engine goal is to be faster than Edge on all benchmarks b3 - “barebones backend” - ssa “static single assignment” compiler like llvm air - assembly intermediate representation

register allocation, macro assembler

“bacon, butter, biscuit” - appetizer at a restaurant in campbell air probably takes more memory than b3 b3 is lowered to air b3 equivalent of llvmir air equivalent of mc

Michael: calling conventions! We’re going to follow a calling convention. Almost C calling convention. use more registers for arguments to prevent storing/loading on stack, which slows down code when calling functions plan to follow calling conventions so that LLINT or baseline JIT can be called from anywhere sometimes requires shuffling registers around so that parameters are in the correct registers JSC on 64-bit platforms dedicate two registers to tag values, which are in callee-saved registers and need to be pushed and popped when using llvm because llvm has different register allocation than JSC coalesces unnecessary mov operations tail call optimizations (unrelated) allow recursive functions without adding to the stack each time. if calling functions knows about the allocated registers in the function, it could break calling convention and do more optimizations, but if the function is recompiled, anything that calls it would need to be recompiled. — this is what inlining is for armv7 doesn’t have enough registers for this optimization to be useful. i386 doesn’t have any registers for calling. pizlo: If we needed this optimization enough, we could invent our own calling convention. pizlo: If a function is well-behaved, we should raise its inlining threshold profiling bug - need profiling on math

rniwa: es6 slides