Changes between Version 1 and Version 2 of b3Discussion2015


Ignore:
Timestamp:
Nov 12, 2015 10:08:33 AM (8 years ago)
Author:
achristensen@apple.com
Comment:

added newlines

Legend:

Unmodified
Added
Removed
Modified
  • b3Discussion2015

    v1 v2  
    1 Geoff:
    2 artschnica data - JSC much faster on similar CPUs if we’re drawing to the screen, slightly slower for purely computational tasks
    3 
    4 pizlo:
    5 FTL motivation: use c-like compiler to do final optimizations
    6 10-50x time spent in llvm as JSC code when compiling with FTL
    7 b3 goal: reduce compiling time by 5x, this would increase scores on losing benchmarks
    8 wrote new compiler from scratch, 10x faster compiling time than llvm
    9 b3 uses better instruction selector, register allocator
    10 llvm instruction selector uses most of its compile time
    11 not done yet, probably measuring data within a month
    12 targeting all 64-bit architectures, right now it works best on x86_64, working on arm64
    13 b3ir has two div instructions, chillDiv does double division then converts to int, arm div is more like chillDiv
    14 JSC regex engine slows down JetStream and Octane2 benchmarks
    15 Kraken should speed up
    16 llvm doesn’t do or want tail duplication optimization, which would speed up JSC
    17 parallel compiling with llvm has a memory bottleneck, but works ok on computers with lots of CPU cores
    18 Octane2 needs better garbage collection and better regex engine
    19 goal is to be faster than Edge on all benchmarks
    20 b3 - “barebones backend” - ssa “static single assignment” compiler like llvm
    21 air - assembly intermediate representation
    22   register allocation, macro assembler
    23 “bacon, butter, biscuit” - appetizer at a restaurant in campbell
    24 air probably takes more memory than b3
    25 b3 is lowered to air
    26 b3 equivalent of llvmir
    27 air equivalent of mc
    28 
    29 Michael:
    30 calling conventions! We’re going to follow a calling convention.  Almost C calling convention.
    31 use more registers for arguments to prevent storing/loading on stack, which slows down code when calling functions
    32 plan to follow calling conventions so that LLINT or baseline JIT can be called from anywhere
    33 sometimes requires shuffling registers around so that parameters are in the correct registers
    34 JSC on 64-bit platforms dedicate two registers to tag values, which are in callee-saved registers and need to be pushed and popped when using llvm because llvm has different register allocation than JSC
    35 coalesces unnecessary mov operations
    36 tail call optimizations (unrelated) allow recursive functions without adding to the stack each time.
    37 if calling functions knows about the allocated registers in the function, it could break calling convention and do more optimizations, but if the function is recompiled, anything that calls it would need to be recompiled. — this is what inlining is for
    38 armv7 doesn’t have enough registers for this optimization to be useful. i386 doesn’t have any registers for calling.  pizlo: If we needed this optimization enough, we could invent our own calling convention.
    39 pizlo: If a function is well-behaved, we should raise its inlining threshold
    40 profiling bug - need profiling on math
    41 
    42 rniwa: es6 slides
    43 
     1Geoff:[[BR]]
     2artschnica data - JSC much faster on similar CPUs if we’re drawing to the screen, slightly slower for purely computational tasks[[BR]]
     3[[BR]]
     4pizlo:[[BR]]
     5FTL motivation: use c-like compiler to do final optimizations[[BR]]
     610-50x time spent in llvm as JSC code when compiling with FTL[[BR]]
     7b3 goal: reduce compiling time by 5x, this would increase scores on losing benchmarks[[BR]]
     8wrote new compiler from scratch, 10x faster compiling time than llvm[[BR]]
     9b3 uses better instruction selector, register allocator[[BR]]
     10llvm instruction selector uses most of its compile time[[BR]]
     11not done yet, probably measuring data within a month[[BR]]
     12targeting all 64-bit architectures, right now it works best on x86_64, working on arm64[[BR]]
     13b3ir has two div instructions, chillDiv does double division then converts to int, arm div is more like chillDiv[[BR]]
     14JSC regex engine slows down JetStream and Octane2 benchmarks[[BR]]
     15Kraken should speed up[[BR]]
     16llvm doesn’t do or want tail duplication optimization, which would speed up JSC[[BR]]
     17parallel compiling with llvm has a memory bottleneck, but works ok on computers with lots of CPU cores[[BR]]
     18Octane2 needs better garbage collection and better regex engine[[BR]]
     19goal is to be faster than Edge on all benchmarks[[BR]]
     20b3 - “barebones backend” - ssa “static single assignment” compiler like llvm[[BR]]
     21air - assembly intermediate representation[[BR]]
     22  register allocation, macro assembler[[BR]]
     23“bacon, butter, biscuit” - appetizer at a restaurant in campbell[[BR]]
     24air probably takes more memory than b3[[BR]]
     25b3 is lowered to air[[BR]]
     26b3 equivalent of llvmir[[BR]]
     27air equivalent of mc[[BR]]
     28[[BR]]
     29Michael: [[BR]]
     30calling conventions! We’re going to follow a calling convention.  Almost C calling convention.[[BR]]
     31use more registers for arguments to prevent storing/loading on stack, which slows down code when calling functions[[BR]]
     32plan to follow calling conventions so that LLINT or baseline JIT can be called from anywhere[[BR]]
     33sometimes requires shuffling registers around so that parameters are in the correct registers[[BR]]
     34JSC on 64-bit platforms dedicate two registers to tag values, which are in callee-saved registers and need to be pushed and popped when using llvm because llvm has different register allocation than JSC[[BR]]
     35coalesces unnecessary mov operations[[BR]]
     36tail call optimizations (unrelated) allow recursive functions without adding to the stack each time.[[BR]]
     37if calling functions knows about the allocated registers in the function, it could break calling convention and do more optimizations, but if the function is recompiled, anything that calls it would need to be recompiled. — this is what inlining is for[[BR]]
     38armv7 doesn’t have enough registers for this optimization to be useful. i386 doesn’t have any registers for calling.  pizlo: If we needed this optimization enough, we could invent our own calling convention.[[BR]]
     39pizlo: If a function is well-behaved, we should raise its inlining threshold[[BR]]
     40profiling bug - need profiling on math[[BR]]
     41[[BR]]
     42rniwa: es6 slides[[BR]]