wiki:SquirrelFish

SquirrelFish

SquirrelFish is an incremental rewrite of JavaScriptCore to turn it into a bytecode interpreter. It is a direct-dispatch register VM. Currently it is showing promising performance results, even at a fairly early stage, and we are working to fix the remaining blockers to merging to trunk.

Blockers to merge to trunk

See SquirrelFishBlockers.

Lost optimizations

Some optimizations have been temporarily lost as part of the squirrelfish work. Other speedups seem to make up for them, so they could be restored either before or after merging back to trunk.

  • optimized multiscope access 18645
  • optimized access to global built-ins (originally done in r31226) 18646
  • static type inference of things like "numeric less than" and "string add" 18647

Additional optimization ideas

See SquirrelFishPerfIdeas.

What people are working on:

Geoff is working on:

Fixing scope chain handling, JS regression test failures.

Cameron is working on:

JS regression test failures.

Better code generation. We have been pondering whether to have a separate peephole optimization pass or to incorporate peephole optimization into code generation. Either way, we should look at some code generation algorithms based on tile matching. We also want to choose an approach that will be compatible with planned extensions, e.g. superinstructions.

Oliver is working on:

Getters & setters.

Sam is working on (when he sees fit to do so):

Maciej is working on:

Organizing remaining blocker issues.

Assorted notes, some of these may be obsolete

Oliver's patch to fix up "this" allocation was a .3% regression (http://trac.webkit.org/changeset/33541). We believe that allocating pre-capacity for the register file should fix this. Let's do that. Is it a .3% speedup?

"With", and "catch" scopes are not marked during GC.

Optimize dynamic scopes that aren't closures not to save the environment on return

Statically detect presence of "with" and/or "catch" in the parser.

Evaluation of a script is supposed to produce a value. This requires storing the value of the last value-producing statement to execute. We need to detect the last top-level value-producing statement in a program, and save its value. Basically, that just means passing an explicit "dst" register to its emitCode function.

Make const work -- const info has to go in the symbol table, so writes to const vars can turn to no-ops at compile time.

Is it safe for Lists to store a direct pointer to the register file? What if the register file reallocates?

Change conservative mark of register file to exact mark -- use zero fill plus type tagging to know whether to mark a register

automatic conversion of "this" to global object doesn't work.

If we do enough work to prove it's viable compatibility-wise, we can remove support for function.caller and reduce support for function.arguments such that we only support it in a statically detectable form inside "function". (See js1_4/Functions/function-001.js for details on the origins of function.arguments.) That way, we could avoid all runtime tracking of call frames, simplifying the engine, and enable function inlining and other optimizations.

Does global/eval code always need a full scope chain, since global eval may be called at any time?

phase out implementsCall in favor of all clients using an inline function that calls getCallData.

Phase out implementsConstruct in favor of all clients using an inline function that calls getContructData.

For memory's sake, functions should probably shrink the register file when they return, but doing so causes a minor performance regression.

Turn built-in object construct functions non-virtual, since their callers inside the engine know their types.

Pointers to registers and labels become invalid if the register or label vector resizes.

Mark constant pools for global and eval code

Avoid copying the register file when adding globals by keeping spare capacity at the beginning of the register file, just like at the end.

GC mark for possibly uninitialized register file

Add relevant files to AllInOneFile.cpp.

remove irrelevent files

replace resolve_base_and_func:

  • statically detect functions that use "this", and emit an "op_fix_this" instruction for them that does the isActivationObject check. normal function calls don't need to do it.
  • for built-in functions, have a "thisObject" accessor on List, which lazily fixes up "this", or just fix up "this" in the native function invocation code, since most native functions use "this".
  • remove resolve_base_and_func, and use resolve_base_and_value in its place

What things should go in dedicated local variables? CodeBlock::jsValues? CodeBlock::identifiers?

VarStatementNode should just be nixed in favor of AssignmentNode. (Note that VarStatementNode returns undefined instead of the assigned value as an ExpressionStatement would though).

Remove ::execute, ::evaluate, ::optimizeVariableAccess

Future optimizations:

Find a way to put pre-capacity at the beginning of the register file, so we can add new global symbols without having to move or copy anything.

Use RefPtr to indicate use of register -- moves to un-refed registers should be stripped or consolidated to other instructions.

  • i++ => ++i
  • less, jtrue => jless

optimize out redundant initializations of vars -- often, the var initialization will be dead code. any read of variable before init can statically become "load undefined".

a single run of SunSpider performs 1,191,803 var initializations

-1 means "never happend"

var buckets: [846461] [40445] [350197] [9412] [7531] [50] [9] [178] [35000] [3] [1022] [3] [4499] [-1] [1353] [-1] [-1] [1851] [-1] [0] [-1] [0] [-1] [0] [-1] [0] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1]

fun buckets: [1297008] [7] [3] [2] [1] [0] [-1] [0] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [0] [-1] [1] [-1] [-1] [0] [-1] [0] [-1] [-1] [-1] [-1] [-1] [999] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] ]

for resolve-evaluate-put, we can have a { DontCare, Clean, Dirty } switch -- get slot and if DontCare, set clean, evaluate, set slot if clean

instead of branching to see if you've emitted code, just start out with a stub that does that emitting when invoked.

single, shared constant pool

At least for loops with fewer iterations it would probably be a win to duplicate the loop condition at the start and end of the loop

Perhaps we should have a distinguished "condition code" register for expressions in a boolean context. For relational and logical operators we can output directly to the condition code register, for other opcodes you get an extra instruction. Jump instructions can read implicitly from the condition code. That avoids the less writing to r0, it just puts a bool in the condition code register.

Can't you just make all opcodes have variants that use constant table operands directly?

A named function expression can just enter its name into the symbol table instead of adding an object to the scope chain.

Shrink instructions -- usually, don't need a whole word to store int values. Perhaps use tagging of opcodes to encode the first operand. Special work-around instructions when whole words are needed

GCC is crazy:

For the program

for (var i = 0; i < 100000000; ++i)
    ;

at r31276 of the squirrelfish branch, adding the line

Machine.cpp:354         scopeChain = new (&returnInfo[6]) ScopeChain(function->scope()); // scope chain for this activation

causes a ~25% slowdown

We should write a reduction of this issue for the compiler team, and see what they have to say
Revision 31432 was a 1.4% performance regression because it moved the register vector from a
local to a parameter. Making the register vector a data member has the same effect. WTF?
Exception handling throw logic has to pass vPC to a function, and assign the result to vPC, eg.
  if (!(vPC = throwException(codeBlock, k, scopeChain, registers, r, vPC)))
But this causes a 25% regression on the above empty-for-loop test, despite never being hit.  
To avoid this we need to do:
void* throwTarget;
...
void Machine::privateExecute(..)
{
    ...
    // in address table initialiser
    throwTarget = &&gcc_dependency_hack;
    ...
    BEGIN_OPCODE(op_throw) {
        ...
        if (!(exceptionTarget = throwException(codeBlock, k, scopeChain, registers, r, vPC))) { ... }
        ...
        goto *throwTarget;
    }
    gcc_dependency_hack:
    {
        vPC = exceptionTarget;
        NEXT_OPCODE;
    }
}

Without this _indirect_ goto we get a 25% regression, if we use a direct goto we still get an 18% regression.

Last modified 16 years ago Last modified on May 16, 2008 6:46:46 PM

Attachments (1)

Download all attachments as: .zip