Version 21 (modified by, 15 years ago) (diff)



List of tasks in no order -- pick one, tell everyone what you're doing:

Geoff is planning on:

squirrelfish unit tests

Function calls to native functions (requires revamping the List class to be an alias to the register file -- Geoff has some ideas for making this fast)

Arguments object

Cameron is working on:

Investigating performance regressions caused by the introduction of new opcodes. In particular, a simple for loop with no body regresses about 25%. It seemed at first that this has to do with the fact that these instructions call arbitrary external functions, but some odd performance differences still occur without these calls. Done: the regression seems to be related to inlining of large function bodies into Machine::privateExecute(). In order to solve this, we have to remove ALWAYS_INLINE from some larger functions, and move some opcodes out into individual functions marked NEVER_INLINE. This is suboptimal, and we still don't know the exact reason why it is happening, but hopefully it will be easy enough to work around for now. Geoff and I worked this out and he landed r31277, which fixes the problems we have seen thusfar.

Implementing all of the left-over binary and unary operations.

Better code generation. We have been pondering whether to have a separate peephole optimization pass or to incorporate peephole optimization into code generation. Either way, we should look at some code generation algorithms based on tile matching. We also want to choose an approach that will be compatible with planned extensions, e.g. superinstructions.

Oliver is working on:

Zero-cost exception handling, using a table -- waiting on native function call support

break and continue

Sam is working on (when he sees fit to do so):

Implement more "emitCode" functions, along with support in the CodeGenerator and the Machine. IfNode would be a good place to start.

You could take something from Geoff, or make something up yourself, or do one of these:

Optimize dynamic scopes that aren't closures not to save the environment on return

Statically detect presence of "with" and/or "catch" in the parser.

For functions that don't use "with" and/or "catch" (and that don't require activation objects), just use the function's scope chain directly, instead of creating a meaningless copy that will never be modified.

Evaluation of a script is supposed to produce a value. This requires storing the value of the last value-producing statement to execute. We need to detect the last top-level value-producing statement in a program, and save its value. Basically, that just means passing an explicit "dst" register to its emitCode function.

Make const work -- const info has to go in the symbol table, so writes to const vars can turn to no-ops at compile time.

Make global code work, including:

global code should only emit "overwrite var with undefined" if the var doesn't exist already.
Global code needs to deal with new vars being added. Stupid solution: always allocate a new vector, add new, then copy old. Better solution: provide pre-capacity to the existing vector. Alternative solution: just renumber the vars -- does any code depend on the old numbers? 

Function call should store offset of R, not R, since vector may reallocate. This probably solves most problems related to new evaluations in same global object, since they all occur beneath function calls

-- arguments object also holds a pointer into the register file -- probably needs to be indirect index, instead -- activation objects also hold pointers into the register file -- ditto -- lists, if we decide not to make them copy

Verify that current function gets marked by virtue of being in the register file

Make functions mark their CodeBlocks' constant pools

Change conservative mark of register file to exact mark -- use zero fill plus type tagging to know whether to mark a register

Must mark all scope chains in all active scopes -- can do this by walking up the scopeChain pointers in the register file

List should just be a pointer and a length. This allows us to avoid copying arguments when calling from JS to native code. Most list clients know the size of the list in advance, so they can statically stack-allocate their data, and then vend a pointer and a length. A few clients don't know the size of the list until runtime. They can use a JSCellArray, which is a JSCell that holds a pointer to fixed-sized calloc'd array, which it marks. This might not work exactly as stated once we store types in registers, since the JSValues won't be immediately adjacent anymore.

Pointers to registers and labels become invalid if the register or label vector resizes.

GC mark for constant pools

GC mark for possibly uninitialized register file

Add relevant files to AllInOneFile.cpp.

remove irrelevent files

What things should go in dedicated local variables? CodeBlock::jsValues? CodeBlock::identifiers?

VarStatementNode should just be nixed in favor of AssignmentNode.

Remove ::execute, ::evaluate, ::optimizeVariableAccess

Future optimizations:

Use RefPtr to indicate use of register -- moves to un-refed registers should be stripped or consolidated to other instructions.

  • i++ => ++i
  • less, jtrue => jless

optimize out redundant initializations of vars -- often, the var initialization will be dead code. any read of variable before init can statically become "load undefined".

a single run of SunSpider performs 1,191,803 var initializations

-1 means "never happend"

var buckets: [846461] [40445] [350197] [9412] [7531] [50] [9] [178] [35000] [3] [1022] [3] [4499] [-1] [1353] [-1] [-1] [1851] [-1] [0] [-1] [0] [-1] [0] [-1] [0] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1]

fun buckets: [1297008] [7] [3] [2] [1] [0] [-1] [0] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [0] [-1] [1] [-1] [-1] [0] [-1] [0] [-1] [-1] [-1] [-1] [-1] [999] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] [-1] ]

for resolve-evaluate-put, we can have a { DontCare, Clean, Dirty } switch -- get slot and if DontCare, set clean, evaluate, set slot if clean

instead of branching to see if you've emitted code, just start out with a stub that does that emitting when invoked.

single, shared constant pool

At least for loops with fewer iterations it would probably be a win to duplicate the loop condition at the start and end of the loop

Perhaps we should have a distinguished "condition code" register for expressions in a boolean context. For relational and logical operators we can output directly to the condition code register, for other opcodes you get an extra instruction. Jump instructions can read implicitly from the condition code. That avoids the less writing to r0, it just puts a bool in the condition code register.

Can't you just make all opcodes have variants that use constant table operands directly?

A named function expression can just enter its name into the symbol table instead of adding an object to the scope chain.

Shrink instructions -- usually, don't need a whole word to store int values. Perhaps use tagging of opcodes to encode the first operand. Special work-around instructions when whole words are needed

GCC is crazy:

For the program

for (var i = 0; i < 100000000; ++i)

at r31276 of the squirrelfish branch, adding the line

Machine.cpp:354         scopeChain = new (&returnInfo[6]) ScopeChain(function->scope()); // scope chain for this activation

causes a ~25% slowdown

We should write a reduction of this issue for the compiler team, and see what they have to say
Revision 31432 was a 1.4% performance regression because it moved the register vector from a local to a parameter. Making the register vector a data member has the same effect. WTF?