SquirrelFishPerfIdeas – WebKit

Context Navigation

Version 12 (modified by ggaren@apple.com, 18 years ago) ( diff )
--

Ideas for new optimizations for SquirrelFish

reduce cost of calling with too many arguments - at least one SunSpider test does this a lot. Some possibilities:
- Allocate a few extra parameter slows for every function, so just a few extra is cheap
- If we can limit or remove the functionality of randomFunc.arguments, then for functions that do not themselves use arguments we could just overwrite the extra args.

Make variants of instructions that can read directly from the constant pool, to avoid all the "load" insns you get when dealing with constants.

Atomize constant strings, so any given string is only in the constant pool once.

Maybe sorting opcode implementations by frequency of use would make things faster.

Avoid creation of wrapper objects for primitives just to call methods, when this can be avoided. ES4 defines a way that seems like it could work with 100% back compatibility (make the wrapper object only when "this" is accessed inside a method.) A slightly simpler possibility is to reuse a single wrapper but taint it when used as "this" so that it's known not to be safe to use as the shared one any more.

Larger ideas:

Store primitive types directly in registers, along with type info. Tamarin's 64 bit NaN encoding trick may work well here. We could have both instructions that statically infer a specific type, and dynamic type inference which uses instructions that think a particular type is more likely and optimize for that case with checks.

Explicit vtables. Right now we use C++ virtual methods for polymorphic behavior of JS types. An explicit vtable could store per-type pure data as well as functions, turning some things that are currently virtual method calls into simple pointer derefs.

Better codegen framework. We don't have a great way to pick from one of several instructions, using a "tile matching" algorithm may be a good way. This could enable super-instructions, type-specialized instructions, and handling of the fact that you may want different codegen in value, condition and void contexts.

Cull frameworks like dojo, jQuery, MochiKit, Scriptaculous, Prototype, YUI, and ASP.NET AJAX, for patterns to optimize for. For example, ASP.NET AJAX is obsessed with the "arguments" object.

Produce code that expects but verifies int values when we see expressions like "<<" and "++".

Optimize for inner functions that don't close over outer functions: don't create an activation for the outer function.

Optimize for inner functions that only close a few free variables: don't copy or create scope chain entries for the rest.

Simple static type analysis could be easily doable for temporaries - when a node produces a known exact type (either because it is a constant or due to the nature of the operator), it could annotate the register, and instructions depending on that value could assume that type. Handling locals would be trickier, though, as you need some dataflow analysis.

In principle, local registers for variables not captured by a closure could be reused for temporaries (or other locals) after last use, but this would require live range analysis to determine and a smarter register allocator to make use of.

Perhaps removing any need for a PIC branch inside Machine::privateExecute would earn back a general purpose register for GCC to allocate.

Analysis of SunSpider tests that show little improvement

These tests don't show as much improvement from SquirrelFish as expected (though in some cases they are as much as 7% improved).

3d-morph: Suffering from missing cross-scope access optimization and DontDelete global optimization (9.4% deep time in resolve()). Probably suffering from lack of static type inference (lots of time in jsNumberCell, JSImmediate, NumberImp::toNumber, etc.

access-nbody: Major factor seems to be lack of type specialization (lots of time in number-related stuff). A huge proportion of time is flat time in privatExecute. Some hit (1%?) from lack of multiscope.

date-format-tofte: Lots of time spent in parsing and code generation for eval. Can codegen itself be optimized? Also lots of time in makeFunction(), a big chunk of this is making the empty prototype for the function object, as well as setting the special properties (prototype, constructor, length), perhaps those coudl be handled in a smarter way. Also function call overhead for FuncDeclNode::makeFunction itself. Perhaps it should be inlined. Also it is suspicious that call overhead for makeFunction would be a bottleneck, is it getting called more often than it should? Also taking some hit from lack of multiscope.

date-format-xparb: taking a 5% hit from lack of multiscope optimization. Also spending an awful lot of time in string appends.

math-partial-sums: Lots of time in resolve/resolveBase. (Lack of global opt + possible bug in test). Hit from lack of type specialization. This is taking a fair hit from the VM_EXCEPTION_CHECK call in put_prop_id. We should move this onto our new zero cost exception check scheme (pass vPC in and out etc).

regexp-dna: practically all time in this test is in the regexp engine. Will need regexp engine hacking to improve.

string-base64: 5% of time spent in slow global lookup. Other improvement opportunities: some form of immediate for one-char strings; void allocating a fresh StringInstance wrapper all the time.

string-fasta: Lots of time spent in resolve/resolveBaseAndFunc. Probably needs multiscope optimization. Lots of time is spent on getEnumerablePropertyNames (used by for..in) - 25%. Bears investigating.

string-tagcloud: String.replace passes excess arguments to its function and this is slow in our new calling convention. Creates StringInstance wrappers. Lots of time spent in parsing (~15%).

string-unpack-code: String.replace passes excess arguments to its function and this is slow in our new calling convention. A few percent eaten by resolve (but appears to end up on an activation - nested funcs or eval?). Fair bit of time making identifiers.

string-validate-input: Multiscope. This is spending a fair amount of time in string concatenation, also.

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text