Posts tagged JavaScript

JavaScript’s two array types

Jul 16th, 2012

Imagine a BitSet constructor with an overloaded API for setting bits:

1
2
3
4
var bits = new BitSet();

bits.set(4);
bits.set([1, 4, 8, 17]);

The interface for BitSet.prototype.set is:

1
// set :: (number | [number]) -> undefined

Now imagine a StringSet constructor with an overloaded API for adding strings:

1
2
3
4
5
var set = new StringSet();

set.add('foo');
set.add(['foo', 'bar', 'baz']);
set.add({ foo: true, bar: true, baz: true });

The interface for StringSet.prototype.add is something like:

1
// add :: (string | [string] | object) -> undefined

These both look pretty similar, but there’s a critical difference. Think about how you might implement BitSet.prototype.set:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
BitSet.prototype.set = function set(x) {
    // number case
    if (typeof x === 'number') {
        this._add1(x);
        return;
    }
    // array case
    for (var i = 0, n = x.length; i < n; i++) {
        this._add1(x[i]);
    }
};

Now think about how you might implement StringSet.prototype.add:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
StringSet.prototype.add = function add(x) {
    // string case
    if (typeof x === 'string') {
        this._add1(x);
        return;
    }
    // array case
    if (/* hmmmm... */) {
        for (var i = 0, n = x.length; i < n; i++) {
            this._add1(x[i]);
        }
        return;
    }
    // object case
    for (var key in x) {
        if ({}.hasOwnProperty.call(x, key)) {
            this._add1(key);
        }
    }
};

What’s the difference? BitSet.prototype.set doesn’t have to test whether its argument is an array. It’ll work for any object that acts like an array (i.e., has indexed properties and a numeric length property). It’ll even accept values like an arguments object, a NodeList, some custom object you create that acts like an array, or even a primitive string.

But StringSet.prototype.add actually needs a test to see if x is an array. How do you distinguish between arrays and objects when JavaScript arrays are objects?

One answer you’ll sometimes see is what I call “duck testing”: use some sort of heuristic that probably indicates the client intended the argument to be an array:

1
2
3
if (typeof x.length === 'number') {
    // ...
}

Beware the word “probably” in programming! Duck testing is a horribly medieval form of computer science:

For example, what happens when a user happens to pass in a dictionary object with the string 'length'?

1
symbolTable.add({ a: 1, i: 1, length: 1 });

The user clearly intended this to be the dictionary case, but the duck test saw a numeric 'length' property and gleefully proclaimed “it’s an array!”

This comes down to the difference between nominal and structural types.

A nominal type is a type that has a unique identity or “brand.” It carries a tag with it that can be atomically tested to distinguish it from other types.

A structural type, also known as a duck type, is a kind of interface: it’s just a contract that mandates certain behaviors, but doesn’t say anything about what specific implementation is used to provide that behavior. The reason people have such a hard time figuring out how to test for structural types is that they are designed specifically not to be testable!

There are a few common scenarios in dynamically typed languages where you need to do dynamic type testing, such as error checking, debugging, and inrospection. But the most common case is when implementing overloaded API’s like the set and add methods above.

The BitSet.prototype.set method treats arrays as a structural type: they can be any kind of value whatsoever as long as they have indexed properties with corresponding length. But StringSet.prototype.add overloads array and object types, so it has to check for “arrayness.” And you can’t reliably check for structural types.

It’s specifically when you overload arrays and objects that you need a predictable nominal type test. One answer would be to punt and change the API so the client has to explicitly tag the variants:

1
2
3
set.add({ key: 'foo' });
set.add({ array: ['foo', 'bar', 'baz'] });
set.add({ dict: { foo: true, bar: true, baz: true } });

This overloads three different objects types that can be distinguished by their relevant property names. Or you could get rid of overloading altogether:

1
2
3
set.add('foo');
set.addArray(['foo', 'bar', 'baz']);
set.addDict({ foo: true, bar: true, baz: true });

But these API’s are heavier and clunkier. Rather than rigidly avoiding overloading arrays and objects, the lighter-weight approach is to use JavaScript’s latent notion of a “true” array: an object whose [[Class]] internal property is "Array". That internal property serves as the brand for a built-in nominal type of JavaScript. And it’s a pretty good candidate for a universally available nominal type: clients get the concise array literal syntax, and the ES5 Array.isArray function (which can be shimmed pretty reliably in older JavaScript engines) provides the exact test needed to implement the API.

But this test is very different from the structural type accepted by BitSet.prototype.set. For example, you can’t pass an arguments object to StringSet.prototype.add:

1
2
3
MyClass.prototype.update = function update() {
    this.wibbles.add(arguments);
};

This code clearly means to pass arguments as an array, but it’ll get interpreted as a dictionary. Similarly, you can’t pass a NodeList, or a primitive string, or any other JavaScript value that acts array-like.

In other words, JavaScript has two latent concepts of array types. Library writers should clearly document when their API’s accept any array-like value (i.e., the structural type) and when they require a true array (i.e., the nominal type). That way clients know whether they need to convert array-like values to true arrays before passing them in.

As a final note, ES6’s Array.from API will do that exact conversion. This would make it very convenient, for example, for the update method above to be fixed:

1
2
3
MyClass.prototype.update = function update() {
    this.wibbles.add(Array.from(arguments));
};

Thanks to Rick Waldron for helping me come to this understanding during an awesome IRC conversation this morning.

Tweaking the JavaScript AST API

Jul 3rd, 2012

A couple years ago I created a JavaScript parser API and implemented SpiderMonkey’s Reflect.parse library. Since then, there have been a couple of pure JavaScript implementations of the API, including Zach Carter’s reflect.js and Ariya Hidayat’s Esprima parser.

Over time, I’ve gotten a bunch of good critiques about the API from people. I probably don’t want to make any huge changes, but there are a couple of small changes that would be nice:

  • Bug 770567 - rename callee to constructor to match the documentation
  • Bug 742612 - separate guarded/unguarded catch clauses
  • Bug 745678 - range-based location info

Ariya is graciously willing to change Esprima to keep in sync with SpiderMonkey. But some of these would affect existing clients of either library. I wanted to post this publicly to ask if there’s anyone who would be opposed to us making the change. Ariya and I would make sure to be very clear about when we’re making the change, and we’d try to batch the changes so that people don’t have to keep repeatedly updating their code.

Feel free to leave a comment if you are using Esprima or Reflect.parse and have thoughts about this.

Synchronous module loading in ES6

Mar 29th, 2012

One of the great features of ES6 modules is the direct style module loading syntax:

1
2
import map from "underscore.js";
... map(a, f) ...

This makes it as frictionless as possible to grow or refactor your code into multiple modules, and to pull third-party modules into an existing codebase. It also makes a common module format that can be shared between the browser and JS servers like Node.

But this direct style requires loading its dependencies before it can execute. That is, it’s a synchronous module load. Put in the context of a script tag, this would make it all too easy to block page rendering on I/O:

1
2
3
4
<script>
import $ from "jquery.js";
$('myelement').style({ 'background-color': 'yellow' })
</script>

Throwing this syntax into the browser like this would be an invitation to jank. Thanks to insight from Luke Hoban, I think we have the right approach to this for ES6, which is in fact similar to our approach to avoiding turning eval into a synchronous I/O operation.

In previous versions of ECMAScript, there’s only one syntactic category of program that you can evaluate, called Program in the grammar. In ES6, we’ll define a restricted version of the syntax to be used in synchronous settings, which makes it illegal to do synchronous loads. Within a blocking script, the only access to modules is via the dynamic loading API:

1
2
3
4
5
<script>
System.load("jquery.js", function($) {
    $('myelement').style({ 'background-color': 'yellow' })
});
</script>

This eliminates the footgun, and all of your modules can themselves use the synchronous loading syntax. For example, if jquery.js wants to use a module — say, a data structure library — it can go ahead and load it synchronously:

1
2
3
// jquery.js
import Stack from "utils.js";
... new Stack() ...

But still, this restriction on the top-level loses the convenience of directly importing modules from scripts. Thing is, in an asynchronous context, there’s nothing wrong with doing a synchronous load. So just like the asynchronously loaded jquery.js can use the synchronous syntax, we can also allow it in a defer or async script:

1
2
3
4
<script async>
import $ from "jquery.js";
$('myelement').style({ 'background-color': 'yellow' })
</script>

This allows the full flexibility and expressiveness of ES6 embedded in HTML, without any hazard of blocking page rendering for I/O.

The eval function for ES6 will work the same way, disallowing synchronous loading syntax in the grammar it recognizes, to prevent turning it into a synchronous API. We’ll also add an asynchronous version of eval that, like script async, recognizes the full grammar.

Why coroutines won’t work on the web

Dec 14th, 2011

The topic of coroutines (or fibers, or continuations) for JavaScript comes up from time to time, so I figured I’d write down my thoughts on the matter. I admit to having a soft spot for crazy control-flow features like continuations, but they’re unlikely ever to make it into ECMAScript. With good reason.

The big justification for coroutines in JavaScript is non-blocking I/O. As we all know, asynchronous I/O leads to callback API’s, which lead to nested lambdas, which lead to… the pyramid of doom:

1
2
3
4
5
6
7
range.on("preheat", function() {
    pot.on("boil", function() {
        rice.on("cooked", function() {
            dinner.serve(rice);
        });
    });
});

Whereas, if you look at the README for node-fibers, you’ll see this pleasant-looking example:

1
2
3
4
5
Fiber.run(function() {
    console.log('wait...' + new Date);
    sleep(1000);
    console.log('ok...' + new Date);
});

That looks pretty sweet. It’s a synchronous version of setTimeout that doesn’t block the main thread. This seems like a nice combination of the sequential style of synchronous code but with the responsiveness of non-blocking I/O. Why wouldn’t we want something like this in ECMAScript?

Coroutines are almost as pre-emptive as threads

Part of the beauty of JavaScript’s event loop is that there’s a very clear synchronization point for reaching a stable state in your programs: the end of the current turn. You can go ahead and leave things in a funky intermediate state for as long as you like, and as long as you stitch everything back up in time for the next spin of the event loop, no other code can run in the meantime. That means you can be sure that while your object is lying in pieces on the floor, nobody else can poke at it before you put it back together again.

Once you add coroutines, you never know when someone might call yield. Any function you call has the right to pause and resume you whenever they want, even after any number of spins of the event loop. Now any time you find yourself modifying state, you start worrying that calling a function might interrupt some code you intended to be transactional. Take something as simple as swapping a couple fields of an object:

1
2
3
var tmp = obj.foo;
obj.foo = obj.bar;
obj.bar = munge(tmp);

What happens if munge does a yield and only resumes your code after a few other events fire? Those events could interact with obj, and they’d see it in this intermediate state where both obj.foo and obj.bar are the same value, because obj.bar hasn’t yet been updated.

We’ve seen this movie before. This is just like Java’s threads, where any time you’re working with state, you have to worry about who might try to touch your data before it reaches a stable point. To be fair, life is actually far worse in Java, where almost every single basic operation of the language can be pre-empted. But still, with coroutines, every function call becomes a potential pre-emption point.

Host frames make coroutines unportable

And then there’s the implementation problem. Unless your JavaScript engine doesn’t use a stack (and they all do), coroutines would have to be able to save a stack on the heap and restore it back on the stack later. But what if JavaScript code calls into code implemented in the host language (usually C++)? Some engines implement functions like Array.prototype.forEach in C++. How would they handle code like this?

1
2
3
4
5
6
7
Fiber.run(function() {
    array.forEach(function(x) {
        console.log('wait: ' + x);
        sleep(1000);
        console.log('ok: ' + x);
    });
});

Other languages with coroutines take different approaches. Lua allows implementations to throw an error if user code tries to suspend host activations. This would simply be unportable, since different engines would implement different standard libraries in C++.

The Scheme community tends to demand a lot from their continuations, so they expect functions like for-each and map to be suspended. This could mean either forcing all the standard libraries to be self-hosted, or using more complicated implementation strategies than traditional stacks.

Simply put: browser vendors are not going to do this. Modern JS engines are extraordinary feats of engineering, and rearchitecting their entire stack mechanism is just not realistic. Then when you consider that these changes could hurt performance of ordinary function calls, well… end of discussion.

Shallow coroutines to the rescue

OK, back to the pyramid of doom. It really does kind of suck. I mean, you could name and lift out your functions, but then you break up the sequential flow even worse, and you get a combinatorial explosion of function arguments for all those upvars.

This is why I’m excited about generators. Generators are a lot like coroutines, with one important difference: they only suspend their own function activation. In ES6, yield isn’t a function that anyone can use, it’s a built-in operator that only a generator function can use. With generators, calling a JS function is as benign as it ever was. You never have to worry that a function call might yield and stop you from doing what you were trying to do.

But it’s still possible to build an API similar to node-fibers. This is the idea of task.js. The fibers example looks pretty similar in task.js:

1
2
3
4
5
Task(function() {
    console.log('wait... ' + new Date);
    yield sleep(1000);
    console.log('ok... ' + new Date);
}).run();

The big difference is that the sleep function doesn’t implicitly yield; instead, it returns a promise. The task then explicitly yields the promise back to the task.js scheduler. When the promise is fulfilled, the scheduler wakes the task back up to continue. Hardly any wordier than node-fibers, but with the benefit that you can always tell when and what you’re suspending.

Coroutines no, generators yes

Coroutines are not going to happen in JavaScript. They would break one of the best features of JavaScript: the simplicity of the event loop execution model. And the demands they would place on current engines for portability are simply unrealistic. But generator functions are easy to add to existing engines, they have none of the portability issues of coroutines, and they give you just enough power to write non-blocking I/O in a synchronous style without being “threads lite.”