We will be hosting a virtual doc sprint to work on these pages next Thursday, August 23rd. If you enjoy writing documentation or coming up with bite-sized example programs to demonstrate new language features, please join us! A few of us will be on US Eastern time, so starting around 9 - 10am UTC-5, and others will be coming online on US Pacific time, around 9am UTC-8. You’re welcome to join us for any part of the day.
We’ll be hanging out all day in the #jsdocs channel on irc.mozilla.org. Hope you can join us!
These both look pretty similar, but there’s a critical difference. Think about how you might implement BitSet.prototype.set:
1234567891011
BitSet.prototype.set=functionset(x){// number caseif(typeofx==='number'){this._add1(x);return;}// array casefor(vari=0,n=x.length;i<n;i++){this._add1(x[i]);}};
Now think about how you might implement StringSet.prototype.add:
What’s the difference? BitSet.prototype.set doesn’t have to test whether its argument is an array. It’ll work for any object that acts like an array (i.e., has indexed properties and a numeric length property). It’ll even accept values like an arguments object, a NodeList, some custom object you create that acts like an array, or even a primitive string.
But StringSet.prototype.add actually needs a test to see if x is an array. How do you distinguish between arrays and objects when JavaScript arrays are objects?
One answer you’ll sometimes see is what I call “duck testing”: use some sort of heuristic that probably indicates the client intended the argument to be an array:
123
if(typeofx.length==='number'){// ...}
Beware the word “probably” in programming! Duck testing is a horribly medieval form of computer science:
For example, what happens when a user happens to pass in a dictionary object with the string 'length'?
1
symbolTable.add({a:1,i:1,length:1});
The user clearly intended this to be the dictionary case, but the duck test saw a numeric 'length' property and gleefully proclaimed “it’s an array!”
This comes down to the difference between nominal and structural types.
A nominal type is a type that has a unique identity or “brand.” It carries a tag with it that can be atomically tested to distinguish it from other types.
A structural type, also known as a duck type, is a kind of interface: it’s just a contract that mandates certain behaviors, but doesn’t say anything about what specific implementation is used to provide that behavior. The reason people have such a hard time figuring out how to test for structural types is that they are designed specifically not to be testable!
There are a few common scenarios in dynamically typed languages where you need to do dynamic type testing, such as error checking, debugging, and inrospection. But the most common case is when implementing overloaded API’s like the set and add methods above.
The BitSet.prototype.set method treats arrays as a structural type: they can be any kind of value whatsoever as long as they have indexed properties with corresponding length. But StringSet.prototype.add overloads array and object types, so it has to check for “arrayness.” And you can’t reliably check for structural types.
It’s specifically when you overload arrays and objects that you need a predictable nominal type test. One answer would be to punt and change the API so the client has to explicitly tag the variants:
This overloads three different objects types that can be distinguished by their relevant property names. Or you could get rid of overloading altogether:
But these API’s are heavier and clunkier. Rather than rigidly avoiding overloading arrays and objects, the lighter-weight approach is to use JavaScript’s latent notion of a “true” array: an object whose [[Class]] internal property is "Array". That internal property serves as the brand for a built-in nominal type of JavaScript. And it’s a pretty good candidate for a universally available nominal type: clients get the concise array literal syntax, and the ES5 Array.isArray function (which can be shimmed pretty reliably in older JavaScript engines) provides the exact test needed to implement the API.
But this test is very different from the structural type accepted by BitSet.prototype.set. For example, you can’t pass an arguments object to StringSet.prototype.add:
This code clearly means to pass arguments as an array, but it’ll get interpreted as a dictionary. Similarly, you can’t pass a NodeList, or a primitive string, or any other JavaScript value that acts array-like.
In other words, JavaScript has two latent concepts of array types. Library writers should clearly document when their API’s accept any array-like value (i.e., the structural type) and when they require a true array (i.e., the nominal type). That way clients know whether they need to convert array-like values to true arrays before passing them in.
As a final note, ES6’s Array.from API will do that exact conversion. This would make it very convenient, for example, for the update method above to be fixed:
A couple years ago I created a JavaScript parser API and implemented SpiderMonkey’s Reflect.parse library. Since then, there have been a couple of pure JavaScript implementations of the API, including Zach Carter’s reflect.js and Ariya Hidayat’s Esprima parser.
Over time, I’ve gotten a bunch of good critiques about the API from people. I probably don’t want to make any huge changes, but there are a couple of small changes that would be nice:
Bug 770567 - rename callee to constructor to match the documentation
Bug 742612 - separate guarded/unguarded catch clauses
Ariya is graciously willing to change Esprima to keep in sync with SpiderMonkey. But some of these would affect existing clients of either library. I wanted to post this publicly to ask if there’s anyone who would be opposed to us making the change. Ariya and I would make sure to be very clear about when we’re making the change, and we’d try to batch the changes so that people don’t have to keep repeatedly updating their code.
Feel free to leave a comment if you are using Esprima or Reflect.parse and have thoughts about this.
I haven’t spoken enough about the rationale for declarative, static module resolution in ES6 modules. Since multiple module systems exist in pure JS, the concept of modules that involve new syntax is coming across as foreign to people. I’d like to explain the motivation.
First, a quick explanation of what this is about. In a pure-JS system like CommonJS, modules are just objects, and whatever definitions they export can be imported by a client with object property lookup:
1
var{stat,exists,readFile}=require('fs');
By contrast, in the ES6 module system, modules are not objects, they’re declarative collections of code. Importing definitions from a module is also declarative:
1
import{stat,exists,readFile}from'fs';
This import is resolved at compile time—that is, before the script starts executing. All the imports and exports of the declarative module dependency graph are resolved before execution. (There’s also an asynchronous dynamic loading API; it’s of course important to be able to defer module loading to runtime. But this post is about the resolution of a declarative module dependency graph.)
On the origin of specs
Node leaders are arguing that we should take more incremental, evolutionary steps, that we should hew more closely to the module systems that exist today. I have a lot of sympathy for the “pave the cowpaths” philosophy, and I often argue for it. But the module systems people have built for JavaScript to date did not have the option of modifying the language. We have an opportunity to move JS in directions where a purely dynamic system could never go.
What are some of those directions?
Fast lookup
Static imports (whether via import or references like m.foo) can always be compiled like variable references. In a dynamic module system, an explicit dereference like m.foo will be an object reference, which will generally require PIC guards. If you copy them into locals, they’ll be more optimizable in some cases, but with static modules you always predictably get early binding. Keeping module references as cheap as variable references makes modular programs faster and avoids imposing a tax on modular code.
Early variable checking
Having variable references, including imports and exports, checked before a script starts running is, in my experience, very useful for making sure the basic top-level structure of a program is sane. JavaScript is almost statically scoped, and this is our one and only chance to get there. James Burke dismisses this as a kind of shallow type checking, which he claims is not enough to be useful. My experience in other languages says otherwise — it is super useful! Variable checking is a nice sweet spot where you can still write expressive dynamic programs, but catch really basic and common errors. As Anton Kovalyov points out, unbound variable reporting is a popular feature in JSHint, and it’s so much nicer not to have to run a separate linter to catch these bugs.
Cyclic dependencies
Allowing cyclic dependencies between modules is really important. Mutual recursion is a fact of programming. It occurs sometimes without you even noticing it. If you try splitting up your program into modules and the system breaks because it can’t handle cycles, the easiest workaround is just to keep everything together in one big module. Module systems should not prevent programmers from splitting up their program however they see fit. They should not provide disincentives from writing modular programs.
This isn’t impossible with dynamic systems, but it tends to be something I see treated as an afterthought by alternative proposals. It’s something we’ve thought very carefully about for ES6. Also, declarative modules allow you to pre-initialize more of the module structure before executing any code, so that you can give better errors if someone refers to a not-yet-assigned export. For example, a let binding throws if you refer to it before it’s been assigned, and you get a clear error message. This is much easier to diagnose than referring to a property of a dynamic module object that just isn’t even there yet, getting undefined, and having to trace the eventual error back to the source.
Future-compatibility for macros
One of the things I would love to see in JavaScript’s future is the ability for programmers to come up with their own custom syntax extensions without having to wait for TC39 to add it. Today, people invent new syntax by writing their own compilers. But this is extremely hard to do, and you can’t use different syntax features from different compilers in a single source file.
With macros, you might implement, say, a new cond syntax that makes a nicer alternative to chaining ? : conditionals, and share that via a library:
The cond macro would preprocess this into a chain of conditionals before the program runs. Preprocessing doesn’t work with purely dynamic modules:
1234
varcond=require('cond.js');...// impossible to preprocess because we haven't evaluated the require!vartype=cond{/* etc */};
Future-compatibility for types
I joined TC39 in the ill-fated ES4 days, when the committee was working on an optional type system for JS. It was built on sketchy foundations and ultimately fell apart. One of the things that was really lacking was a module system where you could draw a boundary around a section of code and say “this part needs to be type-checked.” Otherwise you never knew if more code was going to be appended later.
Why types? Here’s one reason: JS is fast and getting faster, but it only gets harder to predict performance. With experiments like LLJS, my group at Mozilla is playing with dialects of JS that use types to pre-compile offline and generate some pretty funky JS code optimized for current JIT’s. But if you could just directly write your high-performance kernels in a typed dialect of JS, modern compilers could go to town with it.
With declarative resolution, you can import and export typed definitions and they can all be checked at compile-time. Dynamic imports can’t be statically checked.
Inter-language modularity
Some people may not care about or want features like macros or types. But JavaScript has to serve many different programmers who come with many different development practices and needs. And one of the ways it can do so is by allowing people to use their own languages that compile to JavaScript. So even if macros or types aren’t in the future of the ECMAScript standard, it’d be pretty great if you could use statically typed or macro-enabled dialects of JS offline that compile to browser-compatible JS. People are already doing this kind of thing today with the Closure compiler’s type checking, or the Roy language, or ClojureScript. A static module system is more universally and straightforwardly compatible with a wider range of languages.
Costs and benefits
The above are some of the benefits that I see to declarative module resolution. Isaac Schlueter says the import syntax adds nothing. That’s unfair and wrong. It’s there for a purpose. I don’t believe that a declarative import syntax is a high cost for the benefit both to ES6 and to potential future editions.
PS: What’s all this about Python?
One last thing: people keep claiming that the ES6 module system came from Python. I don’t even have very much experience with Python. And Python’s modules are more mutable and their scope is more dynamic. Personally, I’ve drawn inspiration from Racket, which has gotten lots of mileage out of its declarative module system. They’ve leveraged static modules to build a macro system, early variable checking, optimized references, dynamic contracts with module-based blame reporting, multi-language interoperability, and a statically typed dialect.
I’m not interested in making JavaScript into some other language. But you can learn a lot from studying precedent in other languages. I’ve seen firsthand the benefits you can get from a declarative module system in a dynamic language.
This feels a little bit like the web platform having opened a door to hell and Zombies running out of it. I wonder if we can ever close it again.
– Malte Ubl
Let’s see if we can. I’ve had a bunch of productive conversations since my post the other day.
I talked about how specifying little-endian would force big-endian browser vendors to choose one implementation strategy—emulate little-endian by byte-swapping and try to optimize as best they can—and concluded that it was better to let them decide for themselves and see how the market shakes out before specifying. But that doesn’t take into account the cost to web developers, which should always be the first priority (mea culpa).
Leaving it unspecified or forcing developers to opt in to a specified endianness taxes developers: it leaves them open to the possibility of their sites breaking on systems they likely can’t even test on, or forces them to make sure they pass the argument (in which case, they’d always be one forgotten argument away from possible bustage on some platform they can’t test on).
Imagine that instead of defaulting to unspecified behavior, we defaulted to little-endian—which is the de facto semantics of the web today—but apps could opt in to big-endian with an optional argument. Then a carefully-written app could use this (in combination with, say, a navigator.endianness feature test API) to decide which byte order would give them better performance. On little-endian systems, they’d use little-endian, on big-endian systems, they’d use big-endian. Less carefully-written apps that just went with the default might get some performance degradation in big-endian platforms, but we don’t actually know how bad it would be. But crucially, there would be no way to accidentally break your app’s behavior.
But let me take it one step further. I don’t even think we know that that additional option will be needed. For now, we don’t even know of any big-endian user agents that are implementing WebGL, nor do we know if byte-swapping will be prohibitively expensive. Until then, I say any additional API surface area is premature optimization. YAGNI.
In summary: let’s prioritize web developers over hypothetical performance issues on hypothetical browsers. Typed arrays should be standardized as little-endian—full stop.
But the typed arrays spec doesn’t specify a byte order. So a browser on a big-endian system (say, a PowerPC console like Xbox or PS3) is allowed to print 0. In short: casting an ArrayBuffer to different types is unportable by default. It’s up to web developers to canonicalize bytes for different architectures.
Now, we could just require typed arrays to be little-endian, once and for all. After all, almost all platforms are little-endian these days. The few big-endian platforms could just automatically reorder bytes for all typed array accesses. But this would have to be made to work with WebGL, which works by sending application-generated buffers to the GPU. In order to make this work on a big-endian architecture, little-endian-encoded ArrayBuffer data would need to be translated when sending back and forth to the GPU. Technically, this might be possible, but there’s really no evidence that it would have acceptable performance.
On the other hand, can we really trust that web applications will write portable code? Imagine a hashing algorithm that builds an internal ArrayBuffer and casts it to different types. If the code isn’t written portably, it’ll break on a browser implementing big-endian typed arrays.
This leaves big-endian browsers with a nasty decision: try to emulate little-endian typed arrays to protect against unportable application logic, and suffer the complexity and performance costs of translating data back and forth to the GPU, or just hope that not too many web pages break. Or perhaps surface an annoying decision to users: do you want to run this application in fast mode or correct mode?
For now, we should let browser vendors on big-endian systems make that decision, and not force the decision through the spec. If they end up all choosing to emulate little-endian, I’ll be happy to codify that in the standards. As I understand it, TenFourFox can’t support WebGL, so there the best decision is probably to emulate little-endianness. On an Xbox, I would guess WebGL performance would be a higher priority than web sites using internal ArrayBuffers. But I’m not sure. I’d say this is a decision for big-endian browsers to make, but I would greatly welcome their input.
In the meantime, we should do everything we can to make portability more attractive and convenient. For working with I/O, where you need explicit control over endianness, applications can use DataView. For heterogeneous data, there’ll be ES6 structs. Finally, I’d like to add an option for ArrayBuffers and typed arrays to be given an optional explicit endianness:
12345
varbuffer=newArrayBuffer(1024,"little");// a little-endian buffervara1=newUint32Array(buffer);a1[0]=1;vara2=newUint8Array(buffer);a2[0];// must be 1, regardless of system architecture
With the endianness specified explicitly, you can still easily write portable logic even when casting—without having to canonicalize bytes yourself. Emscripten and Mandreel could benefit from this increased portability, for example, and I think crypto algorithms would as well. I’ll propose this extension to Khronos and TC39, and discuss it with JavaScript engine implementors.
I’ve never really understood what “homoiconic” is supposed to mean. People often say something like “the syntax uses one of the language’s basic data structures.” That’s a category error: syntax is not a data structure, it’s just a representation of data as text. Or you hear “the syntax of the language is the same as the syntax of its data structures.” But S-expressions don’t “belong” to Lisp; there’s no reason why Perl or Haskell or JavaScript couldn’t have S-expression libraries. And every parser generates a data structure, so if you have a Python parser in Python, then is Python homoiconic? Is JavaScript?
Maybe there’s a more precise way to define homoiconicity, but frankly I think it misses the point. What makes Lisp’s syntax powerful is not the fact that it can be represented as a data structure, it’s that it’s possible to read it without parsing.
Wait, what?
It’s hard to explain these concepts with traditional terminology, because the distinction between reading and parsing simply doesn’t exist for languages without macros.
Parsing vs reading: the compiler’s view
In almost every non-Lispy language ever, the front end of every interpreter and compiler looks pretty much the same:
Take the text, run it through a parser, and you get out an AST. But that’s not how it works when you have macros. You simply can’t produce an AST without expanding macros first. So the front-end of a Lispy language usually looks more like:
What’s this intermediate syntax tree? It’s an almost entirely superficial understanding of your program: it basically does paren-matching to create a tree representing the surface nesting structure of the text. This is nowhere near an AST, but it’s just enough for the macro expansion system to do its job.
Parsing vs reading: the macro expander’s view
If you see this statement in the middle of a JavaScript program:
123
for(letkeyinobj){print(key);}
you know for sure that it’s a ForInStatement, as defined by the spec (I’m using let because… ES6, that’s why). If you know the grammar of JavaScript, you know the entire structure of the statement. But in Scheme, we could implement for as a macro. When the macro expander encounters:
12
(for(keyobj)(printkey))
it knows nothing about the contents of the expression. All it knows is the macro definition of for. But that’s all it needs to know! The expander just takes the two subtrees, (key obj) and (print key), and passes them as arguments to the for macro.
This macro works by pattern matching: it expects two sub-trees, the first of which can itself be broken down into two identifier nodes x and e1, and it expands into the for-each expression. So when the expander calls the macro with the above example, the result of expansion is:
1
(for-each (λ(key)(printkey))obj)
The power of the parenthesis
If you’ve ever wondered why Lisp weirdos are so inexplicably attached to their parentheses, this is what it’s all about. Parentheses make it unambiguous for the expander to understand what the arguments to a macro are, because it’s always clear where the arguments begin and end. It knows this without needing to understand anything about what the macro definition is going to do. Imagine trying to define a macro expander for a language with syntax like JavaScript’s. What should the expander do when it sees:
1
quux(mumble,flarg)[1,2,3]{foo:3}grunch/wibble/i
How many arguments does quux take? Is the curly-braced argument a block statement or an object literal? Is the thing at the end an arithmetic expression or a regular expression literal? These are all questions that can’t be answered in JavaScript without knowing your parsing context—and macros obscure the parsing context.
None of this is to say that it’s impossible to design a macro system for languages with non-Lispy syntax. My point is just that the power of Lisp’s (Scheme’s, Racket’s, Clojure’s, …) macros comes not from being somehow tied to a central data structure of the language, but rather to the expander’s ability to break up a macro call into its separate arguments and then let the macro do all the work of parsing those arguments. In other words, homoiconicity isn’t the point, read is.
One of the great features of ES6 modules is the direct style module loading syntax:
12
importmapfrom"underscore.js";...map(a,f)...
This makes it as frictionless as possible to grow or refactor your code into multiple modules, and to pull third-party modules into an existing codebase. It also makes a common module format that can be shared between the browser and JS servers like Node.
But this direct style requires loading its dependencies before it can execute. That is, it’s a synchronous module load. Put in the context of a script tag, this would make it all too easy to block page rendering on I/O:
Throwing this syntax into the browser like this would be an invitation to jank. Thanks to insight from Luke Hoban, I think we have the right approach to this for ES6, which is in fact similar to our approach to avoiding turning eval into a synchronous I/O operation.
In previous versions of ECMAScript, there’s only one syntactic category of program that you can evaluate, called Program in the grammar. In ES6, we’ll define a restricted version of the syntax to be used in synchronous settings, which makes it illegal to do synchronous loads. Within a blocking script, the only access to modules is via the dynamic loading API:
This eliminates the footgun, and all of your modules can themselves use the synchronous loading syntax. For example, if jquery.js wants to use a module—say, a data structure library—it can go ahead and load it synchronously:
But still, this restriction on the top-level loses the convenience of directly importing modules from scripts. Thing is, in an asynchronous context, there’s nothing wrong with doing a synchronous load. So just like the asynchronously loaded jquery.js can use the synchronous syntax, we can also allow it in a defer or async script:
This allows the full flexibility and expressiveness of ES6 embedded in HTML, without any hazard of blocking page rendering for I/O.
The eval function for ES6 will work the same way, disallowing synchronous loading syntax in the grammar it recognizes, to prevent turning it into a synchronous API. We’ll also add an asynchronous version of eval that, like script async, recognizes the full grammar.
If I remember right, today is my two year anniversary working full time at Mozilla. And it works out to about six years of working with Mozilla and TC39. I could stop and get sentimental, but there’s work to do.
The topic of coroutines (or
fibers, or continuations) for JavaScript comes up from time to time,
so I figured I’d write down my thoughts on the matter. I admit to
having a soft spot for crazy control-flow features like continuations,
but they’re unlikely ever to make it into ECMAScript. With good
reason.
The big justification for coroutines in JavaScript is non-blocking
I/O. As we all know, asynchronous I/O leads to callback API’s, which
lead to nested lambdas, which lead to… the pyramid of doom: