Posts tagged binary data

The little-endian web!

Apr 25th, 2012

This feels a little bit like the web platform having opened a door to hell and Zombies running out of it. I wonder if we can ever close it again. – Malte Ubl

Let’s see if we can. I’ve had a bunch of productive conversations since my post the other day.

I talked about how specifying little-endian would force big-endian browser vendors to choose one implementation strategy — emulate little-endian by byte-swapping and try to optimize as best they can — and concluded that it was better to let them decide for themselves and see how the market shakes out before specifying. But that doesn’t take into account the cost to web developers, which should always be the first priority (mea culpa).

Leaving it unspecified or forcing developers to opt in to a specified endianness taxes developers: it leaves them open to the possibility of their sites breaking on systems they likely can’t even test on, or forces them to make sure they pass the argument (in which case, they’d always be one forgotten argument away from possible bustage on some platform they can’t test on).

Imagine that instead of defaulting to unspecified behavior, we defaulted to little-endian — which is the de facto semantics of the web today — but apps could opt in to big-endian with an optional argument. Then a carefully-written app could use this (in combination with, say, a navigator.endianness feature test API) to decide which byte order would give them better performance. On little-endian systems, they’d use little-endian, on big-endian systems, they’d use big-endian. Less carefully-written apps that just went with the default might get some performance degradation in big-endian platforms, but we don’t actually know how bad it would be. But crucially, there would be no way to accidentally break your app’s behavior.

But let me take it one step further. I don’t even think we know that that additional option will be needed. For now, we don’t even know of any big-endian user agents that are implementing WebGL, nor do we know if byte-swapping will be prohibitively expensive. Until then, I say any additional API surface area is premature optimization. YAGNI.

In summary: let’s prioritize web developers over hypothetical performance issues on hypothetical browsers. Typed arrays should be standardized as little-endian — full stop.

The little-endian web?

Apr 24th, 2012

Here’s the deal: typed arrays are not fully portable. On most browsers, this code will print 1:

1
2
3
var a1 = new Uint32Array([1]);
var a2 = new Uint8Array(a1.buffer);
console.log(a2[0])

But the typed arrays spec doesn’t specify a byte order. So a browser on a big-endian system (say, a PowerPC console like Xbox or PS3) is allowed to print 0. In short: casting an ArrayBuffer to different types is unportable by default. It’s up to web developers to canonicalize bytes for different architectures.

Now, we could just require typed arrays to be little-endian, once and for all. After all, almost all platforms are little-endian these days. The few big-endian platforms could just automatically reorder bytes for all typed array accesses. But this would have to be made to work with WebGL, which works by sending application-generated buffers to the GPU. In order to make this work on a big-endian architecture, little-endian-encoded ArrayBuffer data would need to be translated when sending back and forth to the GPU. Technically, this might be possible, but there’s really no evidence that it would have acceptable performance.

On the other hand, can we really trust that web applications will write portable code? Imagine a hashing algorithm that builds an internal ArrayBuffer and casts it to different types. If the code isn’t written portably, it’ll break on a browser implementing big-endian typed arrays.

This leaves big-endian browsers with a nasty decision: try to emulate little-endian typed arrays to protect against unportable application logic, and suffer the complexity and performance costs of translating data back and forth to the GPU, or just hope that not too many web pages break. Or perhaps surface an annoying decision to users: do you want to run this application in fast mode or correct mode?

For now, we should let browser vendors on big-endian systems make that decision, and not force the decision through the spec. If they end up all choosing to emulate little-endian, I’ll be happy to codify that in the standards. As I understand it, TenFourFox can’t support WebGL, so there the best decision is probably to emulate little-endianness. On an Xbox, I would guess WebGL performance would be a higher priority than web sites using internal ArrayBuffers. But I’m not sure. I’d say this is a decision for big-endian browsers to make, but I would greatly welcome their input.

In the meantime, we should do everything we can to make portability more attractive and convenient. For working with I/O, where you need explicit control over endianness, applications can use DataView. For heterogeneous data, there’ll be ES6 structs. Finally, I’d like to add an option for ArrayBuffers and typed arrays to be given an optional explicit endianness:

1
2
3
4
5
var buffer = new ArrayBuffer(1024, "little"); // a little-endian buffer
var a1 = new Uint32Array(buffer);
a1[0] = 1;
var a2 = new Uint8Array(buffer);
a2[0]; // must be 1, regardless of system architecture

With the endianness specified explicitly, you can still easily write portable logic even when casting — without having to canonicalize bytes yourself. Emscripten and Mandreel could benefit from this increased portability, for example, and I think crypto algorithms would as well. I’ll propose this extension to Khronos and TC39, and discuss it with JavaScript engine implementors.