Saturday, December 10, 2011

Typed Arrays by Default in Emscripten

Emscripten has several ways of compiling code into JavaScript, for example, it can use typed arrays or not (for more, see Code Generation Modes). I merged the 'ta2 by default' branch into master in Emscripten just now, which makes one of the typed array modes the default. I'll explain here the reason for that, and the results of it.

Originally Emscripten did not use typed arrays. When I began to write it, typed arrays were supported only in Firefox and Chrome, and even there they were of limited benefit due to lack of optimization and incomplete implementation. Perhaps more importantly, it was not clear whether they would ever be universally supported in all browsers. So to generate code that truly runs everywhere, Emscripten did not use typed arrays, it generated "plain vanilla" JavaScript.

However, that has changed. Firefox and Chrome now have mature and well-performing implementations of typed arrays, and Opera and Safari are very close to the same. Importantly, Microsoft has said that IE10 will support typed arrays. So typed arrays are becoming ubiquitous, and have a bright future.

The main benefits of using typed arrays are speed and code compatibility. Speed is simply a cause of JS engines being able to optimize typed arrays better than normal ones, both in how they are laid out in memory and how they are accessed. Compatibility stems from the fact that by using typed arrays with a shared buffer, you can get the same memory behavior as C has, for example, you can read an 8-bit byte from the middle of a 32-bit int and get the same result C would get. It's possible to do that without typed arrays, but it would be much, much slower. (There is however a downside to such C-like memory access: Your code, if it was not 100% portable in the first place, may depend on the CPU endianness.)

Because of those benefits, I worked towards using typed arrays by default. To get there, I had to fix various problems with accessing 64-bit values, which are only a problem when doing C-like memory access, because unaligned 64-bit reads and writes do not work (due to how the typed arrays API is structured). The settings I64_MODE and DOUBLE_MODE control reading those 64-bit values: If set to 1, reads and writes will be in two 32-bit parts, in a safe way.

Another complication is that typed arrays cannot be resized. So when sbrk() is called to a value that is larger than the max size, we can't easily enlarge the typed arrays we are using. The current implementation will create new typed arrays and copy the old values into them, which will work but is potentially slow.

Typed arrays have already worked in Emscripten for a long time (in two modes, even, shared and non-shared buffers), but the issues mentioned in the previous two paragraphs limited their use in some areas. So the recent work has been to smooth over all the missing pieces, to make typed arrays ready as the default mode.

The current default in Emscripten, after the merge, is to use typed arrays (in mode 2, with a shared buffer, that is, C-like memory access), and all the other settings are set to safe values (I64_MODE and DOUBLE_MODE are both 1), etc. This means that all the code that worked out of the box before will continue to work, and additional code will now work out of the box as well. Note that this is just the defaults: If your makefile sets all the Emscripten settings itself (like defining whether to use typed arrays or not, etc.), then nothing will change.

The only thing to keep in mind with this change is that by default, you will need typed arrays to run the generated code. If you want your code, right now, to run in the most places, you should set USE_TYPED_ARRAYS to 0 to disable typed arrays. Another possible issue is that not all JS console environments support typed arrays: Recent versions of SpiderMonkey and Node.js do, but the V8 shell has some issues (note that this is just a problem in the commandline shell, not in Chrome), so if you test your generated code using d8 then it will not work. Instead, you can test it in a browser, or by using Node.js or the SpiderMonkey shell for now.

5 comments:

  1. "(There is however a downside to such C-like memory access: Your code, if it was not 100% portable in the first place, may depend on the CPU endianness.)"
    I wonder if this would have been a good opportunity to standardize either big or little endian (outside file I/O) for JS. I mean honestly, it's 2011, can we remove this headache already? If the interpreter and JIT know that the architecture has the opposite endianness, can't it dynamically convert the code to that (if it also knows the code will assume a particular endianness). That might not be the best thing in the world for performance, but the advantage of lowering the entrance barrier might outweigh that disadvantage.

    ReplyDelete
  2. Have you considered trying something like CCured, or moving in that direction, to try to generate more idiomatic JS?

    ReplyDelete
  3. @Ver: I think the standards committee has been debating things like that. The concern though is speed, as you said. Hopefully there will be a solution there.

    @Robert: I don't know much about CCured, how do you think it could help here?

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete