radex.io

aboutarchiveetsybooksmastodontwitter

I made JSON.parse() 2x faster

March 04, 2023

Part of my job is to make JavaScript things go fast. Speed is a feature, and when working in an interpreted language, squeezing every last bit of performance can be the difference between a great product and unusable garbage.

Anyway, how cool would it be to make JavaScript itself go faster? I’m not a C++ programmer, but that didn’t stop me before, so I thought I’d give it a try anyway!

The objective

We’ll make a common operation, JSON.parse(), faster. There are real world use cases where you’d want to parse large JSONs, such as initial login to an offline-first app, or deserializing persisted state at launch. By making it twice as fast, it could have real impact on users, as opposed to optimizing some obscure microbenchmark.

The JavaScript engine we’ll target is Hermes, used primarily by React Native. Improving V8 (Chrome, Node) or JSC (Safari, Bun) would have greater impact, but Hermes is relevant to my work.

Let’s dive deep.

Step 1: Parse JSON using SIMD instructions

The first thing I tried was to use simdjson.

If you’ve never heard of it, simdjson is (to my knowledge) the world’s fastest JSON parser. It can process gigabytes of JSON per second thanks to clever use of SIMD instructions. Basically, instead of processing it character by character, it does it in chunks of up to 512 bits (64 characters) per CPU instruction! Imagine trying to find the end of a JSON string. One way is to increment an index until buffer[i] == '"'. Another is to use a SIMD instruction that will compare a 64-byte chunk of text to 64 repeated " characters and report which bytes are equal. As you can imagine, the second way is absurdly more performant.

Of course, in reality, it’s far, far more complicated – check out Daniel Lemire’s blog if you want to learn how to use juicy APIs like _mm256_cmpgt_epi8 to do SIMD magic yourself.

Meanwhile, let’s breathe a sigh of relief that someone already did that hard work for us, and let’s hook up simdjson parser into Hermes VM object creation:

// Much simplified version of the real thing, for clarity

use namespace simdjson;
use namespace simdjson::ondemand;

HermesValue jsonParse(Runtime &runtime, Handle<StringPrimitive> jsonString)
{
  // Create a simdjson-compatible string representation
  auto stringRef = jsonString->getStringRef<char>();
  auto json = padded_string(stringRef.data(), stringRef.size());

  // Parse with simdjson
  simdjson::ondemand::parser parser;
  auto doc = parser.iterate(json);
  return parseValue(runtime, doc);
}
… rest of the code
// Turn simdjson values into HermesValues
HermesValue parseValue(Runtime &runtime, ondemand::value &value)
{
  switch (value.type()) {
  case json_type::array:
    return parseArray(runtime, value.get_array());
  case json_type::object:
    return parseObject(runtime, value.get_object());
  case json_type::string:
    std::string_view string = value.get_string();
    UTF8Ref hermesString{string.data(), string.size()};
    return StringPrimitive::createEfficient(runtime, hermesString);
  case json_type::boolean:
    return HermesValue::encodeBoolValue(value.get_bool());
  case json_type::number:
    return HermesValue::encodeDoubleValue(value.get_double());
  case json_type::null:
    return HermesValue::encodeNullValue();
  }
}

// Turn simdjson arrays into JS arrays
HermesValue parseArray(Runtime &runtime, ondemand::array &array)
{
  // Create empty array
  auto jsArray = *JSArray::create(runtime, 4, 0);

  uint32_t index = 0;
  for (ondemand::value value : array) {
    // Create index and value as HermesValues
    auto jsIndex = HermesValue::encodeDoubleValue(index);
    auto jsValue = parseValue(runtime, value);

    // Assign to array
    JSObject::defineOwnComputedPrimitive(jsArray, runtime, jsIndex, /* ... */, jsValue);
    index++;
  }

  return *jsArray;
}

// Turn simdjson objects into JS objects
HermesValue parseObject(Runtime &runtime, ondemand::object &object)
{
  auto jsObject = JSObject::create(runtime);

  for (auto field : object) {
    // Create key and value
    std::string_view key = field.unescaped_key();
    auto jsKey = /* same as string creation in parseValue */
    auto jsValue = parseValue(runtime, field.value());

    // Assign to object
    JSObject::defineOwnComputedPrimitive(jsObject, runtime, jsKey, /* ... */, jsValue);
  }

  return jsObject;
}

Okay, so that was a bit long for a blog post, and I skipped boring details, such as:

But if you read the code, it’s pretty straightforward, even if you don’t know C++. Basically just glue between simdjson representations of values and those of Hermes.

But did it work?

Well yes! It did (mostly) work just fine!

But, no. In my benchmarks, the new implementation was ~20% slower than the original. Oof.

Why didn’t simdjson help?

We need to dive deeper to understand why a fast JSON parser isn’t enough to make JSON.parse() fast.

JavaScript, like many 1990s inventions, made an unfortunate choice of string encoding: UTF-16 (technically, UCS-2 initially).

Today, UTF-8 is king, and any JSON you’ll receive from the web (or read from disk) will almost certainly be encoded with UTF-8. Then, as it’s ingested into JavaScript world, it will have to be transcoded into UTF-16. Even the most basic operation, string.length, is specified to behave consistently with UTF-16, so if a JavaScript runtime kept the original UTF-8 string and simply returned its length (size in bytes), it would be all wrong!

To be more precise, major JavaScript VMs (V8, JSC, and Hermes) keep strings as one of two representations: UTF-16 or ASCII. The latter is safe, since length and character positions of ASCII strings are the same as in UTF-16. But it takes half the memory, so it’s used as an optimization when we know a string doesn’t contain Unicode characters.

simdjson on the other hand, uses UTF-8. So, we need to transcode the input string, and then, to add parsed strings to JavaScript, we need to transcode the output again back to UTF-16 (or maybe ASCII, if possible).

A diagram explaining the multiple Unicode transcoding steps. UTF-8 -> UTF-16 -> UTF-8 -> JSON.parse() -> UTF-8 -> UTF-16

Takeaway: simdjson did speed up JSON parsing (quite a lot, in fact!), it’s just that its gains were outweighed by extra transcoding steps, not required by the original implementation.

Step 2: When in doubt, just use more SIMD

Let me introduce you to another of Daniel Lemire’s libraries: simdutf. Can it transcode gigabytes of Unicode per second? You bet!

Once again, in front of its hardcore engineering, lies a fairly straightforward API:

// First, estimate how much capacity we need to transcode
auto utf8_capacity = simdutf::utf8_length_from_utf16(str, utf16_size);

// Then, allocate a string buffer
std::unique_ptr<char[]> utf8_str{new char[utf8_capacity]};

// Now actually transcode
auto utf8_size = simdutf::convert_utf16_to_utf8(str, utf16_size, utf8_str.get());

Result? The improved JSON.parse() is now ~1.2–1.3x faster than the Hermes original! Neat!

Step 3: Draw the rest of the goddamn owl

Step 1: Draw two circles. Step 2: Draw the rest of the goddamn owl

I originally planned to stop here, with a respectable perf improvement that only required me to glue other people’s code. But I felt that a blog post title “I made JSON.parse() 1.2x faster” didn’t quite have the same ring to it, so I kept going until I reached 2x.

Here’s some of the improvements I made:

ASCII fast path. If input string is ASCII (UTF-8 subset), reuse it instead of transcoding.

Reduce copying. Initially, input string was copied 3 times (initial copy to ensure string doesn’t move during garbage collection, then second allocation by UTF transcoder, and third allocation by simdjson::padded_string to add padding required by simdjson parser). I consolidated it into a single allocation. Analogous improvement on output.

Reduce heap allocations. llvh::SmallVector<T, N> is a neat data structure. When used with a lot of data, it works just like std::vector (a heap-allocated resizable array), but if you use less than N items, it skips heap allocation (expensive) and keeps data on stack (cheap). I used it for transcoding output strings, as most such strings are small. Sometimes this simple technique can do wonders.

Improved isAllASCII check. When creating output strings from UTF8 using StringPrimitive::createEfficient, Hermes first checks if all characters are ASCII. If so, we avoid transcoding to UTF16. I tried replacing this check with simdutf::validate_ascii, but it was slower! I suspect that simdutf is faster for large strings, but has too much overhead for, say, an 8-byte string.

isAllASCII is quite clever already. Rather than checking each character for char <= 127, it loads 4 bytes into an uint32_t, then checks chunk & 0x80808080u (if any of the four char’s most significant bit is 1, it’s not ASCII). Does that sound familiar? Yep, it’s basically SIMD, just using normal 32-bit registers. I extended this to 64-bits and removed memory alignment requirement (as far as I can tell, there’s no perf penalty for unaligned access on arm64).

ASCII faster path? (Fail). At one point, I added a condition that if input string is all ASCII, then we can assume that all output strings are ASCII (to skip transcoding). But that was wrong! I forgot that JSON allows Unicode to be encoded as "\uFFFF".

Object.defineProperty fast path. Parsed fields were set onto JS objects using JSObject::defineOwnComputedPrimitive, equivalent to unknownObject[unknownKey] = value in JS. This method has to work with anything we can throw at it. object could be a plain object, or an array, or it could be frozen or sealed. key might be a string, a number, or something silly like an object. In fact, key property might already be defined, and not writable! But we don’t have to worry about any of that inside of JSON.parse(). We know we have a plain object, a string key, and that all property descriptors are default. The only unusual case we need to check for is that key could be redefined.

JSArray::setElementAt. Analogous to objects, there’s a lot of assumptions we can safely make when setting values on an array — that it is an array (has indexed storage, unlike plain objects), key is a sequential number, there’s no holes, etc.

SymbolID fast path. In Hermes, object keys are all deduplicated and turned into SymbolID. Basically, instead of keeping each key of each object as its own string in memory, there’s only one copy, and objects instead just use its ID. This is great for memory efficiency (and provides a nice boost for startup time, since symbols in source code can be uniqued ahead-of-time), but bad for JSON.parse() performance.

Initially, raw key strings would be turned into StringPrimitive, then a Handle<>, passed to defineOwnComputedPrimitive, then valueToSymbolID, then turned back into StringPrimitive, another Handle, then back to a raw string, only to be finally looked up in the hash map by IdentifierTable::getOrCreateIdentifier. Needless to say, we could go straight to IdentifierTable.

Inlining. When a function is small, and hot (often-called), it’s often worth telling the compiler to inline it. I did that with some JSON helper functions.

Reusing simdjson objects. I put simdjson::ondemand::parser into RuntimeCommonStorage to reuse it between JSON.parse()s. This reduces overhead when parsing small JSONs.

Reducing Handles. Because we’re working in a VM of a garbage-collected language, objects behave in an unusual way. HermesValues are only safe to use for a very short period of time, specifically until the next heap allocation. At that point, garbage collector could kick in, and move surviving objects to another location in memory. To avoid this problem, we use a Handle<>, which keeps a pointer to underlying object, but is updated by the GC if needed. Handles are low-overhead, but sometimes, we know it’s safe to avoid it.

Some of these were 0.5% improvements, others were >10% speedups. But taken all together, the goal was reached!

Final results!

Chart with experiment results

I’ve measured this with a simple benchmark on an arm64 machine.

Notice:

Oh no, I have competition…

Since I started hacking on the JSON parser (February 17), Hermes team added a few performance improvements of their own…

Interestingly, their changes are completely unrelated to mine! While I used simdjson as the low-level parser, with simdutf to patch up the text encoding issue, Michael from Hermes team took advantage of the fact that if you have a bespoke parser, you can optimize it for your particular use case. Specifically, you can calculate string hash (used to convert keys to SymbolIDs) and check if it’s ASCII incrementally as you parse the input JSON string to avoid scanning memory multiple times.

What’s the difference? Comparing against lastest main, the results are a bit worse: not 1.9-2.6x better, but closer to 1.7-2.3x.

UTF-8 fast path?

While we discussed improving JSON.parse() itself, in reality, it seems unlikely to me that someone would construct a large raw JSON string from within JavaScript. I posit that almost all such JSONs come either from fetch()’s result.json() or are read from disk. In other words — they come to you as UTF8.

From this perspective, the fact that simdjson uses UTF8 is a good thing! We just didn’t measure the whole process, and conveniently ignored that it’s wasteful to convert data into UTF16 to create a JS string, just to discard it after parsing.

We could make an easy speed (and memory/GC pressure) improvement by exposing a JSON.parse()-like function that takes an UTF-8 ArrayBuffer or Blob for input, either via Hermes’s jsi C++ API (to be used by React Native’s internals), or via global.HermesInternal (to be used by fetch() polyfill).

Appendix: More improvement ideas

It’s important to understand that most of what JSON.parse() does is not actually JSON parsing (at least not with simdjson). Rather, it’s constructing JS objects, looking up SymbolIDs and HiddenClass transitions, heap allocations, copying memory, Unicode transcoding, etc. (In fact, if you wanted to squeeze max performance out of JSON parsing, you’d want to use arrays of values, not objects, and make it all ASCII.)

Here’s a random collection of ideas which might work (or maybe not).

Speculation. JavaScript VMs regularly use speculation techniques to speed up JS execution. The basic idea is that you inspect what a function does (for example, what types are its arguments), and assume (speculate) that it’s going to behave the same way in the future. Making these assumptions allows for massive speedups. Of course, we need to catch if the assumption is broken, and if so, go the slow path — but this still works well, because most of the time, it won’t. Could we use speculation-like techniques to speed up JSON.parse()?

About 20-25% of its executime time is spent looking up SymbolIDs from key strings. My theory is that many large JSONs consist of arrays of objects with the same shape, i.e. same keys appearing in the same order. If that’s true, we could inspect and remember an object’s keys with matching SymbolIDs, then for subsequent objects, check first with this speculation table, falling back to IdentifierTable lookup if speculation fails. My intuition is that this should be significantly faster if speculation is right most of the time.

I actually prototyped this! A fairly simple algorithm: first wait for enough objects parsed to deem speculation worthwhile, inspect one object, and set a goodObjects counter to 2. Subsequent objects speculate, and increment/decrement the counter depending on whether speculation worked. If we go to zero, we back off and wait a while until attempting to speculate again. This way, if speculation doesn’t work for a particular JSON, there will only be a small performance penalty for this. But if speculation helps, not every object has to match, only most of them. This could be made more sophisticated, e.g. to work with objects that have smaller objects as values (we want to inspect the larger object, not its children), tolerate some skipped keys, add exponential backoff, etc. My prototype kinda worked, but it wasn’t faster. I didn’t spend much time on this though, I’m sure it can be improved.

More speculation? Objects in Hermes are backed by a HiddenClass, which describes what keys it contains. This is uniqued such that all objects of the same shape have the exact same HiddenClass, which is good for memory efficiency. For that to work, HiddenClasses are immutable. Every time you assign a new property to an object, its HiddenClass changes. (TransitionMap is looked up to find the right one). These lookups, alongside other manipulations of HiddenClasses take up another 10-20% of execution time. But we’ve just established that HiddenClasses are shared, and most objects have the same shape. Can we cache them alongside SymbolIDs for well-speculated objects?

Changing object creation order. Currently, we create values in JavaScript immediately after parsing them. This is efficient from parsing perspective, but again, most time is spent creating objects. What if we parsed the entire object first? We’d know all keys, so we wouldn’t have to change the JSObject multiple times, we could start with the correct HiddenClass right off the bat. It’s just an idea, I haven’t dug deep enough to say if this makes sense.

Reduce copying. There’s still some inefficiency in this area. When creating output strings from UTF8, we first transcode, then copy to a JS string. Could we allocate the right-sized JS string first, and transcode directly into it?

Final words

Wow, that was a long post! I hope it was interesting and that you learned something about how Hermes works.

Discussion: Hacker News, Twitter, Mastodon.

As a reminder, I’m not a C++ programmer or a VM engineer, so I apologize for all mistakes, oversimplications, and ideas obviously foolish to someone actually experienced in this area.

Could this work be upstreamed into Hermes? I don’t know. Adding simdjson and simdutf might be out of the question for the project, due to their size, however some improvements are back-portable to the current parser. It’s also possible that some of my optimizations, while they worked in my tests, are actually incorrect and could break under some circumstances. I made a pull request to discuss these concerns.

My thanks to Michał Pierzchała, Gilad Ronat, and Palid for proof-reading this post.

Published March 04, 2023. Send feedback.