100 lines
6.6 KiB
HTML
100 lines
6.6 KiB
HTML
<h1>Introducing the JetStream benchmark suite</h1>
|
|
|
|
<p>Today we are introducing a new WebKit JavaScript benchmark test suite,
|
|
<a href="http://www.browserbench.org/JetStream">JetStream</a>. JetStream codifies what
|
|
our de facto process has been — to combine latency and throughput benchmarks with roughly
|
|
equal weighting, and capturing both metrics of traditional JavaScript programming styles as
|
|
well as new JavaScript-based technologies that have captured our imaginations. Scores on
|
|
JetStream are a good indicator of the performance users would see in advanced web applications
|
|
like games.</p>
|
|
|
|
<p>Optimizing the performance of our JavaScript engine is a high priority for the WebKit
|
|
project. Examples of some of the improvements we introduced in the last year include
|
|
<a href="https://bugs.webkit.org/show_bug.cgi?id=112839">concurrent compilation</a>,
|
|
<a href="https://bugs.webkit.org/show_bug.cgi?id=121074">generational GC</a>, and the
|
|
<a href="https://bugs.webkit.org/show_bug.cgi?id=112840">FTL JIT</a>. Engineering such
|
|
improvements requires focus: we try to prioritize high-impact projects over building and
|
|
maintaining complex optimizations that have smaller benefits.
|
|
Thus, we motivate performance work with
|
|
benchmarks that illustrate the kinds of workloads that WebKit users will likely encounter.
|
|
This philosophy of benchmark-driven development has long been part of WebKit.</p>
|
|
|
|
<h2>The previous state of JavaScript benchmarking</h2>
|
|
|
|
<p>As we made enhancements to the WebKit JavaScript engine, we found that no single
|
|
benchmark suite was entirely representative of the scenarios that we wanted to improve. We
|
|
like <a href="https://www.webkit.org/perf/sunspider/sunspider.html">SunSpider</a> for its
|
|
coverage of commonly-used language constructs and for the fact that its running time is
|
|
representative of the running time of code on the web, but it falls short for measuring
|
|
peak throughput. We like <a href="https://developers.google.com/octane/">Octane</a>, but it
|
|
skews too far in the other direction: it's useful for determining our engine's peak
|
|
throughput but isn't sensitive enough to the performance you'd be most likely to see on typical
|
|
web workloads. It also downplays novel JavaScript technologies like asm.js; only one of
|
|
Octane's 15 benchmarks was an asm.js test, and this test ignores floating point
|
|
performance.</p>
|
|
|
|
<p>Finding good asm.js benchmarks is difficult. Even though
|
|
<a href="https://github.com/kripken/emscripten">Emscripten</a> is gaining
|
|
mindshare, its tests are long-running and until recently, lacked a web harness.
|
|
So we built our own asm.js benchmarks by using tests
|
|
from the <a href="http://llvm.org/">LLVM</a>
|
|
<a href="http://llvm.org/docs/TestingGuide.html#test-suite">test suite</a>.
|
|
These C and C++ tests are
|
|
used by LLVM developers to track performance improvements of the clang/LLVM compiler stack.
|
|
Emscripten itself uses LLVM to
|
|
generate JavaScript code. This makes the LLVM test suite particularly appropriate for testing
|
|
how well a JavaScript engine handles native code. Another benefit of our new tests is that they
|
|
are much quicker to run than the Emscripten test suite.</p>
|
|
|
|
<p>Having good JavaScript benchmarks allows us to confidently pursue ambitious improvements to
|
|
WebKit. For example, SunSpider guided our
|
|
concurrent compilation work, while the asm.js tests and Octane's throughput tests motivated
|
|
our work on the FTL JIT. But allowing our testing to be based on
|
|
a hodgepodge of these different benchmark suites has become impractical. It's
|
|
difficult
|
|
to tell contributors what they should be testing if there is no unified test suite
|
|
that can tell them if their change had the desired effect on performance. We want one test
|
|
suite that can report one score in the end, and we want this one score to be representative
|
|
of WebKit's future direction.</p>
|
|
|
|
<h2>Designing the new JetStream benchmark suite</h2>
|
|
|
|
<p>Different WebKit components require different approaches to measuring performance.
|
|
In some cases, the obvious approach works pretty well: for example, many layout and
|
|
rendering optimizations can be driven by measuring page load time on representative web pages.
|
|
But measuring the performance of a programming language implementation requires
|
|
more subtlety. We want to increase the benchmarks' sensitivity to core engine improvements, but
|
|
not so much so that we lose perspective on how those engine improvements play out in real
|
|
web sites. We want to minimize the opportunities for system noise to throw off our measurements,
|
|
but anytime a workload is inherently prone to noise, we want a benchmark to show this.
|
|
We want our benchmarks to represent a high-fidelity approximation of the workloads that
|
|
WebKit users are likely to care about.</p>
|
|
|
|
<p>JetStream combines a variety of JavaScript benchmarks, covering a variety of advanced
|
|
workloads and programming techniques, and reports a single score that balances them using a
|
|
geometric mean. Each test is run three times and scores are reported with 95% confidence
|
|
intervals. Each benchmark measures a distinct workload, and no single optimization technique
|
|
is sufficient to speed up all benchmarks. Some benchmarks demonstrate tradeoffs, and aggressive
|
|
or specialized optimization for one benchmark might make another benchmark slower. Demonstrating
|
|
trade-offs is crucial for our work. As discussed in my
|
|
<a href="https://www.webkit.org/blog/3362/introducing-the-webkit-ftl-jit/">previous post about
|
|
our new JIT compiler</a>, WebKit tries to dynamically adapt to workload using different
|
|
execution tiers. But this is never perfect. For example, while our new FTL JIT compiler
|
|
gives us fantastic speed-ups on peak throughput tests, it does cause slight regressions in
|
|
some ramp-up tests. New optimizations for advanced language runtimes often run into such
|
|
trade-offs, and our goal with JetStream is to have a benchmark that informs us about
|
|
the trade-offs that we are making.</p>
|
|
|
|
<p>JetStream includes benchmarks from the SunSpider 1.0.2 and Octane 2 JavaScript
|
|
benchmark suites. It also includes benchmarks from the LLVM compiler open source
|
|
project, compiled to JavaScript using Emscripten 1.13. It also includes a
|
|
benchmark based on the Apache Harmony open source project's HashMap, hand-translated to
|
|
JavaScript. More information about the benchmarks included in JetStream is
|
|
available on the <a href="http://www.browserbench.org/JetStream-1.0/in-depth.html">JetStream
|
|
In Depth</a> page.</p>
|
|
|
|
<p>We're excited to be introducing this new benchmark. To run it, simply visit
|
|
<a href="http://www.browserbench.org/JetStream/">browserbench.org/JetStream</a>. You can
|
|
<a href="http://bugs.webkit.org/">file bugs</a> against the benchmark using WebKit's bug
|
|
management system under the Tools/Tests component.</p>
|