71 lines
2.7 KiB
Plaintext
71 lines
2.7 KiB
Plaintext
|
|
* Add more test cases. Categories we'd like to cover (with reasonably
|
|
real-world tests, preferably not microbenchmarks) include:
|
|
|
|
(X marks the ones that are fairly well covered now).
|
|
|
|
X math (general)
|
|
X bitops
|
|
X 3-d (the math bits)
|
|
- crypto / encoding
|
|
X string processing
|
|
- regexps
|
|
- date processing
|
|
- array processing
|
|
- control flow
|
|
- function calls / recursion
|
|
- object access (unclear if it is possible to make a realistic
|
|
benchmark that isolates this)
|
|
|
|
I'd specifically like to add all the computer language shootout
|
|
tests that Mozilla is using.
|
|
|
|
* Normalize tests. Most of the test cases available have a repeat
|
|
count of some sort, so the time they take can be tuned. The tests
|
|
should be tuned so that each category contributes about the same
|
|
total, and so each test in each category contributes about the same
|
|
amount. The question is, what implementation should be the baseline?
|
|
My current thought is to either pick some specific browser on a
|
|
specific platform (IE 7 or Firefox 2 perhaps), or try to target the
|
|
average that some set of same-generation release browsers get on
|
|
each test. The latter is more work. IE7 is probably a reasonable
|
|
normalization target since it is the latest version of the most
|
|
popular browser, so results on this benchmark will tell you how much
|
|
you have to gain or lose by using a different browser.
|
|
|
|
* Instead of using the standard error, the correct way to calculate
|
|
a 95% confidence interval for a small sample is the t-test.
|
|
<http://en.wikipedia.org/wiki/Student%27s_t-test>. Basically this involves
|
|
using values from a 2-tailed t-distribution table instead of 1.96 to
|
|
multiply by the error function, a table is available at
|
|
<http://www.medcalc.be/manual/t-distribution.php>
|
|
|
|
* Add support to compare two different engines (or two builds of the
|
|
same engine) interleaved.
|
|
|
|
* Add support to compare two existing sets of saved results.
|
|
|
|
* Allow repeat count to be controlled from the browser-hosted version
|
|
and the WebKitTools wrapper script.
|
|
|
|
* Add support to run only a subset of the tests (both command-line and
|
|
web versions).
|
|
|
|
* Add a profile mode for the command-line version that runs the tests
|
|
repeatedly in the same command-line interpreter instance, for ease
|
|
of profiling.
|
|
|
|
* Make the browser-hosted version prettier, both in general design and
|
|
maybe using bar graphs for the output.
|
|
|
|
* Make it possible to track change over time and generate a graph per
|
|
result showing result and error bar for each version.
|
|
|
|
* Hook up to automated testing / buildbot infrastructure.
|
|
|
|
* Possibly... add the ability to download iBench from its original
|
|
server, pull out the JS test content, preprocess it, and add it as a
|
|
category to the benchmark.
|
|
|
|
* Profit.
|