Monday, June 15, 2015

Mission accomplished: IonPower kicks incredibly large gluteus

I like kicking big butts on big endian processors and I cannot lie. You other brothers/sisters/etc. are unable to deny.

Remember our primary target was to make our new IonPower JavaScript JIT backend eclipse our previous speed champion, JaegerMonkeyPPC, on our chosen classic V8 benchmark. It's always hard to judge debugging builds because of all the extra chaff they carry, but as of early this morning TenFourFox 38 was able to get a G5-optimized build up (more on that in a moment which builders need to read). Let's get a drum roll, maestro. Test system is a Quad G5 with performance set to Highest. Here is our current browser (31.7.0) running the current JIT compiler, PPCBC:

% /Applications/TenFourFoxG5.app/Contents/MacOS/js --no-ion -f run.js
Richards: 208
DeltaBlue: 579
Crypto: 365
RayTrace: 574
EarleyBoyer: 628
RegExp: 622
Splay: 932
NavierStokes: 439
----
Score (version 7): 502

PPCBC beats the interpreter by a long shot, which our long suffering friends running Firefox in PowerPC Linuxland must still contend with, but PPCBC was no JaegerMonkey. Here is 17.0.11:

% /Applications/TenFourFoxG5-17.0.11.app/Contents/MacOS/js -m -n -f run.js
Richards: 3160
DeltaBlue: 8364
Crypto: 3036
RayTrace: 1784
EarleyBoyer: 3602
RegExp: 553
Splay: 4115
NavierStokes: 1383
----
Score (version 7): 2519

Yup, that was good times. But are you ready for awesome?

% mozilla-38b/obj-ff-dbg/dist/bin/js -f run.js
Richards: 7258
DeltaBlue: 19207
Crypto: 2857
RayTrace: 15730
EarleyBoyer: 8873
RegExp: 1202
Splay: 3960
NavierStokes: 3456
----
Score (version 7): 5561

That is 2.21 times faster than JaegerMonkey (versions 10-22) and over eleven (!!) times faster than PPCBC (versions 24-36). I don't think you even want to see how much faster it is than the interpreter ... but what the heck!

% mozilla-38b/obj-ff-dbg/dist/bin/js --no-ion --no-baseline --no-native-regexp -f run.js
Richards: 44.7
DeltaBlue: 172
Crypto: 80.9
RayTrace: 113
EarleyBoyer: 173
RegExp: 192
Splay: 341
NavierStokes: 135
----
Score (version 7): 134

That's over 41 times faster. I can't wait to get this hot little rocket in your hot little hands. Mind the fire. PowerPC got back.

The only target we missed is that Baseline-only performance is still about 25% slower than PPCBC, which admittedly was highly optimized for that purpose. I think this tradeoff will be acceptable. :)

Builders are warned: 38 will probably require some tweaks to bintools due to a problem with N_SECT and gcc 4.6. I'm currently experimenting with a modified strip that doesn't choke on these non-standard object files and will provide this to you when the beta becomes available. In addition, I've pretty much settled on gcc 4.9 for post-38 and we will likely use a combination of MacPorts and Sevan's pkgsrc builds for the new build system, if there is one. More on that after 38 launches successfully.

8 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. My anaconda don't want none if it ain't got buns, hon!

    ReplyDelete
  3. I'd be interesting to see the score comparison of the G3-optimized builds to see if the same 11x improvement holds.

    ReplyDelete
    Replies
    1. The improvement includes the bonus from off-thread compilation which obviously is extremely profitable on multiprocessor Macs. Also, the G5 has hardware square root available to IonPower, which PPCBC didn't have (and neither do G3 or G4). The differential on a G3 or uniprocessor G4 will not be nearly as high, but it should still be substantially faster.

      Delete
  4. You're awesome. I mean, like, REALLY AWESOME.

    ReplyDelete
  5. I'm wondering how the G5 Quad compares to the benchmarks achieved on the OS X Intel chipsets?

    ReplyDelete
  6. Just for comparison, this is a 2.0 GHz Intel Celeron with Win XP.
    http://postimg.org/image/410zhxsif/

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.