Rubinius Retort

By now, a good deal of you have read Charles breakdown of Ruby implementations.
If you haven’t please go read at least the Rubinius section before reading the rest of this post, as it is largely a response to that.

Now, on to Charles section on Rubinius:

Evan Phoenix’s Rubinius project is an effort to implement Ruby using as much Ruby code as possible. It is not, as professed, “Ruby in Ruby” anymore. Rubinius started out as a 100% Ruby implementation of Ruby that bootstrapped and ran on top of MatzRuby. Over time, though the “Ruby in Ruby” moniker has stuck, Rubinius has become more or less half C and half Ruby. It boasts a stackless bytecode-based VM (compare with Ruby 1.9, which does use the C stack), a “better” generational, compacting garbage collector, and a good bit more Ruby code in the core libraries, making several of the core methods easier to understand, maintain, and implement in the first place.

A little background is in order, to put things straight. Rubinius began as a hobby, back in February of 2006 (Same month I got married, that’s how I recall).
At RubyConf 2006, I gave a presentation on what was then the initial work, which at that point constitute 3 bodies of work.

  1. A VM written in ruby, using RubyInline to access some raw operations. More slow that you can imagine.
  2. A VM written in C, created by hand translating the ruby code into C. Parts of this work were originally done using a translator program I’d written, which tried to convert the VM in ruby into C mechanically. This proved beyond my skill and time level, thus I felt it was more important to have something running.
  3. A kernel of ruby code, implementing 95% of the core library / kernel / class library of 1.8. The terminology for this part has always been fuzzy in the Ruby community. Rubinius calls this the kernel, some call it the standard library, some the class library. It’s the implementations of the builtin classes such as Array, Hash, etc.

It’s plainly true that today, the VM is about 22,000 lines, the kernel 23,000 lines. I’ve never hidden this fact from anyone; in fact I’ve put those numbers directly into presentations. That’s been true for pretty much the entire life of the project in the public. The initial ruby prototype was only even run by me.

I do though believe that it still can claim “Ruby in Ruby”. When I present on Rubinius or am asked about this, the response I give is:
What is Ruby?
The typically response is that Ruby is 3 things:

  • A syntax
  • An execution model
  • A kernel

Again, lets have some context. When I began this project, there was buzz about improving things like String and Array. In 1.8, this requires diving down into C right off the bat. Plus, consider languages such as C++ and Java. Java largely claims to be written in Java, since almost the entire class library is written in Java. This lets it evolve faster, because there is no mismatch between Java user code and the Java class library.
It is this that we typically talk about “Ruby in Ruby”. If I’ve not explained this well enough in person and in type, I take full responsibility for this misunderstanding.
There is the long term goal of having a VM which is mechanically generated from Ruby code, in the same way Squeak’s VM is written. But after that RubyConf 2006, there has been no additional work on this, but there is a very good reason for that.

Rubinius today has around 150 people who have received commit rights. The vast, vast majority of their work has been in the kernel, because this is the largest part of the whole system. And probably 95% of that work has been writing Ruby code. This means that for pretty much all contributers, helping with Rubinius means writing Ruby code. And thus to them, it is Ruby in Ruby.

The promise of Rubinius is pretty large. If it can be made compatible, and made to run fast, it might represent a better Ruby VM than YARV. Because a fair portion of Rubinius is actually implemented in Ruby, being able to run Ruby code fast would mean all code runs faster. And the improved GC would solve some of the scaling issues Ruby 1.8 and Ruby 1.9 will face.

Rubinius also brings some other innovations. The one most likely to see general visibility is Rubinius’s Multiple-VM API. JRuby has supported MVM from the beginning, since a JRuby runtime is “just another Java object”. But Evan has built simple MVM support in Rubinius and put a pretty nice API on it. That API is the one we’re currently looking at improving and making standard for user-land MVM in JRuby and Ruby 1.9. Rubinius has also shown that taking a somewhat more Smalltalk-like approach to Ruby implementation is feasible.

But here be dragons.

In the 1.5 years since Rubinius was officially named and born into the Ruby world, it has not yet met any of these promises. It is not generally faster than Ruby 1.8, though it performs pretty well on some low-level microbenchmarks. It is not implemented in Ruby: the current VM is written in C and the codebase hosts as much C code as it does Ruby code. Evan’s work on a C++ rewrite of the VM will make Rubinius the first C++-based Ruby implementation. It has not reached the Rails singularity yet, though they may achieve it for RailsConf (probably in the same cobbled-together state JRuby did at JavaOne 2006…or maybe a bit better). And the second Rails inflection point–running Rails faster than Ruby 1.8–is still far away.

Charles once again gets my hackles up, thought his points are true. We’ve yet to run rails. We’ve yet to run significant Ruby code faster than 1.8. I am finishing up a C++ rewrite of the VM.

I’ve addressed the Ruby in Ruby phraseology above, so lets move past that.

Performance is improving at a slow, regular pace. This is because of 2 factors:

  • VM improvements. Adding more caches, more VM logic to make it’s constructs faster. This happens far more infrequently than:
  • Ruby code improvements. This happens quite often, because we have so many people working in the kernel. These kinds of improvements will get us a long way, but not the entire way to 1.8 performance. That’s where VM improvements help.

Again, he brings up the sizes of the VM in comparison to the kernel. This will be the last time I address this in this post. Ruby is a dynamic language, which boasts a very rich, featureful kernel. It’s syntax and constructs allow for short, succinct algorithms.
So while, yes, the kernel is the same number of lines as the VM, it’s not unreasonable to say that it probably constitutes 10x the functionality. This is because the written Ruby code is shorter and easier to understand. That’s the whole point of this project, to make the core of it easier to work on and evolve.

Compatibility is not going to be a problem for Rubinius. They’ve worked very hard from the beginning to match Ruby behavior, even launching a Ruby specification suite project to officially test that behavior using Ruby 1.8 as the standard. I have no doubt Rubinius will be able to run Rails and most other Ruby apps people throw at it. And despite Evan’s frequent cowboy attitude to language compatibility (such as his early refusal to implement left-to-right evaluation ordering, a fatal decision that led to the current VM rework), compatibility is likely to be a simple matter of time and effort, driven by the spec suite and by actual applications, as people start running real code on Rubinius.

A quick personal response to a personal attack. I never once refused to implement left-to-right evaluation ordering, this is a bald faced lie. It’s totally true that Rubinius today is right-to-left, because that was much easier to implement way back in the day when the project began. As we started to work on ActiveRecord, we found that there was code that appear to depend on left-to-right ordering, so I brought it up with matz. And now I’m in the midst of changing it. Truth be told, I should have done my research back when the project started, it would have been easier to fix this then than now.

But I take issue with Charles statement that I’m operating fast and loose with language compatibility. We have an awesome team working on RubySpecs, which will end up being a definitive reference for 1.8 behavior. I will always be the first one to defend their behavior, and get Rubinius implementing it properly.

That’s not to say that Rubinius in the past has made temporary pragmatic decisions in implementation. We absolutely have, and in time those are corrected.
Perhaps Charles mistakes my pragmatism and Montana upbringing for a cowboy attitude.

Performance is going to be a much harder problem for Rubinius. In order for Rubinius to perform well, method invocation must be extremely fast. Not just faster than Ruby 1.8 or Ruby 1.9, but perhaps an order of magnitude faster than the fastest Ruby implementations. The simple reason for this is that with so much of the core classes implemented in Ruby, Rubinius is doing many times more dynamic invocations than any other implementation. If a given String method represents one or two dynamic calls in JRuby or Ruby 1.8, it may represent twenty in Rubinius…and sometimes more. All that dispatch has a severe cost, and on most benchmarks involving heavily Ruby-based classes Rubinius has absolutely dismal performance–even with call-site optimizations that finally pushed JRuby’s performance to Ruby 1.9 levels. A few benchmarks I’ve run from JRuby’s suite must be ratcheted down a couple orders of magnitude to even complete.

He’s absolutely correct. We have a ways to go, but I don’t believe we can’t get there. Others before us have made it work, and I think so shall we.

And the Rubinius team knows this. Over the past few months, more and more core methods have been reimplemented in C as “primitives”, sometimes because they have to be to interact with C-level memory and VM constructs, but frequently for performance reasons. So the “Ruby in Ruby” implementation has evolved away from that ideal rather than towards it, and performance is still not acceptable for most applications. In theory, none of this should be insurmountable. Smalltalk VMs run significantly faster than most Ruby implementations and still implement all or most of the core in Smalltalk. Even the JVM, largely associated with the statically-typed Java language, is essentially an optimized dynamic language VM, and the majority of Java’s core is implemented in Java…often behind interfaces and abstractions that require a good dynamic runtime. But these projects have hundreds of man-years behind them, where Rubinius has only a handful of full-time and part-time enthusiastic Rubyists, most with no experience in implementing high-performance language runtimes. And Evan is still primarily responsible for everything at the VM level.

Of course, it would be folly to suggest that the Rubinius team should focus on performance before compatibility. The “Ruby in Ruby” meme needs to die (seriously!), but other than that Rubinius is an extremely promising implementation of Ruby. Its performance is terrible for most apps, but not all that much worse than JRuby’s performance was when we reached the Rails singularity ourselves. And its design is going to be easier to evolve than comparable C implementations, assuming that people other than Evan learn to really understand the VM core. I believe the promise of Rubinius is certainly great enough to continue the project, even if the perils are going to present some truly epic challenges for Evan and company to overcome.

Thank you for the kind works of encouragement Charles. We’re getting there.
I want to say briefly as well that Charles and I are good friends, I just wanted to clear the air slightly and get everyone on the same page.

14 thoughts on “Rubinius Retort

  1. I always repeat in my presentations that Rubinius is destined to be the best of all implementations. As you said, Smalltalk did it once, I see no reason for you guys not being able to replicate it. On the other hand I always think that maybe Charles and Avi have both come up with a different aspect that should be considered. Charles says that both SmalltalkVMs and JVMs had hundreds of man-hours behind them, and Rubinius really don’t have that much man power. Avi once suggested that maybe the easy way out was to use an already existing VM such as Strongtalk. Have you ever considered such a route?

    Contrary to Charles, I think the ‘Ruby on Ruby’ meme is worthwhile.

    Kudos for all the hard work. I am still holding my breadth for the Rails-singularity. That’s when the project should gain more traction and contributors: once it shows the first milestone towards full compatibility.

  2. A few responses to the retort!๐Ÿ™‚

    * Your 150 committers claim needs to be softened a bit. There are 150+ committers, but by your own admission there are maybe 20 active in a given period. And by my measurements, the majority of work (as much as 3:1 over the past month) has been spent working on the specs rather than on anything Rubinius-specific. So it’s not like there’s an army of man-machines cranking out fresh new Rubinius kernel code.
    * Evan did refuse to fix eval ordering, even after I warned it would be a problem and provided example code that would break. And break it did, at least in part forcing the current rewrite. In the end, though, it’s probably not even a bad thing: the new VM is looking great, and will probably be easier to follow and certainly easier to test (it’s TDD all the way this time). But who said what isn’t important. Both Evan and I have been burned by our “pragmatic” decisions, and we’ve both had to backtrack, fix bugs, and sometimes replace whole subsystems as a result. We’re both learning, and hopefully we’re able to help each other along the way.
    * And as I updated on my blog and commented a few other places, I don’t believe the ideal of “Ruby in Ruby” is a bad one. Far from it, I believe “RiR” is a great goal to work toward, and I’d even like to achieve it in JRuby, either by reimplementing pieces or by providing a JVM/JRuby backend for Rubinius. The meme I think needs to die, however, is that Rubinius is already 100% Ruby in Ruby or that it’s somehow automatically better. That’s not the case yet. Maybe in the future it will be.

  3. The biggest problem I’ve seen with the “Ruby in Ruby” approach is that it makes the whole system more fragile to core methods replacement, something that ActiveSupport loves to do. Proof is the SafeMathOperators compiler plugin that protects Fixnum against division overrides.

  4. I don’t think I’ve ever heard anywhere that Rubinius was better because it was “Ruby in Ruby”, or that it would be better than JRuby, even. Ever.

    Truth be told, more often than not, myself and I’m sure others would think it would perform worse simply for the fact that Ruby is slow, but this is flawed thinking to some degree.

    I look forward to both (and others) improving a great deal over time. Keep your dicks in your pants and keep up the good work.๐Ÿ™‚

  5. Do you think that VM or JRuby will be stick?

    Well, Sun, er “JAVA” has hired itself 1-3 top-tier fanboys, so at least until it changes its ticker symbol to “FAIL” (1-3 years), you’d better hope non-J Rubies survive…

  6. nothanks you snarky cow. The JRuby crew do good work and should be congratulated for it.

    Bring on the implementations I say. The more the better. Pitfalls be damned, it’s worth the risk if we can see Ruby move beyond the poor and oft-pummeled MRI 1.8.

  7. Mr eel: no need to name call. JRuby is doing a great job, when did I say they weren’t.

    I’ve a big advocate of multiple Ruby implementations, it means that the community and ecosystem are healthy.

  8. evan, i think Mr. Eel was referring to the dude who posted as “nothanks”. I take Mr. Eel’s point to be that the hatin’ should stop, and that JRuby team has done good work pushing the Ruby world towards greener pastures than the 1.8 vm, since everyone is always criticizing it!

  9. I think we have some wires crossed Evan, I was being rude to a cheeky commenter ‘nothanks’, not yourself.

    That’s what I get for being a bastard.

  10. It’s interesting that your comparison between c and ruby is based on lines of code, when we all know that ruby’s lines of code do far more than c’s lines of code.

    In other words,{|a| a.items.array_of_forefathers}.flatten.uniq

    is far more powerful than c can hope to be in one line of code.

    Perhaps you shouldn’t be comparing this, but rather POTENTIAL lines of c code that the relevant ruby code would require if it were written in c?

    I’d wager it’d be a darn sight more than the same number.

    Zenunit Sensei

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s