Archive for April 2008
By now, a good deal of you have read Charles breakdown of Ruby implementations.
If you haven’t please go read at least the Rubinius section before reading the rest of this post, as it is largely a response to that.
Now, on to Charles section on Rubinius:
Evan Phoenix’s Rubinius project is an effort to implement Ruby using as much Ruby code as possible. It is not, as professed, “Ruby in Ruby” anymore. Rubinius started out as a 100% Ruby implementation of Ruby that bootstrapped and ran on top of MatzRuby. Over time, though the “Ruby in Ruby” moniker has stuck, Rubinius has become more or less half C and half Ruby. It boasts a stackless bytecode-based VM (compare with Ruby 1.9, which does use the C stack), a “better” generational, compacting garbage collector, and a good bit more Ruby code in the core libraries, making several of the core methods easier to understand, maintain, and implement in the first place.
A little background is in order, to put things straight. Rubinius began as a hobby, back in February of 2006 (Same month I got married, that’s how I recall).
At RubyConf 2006, I gave a presentation on what was then the initial work, which at that point constitute 3 bodies of work.
- A VM written in ruby, using RubyInline to access some raw operations. More slow that you can imagine.
- A VM written in C, created by hand translating the ruby code into C. Parts of this work were originally done using a translator program I’d written, which tried to convert the VM in ruby into C mechanically. This proved beyond my skill and time level, thus I felt it was more important to have something running.
- A kernel of ruby code, implementing 95% of the core library / kernel / class library of 1.8. The terminology for this part has always been fuzzy in the Ruby community. Rubinius calls this the kernel, some call it the standard library, some the class library. It’s the implementations of the builtin classes such as Array, Hash, etc.
It’s plainly true that today, the VM is about 22,000 lines, the kernel 23,000 lines. I’ve never hidden this fact from anyone; in fact I’ve put those numbers directly into presentations. That’s been true for pretty much the entire life of the project in the public. The initial ruby prototype was only even run by me.
I do though believe that it still can claim “Ruby in Ruby”. When I present on Rubinius or am asked about this, the response I give is:
What is Ruby?
The typically response is that Ruby is 3 things:
- A syntax
- An execution model
- A kernel
Again, lets have some context. When I began this project, there was buzz about improving things like String and Array. In 1.8, this requires diving down into C right off the bat. Plus, consider languages such as C++ and Java. Java largely claims to be written in Java, since almost the entire class library is written in Java. This lets it evolve faster, because there is no mismatch between Java user code and the Java class library.
It is this that we typically talk about “Ruby in Ruby”. If I’ve not explained this well enough in person and in type, I take full responsibility for this misunderstanding.
There is the long term goal of having a VM which is mechanically generated from Ruby code, in the same way Squeak’s VM is written. But after that RubyConf 2006, there has been no additional work on this, but there is a very good reason for that.
Rubinius today has around 150 people who have received commit rights. The vast, vast majority of their work has been in the kernel, because this is the largest part of the whole system. And probably 95% of that work has been writing Ruby code. This means that for pretty much all contributers, helping with Rubinius means writing Ruby code. And thus to them, it is Ruby in Ruby.
The promise of Rubinius is pretty large. If it can be made compatible, and made to run fast, it might represent a better Ruby VM than YARV. Because a fair portion of Rubinius is actually implemented in Ruby, being able to run Ruby code fast would mean all code runs faster. And the improved GC would solve some of the scaling issues Ruby 1.8 and Ruby 1.9 will face.
Rubinius also brings some other innovations. The one most likely to see general visibility is Rubinius’s Multiple-VM API. JRuby has supported MVM from the beginning, since a JRuby runtime is “just another Java object”. But Evan has built simple MVM support in Rubinius and put a pretty nice API on it. That API is the one we’re currently looking at improving and making standard for user-land MVM in JRuby and Ruby 1.9. Rubinius has also shown that taking a somewhat more Smalltalk-like approach to Ruby implementation is feasible.
But here be dragons.
In the 1.5 years since Rubinius was officially named and born into the Ruby world, it has not yet met any of these promises. It is not generally faster than Ruby 1.8, though it performs pretty well on some low-level microbenchmarks. It is not implemented in Ruby: the current VM is written in C and the codebase hosts as much C code as it does Ruby code. Evan’s work on a C++ rewrite of the VM will make Rubinius the first C++-based Ruby implementation. It has not reached the Rails singularity yet, though they may achieve it for RailsConf (probably in the same cobbled-together state JRuby did at JavaOne 2006…or maybe a bit better). And the second Rails inflection point–running Rails faster than Ruby 1.8–is still far away.
Charles once again gets my hackles up, thought his points are true. We’ve yet to run rails. We’ve yet to run significant Ruby code faster than 1.8. I am finishing up a C++ rewrite of the VM.
I’ve addressed the Ruby in Ruby phraseology above, so lets move past that.
Performance is improving at a slow, regular pace. This is because of 2 factors:
- VM improvements. Adding more caches, more VM logic to make it’s constructs faster. This happens far more infrequently than:
- Ruby code improvements. This happens quite often, because we have so many people working in the kernel. These kinds of improvements will get us a long way, but not the entire way to 1.8 performance. That’s where VM improvements help.
Again, he brings up the sizes of the VM in comparison to the kernel. This will be the last time I address this in this post. Ruby is a dynamic language, which boasts a very rich, featureful kernel. It’s syntax and constructs allow for short, succinct algorithms.
So while, yes, the kernel is the same number of lines as the VM, it’s not unreasonable to say that it probably constitutes 10x the functionality. This is because the written Ruby code is shorter and easier to understand. That’s the whole point of this project, to make the core of it easier to work on and evolve.
Compatibility is not going to be a problem for Rubinius. They’ve worked very hard from the beginning to match Ruby behavior, even launching a Ruby specification suite project to officially test that behavior using Ruby 1.8 as the standard. I have no doubt Rubinius will be able to run Rails and most other Ruby apps people throw at it. And despite Evan’s frequent cowboy attitude to language compatibility (such as his early refusal to implement left-to-right evaluation ordering, a fatal decision that led to the current VM rework), compatibility is likely to be a simple matter of time and effort, driven by the spec suite and by actual applications, as people start running real code on Rubinius.
A quick personal response to a personal attack. I never once refused to implement left-to-right evaluation ordering, this is a bald faced lie. It’s totally true that Rubinius today is right-to-left, because that was much easier to implement way back in the day when the project began. As we started to work on ActiveRecord, we found that there was code that appear to depend on left-to-right ordering, so I brought it up with matz. And now I’m in the midst of changing it. Truth be told, I should have done my research back when the project started, it would have been easier to fix this then than now.
But I take issue with Charles statement that I’m operating fast and loose with language compatibility. We have an awesome team working on RubySpecs, which will end up being a definitive reference for 1.8 behavior. I will always be the first one to defend their behavior, and get Rubinius implementing it properly.
That’s not to say that Rubinius in the past has made temporary pragmatic decisions in implementation. We absolutely have, and in time those are corrected.
Perhaps Charles mistakes my pragmatism and Montana upbringing for a cowboy attitude.
Performance is going to be a much harder problem for Rubinius. In order for Rubinius to perform well, method invocation must be extremely fast. Not just faster than Ruby 1.8 or Ruby 1.9, but perhaps an order of magnitude faster than the fastest Ruby implementations. The simple reason for this is that with so much of the core classes implemented in Ruby, Rubinius is doing many times more dynamic invocations than any other implementation. If a given String method represents one or two dynamic calls in JRuby or Ruby 1.8, it may represent twenty in Rubinius…and sometimes more. All that dispatch has a severe cost, and on most benchmarks involving heavily Ruby-based classes Rubinius has absolutely dismal performance–even with call-site optimizations that finally pushed JRuby’s performance to Ruby 1.9 levels. A few benchmarks I’ve run from JRuby’s suite must be ratcheted down a couple orders of magnitude to even complete.
He’s absolutely correct. We have a ways to go, but I don’t believe we can’t get there. Others before us have made it work, and I think so shall we.
And the Rubinius team knows this. Over the past few months, more and more core methods have been reimplemented in C as “primitives”, sometimes because they have to be to interact with C-level memory and VM constructs, but frequently for performance reasons. So the “Ruby in Ruby” implementation has evolved away from that ideal rather than towards it, and performance is still not acceptable for most applications. In theory, none of this should be insurmountable. Smalltalk VMs run significantly faster than most Ruby implementations and still implement all or most of the core in Smalltalk. Even the JVM, largely associated with the statically-typed Java language, is essentially an optimized dynamic language VM, and the majority of Java’s core is implemented in Java…often behind interfaces and abstractions that require a good dynamic runtime. But these projects have hundreds of man-years behind them, where Rubinius has only a handful of full-time and part-time enthusiastic Rubyists, most with no experience in implementing high-performance language runtimes. And Evan is still primarily responsible for everything at the VM level.
Of course, it would be folly to suggest that the Rubinius team should focus on performance before compatibility. The “Ruby in Ruby” meme needs to die (seriously!), but other than that Rubinius is an extremely promising implementation of Ruby. Its performance is terrible for most apps, but not all that much worse than JRuby’s performance was when we reached the Rails singularity ourselves. And its design is going to be easier to evolve than comparable C implementations, assuming that people other than Evan learn to really understand the VM core. I believe the promise of Rubinius is certainly great enough to continue the project, even if the perils are going to present some truly epic challenges for Evan and company to overcome.
Thank you for the kind works of encouragement Charles. We’re getting there.
I want to say briefly as well that Charles and I are good friends, I just wanted to clear the air slightly and get everyone on the same page.
Sitting here in Copenhagen, at RubyFools, I thought I’d share a technique that I’ve known about for some time, but seems to not have gotten into the normal ruby vernacular.
This trick is the use of super in methods contained in a Module. Consider the following code:
module N def go puts "N#go" super end end class B def go puts "B#go" end end class A < B include N end A.new.go
This is HIGHLY useful implementing rails plugins, where normally you’d use alias_method_chain to change a method directly inside ActiveRecord::Base. Instead, simply call super in the method that provides the new functionality, and when your module is included in your module class (which is a subclass of ActiveRecord::Base), and the main, ActiveRecord::Base implementation will be called.
NOTE: This trick only works if the method you wish to wrap is located in a superclass of the class you have defined the module in. IE, if N were included in B directly rather than A, N#go would never be called.