Here are some notes on dipping into Rust to speed up a function in R.
Quick points:
- Rust is a practical option in speeding up expensive functions in higher level languages like R and python.
- LLMs make translating existing functions into unfamiliar languages much more straightforward (and the existing functions help validate they work).
- R’s rextendr has both support for making Rust functions available as a library, but also just including a small Rust function inline in a R file.
- There are other considerations than speed: ease of maintenance and review is affected by using two languages: need to consider trade-offs.
- Best candidates are small, conceptually simple but computationally expensive calculations you want to do a lot.
--
Chris Hanretty has a blog post about how to examine and optimise a function in R. - ending up 0.2-0.3 seconds for his optimised version of a function.
This is a well-defined simple problem that takes too long to run and seemed like a good candidate for seeing how to do it in Rust, and seeing what the ecosystem was like for accessing Rust in R compared to Python (which I’m a bit more familiar with).
Why use Rust
A key reason to drop simple functions down into a lower level language like Rust is that computers are a lot faster than you think they are. Some of the difference is the trade-off between ease-of-use and speed in higher level languages, but also that sometimes you have to reframe your problem into a different shape to take advantage of the approaches that are fast in that language.
In R (and data processing like pandas in Python), doing loops is discouraged not because it’s the wrong logical approach, but because vectorised functions are providing an interface to lower-level compiled functions, rather than running at the more abstract python/R level. If we drop down and write our own compiled function, we can do the thing we were actually trying to do in the first place.
As it turns out, the rextendr library is great at this, not only providing bindings in rust if you want to do a full module - but doing something horrible behind the scenes meaning you can just drop a Rust function into the document and it will take care of compiling that for you. So for simple things, you don’t need to grapple with having a complete Rust project - and you can keep the whole source in one file (which is better for simplicity and replication).
The obvious big downside is you have to learn a whole new programming language, but if you’re just dropping in - this is much easier than it used to be. LLMs are really good at translating between languages (and suggesting approaches for optimisation). It is much easier to read programming languages you don’t understand than to write them, especially if you already have a different version and working version of the function to validate the result.
Using Rust from inside R
So here is the function that takes a vector of party and voter positions and finds the average difference per voter, putting the Rust code inside an R file.
There’s some idiomatic Rust stuff in there - but it’s basically, take vector a, loop through vector b, calculating distance, get smallest distance, get average. The most immediately unclear bit is all the & symbols - which is how Rust passes references rather than borrowing a variable (you start to get a feel for this when you do more Rust, but for these purposes it doesn’t matter - it works fine).
When we drop that into the original code, we get a 10x speed up on the previously optimised version 0.03 seconds (once that function has compiled).
Need for speed
We can go further than that: the original blog notes you could also get an answer using integration but that's slower in R. Switching to Rust it's not! And after a bit of consulting ChatGPT on how to make it even faster (e.g. some shortcuts by using SIMD to do the same operation on multiple bits of data), we can get a lot more speed here.
(And the more you’re getting a bit beyond your comfort level in the new language, the more having a working slow version of the functions is really helpful in validating everything is working).
There's some cheating in the timing here - each time the script is run has to recompiles the rust function. This is slower! Moving the two functions off to their own library means we can install the library and compile just once and just make use of those functions in the R script like we would any other module.
Benchmarking a range of approaches in the final library gives the following:
- Pure R Reference: 138 ms
- Rust Monte Carlo: 10.1 ms
- Rust Integration: 0.0180 ms
So here we can see that a simple “just copy the approach over” gives a quick win for a speed-up. But leaning into an approach that wouldn’t have been optimal at all in R can unlock big improvements. Computers are really fast!
Speed isn’t everything
Having explored what’s possible, it’s worth noting that there are reasons not to do this if you’ve got an approach that works at a usable speed. This approach has more complicated build dependencies, your code is now less reviewable and maintainable by another R person. If you can get fast enough in one language, that's probably the best approach. But if you need speed, options are on the table.
And speed isn't just about saving time, but opening up new approaches - when something can run 1000 times a second you can build tools with more real time adjustment of parameters.
The best use case for Rust isn’t necessarily writing a whole thing, but identifying something small, conceptually simple but computationally expensive, that you want to run lots of times. That's not every problem you have, but it might be some.
Header image: Photo by Julian Hochgesang on Unsplash