R 4.1.0 is out! And if version 4.0.0 made history with the revolutionary change of stringAsFactors = FALSE, the big splashing news in this next version is the implementation of a native pipe.
The new pipe The “pipe” is one of the most distinctive qualities of tidyverse/dplyr code. I’m sure you’ve used or seen something like this:
library(dplyr) mtcars %>% group_by(cyl) %>% summarise(mpg = mean(mpg)) ## # A tibble: 3 x 2 ## cyl mpg ## <dbl> <dbl> ## 1 4 26.
My girlfriend and I are watching Star Trek: The Next Generation (TNG). The first season it’s pretty lame, but it gets better further down the line. That piqued my curiosity – is that impression shared by the rest of The Internets? So I decided to download the rating of every TNG episode from IDMB. I quickly realised that IMDB provides much more than just mean reating, it also has the full rating histogram and also demographic breakdowns.
Some time ago, someone I follow on twitter posted about having to buy a whole book with rules to tease out grammatical gender in German. Further down the replies, someone reminisced about trying (and failing) to learn German just by listening to Rammstein’s lyrics. I studied about drei Jahre of German at the same time I started listening to Rammstein and other German-speaking bands and I’ve always found Rammstein’s lyrics to be surprisingly simple.
I’ve been an R user for a few years now and the data.table package has been my staple package for most of it. In this post I wanted to talk about why almost every script and RMarkdown report I write start with:
library(data.table) My memory issues I started working on my licenciate thesis (the argentinian equivalent to a Masters Degree) around mid 2016. I had been using R for school work and fun for some time and knew that I wanted to perform all my analysis in R and write my thesis in RMarkdown.
For my research I needed to download gridded weather data from ERA-Interim, which is a big dataset generated by the ECMWF. Getting long term data through their website is very time consuming and requires a lot of clicks. Thankfuly, I came accross the nifty ecmwfr R package that allowed me to do it with ease. One of the great things about open source is that users can also be collaborators, so I made a few suggestions and offered some code.
The metamer package implements Matejka y Fitzmaurice (2017) algorithm for generating datasets with distinct appearance but identical statistical properties. I propose to call them “metamers” as an analogy with the colorimetry concept.
tl;dr: The functionality shown in this post is now on the ggnewscale package! 📦. You can find the original code in this gist.
A somewhat common annoyance for some ggplot2 users is the lack of support for multiple colour and fill scales. Perusing StackOverflow you can find many questions relating to this issue:
Unfortunately, this deluge of questions is met with a shortage of conclusive answers, most of them being some variation of “you can’t, but here’s how to hack it or visualise the data differently”.