Why didn’t you tell me, R?

Another day in debugging hell. You know when you ask people: “What’s wong?” and they respond with: “Nothing.”, even though there clearly is something wrong? This is how I like to think about missing error messages when there clearly should be one. The issue I encountered with my code today was the following:

(not_cool_r <- data.frame(col1 = 1:3, incomplete_colname = 4:6, incomp = 7:9))
##   col1 incomplete_colname incomp
## 1    1                  4      7
## 2    2                  5      8
## 3    3                  6      9

Now, we can access the column incomplete_colname like this:

not_cool_r$incomplete_colname
## [1] 4 5 6

But you know what also works?

This:

not_cool_r$incomplete_colnam
## [1] 4 5 6

Or this:

not_cool_r$incomplete_colnam
## [1] 4 5 6

In fact …

not_cool_r$incomplete_colname
not_cool_r$incomplete_colnam
not_cool_r$incomplete_colna
not_cool_r$incomplete_coln
not_cool_r$incomplete_col
not_cool_r$incomplete_co
not_cool_r$incomplete_c
not_cool_r$incomplete_
not_cool_r$incomplete
not_cool_r$incomplet
not_cool_r$incomple
not_cool_r$incompl # gives column incomplete_colname until here
not_cool_r$incomp # gives column incomp from here
not_cool_r$incom # gives NULL from here because it could be one of two cols
not_cool_r$inco
not_cool_r$inc

This is the first call that will give us a different output, because it actually matches the name of column incomp.

not_cool_r$incomp
## [1] 7 8 9

When leaving out more letters, the result will be NULL, because this time, R doesn’t know whether we meant the column incomplete_colname or incomp.

not_cool_r$incom
## NULL

Why is this a problem? In my case, I called a column that wasn’t there. This should have given me an error, but instead, R used a different column that happened to have the same beginning like the column name I actually wanted to call.

How to prevent this

Sure, you can get around this with clever naming. However, with various variable amd column names, you might lose track at some point. At least I didn’t think that sub_id (subject ID) and sub_i (subject intercept) would be a problem.

So, a foolproof way is to access the column via [[]], which will give you NULL.

not_cool_r[["incomplete_colname"]]
## [1] 4 5 6
not_cool_r[["incomplete_colnam"]]
## NULL

Or use [ , ], which will give you an error (but don’t use [], see here).

not_cool_r[ , "incomplete_colname"]
## [1] 4 5 6
try(not_cool_r[ , "incomplete_colnam"])
## Error in `[.data.frame`(not_cool_r, , "incomplete_colnam") : 
##   undefined columns selected

For you tidyverse kids out there: The standard tidy syntax doesn’t do this …

library(tidyverse)
not_cool_r %>% 
  mutate(new_col = incomplete_colname * 2)
##   col1 incomplete_colname incomp new_col
## 1    1                  4      7       8
## 2    2                  5      8      10
## 3    3                  6      9      12
try(
  not_cool_r %>% 
    mutate(new_col = incomplete_colnam * 2)
)
## Error : Objekt 'incomplete_colnam' nicht gefunden

… and tibbles will give you a warning.

not_cool_r <- tibble(col1 = 1:3, incomplete_colname = 4:6, incomp = 7:9)

not_cool_r$incomplete_colname
## [1] 4 5 6
not_cool_r$incomplete_colnam
## Warning: Unknown or uninitialised column: 'incomplete_colnam'.
## NULL

Find the .Rmd here.