r - Behavior ggplot2 aes() in combination with facet_grid() when passing variable with dollar sign notation to aes() -


i doing analysis in ggplot2 @ moment project , chance stumbled across (for me) weird behavior cannot explain. when write aes(x = cyl, ...) plot looks different if pass same variable using aes(x = mtcars$cyl, ...). when remove facet_grid(am ~ .) both graphs same again. code below modeled after code in project generates same behavior:

library(dplyr) library(ggplot2)  data = mtcars  test.data = data %>%   select(-hp)   ggplot(test.data, aes(x = test.data$cyl, y = mpg)) +   geom_point() +    facet_grid(am ~ .) +   labs(title="graph 1 - dollar sign notation")  ggplot(test.data, aes(x = cyl, y = mpg)) +   geom_point()+    facet_grid(am ~ .) +   labs(title="graph 2 - no dollar sign notation") 

here picture of graph 1: graph 1 - dollar sign notation

here picture of graph 2: graph 2 - no dollar sign notation

i found can work around problem using aes_string() instead of aes() , passing variable names strings, understand why ggplot behaving way.

thx lot in advance! feel uncomfortable if not understand properly...

tl;dr

never use [ or $ inside aes().


consider illustrative example facetting variable f purposely in non-obvious order respect x

d <- data.frame(x=1:10, f=rev(letters[gl(2,5)])) 

now contrast happens these 2 plots,

p1 <- ggplot(d) +   facet_grid(.~f, labeller = label_both) +   geom_text(aes(x, y=0, label=x, colour=f)) +   ggtitle("good mapping")   p2 <- ggplot(d) +   facet_grid(.~f, labeller = label_both) +   geom_text(aes(d$x, y=0, label=x, colour=f)) +   ggtitle("$ corruption")  

enter image description here

we can better idea of what's happening looking @ data.frame created internally ggplot2 each panel,

 ggplot_build(p1)[["data"]][[1]][,c("x","panel")]      x panel 1   6     1 2   7     1 3   8     1 4   9     1 5  10     1 6   1     2 7   2     2 8   3     2 9   4     2 10  5     2   ggplot_build(p2)[["data"]][[1]][,c("x", "panel")]      x panel 1   1     1 2   2     1 3   3     1 4   4     1 5   5     1 6   6     2 7   7     2 8   8     2 9   9     2 10 10     2 

the second plot has wrong mapping, because when ggplot creates data.frame each panel, picks x values in "wrong" order.

this occurs because use of $ breaks link between various variables mapped (ggplot must assume it's independent variable, knows come arbitrary, disconnected source). since data.frame in example not ordered according factor f, subset data.frames used internally each panel assume wrong order.


Comments