r - Behavior ggplot2 aes() in combination with facet_grid() when passing variable with dollar sign notation to aes() -
i doing analysis in ggplot2 @ moment project , chance stumbled across (for me) weird behavior cannot explain. when write aes(x = cyl, ...) plot looks different if pass same variable using aes(x = mtcars$cyl, ...). when remove facet_grid(am ~ .) both graphs same again. code below modeled after code in project generates same behavior:
library(dplyr) library(ggplot2) data = mtcars test.data = data %>% select(-hp) ggplot(test.data, aes(x = test.data$cyl, y = mpg)) + geom_point() + facet_grid(am ~ .) + labs(title="graph 1 - dollar sign notation") ggplot(test.data, aes(x = cyl, y = mpg)) + geom_point()+ facet_grid(am ~ .) + labs(title="graph 2 - no dollar sign notation")
here picture of graph 1:
here picture of graph 2:
i found can work around problem using aes_string() instead of aes() , passing variable names strings, understand why ggplot behaving way.
thx lot in advance! feel uncomfortable if not understand properly...
tl;dr
never use [
or $
inside aes()
.
consider illustrative example facetting variable f
purposely in non-obvious order respect x
d <- data.frame(x=1:10, f=rev(letters[gl(2,5)]))
now contrast happens these 2 plots,
p1 <- ggplot(d) + facet_grid(.~f, labeller = label_both) + geom_text(aes(x, y=0, label=x, colour=f)) + ggtitle("good mapping") p2 <- ggplot(d) + facet_grid(.~f, labeller = label_both) + geom_text(aes(d$x, y=0, label=x, colour=f)) + ggtitle("$ corruption")
we can better idea of what's happening looking @ data.frame created internally ggplot2 each panel,
ggplot_build(p1)[["data"]][[1]][,c("x","panel")] x panel 1 6 1 2 7 1 3 8 1 4 9 1 5 10 1 6 1 2 7 2 2 8 3 2 9 4 2 10 5 2 ggplot_build(p2)[["data"]][[1]][,c("x", "panel")] x panel 1 1 1 2 2 1 3 3 1 4 4 1 5 5 1 6 6 2 7 7 2 8 8 2 9 9 2 10 10 2
the second plot has wrong mapping, because when ggplot creates data.frame each panel, picks x values in "wrong" order.
this occurs because use of $
breaks link between various variables mapped (ggplot must assume it's independent variable, knows come arbitrary, disconnected source). since data.frame in example not ordered according factor f
, subset data.frames used internally each panel assume wrong order.
Comments
Post a Comment