부분 가로/세로형
melt–dcast 조합
가로/세로형 변환에서 가장 쉬운 방법은 gather/spread 조합일 것이다. 하지만 완전한 세로형/가로형이 아니라 일부 가로/세로형의 경우에는 melt/dcast 조합을 사용할 필요가 있다.
library(dplyr)
library(reshape2)
데이터
x1s <- c('A', 'B', 'C')
x2s <- c('x', 'y', 'z', 'w')
x3s <- c('F1', 'F2', 'F3', 'F4')
df <- expand.grid(x1 = factor(x1s),
x2 = factor(x2s),
x3 = factor(x3s))
v1s <- rnorm(length(x1s))
v2s <- rnorm(length(x2s))
v3s <- rnorm(length(x3s))
df <- df %>% mutate(value = v1s[as.numeric(x1)] +
v2s[as.numeric(x2)] +
v3s[as.numeric(x3)] + rnorm(nrow(df)))
# df$value <-
# v1s[as.numeric(df$x1)] +
# v2s[as.numeric(df$x2)] +
# v3s[as.numeric(df$x3)]
head(df)
## x1 x2 x3 value ## 1 A x F1 -3.687556 ## 2 B x F1 -2.425758 ## 3 C x F1 -4.196482 ## 4 A y F1 1.092759 ## 5 B y F1 3.354659 ## 6 C y F1 1.776936
df에서 x1, x2, x3는 조건에 해당하고 value는 값을 나타낸다. 하지만 x3을 다른 측정값으로 생각해도 무방하다. 예를 들면 x3의 F1, F2, F3, F4를 height, weight, sight, BMI 등으로 생각하는 것이다.
require(dplyr)
df <- df %>%
mutate(x3 = ifelse(x3=="F1", 'height',
ifelse(x3=="F2", 'weight',
ifelse(x3=="F3", 'sight', 'BMI'))))
head(df)
## x1 x2 x3 value ## 1 A x height -3.687556 ## 2 B x height -2.425758 ## 3 C x height -4.196482 ## 4 A y height 1.092759 ## 5 B y height 3.354659 ## 6 C y height 1.776936
이렇게 완전한 세로형일 경우, 일부 가로형으로 만드는 것은 쉽다. 그냥 가로형으로 만들 변수를 다음의 dcast 공식 ~의 오른쪽에 적어주면 된다.
require(reshape2)
df2 = dcast(df, x1~x2+x3)
## Warning in dcast(df, x1 ~ x2 + x3): The dcast generic in data.table has been passed a data.frame and will ## attempt to redirect to the reshape2::dcast; please note that reshape2 is deprecated, and this redirection is ## now deprecated as well. Please do this redirection yourself like reshape2::dcast(df). In the next version, this ## warning will become an error.
head(df2)
## x1 w_BMI w_height w_sight w_weight x_BMI x_height x_sight x_weight y_BMI y_height ## 1 A 0.2497362 -0.002433557 1.626930 1.6627601 -2.6162418 -3.687556 -0.8271366 -2.2951541 2.399217 1.092759 ## 2 B 2.1830400 1.370493919 1.948437 1.6044455 -0.2824331 -2.425758 -0.7175963 0.9055963 3.539909 3.354659 ## 3 C 0.1299226 -0.103315795 3.650791 0.5405934 -1.2623258 -4.196482 -1.4819258 -3.1036973 3.160218 1.776936 ## y_sight y_weight z_BMI z_height z_sight z_weight ## 1 2.792275 -0.2188165 -2.8345123 -1.7443293 0.5987874 -0.39041870 ## 2 2.967254 3.4699891 -1.1376816 0.8465476 2.0376010 1.27049918 ## 3 3.699912 1.0492078 -0.3630513 -3.0043727 -1.4373376 0.06806562
df3 = dcast(df, x1+x2~x3)
## Warning in dcast(df, x1 + x2 ~ x3): The dcast generic in data.table has been passed a data.frame and will ## attempt to redirect to the reshape2::dcast; please note that reshape2 is deprecated, and this redirection is ## now deprecated as well. Please do this redirection yourself like reshape2::dcast(df). In the next version, this ## warning will become an error.
head(df3)
## x1 x2 BMI height sight weight ## 1 A w 0.2497362 -0.002433557 1.6269303 1.6627601 ## 2 A x -2.6162418 -3.687555600 -0.8271366 -2.2951541 ## 3 A y 2.3992172 1.092758736 2.7922748 -0.2188165 ## 4 A z -2.8345123 -1.744329327 0.5987874 -0.3904187 ## 5 B w 2.1830400 1.370493919 1.9484374 1.6044455 ## 6 B x -0.2824331 -2.425758012 -0.7175963 0.9055963
주어진 데이터가 완전한 세로형이 아니라면 우선 완전한 세로형으로 만들면 된다. melt를 활용하자.
df2_melted1 <- melt(df2, id.vars='x1')
## Warning in melt(df2, id.vars = "x1"): The melt generic in data.table has been passed a data.frame and will ## attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this ## redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are ## attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(df2). In the next version, this ## warning will become an error.
require(tidyr)
df2_melted2 <- melt(df2, id.vars='x1') %>% separate(col='variable', sep='_', into=c('x2', 'x3'))
## Warning in melt(df2, id.vars = "x1"): The melt generic in data.table has been passed a data.frame and will ## attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this ## redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are ## attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(df2). In the next version, this ## warning will become an error.
head(df2_melted2)
## x1 x2 x3 value ## 1 A w BMI 0.249736187 ## 2 B w BMI 2.183040009 ## 3 C w BMI 0.129922564 ## 4 A w height -0.002433557 ## 5 B w height 1.370493919 ## 6 C w height -0.103315795
df3_melted1 <- melt(df3, id.vars='x1')
## Warning in melt(df3, id.vars = "x1"): The melt generic in data.table has been passed a data.frame and will ## attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this ## redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are ## attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(df3). In the next version, this ## warning will become an error.
## Warning: attributes are not identical across measure variables; they will be dropped
df3_melted2 <- melt(df3, variable.name = 'x3', value.name='value', id.vars=c('x1', 'x2'))
## Warning in melt(df3, variable.name = "x3", value.name = "value", id.vars = c("x1", : The melt generic in
## data.table has been passed a data.frame and will attempt to redirect to the relevant reshape2 method; please
## note that reshape2 is deprecated, and this redirection is now deprecated as well. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace like
## reshape2::melt(df3). In the next version, this warning will become an error.
# gather에서 key, value과 variable.name, value.name으로 바뀌었다.
head(df3_melted2)
## x1 x2 x3 value ## 1 A w BMI 0.2497362 ## 2 A x BMI -2.6162418 ## 3 A y BMI 2.3992172 ## 4 A z BMI -2.8345123 ## 5 B w BMI 2.1830400 ## 6 B x BMI -0.2824331
Leave a comment