부분 가로/세로형
melt
–dcast
조합
가로/세로형 변환에서 가장 쉬운 방법은 gather
/spread
조합일 것이다. 하지만 완전한 세로형/가로형이 아니라 일부 가로/세로형의 경우에는 melt
/dcast
조합을 사용할 필요가 있다.
library(dplyr)
library(reshape2)
데이터
x1s <- c('A', 'B', 'C')
x2s <- c('x', 'y', 'z', 'w')
x3s <- c('F1', 'F2', 'F3', 'F4')
df <- expand.grid(x1 = factor(x1s),
x2 = factor(x2s),
x3 = factor(x3s))
v1s <- rnorm(length(x1s))
v2s <- rnorm(length(x2s))
v3s <- rnorm(length(x3s))
df <- df %>% mutate(value = v1s[as.numeric(x1)] +
v2s[as.numeric(x2)] +
v3s[as.numeric(x3)] + rnorm(nrow(df)))
# df$value <-
# v1s[as.numeric(df$x1)] +
# v2s[as.numeric(df$x2)] +
# v3s[as.numeric(df$x3)]
head(df)
## x1 x2 x3 value ## 1 A x F1 -3.687556 ## 2 B x F1 -2.425758 ## 3 C x F1 -4.196482 ## 4 A y F1 1.092759 ## 5 B y F1 3.354659 ## 6 C y F1 1.776936
df
에서 x1
, x2
, x3
는 조건에 해당하고 value
는 값을 나타낸다. 하지만 x3
을 다른 측정값으로 생각해도 무방하다. 예를 들면 x3
의 F1
, F2
, F3
, F4
를 height
, weight
, sight
, BMI
등으로 생각하는 것이다.
require(dplyr)
df <- df %>%
mutate(x3 = ifelse(x3=="F1", 'height',
ifelse(x3=="F2", 'weight',
ifelse(x3=="F3", 'sight', 'BMI'))))
head(df)
## x1 x2 x3 value ## 1 A x height -3.687556 ## 2 B x height -2.425758 ## 3 C x height -4.196482 ## 4 A y height 1.092759 ## 5 B y height 3.354659 ## 6 C y height 1.776936
이렇게 완전한 세로형일 경우, 일부 가로형으로 만드는 것은 쉽다. 그냥 가로형으로 만들 변수를 다음의 dcast
공식 ~
의 오른쪽에 적어주면 된다.
require(reshape2)
df2 = dcast(df, x1~x2+x3)
## Warning in dcast(df, x1 ~ x2 + x3): The dcast generic in data.table has been passed a data.frame and will ## attempt to redirect to the reshape2::dcast; please note that reshape2 is deprecated, and this redirection is ## now deprecated as well. Please do this redirection yourself like reshape2::dcast(df). In the next version, this ## warning will become an error.
head(df2)
## x1 w_BMI w_height w_sight w_weight x_BMI x_height x_sight x_weight y_BMI y_height ## 1 A 0.2497362 -0.002433557 1.626930 1.6627601 -2.6162418 -3.687556 -0.8271366 -2.2951541 2.399217 1.092759 ## 2 B 2.1830400 1.370493919 1.948437 1.6044455 -0.2824331 -2.425758 -0.7175963 0.9055963 3.539909 3.354659 ## 3 C 0.1299226 -0.103315795 3.650791 0.5405934 -1.2623258 -4.196482 -1.4819258 -3.1036973 3.160218 1.776936 ## y_sight y_weight z_BMI z_height z_sight z_weight ## 1 2.792275 -0.2188165 -2.8345123 -1.7443293 0.5987874 -0.39041870 ## 2 2.967254 3.4699891 -1.1376816 0.8465476 2.0376010 1.27049918 ## 3 3.699912 1.0492078 -0.3630513 -3.0043727 -1.4373376 0.06806562
df3 = dcast(df, x1+x2~x3)
## Warning in dcast(df, x1 + x2 ~ x3): The dcast generic in data.table has been passed a data.frame and will ## attempt to redirect to the reshape2::dcast; please note that reshape2 is deprecated, and this redirection is ## now deprecated as well. Please do this redirection yourself like reshape2::dcast(df). In the next version, this ## warning will become an error.
head(df3)
## x1 x2 BMI height sight weight ## 1 A w 0.2497362 -0.002433557 1.6269303 1.6627601 ## 2 A x -2.6162418 -3.687555600 -0.8271366 -2.2951541 ## 3 A y 2.3992172 1.092758736 2.7922748 -0.2188165 ## 4 A z -2.8345123 -1.744329327 0.5987874 -0.3904187 ## 5 B w 2.1830400 1.370493919 1.9484374 1.6044455 ## 6 B x -0.2824331 -2.425758012 -0.7175963 0.9055963
주어진 데이터가 완전한 세로형이 아니라면 우선 완전한 세로형으로 만들면 된다. melt
를 활용하자.
df2_melted1 <- melt(df2, id.vars='x1')
## Warning in melt(df2, id.vars = "x1"): The melt generic in data.table has been passed a data.frame and will ## attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this ## redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are ## attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(df2). In the next version, this ## warning will become an error.
require(tidyr)
df2_melted2 <- melt(df2, id.vars='x1') %>% separate(col='variable', sep='_', into=c('x2', 'x3'))
## Warning in melt(df2, id.vars = "x1"): The melt generic in data.table has been passed a data.frame and will ## attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this ## redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are ## attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(df2). In the next version, this ## warning will become an error.
head(df2_melted2)
## x1 x2 x3 value ## 1 A w BMI 0.249736187 ## 2 B w BMI 2.183040009 ## 3 C w BMI 0.129922564 ## 4 A w height -0.002433557 ## 5 B w height 1.370493919 ## 6 C w height -0.103315795
df3_melted1 <- melt(df3, id.vars='x1')
## Warning in melt(df3, id.vars = "x1"): The melt generic in data.table has been passed a data.frame and will ## attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this ## redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are ## attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(df3). In the next version, this ## warning will become an error.
## Warning: attributes are not identical across measure variables; they will be dropped
df3_melted2 <- melt(df3, variable.name = 'x3', value.name='value', id.vars=c('x1', 'x2'))
## Warning in melt(df3, variable.name = "x3", value.name = "value", id.vars = c("x1", : The melt generic in ## data.table has been passed a data.frame and will attempt to redirect to the relevant reshape2 method; please ## note that reshape2 is deprecated, and this redirection is now deprecated as well. To continue using melt ## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace like ## reshape2::melt(df3). In the next version, this warning will become an error.
# gather에서 key, value과 variable.name, value.name으로 바뀌었다.
head(df3_melted2)
## x1 x2 x3 value ## 1 A w BMI 0.2497362 ## 2 A x BMI -2.6162418 ## 3 A y BMI 2.3992172 ## 4 A z BMI -2.8345123 ## 5 B w BMI 2.1830400 ## 6 B x BMI -0.2824331
Leave a comment