i have large set of financial data has hundreds of columns. have cleaned , sorted data based on date. here simplified example:
df1 <- data.frame(matrix(vector(),ncol=5, nrow = 4)) colnames(df1) <- c("date","0.4","0.3","0.2","0.1") df1[1,] <- c("2000-01-31","0","0","0.05","0.07") df1[2,] <- c("2000-02-29","0","0.13","0.17","0.09") df1[3,] <- c("2000-03-31","0.03","0.09","0.21","0.01") df1[4,] <- c("2004-04-30","0.05","0.03","0.19","0.03") df1 date 0.4 0.3 0.2 0.1 1 2000-01-31 0 0 0.05 0.07 2 2000-02-29 0 0.13 0.17 0.09 3 2000-03-31 0.03 0.09 0.21 0.01 4 2000-04-30 0.05 0.03 0.19 0.03
i assigned individual weights (based on market value raw data) column headers, because don’t care company names , need weights calculating result.
my ultimate goal get: 1. sum of weighted returns; , 2. sum of weights when returns non-zero. being said, below result want get:
date sweightedr sweights 1 2000-01-31 0.017 0.3 2 2000-02-29 0.082 0.6 3 2000-03-31 0.082 1 4 2000-04-30 0.07 1
for instance, sweightedr 2000-01-31 = 0.4x0+0.3x0+0.2x0.05+0.1x0.07, , sweights = 0.2+0.1.
my initial idea assign weights each column wcol2 <- 0.4
, use cbind
create new columns , use c(as.matrix() %*% )
sums. realize impossible there hundreds of columns. advice or suggestion appreciated!
here's simple solution using matrix multiplications (as suggesting yourself).
first of all, data seem of character
type , i'm not sure it's real case real data, first convert appropriate type
df1[-1] <- lapply(df1[-1], type.convert)
next, convert column names numeric class too
vec <- as.numeric(names(df1)[-1])
finally, create new columns in 2 simple steps. indeed has matrix conversion overhead, maybe should work matrices in first place. either way, vectorized
df1["sweightedr"] <- as.matrix(df1[, -1]) %*% vec df1["sweights"] <- (df1[, -c(1, ncol(df1))] > 0) %*% vec df1 # date 0.4 0.3 0.2 0.1 sweightedr sweights # 1 2000-01-31 0.00 0.00 0.05 0.07 0.017 0.3 # 2 2000-02-29 0.00 0.13 0.17 0.09 0.082 0.6 # 3 2000-03-31 0.03 0.09 0.21 0.01 0.082 1.0 # 4 2004-04-30 0.05 0.03 0.19 0.03 0.070 1.0
or, convert long format first (here's data.table
example), though believe less efficient row operations
library(data.table) res <- melt(setdt(df1), id = 1l, variable.factor = false )[, c("value", "variable") := .(as.numeric(value), as.numeric(variable))] res[, .(sweightedr = sum(variable * value), sweights = sum(variable * (value > 0))), = date] # date sweightedr sweights # 1: 2000-01-31 0.017 0.3 # 2: 2000-02-29 0.082 0.6 # 3: 2000-03-31 0.082 1.0 # 4: 2004-04-30 0.070 1.0
Comments
Post a Comment