r - Assign different values to a large number of columns -


i have large set of financial data has hundreds of columns. have cleaned , sorted data based on date. here simplified example:

df1 <- data.frame(matrix(vector(),ncol=5, nrow = 4)) colnames(df1) <- c("date","0.4","0.3","0.2","0.1") df1[1,] <- c("2000-01-31","0","0","0.05","0.07") df1[2,] <- c("2000-02-29","0","0.13","0.17","0.09") df1[3,] <- c("2000-03-31","0.03","0.09","0.21","0.01") df1[4,] <- c("2004-04-30","0.05","0.03","0.19","0.03") df1         date    0.4    0.3    0.2    0.1 1 2000-01-31      0      0   0.05   0.07 2 2000-02-29      0   0.13   0.17   0.09 3 2000-03-31   0.03   0.09   0.21   0.01 4 2000-04-30   0.05   0.03   0.19   0.03 

i assigned individual weights (based on market value raw data) column headers, because don’t care company names , need weights calculating result.

my ultimate goal get: 1. sum of weighted returns; , 2. sum of weights when returns non-zero. being said, below result want get:

        date    sweightedr    sweights     1 2000-01-31         0.017         0.3 2 2000-02-29         0.082         0.6 3 2000-03-31         0.082           1 4 2000-04-30          0.07           1 

for instance, sweightedr 2000-01-31 = 0.4x0+0.3x0+0.2x0.05+0.1x0.07, , sweights = 0.2+0.1.

my initial idea assign weights each column wcol2 <- 0.4, use cbind create new columns , use c(as.matrix() %*% ) sums. realize impossible there hundreds of columns. advice or suggestion appreciated!

here's simple solution using matrix multiplications (as suggesting yourself).

first of all, data seem of character type , i'm not sure it's real case real data, first convert appropriate type

df1[-1] <- lapply(df1[-1], type.convert) 

next, convert column names numeric class too

vec <- as.numeric(names(df1)[-1]) 

finally, create new columns in 2 simple steps. indeed has matrix conversion overhead, maybe should work matrices in first place. either way, vectorized

df1["sweightedr"] <- as.matrix(df1[, -1]) %*% vec df1["sweights"] <- (df1[, -c(1, ncol(df1))] > 0) %*% vec df1 #         date  0.4  0.3  0.2  0.1 sweightedr sweights # 1 2000-01-31 0.00 0.00 0.05 0.07      0.017      0.3 # 2 2000-02-29 0.00 0.13 0.17 0.09      0.082      0.6 # 3 2000-03-31 0.03 0.09 0.21 0.01      0.082      1.0 # 4 2004-04-30 0.05 0.03 0.19 0.03      0.070      1.0 

or, convert long format first (here's data.table example), though believe less efficient row operations

library(data.table) res <- melt(setdt(df1), id = 1l, variable.factor = false             )[, c("value", "variable") := .(as.numeric(value), as.numeric(variable))] res[, .(sweightedr = sum(variable * value),         sweights = sum(variable * (value > 0))), = date]  #          date sweightedr sweights # 1: 2000-01-31      0.017      0.3 # 2: 2000-02-29      0.082      0.6 # 3: 2000-03-31      0.082      1.0 # 4: 2004-04-30      0.070      1.0 

Comments