r - Fast concatenation of thousands of files by columns -


i using r cbind ~11000 files using:

dat <- do.call('bind_cols',lapply(lfiles,read.delim)) 

which unbelievably slow. using r because downstream processing creating plots etc in r. fast alternatives concatenating thousands of files columns?

i have 3 types of files want done. this:

[centos@ip data]$ head c021_0011_001786_tumor_rnaseq.abundance.tsv target_id   length  eff_length  est_counts  tpm enst00000619216.1   68  26.6432 10.9074 5.69241 enst00000473358.1   712 525.473 0   0 enst00000469289.1   535 348.721 0   0 enst00000607096.1   138 15.8599 0   0 enst00000417324.1   1187    1000.44 0.0673096   0.000935515 enst00000461467.1   590 403.565 3.22654 0.11117 enst00000335137.3   918 731.448 0   0 enst00000466430.5   2748    2561.44 162.535 0.882322 enst00000495576.1   1319    1132.44 0   0  [centos@ip data]$ head c021_0011_001786_tumor_rnaseq.rsem.genes.norm_counts.hugo.tab gene_id c021_0011_001786_tumor_rnaseq tspan6  1979.7185 tnmd    1.321 dpm1    1878.8831 scyl3   452.0372 c1orf112    203.6125 fgr 494.049 cfh 509.8964 fuca2   1821.6096 gclc    1557.4431  [centos@ip data]$ head cpbt_0009_1_tumor_rnaseq.rsem.genes.norm_counts.tab gene_id cpbt_0009_1_tumor_rnaseq ensg00000000003.14  2005.0934 ensg00000000005.5   5.0934 ensg00000000419.12  1100.1698 ensg00000000457.13  2376.9100 ensg00000000460.16  1536.5025 ensg00000000938.12  443.1239 ensg00000000971.15  1186.5365 ensg00000001036.13  1091.6808 ensg00000001084.10  1602.7165 

thanks!

for fast reading of files, can use fread data.table , rbind list of data.table using rbindlist specifying idcol=true provide grouping variable identify each of datasets

library(data.table) dt <- rbindlist(lapply(lfiles, fread), idcol=true) 

Comments