performance - Translating vector elements in R using correspondence table -

what efficient , simple way in r following:

read in two-column data file
use information build kind of translation dictionary, python dict
apply translation content of vector in order obtain translated vector, possibly several vectors using same correspondence information

i thought hash package me that, i'm unsure perform step 3 correctly.

say initial vector my_vect , hash my_dict tried following: values(my_dict, keys=my_vect)

the following observation make me doubt i'm doing in proper way:

the operation seems slow (more 1 second on powerful desktop computer vector of 582 entries , hash of 46665 entries)
it results in doesn't homogeneous my_vec: while my_vec appeared "indexed numbers" (i mean integer numbers between square brackets appear on side of values when displaying data in interactive console), result of calling values above appears still somehow looks dictionary: each displayed translated value has original value (i.e. hash key) displayed above it. want values.

edit:

if understand correctly, r has way of using "names" instead of numerical indices vectors, , obtain using values function such vector names. seems work wanted do, although imagine takes more memory necessary.

i tried libraries hash , hashmap, , second seemed more efficient.

a small usage example:

> library(hashmap) > keys = c("a", "b", "c", "d") > values = c("a", "b", "c", "d") > my_dict <- hashmap(keys, values) > my_vect <- c("b", "c", "c") > translated <- my_dict$find(my_vect) > translated [1] "b" "c" "c"

to build dictionary table obtained using read.table, option stringsasfactors = false of read.table has used, otherwise weird things happen (see discussion in comments of https://stackoverflow.com/a/38838271/1878788).

swift

Search This Blog

performance - Translating vector elements in R using correspondence table -

edit:

Comments

Post a Comment