what efficient , simple way in r following:
- read in two-column data file
- use information build kind of translation dictionary, python dict
- apply translation content of vector in order obtain translated vector, possibly several vectors using same correspondence information
?
i thought hash package me that, i'm unsure perform step 3 correctly.
say initial vector my_vect
, hash my_dict
tried following: values(my_dict, keys=my_vect)
the following observation make me doubt i'm doing in proper way:
- the operation seems slow (more 1 second on powerful desktop computer vector of 582 entries , hash of 46665 entries)
- it results in doesn't homogeneous
my_vec
: whilemy_vec
appeared "indexed numbers" (i mean integer numbers between square brackets appear on side of values when displaying data in interactive console), result of callingvalues
above appears still somehow looks dictionary: each displayed translated value has original value (i.e. hash key) displayed above it. want values.
edit:
if understand correctly, r has way of using "names" instead of numerical indices vectors, , obtain using values
function such vector names. seems work wanted do, although imagine takes more memory necessary.
i tried libraries hash , hashmap, , second seemed more efficient.
a small usage example:
> library(hashmap) > keys = c("a", "b", "c", "d") > values = c("a", "b", "c", "d") > my_dict <- hashmap(keys, values) > my_vect <- c("b", "c", "c") > translated <- my_dict$find(my_vect) > translated [1] "b" "c" "c"
to build dictionary table obtained using read.table
, option stringsasfactors = false
of read.table
has used, otherwise weird things happen (see discussion in comments of https://stackoverflow.com/a/38838271/1878788).
Comments
Post a Comment