what efficient , simple way in r following:
- read in two-column data file
- use information build kind of translation dictionary, python dict
- apply translation content of vector in order obtain translated vector, possibly several vectors using same correspondence information
?
i thought hash package me that, i'm unsure perform step 3 correctly.
say initial vector my_vect , hash my_dict tried following: values(my_dict, keys=my_vect)
the following observation make me doubt i'm doing in proper way:
- the operation seems slow (more 1 second on powerful desktop computer vector of 582 entries , hash of 46665 entries)
- it results in doesn't homogeneous
my_vec: whilemy_vecappeared "indexed numbers" (i mean integer numbers between square brackets appear on side of values when displaying data in interactive console), result of callingvaluesabove appears still somehow looks dictionary: each displayed translated value has original value (i.e. hash key) displayed above it. want values.
edit:
if understand correctly, r has way of using "names" instead of numerical indices vectors, , obtain using values function such vector names. seems work wanted do, although imagine takes more memory necessary.
i tried libraries hash , hashmap, , second seemed more efficient.
a small usage example:
> library(hashmap) > keys = c("a", "b", "c", "d") > values = c("a", "b", "c", "d") > my_dict <- hashmap(keys, values) > my_vect <- c("b", "c", "c") > translated <- my_dict$find(my_vect) > translated [1] "b" "c" "c" to build dictionary table obtained using read.table, option stringsasfactors = false of read.table has used, otherwise weird things happen (see discussion in comments of https://stackoverflow.com/a/38838271/1878788).
Comments
Post a Comment