python - Understanding NumPy's interpretation of string data types -


lets have bytes object represents data, , want convert numpy array via np.genfromtxt. having trouble understanding how should handle strings in case. let's start following:

from io import bytesio import numpy np  text = b'test, 5, 1.2' types = ['str', 'i4', 'f4'] np.genfromtxt(bytesio(text), delimiter = ',', dtype = types) 

this not work. raises

typeerror: data type not understood

if change types types = ['c', 'i4', 'f4']

then numpy call returns

array((b't', 5, 1.2000000476837158),        dtype=[('f0', 's1'), ('f1', '<i4'), ('f2', '<f4')]) 

so works, getting first letter of string, obviously.

if use c8 or c16 dtype of test,

array(((nan+0j), 5, 1.2000000476837158),        dtype=[('f0', '<c8'), ('f1', '<i4'), ('f2', '<f4')]) 

which garbage. i've tried using a, , u, no success. how in world genfromtxt recognize , save elements string?


edit: assume part of ssue bytes object. however, if instead use normal string text, , use stringio rather bytesio, genfromtxt raises error:

typeerror: can't convertbytesobject str implicitly

in python3 session:

in [568]: text = b'test, 5, 1.2' # don't need bytesio since genfromtxt works list of # byte strings, text.splitlines()  in [570]: np.genfromtxt([text], delimiter=',', dtype=none) out[570]:  array((b'test', 5, 1.2),        dtype=[('f0', 's4'), ('f1', '<i4'), ('f2', '<f8')]) 

if left own devices genfromtxt deduces 1st field should s4 - 4 bytestring characters.

i explicit types:

in [571]: types=['s4', 'i4', 'f4'] in [572]: np.genfromtxt([text],delimiter=',',dtype=types) out[572]:  array((b'test', 5, 1.2000000476837158),        dtype=[('f0', 's4'), ('f1', '<i4'), ('f2', '<f4')]) in [573]: types=['s10', 'i', 'f'] in [574]: np.genfromtxt([text],delimiter=',',dtype=types) out[574]:  array((b'test', 5, 1.2000000476837158),        dtype=[('f0', 's10'), ('f1', '<i4'), ('f2', '<f4')])  in [575]: types=['u10', 'int', 'float'] in [576]: np.genfromtxt([text],delimiter=',',dtype=types) out[576]:  array(('test', 5, 1.2),        dtype=[('f0', '<u10'), ('f1', '<i4'), ('f2', '<f8')]) 

i can specify either s or u (unicode), have specify length. don't think there's way genfromtxt let deduce length - except none type. i'd have dig code see how deduces string length.

i create array np.array (by making tuple of substrings, , giving correct dtype:

in [599]: np.array(tuple(text.split(b',')), dtype=[('f0', 's4'), ('f1', '<i4'), ('f2', '<f8')]) out[599]:  array((b'test', 5, 1.2),        dtype=[('f0', 's4'), ('f1', '<i4'), ('f2', '<f8')]) 

Comments