i trying work on simple word count problem , trying figure if can done use of map, filter , reduce exclusively.
following example of wordrdd(the list used spark):
mylst = ['cats', 'elephants', 'rats', 'rats', 'cats', 'cats']
all need count words , present in tuple format:
counts = [('cat', 1), ('elephant', 1), ('rat', 1), ('rat', 1), ('cat', 1)]
i tried simple map() , lambdas as:
counts = mylst.map(lambdas x: (x, <here problem>))
i might wrong syntax or maybe confused. p.s.: isnt duplicate questin rest answers give suggestions using if/else or list comprehensions.
thanks help.
you don't need map(..)
@ all. can reduce(..)
>>> def function(obj, x): ... obj[x] += 1 ... return obj ... >>> functools import reduce >>> reduce(function, mylst, defaultdict(int)).items() dict_items([('elephants', 1), ('rats', 2), ('cats', 3)])
you can iterate of result.
however, there's better way of doing it: counter
Comments
Post a Comment