python - How to cluster data that has been split into a binary vector based on their path with scikitlearn? -
i have large list of urls, , want group them based on similarity 1 later uses. have grouped them based on domain. tried @ paths , file. however, method used took long because iterated through every character in word.
i have split every url's path words make up, , put them data frame. data frame's columns have every unique word appears in url's path, , every row filled out 0s , 1s, based on whether word appears or not. thought way group them use data frame scikit learn can use kmeans.
edit:
| | r | python | eve | submit | |-------------------------|---|--------|-----|--------| | reddit.com/r/python | 1 | 1 | 0 | 0 | | reddit.com/r/eve | 1 | 0 | 1 | 0 | | reddit.com/r/eve/submit | 1 | 0 | 1 | 1 |
Comments
Post a Comment