python - Approach where features are combination of text(labels) and numerical -
i'm trying figure out approach data set includes text, more labels , numeric data. example, in data set, have city, state, lat/lon , want classify. supervised, have labels (y) data.
so in case, text not bag of words on or that. label, more 0, 1, ... however, don't ~think~ want give algorithm idea these real values. have tried couple of different algos including svm.svc , linearsvc, , decisiontree. svm, converted city , state numeric values using couple of different methods including labelencoder. doesn't seem right intuitively , not satisfied score.
any thoughts or input appreciated.
it looks looking onehotencoder. explanation take @ encoding categorical features section of docs. idea make column each city 0/1 values if sample belongs current city. might interested in dictvectorizer.
Comments
Post a Comment