python - NLTK Wordnet Synset for word phrase -
i'm working python nltk wordnet api. i'm trying find best synset represents group of words.
if need find best synset "school & office supplies", i'm not sure how go this. far i've tried finding synsets individual words , computing best lowest common hypernym this:
def find_best_synset(category_name): text = word_tokenize(category_name) tags = pos_tag(text) node_synsets = [] word, tag in tags: pos = get_wordnet_pos(tag) if not pos: continue node_synsets.append(wordnet.synsets(word, pos=pos)) max_score = 0 max_synset = none max_combination = none combination in itertools.product(*node_synsets): test in itertools.combinations(combination, 2): score = wordnet.path_similarity(test[0], test[1]) if score > max_score: max_score = score max_combination = test max_synset = test[0].lowest_common_hypernyms(test[1]) return max_synset
however doesn't work plus costly. there ways figure out synset best represents multiple words together?
thanks help!
apart said in comments already, think way select best hyperonym might flawed. synset end not lowest common hyperonym of all words, of 2 of them.
let's stick example of "school & office supplies". each word in expression number of synsets. variable node_synsets
following:
[[school_1, school_2], [office_1, office_2, office_3], [supply_1]]
in example, there 6 ways combine each synset of others:
[(school_1, office_1, supply_1), (school_1, office_2, supply_1), (school_1, office_3, supply_1), (school_2, office_1, supply_1), (school_2, office_2, supply_1), (school_2, office_3, supply_1)]
these triples iterate on in outer for
loop (with itertools.product
). if expression has 4 words, iterate on quadruples, 5 it's quintuples, etc.
now, inner for
loop, pair off each triple. first 1 is:
[(school_1, office_1), (school_1, supply_1), (office_1, supply_1)]
... , determine lowest hyperonym among each pair. in end lowest hyperonym of, say, school_2
, office_1
, might kind of institution. not meaningful, doesn't consider synset of last word.
maybe should try find lowest common hyperonym of all three words, in each combination of synsets, , take 1 scoring best among them.
Comments
Post a Comment