python - NLTK Wordnet Synset for word phrase -


i'm working python nltk wordnet api. i'm trying find best synset represents group of words.

if need find best synset "school & office supplies", i'm not sure how go this. far i've tried finding synsets individual words , computing best lowest common hypernym this:

def find_best_synset(category_name):     text = word_tokenize(category_name)     tags = pos_tag(text)      node_synsets = []     word, tag in tags:         pos = get_wordnet_pos(tag)         if not pos:             continue         node_synsets.append(wordnet.synsets(word, pos=pos))      max_score = 0     max_synset = none     max_combination = none     combination in itertools.product(*node_synsets):         test in itertools.combinations(combination, 2):             score = wordnet.path_similarity(test[0], test[1])             if score > max_score:                 max_score = score                 max_combination = test                 max_synset = test[0].lowest_common_hypernyms(test[1])     return max_synset 

however doesn't work plus costly. there ways figure out synset best represents multiple words together?

thanks help!

apart said in comments already, think way select best hyperonym might flawed. synset end not lowest common hyperonym of all words, of 2 of them.

let's stick example of "school & office supplies". each word in expression number of synsets. variable node_synsets following:

[[school_1, school_2], [office_1, office_2, office_3], [supply_1]] 

in example, there 6 ways combine each synset of others:

[(school_1, office_1, supply_1),  (school_1, office_2, supply_1),  (school_1, office_3, supply_1),  (school_2, office_1, supply_1),  (school_2, office_2, supply_1),  (school_2, office_3, supply_1)] 

these triples iterate on in outer for loop (with itertools.product). if expression has 4 words, iterate on quadruples, 5 it's quintuples, etc.

now, inner for loop, pair off each triple. first 1 is:

[(school_1, office_1),  (school_1, supply_1),  (office_1, supply_1)] 

... , determine lowest hyperonym among each pair. in end lowest hyperonym of, say, school_2 , office_1, might kind of institution. not meaningful, doesn't consider synset of last word.

maybe should try find lowest common hyperonym of all three words, in each combination of synsets, , take 1 scoring best among them.


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -