python - generate sequence by indices / one-hot encoding -


i have sequence s = [4,3,1,0,5] , num_classes = 6 , want generate numpy matrix m of shape (len(s), num_classes) m[i,j] = 1 if s[i] == j else 0.

is there such function in numpy, can pass s , num_classes?

this called 1-of-k or one-hot encoding.


timeit results:

def b():      m = np.zeros((len(s), num_classes))      m[np.arange(len(s)), s] = 1      return m  in [57]: timeit.timeit(lambda: b(), number=1000) out[57]: 0.012787103652954102  in [61]: timeit.timeit(lambda: (np.array(s)[:,none]==np.arange(num_classes))+0, number=1000) out[61]: 0.018411874771118164 

since want single 1 per row, can fancy-index using arange(len(s)) along first axis, , using s along second:

s = [4,3,1,0,5] n = len(s) k = 6 m = np.zeros((n, k)) m[np.arange(n), s] = 1 m =>  array([[ 0.,  0.,  0.,  0.,  1.,  0.],        [ 0.,  0.,  0.,  1.,  0.,  0.],        [ 0.,  1.,  0.,  0.,  0.,  0.],        [ 1.,  0.,  0.,  0.,  0.,  0.],        [ 0.,  0.,  0.,  0.,  0.,  1.]])  m.nonzero() => (array([0, 1, 2, 3, 4]), array([4, 3, 1, 0, 5])) 

this can thought of using index (0,4), (1,3), (2,1), (3,0), (4,5).


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -