从列/行索引数组中填充出现的矩阵

问题描述:

我正在寻找一种从两个包含索引的数组创建出现矩阵的有效方法,一个表示此矩阵中的行索引,另一个表示列。

I'm searching for an efficient way to create a matrix of occurrences from two arrays that contains indexes, one represents the row indexes in this matrix, the other, the column ones.

例如。我有:

#matrix will be size 4x3 in this example
#array of rows idxs, with values from 0 to 3
[0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3]
#array of columns idxs, with values from 0 to 2
[0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2]

需要创建一个事件矩阵,如:

And need to create a matrix of occurrences like:

[[1  0  0]
 [0  2  0]
 [0  1  2]
 [2  1  5]]

我可以用一个简单的形式创建一个热矢量的数组,但是当多次出现时不能使它工作:

I can create an array of one hot vectors in a simple form, but cant get it work when there is more than one occurrence:

n_rows    = 4
n_columns = 3
#data
rows    = np.array([0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3])
columns = np.array([0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2])
#empty matrix
new_matrix = np.zeros([n_rows, n_columns])
#adding 1 for each [row, column] occurrence:
new_matrix[rows, columns] += 1
print(new_matrix)

返回:

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  1.]
 [ 1.  1.  1.]]

似乎索引并添加这样的值当有多个出现/索引时不起作用,除了打印它似乎工作正常:

It seems like indexing and adding a value like this doesn't work when there is more than one occurrence/index, besides printing it seems to work just fine:

print(new_matrix[rows, :])

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  1.]
 [ 0.  1.  1.]
 [ 0.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]

所以也许我错过了什么?或者这个不能完成,我需要寻找另一种方法吗?

So maybe I'm missing something there? Or this cant be done and I need to search for another way to do it?

使用 np .add.at ,指定一个索引元组:

Use np.add.at, specifying a tuple of indices:

>>> np.add.at(new_matrix, (rows, columns), 1)
>>> new_matrix
array([[ 1.,  0.,  0.],
       [ 0.,  2.,  0.],
       [ 0.,  1.,  2.],
       [ 2.,  1.,  5.]])

np.add.at 对数组运行就地,将 1 添加到(行,列)元组指定的索引中。

np.add.at operates on the array in-place, adding 1 as many times to the indices as specified by the (row, columns) tuple.