假设卷积神经网络的输入信号x.shape=(1,3,6,6):表示 1个3通道的分辨率为6*6的图片。
卷积核w.shape=(2,3,3,3):表示2个3通道的3*3的卷积核。
以下直接用多维数组ma来模拟w, x展开成矩阵后相乘的结果,其shape=(16,2)。
ma = np.arange(32).reshape(1,4,4,2)
[
[
[
[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7]
],
[
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]
],
[
[16, 17],
[18, 19],
[20, 21],
[22, 23]
],
[
[24, 25],
[26, 27],
[28, 29],
[30, 31]
]
]
]
其实际输出应求转置:
out = ma.transpose(0,3,1,2)
可将其理解为两次转置的叠加:(0,1,2,3)->(0,3,2,1)->(0,3,1,2)
1. 第1维与第3维转置:第一维是(0,8,16,24)的方向,第三维是(0,1)的方向。转置之后为:
[
[
[
[ 0, 8, 16, 24],
[ 2, 10, 18, 26],
[ 4, 12, 20, 28],
[ 6, 14, 22, 30]
],
[
[ 1, 9, 17, 25],
[ 3, 11, 19, 27],
[ 5, 13, 21, 29],
[ 7, 15, 23, 31]]
]
]
]
2.第二维与第三维转置:第二维是(0,2,4,6)的方向。当前的第三维是(0,8,16,24)的方向。转置之后为:
[
[
[
[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22],
[24, 26, 28, 30]
],
[
[ 1, 3, 5, 7],
[ 9, 11, 13, 15],
[17, 19, 21, 23],
[25, 27, 29, 31]
]
]
]
可结合下面的手绘图来理解: