Original article was published by /u/moonkraken on Deep Learning
I’ve been solving the cs2321n course assignments for some time. What intrigues me is this:
scores = X.dot(W)+b,, where
X is a
N x D matrix,
W is a
D x C matrix and
b is the bias of dimension
dscores => dscores / dL (where L is the loss), and
dW => dW/dL , for clear representation.
and it’s given there that:
dW = np.dot(X.T, dscores) whose origin doesn’t make sense to me. They say they’ve used the chain rule to get to this result, but I do not seem to understand how.
Any help is highly appreciated!