demystifying derivatives of softmax -2

Source: Deep Learning on Medium

demystifying derivatives of softmax -2

stage-2 of peeking into derivatives of softmax

HELLO FRIEND !! you are now at stage 2 of computing derivatives of softmax function. It is recommended to check stage1 before starting here.

!! Disclaimer !! this is my first technical article and open to constructive criticism.

continuing from a couple of hypothesis we framed in stage1 , we have to apply these hypothesis to compute the jacobians under condition i==j and i!=j.

property 1
property2

I understand friend , you could be bored with equations. Now it’s time to prove the above properties. Torch is coming !!!!

### empty array to store jacobians ###
jb=np.empty(shape=(a.shape[0],a.shape[0]))
### the trivial part is here. we cannot compute jacobians w.r.t ### multiple inputs . hence we loop grads w.r.t each input and store ### it in above empty array.for i in range(a.shape[0]):
jb[i]=torch.autograd.grad(outputs = s[i], inputs = a,grad_outputs=torch.ones_like(a)[0],retain_graph=True)[0]
print(jb)
### array([[ 0.12033208, -0.0799501 , -0.04038198], ### [-0.0799501 , 0.24489387, -0.16494377], ### [-0.04038198, -0.16494377, 0.20532575]])

We can verify above properties by substituting the values of (Si,Sj) and comparing them with jacobian matrices. See you in another post. GoodBye Friend !!!.