News

However, because the input-to-hidden weight gradients are influenced by the values of the hidden-to-output gradients, the input-to-hidden gradients are indirectly changed when using CE instead of SE.