Introduction - If you have any usage issues, please Google them yourself
ADAM (Adaptive Moment Estimation) is another adaptive learning rate algorithm, which combines momentum gradient.
The descent method, which uses different learning rates in different parameter directions and retains the gradients of previous iterations, is very good.
It is suitable for sparse data.