I have one more question.

Source: Deep Learning on Medium

I have one more question. Any particular reason for using ADAM optimizer? I have tried training similar data for Chicago and by using ADAM, the gradient exploded. After going through some literature, I found out that SGD seems to be more robust for this kind of data and tried the same using following hyper parameters —

optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True, clipvalue=0.5)

I was surprised to see the model seems to be obtaining a good val_loss in significantly lower number of epochs. I just thought to share with you if you have interesting comment here. Thanks! https://www.dropbox.com/s/8ug2nugz08arknz/Screenshot%202020-02-18%2017.48.34.png?dl=0