Mixing policy gradient and Q-learning

Source: Deep Learning on Medium

Policy gradient algorithms is a big family of reinforcement learning algorithms, including reinforce, A2/3C, PPO and others. Q-learning is…

Continue reading on Medium »