(Desk) rejected!!!

Original article was published on Artificial Intelligence on Medium


The conference NeurIPS instituted a new reviewing policy this year:

“In order to cope with the growing number of submissions, this year we will adopt an “early desk-reject” process involving only Area Chairs and Senior Area Chairs. Area Chairs will be responsible for identifying papers that are very likely to be rejected, and Senior Area Chairs will cross check the selections. These papers will not be further reviewed, and authors will be notified immediately.” https://nips.cc/Conferences/2020/CallForPapers

I received emails today notifying me that both of my submissions were in fact desk rejected. In case you are interested, versions of both of these papers are up on arXiv: Fast Complete Algorithm for Multiplayer Nash Equilibrium and Fictitious Play Outperforms Counterfactual Regret Minimization.

Here is the content from the desk rejection of the first paper:

“1. Reasons of desk rejection

This submission is summarily rejected because:

– There is not a significant contribution with respect to previous work; for example, the paper presents an incremental method, the gains are not significant, or the main theoretical results are variations of known results.

— — — — — — — — — — — — — –

– additional comment (optional):

“The paper proposes a straight-forward extension of [21], more multiplayer games. The main formulation is directly quoted from the original paper. The novelty is unclear. The new formulation is described by example and its correctness is not sufficiently discussed nor proven. The results are significant, but structure and presentation can be improved.”

The review states “The main formulation is directly quoted from the original paper.” This is false. I quote directly from the paper [21] in presenting the algorithm presented in that paper for solving two-player games (I could have just described that algorithm in my own words, but I see no problem with just quoting their original description with attribution). Then I present my new formulation for multiplayer games, as the title (and content) of the paper indicate. The two-player algorithm is based on a linear mixed-integer feasibility program, while the new multiplayer algorithm is based on a (non-convex) quadratic program, which require completely different approaches to solve. It is totally false that the main multiplayer algorithm formulation is directly quoted from the “original paper.”

The review then states “The new formulation is described by example and its correctness is not sufficiently discussed nor proven.” This again is totally incorrect. As you can see in the paper, I present the full formulation for 3, 4, and 5 players (and describe clearly how it can be naturally extended to n players). I don’t just present the formulation by example. The correctness of the formulation follows straightforwardly from the same argument used to show correctness of the prior 2-player formulation.

Here is the content from the desk rejection of the second paper:

1. Reasons of desk rejection

This submission is summarily rejected because:

— — — — — — — — — — — — — –

– additional comment (optional):

“The premise of this paper is to refute the common claim that CFR performs empirically better than fictitious play. In these claims, there is an implicit assumption that the algorithms are used within the class for which they are intended and known to converge (two-player zero-sum games).

This paper presents a straw-man argument to challenge this claim: run CFR and fictitious play across a specific selection of games which include many where both algorithms are known not to converge, and use the empirical results as evidence for a counter-argument to challenge the claim. In the same vein, most of these are normal-form games.

A thorough empirical comparison between these two algorithms is certainly a worthwhile pursuit, but the content should be reformulated to reflect a different narrative. Or, for games outside two-player zero-sum, a different metric should be used that matches the convergence guarantees of each algorithm (such as approximate coarse-correlated equilibria).”

The reviewer makes some interesting comments. In the first sentence, the reviewer summarizes the goal of the paper as to refute a “common” claim that CFR performs empirically better than fictitious play. I’ve only actually seen that claim stated once (which I cite in the final paragraph of the introduction). The premise of the paper is just to show that CFR outperforms Fictitious Play (as the title states), not to refute any particular claims. Furthermore, the prior paper makes the general claim, “Fictitious Play has weaker theoretical convergence guarantees than CFR, and in practice converges slower,” without any evidence or qualifications that the statement is referring to any specific class of games (note that one of the authors Professor Sandholm has authored papers on applying both CFR and fictitious play to multiplayer games).

The reviewer also claims that the algorithms were both “intended” to solve two-player zero-sum games. Certainly CFR was initially developed as an algorithm for solving two-player zero-sum games. But it has also been applied recently to multiplayer games, including recently by the agent Pluribus that decisively defeated strong human players in 6-player poker. Fictitious play was not even initially developed with the goal of computing Nash equilibrium at all, it was developed as a learning rule, that was subsequently proven to converge to Nash equilibrium in several game classes (one of which is two-player zero-sum games). Subsequently it has been applied for successfully computing Nash equilibrium in several multiplayer games (I cite several of these). I am certainly allowed to study the question of whether CFR or fictitious play performs better in multiplayer games, regardless of whatever the initial intention of the algorithms may have been.

Regarding the comment,

“This paper presents a straw-man argument to challenge this claim: run CFR and fictitious play across a specific selection of games which include many where both algorithms are known not to converge, and use the empirical results as evidence for a counter-argument to challenge the claim. In the same vein, most of these are normal-form games.”

I ran the algorithms over the same games from the standard game generator that covers many important game classes and is widely used for evaluating game-theoretic algorithms. In fact, I used the same exact game classes and similar parameter settings to prior experiments by Professor Sandholm on evaluating algorithms for multiplayer games. In fact, fictitious play converged to equilibrium on nearly all of the games experimented on, so it is false that the algorithms are “known not to converge” on these games.

Normal-form games are the most basic and widely-studied game representation, so it seems very natural to study them. Often results from normal-form games also carry over to more complicated game representations such as extensive form, but regardless, it seems clearly interesting in its own right whether fictitious play or CFR performs better in normal-form games. Comparing the algorithms in extensive-form games is also a bit more challenging because there are a lot of different variants (the algorithms can be combined with various forms of sampling and deep learning), and no established benchmark database of games to compare like there is with the GAMUT generator for normal-form games (often the algorithms are compared on specific versions of simplified two-player forms of games like poker). It seems natural to study the core versions of the algorithms in a simple setting where they can be clearly assessed across a broad selection of realistic game classes that have been previously studied.

Finally, the reviewer states, “Or, for games outside two-player zero-sum, a different metric should be used that matches the convergence guarantees of each algorithm (such as approximate coarse-correlated equilibria).” Here I think the reviewer misses an important point. Neither of these algorithms was created with the goal of computing “approximate coarse-correlated equilibrium” in multiplayer games. As mentioned above, fictitious play was developed as a learning algorithm, and CFR for computing Nash equilibrium in two-player zero-sum games. It was only later, in search of some sort of theoretical guarantee for multiplayer games, that it was proven that both algorithms are guaranteed to converge to an “approximate coarse-correlated equilibrium,” which is arcane solution concept that I have never seen be applied outside of this context. This concept is only really “interesting” because it is the only guarantee that has been developed for the algorithms. It does not seem interesting to compare how closely the algorithms converge empirically to a “coarse-correlated equilibrium,” a concept that is essentially only interesting because of the algorithms themselves.

Nash equilibrium is the central solution concept in game theory, even for multiplayer games, despite its limitations. It is a very natural, and interesting, question to ask whether CFR or fictitious play performs better in terms of approximating Nash equilibrium, especially as both algorithms have been applied for the goal of doing exactly that. I personally have applied fictitious play to successfully approximate Nash equilibrium in several multiplayer poker and national security scenarios, and CFR was recently applied to multiplayer poker with the goal of approximating Nash equilibrium.

The conference had also instituted a policy that at least one author from each paper submission would be required to review papers, so obviously now that my papers are both desk rejected I am no longer reviewing. I guess this new desk-reject system is a win-win-win. I would just as soon receive my paper rejections now (giving myself more time to submit them to another venue) and avoid spending many hours unpaid reviewing six papers, than to wait several months to receive the rejections. NeurIPS benefits by removing these weak submissions and reducing the load of reviewers. And the authors of other NeurIPS submissions also benefit because now an incompetent reviewer is removed from the pool (certainly someone who writes two papers that are both so bad as to be desk rejected is not qualified to review papers for that conference). So it appears this new desk reject system is a big success. It is exciting to see the AI conferences employing all these brilliant new methods to ensure higher quality of the publications. I can’t wait to see what’s in store with AAAI’s new “Two-phase reviewing” process!