The AI conference system is flawed

Original article was published by Sam Ganzfried on Artificial Intelligence on Medium


The AI conference system is flawed

It is widely known that there is a lot of “random noise” in AI conference paper acceptances. For example, an experiment was done at the conference NIPS several years ago showing that paper acceptances were much closer to random chance than initially expected. In that experiment, some papers were reviewed by two independent committees, and the two committees disagreed on 57% of the outcomes of these papers: “Relative to what people expected, 57% is actually closer to a purely random committee, which would only disagree on 77.5% of the accepted papers on average.” For more details of that study see here.

In addition to this documented randomness, it is possible that there is also a bias in reviewing in favor of certain types of papers and against others that does not agree with the paper qualities. For example, this could include accepting worse papers because they are on certain topics, have certain authors, use certain buzzwords, follow a certain structure, etc., and similarly rejecting better papers that fail to do these things. One can only conjecture the extent to which this phenomenon occurs — and whether it occurs at all — as no experiments have been done, and performing such experiments would be nearly impossible as they would require a clear agreement on the correct definition of bias for this setting and which features are pertinent.

It is nearly impossible to make any conclusive claims about the extent to which these issues of “randomness” and “bias” arise. For every experiment on one dataset, there could easily be another experiment on a different dataset showing a different conclusion. Furthermore, performing such experiments would likely involve a lot of time spent performing additional reviews and collecting and analyzing data. It is unlikely that a data-based approach would be feasible or be able to conclusively determine whether the system is flawed.

However, logical reasoning is infallible, and I will present a clear logical argument why the AI conference reviewing system is flawed.

I’m not going to lie, I’ve had a lot of papers rejected recently. As two examples, “Fast Complete Algorithm for Multiplayer Nash Equilibrium,” which presents a new complete algorithm for computing Nash equilibria in multiplayer games that outperforms all prior approaches, was just rejected at EC, NeurIPS, and AAAI (with extremely negative reviews at each). And “Algorithm for Computing Approximate Nash equilibrium in Continuous Games with Application to Continuous Blotto,” which presents the first algorithm for solving the continuous version of the widely-studied Blotto game, was also just rejected at AAAI. Obviously a couple rejections in this high-variance process doesn’t mean much. But all of my papers have been getting rejected for years now, often with extremely negative reviews. In fact, I haven’t had a first-author paper accepted at a “top-tier” AI conference in over 5 years, despite writing several papers that I think are great. Of course, however, my anecdotal experience in the past several years does not prove anything.

There are two distinct possibilities.

Case 1: I have been writing good papers.

Case 2: I have been writing bad papers.

Now, I’m an open-minded person. While I think that many of my papers are great, I’m aware that there is a possibility that I am wrong and that they are actually not very good. It’s been consistently shown in the psychology literature that people overestimate their own abilities, and I’m aware that this may be the case for me. I don’t really think this is the case, but I acknowledge that it is a possibility.

Suppose case 1 is true. This means that I am writing great papers, that are consistently getting extremely negative reviews. This would clearly imply that the system is flawed.

Now suppose case 2 is true, and my papers are actually bad and the reviews are correct. This would imply that I am just a not very good researcher. If I am consistently writing papers that merit very poor reviews, then I am a bad researcher. While I certainly hope this is not the case, I am aware that it is a possibility.

Nearly every year, several of the top conferences invite me to serve on the program committee to review papers for them. For example, this year AAAI, WWW, AAMAS, and a NeurIPS workshop invited me to serve on the PC, and the NeurIPS conference also invited me to be a reviewer.

Assuming we are in case 2, we have already shown that I must be a bad researcher. But then this means that all of the top conferences are recruiting bad researchers to review papers and play a pivotal role in determining acceptance outcomes. If bad researchers are determining which papers should be accepted, then the system is clearly flawed.

So we have shown that under both cases the system is flawed, and clearly either Case 1 or Case 2 must be true. While I think my research is great, I’m open to the possibility that it is actually terrible; but I have shown that the system is flawed even if this is the case. So I have shown conclusively, using a rigorous logical argument, that the current AI conference system is flawed.