Reviewing Advice for NAACL-HLT

We are re-posting the reviewing advice from EMNLP 2020.

The intention for this page is to provide some advice to reviewers, such that we can identify the best research to be presented in the conference, and provide constructive feedback in order for authors to further improve their papers. We recognize and appreciate the amount of efforts reviewers have contributed, and hope to make that more beneficial. We want all the authors to feel the delight when they read the peer reviews for their papers.

This is not the first attempt to educate reviewers. Many major conferences have included advice to reviewers, in NLP and in other fields, and there’s also plentiful advice relating to journal reviewing. Within the field of NLP, we would highlight:

Discursive advice in ACL 2017 from leading lights in the field: Mirella Lapata, Marco Baroni, Yoav Artzi, Emily Bender, Joel Tetreault, Ani Nenkova, and Tim Baldwin
Two example good reviews from NAACL 2018 presented in their reviewing form
A podcast by Noah Smith about peer reviews

Please take the time to look through these excellent resources.

We hope reiterating some dos and don’ts here can help reviewers as well as authors.

First, evaluate the paper’s contributions. This is where you will use your NLP domain knowledge. We advise that you should not accept papers just because their reported results are better, or that they appear to be mathematically sophisticated. These are not sufficient or necessary to constitute contributions. And we advise that you should not reject papers just because their results are not better than state-of-the-art. In the previous *ACL conferences, some reviewers placed too much emphasis on SOTA performance, giving low scores to any systems that failed to reach that. While we aim to publish the very best work, a more constructive question to ask is “state of which art?”. As discussed in this blog post, a paper could offer a step forward in terms of efficiency, generalizability, interpretability, and many other criteria. A convincing contribution of any kind should not be rejected only for not topping the leaderboards.

Regarding different kinds of contributions, here’s what Prof. Philip Resnik at University of Maryland says:

I think there would be significant value in encouraging reviewers to think explicitly about the nature of the contribution, and what questions then need to be asked. As a first pass for consideration/discussion:

Is this research making a scientific contribution? If so:

What is the phenomenon in the world that the authors are seeking to improve our understanding of?

What do we now know about this phenomenon that we did not know before?

Is this research making an engineering contribution? If so:

What is the real-world problem (or set of problems) that this work is making progress on solving?

Alternatively, if it’s not targeting a current real-world problem, what real-world problem(s) will this work help enable solutions of?

Is this research making a theoretical (e.g. mathematical) contribution? If so:

What do we know now that we did not know before?

How does this theoretical or mathematical advance connect to either scientific or engineering goals? (See above.)

Work in computational linguistics might include a mixture of scientific, engineering, and theoretical contributions, rather than just one. But, I am suggesting, if a paper does not make a contribution in any of those three categories, with the sub-bullets having understandable answers, one should seriously consider whether it belongs at the conference.

Second, consider these other important points when reading the paper and writing your review:

Check what the paper’s claims are, and how the content of the paper supports that claim. If the paper claims X and there is a performance increase, is that really because of X?
Be specific in your comments. For example, if you think the authors have neglected to cite key papers, then provide these references in your review. It might be obvious to you, but it’s often less clear to the authors. Being specific will help the authors to formulate a cogent response to the review, and to fix these problems in their paper. And it is worth noting that the authors are not obliged to cite or draw comparisons with contemporaneous work (i.e. appearing within 3 months of submission), especially if it is not published in a peer-reviewed venue.
Be constructive in your advice. Stating that some aspect of the paper is done badly can be helpful in the gatekeeping aspect of review (providing grounds for rejection), but it tends to be less helpful to the authors. Some suggestions of how the authors might improve these problematic aspects can allow them to develop the work into something considerably better.
Be kind in your language, even when being critical. It’s easy to get carried away, and write something nasty that you would never say to someone’s face. Try to be polite in your feedback.

Finally, it’s becoming more common for people to share reviews on social media, especially when the reviews reject the work on spurious grounds. These lead us to advise that the following are often invalid bases for rejecting a paper:

The paper’s language or writing style. Please focus on the paper’s substance. We understand that there may be times when the language or writing style is so poor that reviewers can not understand the paper’s content and substance. In that case, it is fine to reject the paper, however you should only do so after making a concerted effort to understand the paper.
The paper’s work is on a language other than English. We care about NLP for any language.
The paper’s results are not better than SOTA. Please look at the paper’s contributions and findings, as discussed above and in this blog post.
The paper does not use a particular method (e.g., deep learning). No one particular method is a requirement for good work. Please justify why that method is needed. Think about what the paper’s contributions are, and bear in mind that having a diversity of methods used is not a bad thing.
The paper’s method is too simple. Our goal is not to design the most complex method. Again, think what the paper’s contributions and findings are. Often the papers with the simplest methods are the most cited. If a simple method outperforms more complex methods from prior work, then this is often an important finding.
The paper’s topic is narrow or outdated. Please be open minded. We do not want the whole community to chase a trendy topic. Look at the paper’s contributions and consider what impact it may have on our community.
The paper’s topic is completely new, such that there’s no prior art or all the prior art has been done in another field. We are interested in papers that tread new ground.
The paper is a resource paper. In a field that relies on supervised machine learning as much as NLP, development of datasets is as important as modeling work. This blog post discusses what can and cannot be grounds for dismissing a resource paper.

Please refrain from using the reasons above as primary grounds for rejection when writing your reviews. We will ensure authors are aware of these guidelines and can reference them during the author rebuttal period. ACs will be checking reviews carefully based on the above criteria, and may ask that you revise your review, or that you provide objective reasons to justify your positions.

We hope these tips are helpful to reviewers, and hope there will be more authors that appreciate the insightful feedback they get from the reviews, and fewer frustrating authors that complain about review quality.

Additional Resources

To read more about the general advice on reviewing, we recommend the following resources:

NeurIPS not only instructs reviewers as to what to include in their reviews, but also gives examples of useful reviewer comments in their Reviewer Guidelines, organized by evaluation criterion, such as “Contributions of the submission”, “Quality of the submission”, “Clarity”, “Originality”, and “Significance”. This will be particularly useful for new reviewers, and also for authors.
ICML gives some examples of good reviews. Please refer to the Part 2 of their Reviewer Guidelines.
A blog article by Pat Thomson about journal review, which shares a broadly similar procedure to conference review. The article is split into three parts, covering how to read the paper critically, how to decide on revisions required and recommendations to the PC, and how to write constructive feedback.
Wiley Publishing has a cute video with 10 tips for first-time reviewers. A list of those tips can be found in the pdf document. One tip that is useful for EVERYONE is tip 8: “Look at the Conclusion First”. Wiley advocates doing so because the Conclusion will give you a good idea whether the research is an exciting development within its own field. But another reason for doing so is to see what the paper is claiming to have done: This often differs from the Abstract and Introduction, which may make more impressive claims than the work actually supports.
Elsevier’s “Researcher Academy” has produced a video about “How to write a helpful peer review report”, presented by Zoë Mullan, Founding editor and Editor-in-chief of the open access journal “The Lancet Global Health”. To watch the video, you have to sign in to join the academy (which is free).
Some interesting findings about paper reviews can be found in an early article published in the Journal of the American Medical Association (JAMA), “What Makes a Good Reviewer and a Good Review for a General Medical Journal?”.

What any of the articles and videos indicated above will give you is CONFIDENCE that you’re doing the right thing. (Early Career reviewers will be happy to hear that one of the conclusions of the JAMA paper is that “Younger age also was an independent predictor for editors’ quality assessments”)