While researchers know that contemporary natural language processing models aren’t as accurate as their leaderboard performance makes them appear, there hasn’t been a structured way to test them. The best paper award at ACL 2020 went to Prof. Sameer Singh, and collaborators Marco Tulio Ribeiro of Microsoft Research and Tongshuang Wu and Carlos Guestrin at the University of Washington, for their paper Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. Their CheckList framework uses a matrix of general linguistic capabilities and test types to reveal weaknesses in state-of-the-art cloud AI systems.
Read more: https://www.ics.uci.edu/community/news/view_news?id=1817