Belz, A., Thomson, C., Reiter, E., Abercrombie, G., Alonso-Moral, J. M., Arvan, M., Braggaar, A., Cieliebak, M., Clark, E., van Deemter, K., Dinkar, T., Dušek, O., Eger, S., Fang, Q., Gao, M., Gatt, A., Gkatzia, D., González-Corbelle, J., Hovy, D., ... Yang, D. (2023). Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP. In S. Tafreshi, A. Akula, J. Sedoc, A. Drozd, A. Rogers, & A. Rumshisky (Eds.), The Fourth Workshop on Insights from Negative Results in NLP (pp. 1-10). Association for Computational Linguistics.