Katharina Kann

Assistant Professor Of Computer Science · University of Colorado Boulder · CV · katharina.kann@colorado.edu
Website last updated on 01/28/2023.

Universal Natural Language Processing: How can we build natural language processing systems that work for all of the world’s languages?

While an enormous amount of time, effort, and resources has been invested into developing technology for English, other languages are often overlooked. I am convinced that, in order to make NLP technologies accessible and useful for a wider and more diverse variety of users, more emphasis should be put on developing models for languages besides English, including low-resource languages. Thus, an important goal of my research is to develop computational approaches which perform well across a large variety of languages which might differ from English in their typology as well as the amount of available resources.

Deep Learning · Multilingual NLP · Computational Morphology · NLP for Educational Applications · Language Grounding · NLP for Medical Applications · Low-resource Machine Translation


Publications

2023
  • Abteen Ebrahimi, Arya D. McCarthy, Arturo Oncevay, John E. Ortega, Luis Chiruzzo, Rolando Coto-Solano, Gustavo A. Giménez-Lugo, and Katharina Kann. Meeting the Needs of Low-Resource Languages: Exploring Automatic Alignments via Pretrained Models. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, May 2023 (to appear).

2022
  • Katharina Kann, Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, John E. Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo A. Giménez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Elisabeth Mager, Vishrav Chaudhary, Graham Neubig, Alexis Palmer, Rolando Coto-Solano, and Ngoc Thang Vu. AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas. In Frontiers in Artificial Intelligence, 2022.

  • Katharina Kann, Shiran Dudy, and Arya D. McCarthy. A Major Obstacle for NLP Research: Let’s Talk about Time Allocation! In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, November 2022.

  • Adam Wiemerslage, Shiran Dudy, and Katharina Kann. A Comprehensive Comparison of Neural Networks as Cognitive Models of Inflection. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, November 2022.

  • Rajat Bhatnagar, Ananya Ganesh, and Katharina Kann. CHIA: CHoosing Instances to Annotate for Machine Translation. In Findings of the 2022 Conference on Empirical Methods in Natural Language Processing, November 2022.

  • Trevor A. Hall, Maria Valentini, Eliana Colunga, and Katharina Kann. Generate Me a Bedtime Story: Leveraging Natural Language Processing for Early Vocabulary Enhancement. In Proceedings of the Workshop on NLP for Positive Impact, Abu Dhabi, November 2022.

  • Katharina Kann, Abteen Ebrahimi, Kristine Stenzel, and Alexis Palmer. Machine Translation Between High-resource Languages in a Language Documentation Setting. In Proceedings of the First Workshop on Applying NLP to Field Linguistics, Gyeongju, October 2022.

  • Ananya Ganesh, Hugh Scribner, Jasdeep Singh, Katherine Goodman, Jean Hertzberg, and Katharina Kann. Response Construct Tagging: NLP-Aided Assessment for Engineering Education. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, July 2022.

  • Katharina Kann, Abteen Ebrahimi, Joewie J. Koh, Shiran Dudy, and Alessandro Roncone. Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do Next. In Proceedings of the 4th Workshop on NLP for Conversational AI, Dublin, May 2022.

  • Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Vishrav Chaudhary, Luis Chiruzzo, Angela Fan, John Ortega, Ricardo Ramos, Annette Rios, Ivan Vladimir Meza Ruiz, Gustavo A. Giménez-Lugo, Elisabeth Mager, Graham Neubig, Alexis Palmer, Rolando Coto-Solano, Thang Vu, and Katharina Kann. AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 2022.

  • Yoshinari Fujinuma, Jordan Lee Boyd-Graber, and Katharina Kann. How Does Multilingual Pretraining Affect Cross-Lingual Transferability? In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 2022.

  • Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya D. McCarthy, Garrett Nicolai, Eliana Colunga, and Katharina Kann. Morphological Processing of Low-Resource Languages: Where We Are and What's Next. In Findings of the 60th Annual Meeting of the Association for Computational Linguistics, May 2022.

  • Manuel Mager, Arturo Oncevay, Elisabeth Mager, Katharina Kann, and Thang Vu. BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages. In Findings of the 60th Annual Meeting of the Association for Computational Linguistics, May 2022.

2021
  • Cory Paik, Stephane Aroca-Ouellette, Alessandro Roncone, and Katharina Kann. The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, November 2021.

  • Atul Kr Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen. Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages. In Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages, online, August 2021.

  • Adam Wiemerslage, Arya McCarthy, Alexander Erdmann, Garrett Nicolai, Manex Agirrezabal, Miikka Silfverberg, Mans Hulden, and Katharina Kann. Findings of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering. In Proceedings of the Shared Task of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, online, August 2021.

  • Andrew Gerlach, Adam Wiemerslage and Katharina Kann. Paradigm Clustering with Weighted Edit Distance. In Proceedings of the Shared Task of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, online, August 2021.

  • Abteen Ebrahimi and Katharina Kann. How to Adapt Your Pretrained Multilingual Model to 1600 Languages. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, online, August 2021.

  • Rajat Bhatnagar, Ananya Ganesh, and Katharina Kann. Don’t Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, online, August 2021.

  • Stephane Aroca-Ouellette, Cory Paik, Alessandro Roncone, and Katharina Kann. PROST: Physical Reasoning of Objects through Space and Time. In Findings of the 59th Annual Meeting of the Association for Computational Linguistics, August 2021.

  • Ananya Ganesh, Martha Palmer, and Katharina Kann. What Would a Teacher Do? Predicting Future Talk Moves. In Findings of the 59th Annual Meeting of the Association for Computational Linguistics, August 2021.

  • Manuel Mager, Arturo Oncevay, Abteen Ebrahimi, John Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo Giménez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Rolando Coto-Solano, Alexis Palmer, Elisabeth Mager-Hois, Vishrav Chaudhary, Graham Neubig, Ngoc Thang Vu, and Katharina Kann. Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, online, June 2021.

  • Katharina Kann and Mauro M. Monsalve-Mercado. Coloring the Black Box: What Synesthesia Tells Us about Character Embeddings. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, online, April 2021.

  • Beilei Xiang, Changbing Yang, Yu Li, Alex Warstadt and Katharina Kann. CLiMP: A Benchmark for Chinese Language Model Evaluation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, online, April 2021.

2020
  • Nikhil Prabhu and Katharina Kann. Making a Point: Pointer-Generator Transformers for Disjoint Vocabularies. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 9th International Joint Conference on Natural Language Processing Student Research Workshop, online, December 2020. Best Paper Award.

  • Jason Phang*, Iacer Calixto*, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann and Samuel R. Bowman. English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 9th International Joint Conference on Natural Language Processing, online, December 2020.

  • Rajat Agarwal and Katharina Kann. Acrostic Poem Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, online, November 2020.

  • Manuel Mager, Özlem Çetinoğlu and Katharina Kann. Tackling the Low-resource Challenge for Canonical Segmentation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, online, November 2020.

  • Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann and Mans Hulden. IGT2P: From Interlinear Glossed Texts to Paradigms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, online, November 2020.

  • Katharina Kann*, Arya D. McCarthy*, Garrett Nicolai and Mans Hulden. The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, online, July 2020.

  • Nikhil Prabhu and Katharina Kann. Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, online, July 2020.

  • Assaf Singer and Katharina Kann. The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, online, July 2020.

  • Manuel Mager and Katharina Kann. The IMS–CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, online, July 2020.

  • Anhad Mohananey*, Katharina Kann* and Samuel R. Bowman. Self-Training for Unsupervised Parsing with PRPN. In Proceedings of the 16th International Conference on Parsing Technologies, online, July 2020.

  • Yada Pruksachatkun*, Jason Phang*, Haokun Liu*, Phu Mon Htut*, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann and Samuel R. Bowman. Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, online, July 2020.

  • Huiming Jin, Liwei Cai, Yihui Peng, Chen Xia, Arya McCarthy and Katharina Kann. Unsupervised Morphological Paradigm Completion. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, online, July 2020.

  • Katharina Kann, Samuel R. Bowman and Kyunghyun Cho. Learning to Learn Morphological Inflection for Resource-Poor Languages. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, USA, February 2020.

  • Katharina Kann*, Ophélie Lacroix* and Anders Søgaard. Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, USA, February 2020.

  • Katharina Kann. Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge. In Proceedings of the Meeting of the Society for Computation in Linguistics, New Orleans, USA, January 2020.

2019
  • Johannes Bjerva, Katharina Kann and Isabelle Augenstein. Transductive Auxiliary Task Self-Training for Neural Multi-Task Models. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, Hong Kong, China, November 2019.

  • Katharina Kann, Anhad Mohananey, Kyunghyun Cho and Samuel R. Bowman. Neural Unsupervised Parsing Beyond English. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, Hong Kong, China, November 2019.

  • Katharina Kann, Kyunghyun Cho and Samuel R. Bowman. Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Hong Kong, China, November 2019.

  • Yadollah Yaghoobzadeh, Katharina Kann, T. J. Hazen, Eneko Agirre and Hinrich Schütze. Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, August 2019.

  • Manuel Mager, Özlem Çetinoğlu and Katharina Kann. Subword-Level Language Identification for Intra-Word Code-Switching. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, USA, June 2019.

  • Katharina Kann*, Alex Warstadt*, Adina Williams* and Samuel R. Bowman. Verb Argument Structure Alternations in Word and Sentence Embeddings. In Proceedings of the Meeting of the Society for Computation in Linguistics, New York, USA, January 2019.

2018
  • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sebastian Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner and Mans Hulden. The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning, Brussels, Belgium, October/November 2018.

  • Katharina Kann, Stanislas Lauly and Kyunghyun Cho. The NYU System for the CoNLL-SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection. In Proceedings of the CoNLL-SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, Brussels, Belgium, October/November 2018.

  • Katharina Kann and Hinrich Schütze. Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October/November 2018.

  • Katharina Kann, Sascha Rothe and Katja Filippova. Sentence-Level Fluency Evaluation: References Help, But Can Be Spared! In Proceedings of the SIGNLL Conference on Computational Natural Language Learning, Brussels, Belgium, October/November 2018.

  • Manuel Mager, Elisabeth Mager, Alfonso Medina-Urrea, Ivan Meza and Katharina Kann. Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages. In Proceedings of All Together Now? Computational Modeling of Polysynthetic Languages, Santa Fe, USA, August 2018.

  • Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank and Anders Søgaard. Character-level Supervision for Low-resource POS Tagging. In Proceedings of the 1st Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, Melbourne, Australia, July 2018.

  • Yadollah Yaghoobzadeh, Katharina Kann and Hinrich Schütze. Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing. In Proceedings of the 3rd Workshop on Representation Learning for NLP, Melbourne, Australia, July 2018.

  • Katharina Kann*, Jesus Manuel Mager Hois*, Ivan Vladimir Meza Ruiz and Hinrich Schütze. Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies New Orleans, USA, June 2018. pdf bib

2017
  • Katharina Kann and Hinrich Schütze. Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models. In Proceedings of the 1st Workshop on Subword and Character Level Models in NLP, Copenhagen, Denmark, September 2017. pdf bib

  • Huiming Jin and Katharina Kann. Exploring Cross-Lingual Transfer of Morphological Knowledge In Sequence-to-Sequence Models. In Proceedings of the 1st Workshop on Subword and Character Level Models in NLP, Copenhagen, Denmark, September 2017. pdf bib

  • Katharina Kann and Hinrich Schütze. The LMU System for the CoNLL-SIGMORPHON 2017 Shared Task on Universal Morphological Reinflection. In Proceedings of the CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, Vancouver, Canada, August 2017. pdf bib

  • Toms Bergmanis, Katharina Kann, Hinrich Schütze and Sharon Goldwater. Training Data Augmentation for Low-Resource Morphological Inflection. In Proceedings of the CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, Vancouver, Canada, August 2017. pdf bib

  • Katharina Kann, Ryan Cotterell and Hinrich Schütze. One-Shot Neural Cross-Lingual Transfer for Paradigm Completion. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, August 2017. pdf bib

  • Katharina Kann, Ryan Cotterell and Hinrich Schütze. Neural Multi-Source Morphological Reinflection. In Proceedings of the 2017 Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, April 2017. pdf bib

2016
  • Katharina Kann, Ryan Cotterell and Hinrich Schütze. Neural Morphological Analysis: Encoding-Decoding Canonical Segments. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, USA, November 2016. pdf bib

  • Katharina Kann and Hinrich Schütze. MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, Berlin, Germany, August 2016. pdf bib

  • Katharina Kann and Hinrich Schütze. Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, August 2016. pdf bib

Research Group

Current and past members of the Kann Natural Language Processing Group (NALA), my research group, are listed on NALA's website.

Travels

New York, USA · February 2020 · AAAI



New Orleans, USA · January 2020 · LSA



Hong Kong, China · November 2019 · EMNLP



Philadelphia, USA · October 2019 · CLunch



Ann Arbor, USA · October 2019 · Michigan AI Symposium



Guanajuato, Mexico · October 2019 · PLAGAA

Prospective Students

If you are a student interested in working with me, please read the following first.

  • If you are a student at CU Boulder and interested in an independent study: You should have a look at the interests of my PhD students, which are described on our group's page. Then, send an email to the PhD student whose interests are closest to yours, cc'ing me. Describe your relevant background and two project ideas in a few sentences. Add "[ISCUBNALA]" to the subject of your email.
  • If you are not a student at CU Boulder and want to start a PhD with me: You should apply directly to CU Boulder's PhD program in computer science and list me as a potential advisor. You can increase your chances of acceptance by also contacting me per email around the application deadline. In this case, "[PHDCUBNALA]" should be the first word of your email's subject.
  • Everyone else: If you want to send me an email about any other topic and we have not previously been in contact, adding "[MISCCUBNALA]" to your email's subject will increase your chances of receiving an answer.