Grzegorz Chrupała
I am an assistant professor at the department of Cognitive Science and Artificial Intelligence at Tilburg University.
My research is focused on computational language learning, specifically:
- Learning language from multimodal signals such as speech and vision;
- Analysis of representations emerging in multilayer recurrent neural networks.
I received my PhD from
the School of Computing at
Dublin City University. After that I worked as a researcher at the
Spoken Language Systems group
at Saarland
University.
See full bio.
In my free time I do nature and travel photography: see my Google+ Photography collection, and my Instagram.
News
Upcoming
- The First Workshop on Analyzing and Interpreting Neural Networks for NLP (organized by Tal Linzen, Afra Alishahi and Grzegorz Chrupała) will be collocated with EMNLP 2018 in Brussels.
Recent
- Paper accepted at CoNLL 2018: Lessons learned in multilingual grounded language learning.
- July 17: I gave a talk at Amsterdam University about grounded representation learning for spoken language.
- June 27: Jörg Tiedemann gave a talk at the TiCC colloquium.
- March 24: Talk at CiMEC on Neural representations of form and meaning in spoken language.
- Two papers accepted at Coling 2018: Revisiting the Hierarchical Multiscale LSTM and Style Obfuscation by Invariance.
- Koel Dutta Chowdhury, a PhD student from DCU Adapt Center, joins my team for a summer research visit.
- I am area co-chair for Discourse and Dialogue, Summarization and Generation, and Multimodal NLP and Speech at EMNLP 2018.
- I am on the program committee for CoNLL 2018.
- April 18: Arianna Bisazza gave a talk on Hints of linguistic structure in models of language and translation at the TiCC colloquium.
- March 14: Odette Scharenborg gave a talk on Computational modelling of human spoken-word recognition at the TiCC colloquium.
- Mahsa Afsharizade, a PhD student from Kashan University in Iran, joined my team for a 6-month research visit.
- I am general chair of Benelearn 2018, collocated with BNAIC at Jheronimus Academy of Data Science (JADS) in ‘s-Hertogenbosch.
Publications
Profiles: Google Scholar | Semantic Scholar
Selected papers
- Ákos Kádár, Desmond Elliott, Marc-Alexandre Côté, Grzegorz
Chrupała & Afra Alishahi. 2018. Lessons learned in multilingual
grounded language learning. CoNLL.
Paper | Code - Afra Alishahi, Marie Barking and Grzegorz Chrupała. 2017. Encoding
of phonology in a recurrent neural model of grounded
speech. CoNLL. Best Paper Award.
Paper | Slides | Code - Grzegorz Chrupała, Lieke Gelderloos and Afra
Alishahi. 2017. Representations of language in a model of visually
grounded speech
signal. ACL.
Paper | Slides | Code - Ákos Kádár, Grzegorz Chrupała and Afra
Alishahi. 2017. Representation of linguistic form and function in recurrent
neural networks. Computational Linguistics, 43(4):761-780.
Paper | Code - Grzegorz Chrupała. 2014. Normalizing tweets with
edit scripts and recurrent neural embeddings. ACL.
Paper | Slides
Complete list
-
Grzegorz Chrupała, Lieke Gelderloos, Ákos Kádár & Afra
Alishahi. 2019. On the difficulty of a distributional semantics of
spoken language. Accepted for the Proceedings of the Society for
Computation in Linguistics (SCiL) 2019
Preprint - Ákos Kádár, Desmond Elliott, Marc-Alexandre Côté, Grzegorz
Chrupała & Afra Alishahi. 2018. Lessons learned in multilingual
grounded language learning. CoNLL.
Paper - Ákos Kádár, Marc-Alexandre Côté, Grzegorz Chrupała & Afra
Alishahi. 2018. Revisiting the Hierarchical Multiscale
LSTM. Coling.
Paper - Chris Emmery, Enrique Manjavacas & Grzegorz
Chrupała. 2018. Style Obfuscation by
Invariance. Coling.
Paper - Chris Emmery, Grzegorz Chrupała and Walter Daelemans. 2017. Simple
Queries as Distant Labels for Detecting Gender on Twitter. EMNLP
Workshop on Noisy User-generated
Text.
Paper | Code - Afra Alishahi, Marie Barking and Grzegorz Chrupała. 2017. Encoding
of phonology in a recurrent neural model of grounded
speech. CoNLL.
Paper | Slides | Code - Grzegorz Chrupała, Lieke Gelderloos and Afra
Alishahi. 2017. Representations of language in a model of visually
grounded speech
signal. ACL.
Paper | Slides | Code - Ákos Kádár, Grzegorz Chrupała and Afra
Alishahi. 2017. Representation of linguistic form and function in recurrent
neural networks. Computational Linguistics, 43(4):761-780.
Paper | Code - Lieke Gelderloos and Grzegorz Chrupała. 2016. From
phonemes to images: levels of representation in a recurrent neural
model of visually-grounded language learning. Coling.
Paper - Angeliki Lazaridou, Grzegorz Chrupała, Raquel Fernández and Marco
Baroni. 2016. Multimodal Semantic Learning from Child-Directed
Input. NAACL.
Paper - Gabriele Trovato, Grzegorz Chrupała and Atsuo
Takanishi. 2016. Application of the Naive Bayes Classifier for
Representation and Use of Heterogeneous and Incomplete Knowledge in
Social Robotics. Robotics, 5(1).
Paper - Ákos Kádár, Afra Alishahi and Grzegorz
Chrupała. 2015. Learning word meanings from images of
natural scenes. Traitement Automatique des
Langues 55(3).
Paper - Ákos Kádár, Grzegorz Chrupała and Afra
Alishahi. 2015. Linguistic Analysis of Multi-modal Recurrent Neural
Networks. EMNLP Vision and Language workshop.
Abstract | Poster - Antoaneta Baltadzhieva and Grzegorz
Chrupała. 2015. Predicting the quality of questions on
Stackoverflow. RANLP.
Paper - Grzegorz Chrupała, Ákos Kádár, Afra Alishahi. 2015. Learning language through pictures. ACL.
Paper | Slides - Antoaneta Baltadzhieva and Grzegorz
Chrupała. 2015. Question Quality in Community
Question Answering Forums: A Survey. ACM SIGKDD Explorations
Newsletter 17(1):8-13.
Paper - Utsab Barman, Joachim Wagner, Grzegorz Chrupała and
Jennifer Foster. 2014. DCU-UVT: Word-Level Language Classification with
Code-Mixed Data. Shared Task: Language Identification in
Code-Switched Data, EMLNP Workshop on Computational Approaches
to Code Switching.
Paper - Grzegorz Chrupała. 2014. Normalizing tweets with
edit scripts and recurrent neural embeddings. ACL.
Paper | Slides - Huijing Deng and Grzegorz Chrupała. 2014. Semantic
approaches to software component retrieval with English queries. LREC.
Paper | Poster | Code - Benjamin Roth, Tassilo Barth, Grzegorz Chrupała,
Martin Gropp, Dietrich Klakow. 2014.
RelationFactory: A Fast, Modular and Effective System for Knowledge
Base Population. EACL software demos.
Paper | Code - Kilian Evang, Valerio Basile, Grzegorz Chrupała,
Johan Bos. 2013. Elephant: Sequence Labeling for Word and Sentence
Segmentation. EMNLP.
Paper | Poster | Code - Grzegorz Chrupała. 2013. Text segmentation with character-level
text embeddings. ICML
Workshop on Deep Learning for Audio, Speech and Language
Processing.
Paper | Poster -
Benjamin Roth, Grzegorz Chrupała, Michael Wiegand and Mittul Singh. 2012.
Generalizing from Freebase and Patterns using Distant Supervision for Slot Filling. TAC.
Paper - Afra Alishahi
and Grzegorz Chrupała. 2012. Concurrent Acquisition
of Word Meaning and Lexical
Categories. EMNLP-CoNLL.
Paper | Poster - Grzegorz Chrupała. 2012. Hierarchical clustering of word
class distributions. NAACL-HLT Workshop on the Induction of Linguistic
Structure.
Paper | Code -
Grzegorz Chrupała. 2012. Learning from evolving data
streams: online triage of bug
reports. EACL.
Paper | Slides | Data | Code -
Fang Xu, Stefan Kazalski, Grzegorz Chrupała, Benjamin
Roth, Xujian Zhao, Michael Wiegand and Dietrich Klakow. 2011. Saarland
University Spoken Language Systems Group at TAC KBP 2011. TAC.
Paper -
Grzegorz Chrupała. 2011. Efficient induction of
probabilistic word classes with LDA. IJCNLP.
Paper | Slides | Code -
Grzegorz
Chrupała, Saeedeh
Momtazi, Michael Wiegand, Stefan Kazalski, Fang Xu, Benjamin Roth,
Alexandra Balahur and Dietrich Klakow. 2010. Saarland University
Spoken Language Systems at the Slot Filling Task of TAC KBP
2010. TAC.
Paper - Grzegorz Chrupała, Georgiana Dinu and Benjamin
Roth. 2010. Enriched syntax-based meaning representation for answer
extraction. SIGIR
Workshop: Query Representation and Understanding
Paper | Poster - Grzegorz Chrupała and Afra
Alishahi. 2010. Online Entropy-based Model of Lexical Category
Acquisition. CoNLL
Paper | Slides | Code - Georgiana Dinu and Grzegorz Chrupała. 2010. Relatedness
curves for acquiring paraphrases. ACL workshop GEMS
Paper - Djamé Seddah,
Grzegorz Chrupała, Özlem Çetinoğlu, Josef van Genabith
and Marie Candito. 2010. Lemmatization and Lexicalized Statistical Parsing
of Morphologically Rich Languages: the Case of French. NAACL SPMRL workshop
Paper. -
Grzegorz Chrupała and Dietrich Klakow. 2010. A Named
Entity Labeler for German: exploiting Wikipedia and
distributional
clusters. LREC
Paper | Code - Afra Alishahi and Grzegorz
Chrupała. 2009. Lexical
Category Acquisition as an Incremental
Process. PsychoCompLA-2009,
Cogsci
Paper - Michael Wiegand, Saeedeh
Momtazi, Stefan Kazalski, Fang Xu, Grzegorz
Chrupała and Dietrich Klakow. 2008. The Alyssa System at TAC
QA 2008. TAC
Paper - Grzegorz Chrupała, Georgiana Dinu and Josef van
Genabith. 2008. Learning Morphology with Morfette. LREC
Paper | Code - Grzegorz Chrupała, Josef van
Genabith. Using very large corpora to detect raising and control
verbs. 2007. LFG
Paper - Grzegorz Chrupała, Nicolas Stroppa, Josef van Genabith and
Georgiana Dinu. 2007. Better Training for Function
Labeling. RANLP
Paper | Code - Grzegorz Chrupała. 2006. Simple Data-Driven
Context-Sensitive Lemmatization. SEPLN
Paper - Grzegorz Chrupała and Josef van
Genabith. 2006. Using Machine-Learning to Assign Function Labels
to Parser Output for Spanish. COLING/ACL
Paper - Grzegorz Chrupała and Josef van
Genabith. 2006. Improving Treebank-Based Automatic LFG Induction
for Spanish. LFG
Paper - Xavier Carreras, Lluís Màrquez and
Grzegorz Chrupała. 2004. Hierarchical Recognition of
Propositional Arguments with Perceptrons. CoNLL
Paper - Anthony Pym and Grzegorz Chrupała. 2005. The quantitative analysis of translation flows in the age of an international language. In Less Translated Languages, Albert Branchadell and Lovell Margaret West (eds.), 27-38. John Benjamins.
-
Grzegorz Chrupała. 2003. Perl Scripting in
Translation Project Management. Across Languages and
Cultures, 4(1): 109-132
Paper - Grzegorz Chrupała and Lidia Cámara. 2003. STAR Transit XV. In Entornos Informáticos de la Traducción Profesional, Gloria Corpas Pastor and María-José Varela Salinas, (eds.). Atrio, Granada.
Theses
- Grzegorz Chrupała. 2008. Towards a
Machine-Learning Architecture for Lexical Functional Grammar
Parsing. PhD dissertation, Dublin City University
PDF - Grzegorz Chrupała. 2003. Acquiring Verb Subcategorization
from Spanish Corpora. DEA Thesis, University of Barcelona.
PDF - Grzegorz Chrupała. 1998. Bibliotheca in Fabula. The library motive in La biblioteca de Babel, The British Museum is Falling Down and Il nome della rosa. MA Thesis, University of Silesia.
HTML
Team
PhD students
- Chris Emmery. NLP for cybersecurity.
- Ákos Kádár. Language and vision. Co-supervised with Afra Alishahi.
- Koel Dutta Chowdhury. Multimodal translation. Co-supervised with Yvette Graham (DCU).
Selected MSc theses
- Mark van der Laan. 2018. Encoding of speaker identity in a Neural Network model of Visually Grounded Speech perception. Tilburg University.
- Bart Broere. 2017. Syntactic properties of skip-thought vectors. Tiburg University.
- Lieke Gelderloos. 2016. Tilburg University. Levels of representation in a recurrent neural model of visually grounded language learning. Tilburg University. See also Coling 2016 paper: From phonemes to images: levels of representation in a recurrent neural model of visually grounded language learning.
- Kia Eisinga. 2016. Tilburg University. Predicting Runway Allocation with Support Vector Machine and Logistic Regression. Tilburg University.
- Ákos Kádár. 2014. Grounded learning for source code component retrieval. Tilburg University
- Antoaneta Baltadzhieva. 2014. Predicting question quality in question answering forums. Tilburg University
- Lucas Vergeest. 2014. Using N-grams and word embeddings for Twitter hashtag suggestion. Tilburg University
- Huijing Deng. 2013. Probabilistic Models of API Retrieval. Saarland University. (See also Deng and Chrupała. 2014. Semantic approaches to software component retrieval with English queries. LREC.)
Software
Morfette is used by many people for morphological analysis. Sequor is also popular for sequence labeling, and comes with named entity models for German and English. Elephant provides models for word and sentence segmentation for English, Dutch and Italian. RelationFactory implements a state-of-the-art relation extraction and knowledge base population pipeline. Many of the other packages are quite specialized and are of interest mostly if you are a researcher working on similar problems as myself.- Visually grounded speech: speech recognition without transcripts.
- Funktional: minimalistic toolkit for functionally composable neural network layers with Theano.
- RelationFactory: End-to-end relation extraction and knowledge base population pipeline.
- Elephant: word and sentence boundary detection with character-level text embeddings.
- Elman: a version of Tomas Mikolov's recurrent neural network language model modified to output hidden layer activations.
- Hiera: hierarchical clustering of word-class probability distributions.
- Colada: implements online and minibatch word class class induction using Latent Dirichlet Allocation (LDA) with an Online Gibbs sampler.
- Ladybug: Online (incremental) triage of bug reports.
- LDA-wordclass: Soft word-class induction with Latent Dirichlet Allocation
- Lingo: Haskell NLP utilities
- Delta-H: Online entropy-based model of lexical category acquisition
- Sequor: a perceptron-based sequence labeler with a flexible feature template language.
It is meant mainly for NLP applications such as Part of Speech tagging, syntactic chunking or Named Entity labeling. Includes:
- SemiNER: a semi-supervised Named Entity labeler (with pre-trained models for German and English)
- Morfette: a tool for supervised learning of inflectional morphology. Comes with pre-trained models for Spanish and French.
- Funtag: Add grammatical function labels to constituency parse trees
Invited Talks
- Grounded representation learning for spoken language. University of Amsterdam. July 2018.
- Neural representations of form and meaning in spoken language. CIMeC, Italy. March 2018.
- Analyzing neural representations in models of grounded language learning. Symposium at University of Groningen. December 2017.
- Linguistic interpretability in neural models of grounded language learning. EMNLP Workshop on Building Linguistically Generalizable NLP Systems. September 2017. Slides
- Processing Multimodal Data with Recurrent and Convolutional Neural Networks. VU (Vrije Universiteit Amsterdam), November 2016.
- Representation of language form, structure and meaning in multimodal recurrent neural networks. Donders Institute for Brain, Cognition and Behavior in Nijmegen, November 2016. Slides
- Language representations in visually grounded neural models. Department of Linguistics, Ruhr Universitaet Bochum, October 2016.
- Investigating language representations in visually grounded neural models. Ecole Normale Supérieure, September 2016. Slides
- Learning language from multimodal signals. Data Science Seminar Tilburg, March 2016. Slides
- Visually grounded linguistic representations. UvA, December 2015. Slides
- Learning visually grounded word and sentence representations. CIMeC, May 2015. Slides
- Learning text embeddings with recurrent neural language models. Colloquium series on machine learning, pattern recognition, and computer vision at TU Delft, May 2014. Slides
- Introduction to Question Answering. Guest lecture for Language in the Digital Age course, Tilburg University, October 2013. Slides
- Learning word classes for semisupervised learning of NLP tasks. Textkernel, February 2012. Slides
Tutorials
- Gentle introduction to R for data analysis. TiCC colloquium, March 2015.
- Machine learning for NLP and MT.
(with Nicolas Stroppa, Google, Zurich)
EU META Network of Excellence workshop, Barcelona, Spain, October 2010. - Introduction to classification and sequence labeling. International Research Training Group - Annual Meeting, Irsee, Germany, June 2009. Half-day intensive tutorial for graduate students covering basic machine learning techniques useful for NLP. Slides
- Machine Learning for NLP. Centre for Next-Generation Localisation, Dublin, Ireland, March 2009. Two-day intensive tutorial for graduate students covering a selection of machine learning techniques useful for NLP. Slides
Bio
Grzegorz Chrupała is an Assistant Professor at the Department of Cognitive Science and Artificial Intelligence at Tilburg University. Previously he did postdoctoral research at the Spoken Language Systems group at Saarland University. He received his doctoral degree from the School of Computing at Dublin City University in 2008. In his recent research he has focused on computational models of language learning from multimodal signals such as speech and vision and on the analysis and interpretability of representations emerging in multilayer recurrent neural networks. He regularly serves on program committees of major NLP and AI conferences, workshops and journals. He was an area chair at ACL 2017 (Machine Learning) and at EMNLP 2018 (Multimodal NLP and Speech), a general chair for Benelearn 2018, and co-organizer of BlackboxNLP 2018 (workshop on analyzing and interpreting neural networks for NLP).Contact
Grzegorz
Chrupała
Department of Cognitive Science and Artificial Intelligence
Tilburg University
PO Box 90153
5000 LE Tilburg
The Netherlands
Department of Cognitive Science and Artificial Intelligence
Tilburg University
PO Box 90153
5000 LE Tilburg
The Netherlands
Twitter: @gchrupala
Web: grzegorz.chrupala.me
Phone: +31 13 466 8020
Email: g.chrupala@uvt.nl