Wenjuan Han

I am studying for my Phd at ShanghaiTech University, where I was advised by Kewei Tu. I did my bachelors at the Nanjing University of Posts and Telecommunications.

Email  /  CV  /  LinkedIn


My research interest is in natural language processing and machine learning. My current research focuses on the study of \custombold{probabilistic/neural models and parsers} for modeling different aspects of intelligence: (1) grammar-based representation, inference, and unsupervised learning; and (2) the application of unsupervised learning approaches with hidden variables in a variety of AI areas including grammar induction and clustering. Representative papers are highlighted.

Dependency Grammar Induction with Neural Lexicalization and Big Training Data
Wenjuan Han, Yong Jiang, Kewei Tu, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017

We study the impact of big models (in terms of the degree of lexicalization) and big data (in terms of the training cor-pus size) on dependency grammar induc-tion. We experimented with L-DMV, a lexicalized version of Dependency Model with Valence (Klein and Manning, 2004) and L-NDMV, our lexicalized extension of the Neural Dependency Model with Va-lence (Jiang et al., 2016). We find that L-DMV only benefits from very small de-grees of lexicalization and moderate sizes of training corpora. L-NDMV can bene-fit from big training data and lexicaliza-tion of greater degrees, especially when enhanced with good model initialization, and it achieves a result that is competitive with the current state-of-the-art.

Combining Generative and Discriminative Approaches to Unsupervised Dependency Parsing via Dual Decomposition
Yong Jiang, Wenjuan Han, Kewei Tu, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017

Unsupervised dependency parsing aims to learn a dependency parser from unanno-tated sentences. Existing work focuses on either learning generative models us-ing the expectation-maximization algo-rithm and its variants, or learning dis-criminative models using the discrimina-tive clustering algorithm. In this paper, we propose a new learning strategy that learns a generative model and a discriminative model jointly based on the dual decom-position method. Our method is simple and general, yet effective to capture the ad-vantages of both models and improve their learning results. We tested our method on the UD treebank and achieved a state-of-the-art performance on thirty languages.

Unsupervised Neural Dependency Parsing
Yong Jiang, Wenjuan Han, Kewei Tu, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016

Unsupervised dependency parsing aims to learn a dependency grammar from text anno-tated with only POS tags. Various features and inductive biases are often used to incorpo-rate prior knowledge into learning. One use-ful type of prior information is that there exist correlations between the parameters of gram-mar rules involving different POS tags. Pre-vious work employed manually designed fea-tures or special prior distributions to encode such information. In this paper, we propose a novel approach to unsupervised dependen-cy parsing that uses a neural model to predict grammar rule probabilities based on distribut-ed representation of POS tags. The distributed representation is automatically learned from data and captures the correlations between POS tags. Our experiments show that our approach outperforms previous approaches u-tilizing POS correlations and is competitive with recent state-of-the-art approaches on nine different languages.

Website template credits