The Challenge is for you to develop a topic model beyond a common approach (latent Dirichlet allocation, LDA). Your topic model (“Solution”) must meet: (a) the Challenge Objectives, (b) follow the Challenge Instructions and Requirements, and (c) incorporate the Key Deliverables.

Challenge Objectives:

Topic models, methods that extract themes from unstructured text data, often provide a first layer of insights. Two main requirements regarding model output present a challenge to traditional topic models:

  1. Correlations between topics present in many of the corpora violate the assumption of topic independence made by many topic models, including the popular latent Dirichlet allocation (“LDA”).
  2. Hierarchical models, which provide insights on multiple levels along the spectrum of broad to detailed, are more useful than a single, coarse, high-level segmentation of documents into large thematic bins.

Therefore, the two main objectives in this Challenge are for your Solution to appropriately handle correlated topics and to generate subtopics (in addition to topics).

Instructions and Requirements:

When creating your Solution, you may use a novel combination of existing machine learning and/or statistical methods, or develop your own novel method in order to extract and/or represent thematic information from the text. Either way, the output needs to include:

  1. a distribution of topics over documents, and
  2. a distribution of words over topics.


For more information and to apply, click here https://www.mindsumo.com/contests/campus-analytics-challenge-2019