Topic Browsing StartUp Café

You can go straight to the StartUp Café topic browser, or take a moment to read the explanation below ...

What is this?

This is an experimental browsable index for a large collection of documents, applied to the StartUp Café blog, as an example.

For something different, see our wikitopics browser for the Wikileaks cablegate corpus.

For optimal viewing, you must use an up-to-date WebKit browser: Google Chrome or a recent Safari Webkit nightly build.

How do I use it?

The site is under development and will change, hopefully rapidly.

We start from a collection of documents (each viewed as a bag of words), and use Latent Dirichlet Allocation (LDA) to model the each document as a mixture of a number of topics. A topic is a probability distribution over words. Once we choose a fixed number of topics, LDA provides a set of topics and the proportions in which they should be mixed in each document to best approximate our collection.

The interface presents topics on one column and documents in the other. Hovering over a document highlights relevant topics. Selecting a topic highlights relevant documents.