You can go straight to the OpenBiz topic browser, or take a moment to read the explanation below ...

What is this?

This is an experimental browsable index for a large collection of documents, applied to a set of technology transfer case studies, as an example.

For something more large-scale, see our wikitopics browser for the Wikileaks cablegate corpus.

For optimal viewing, you must use an up-to-date WebKit browser: Google Chrome or a recent Safari Webkit nightly build.

How do I use it?

The site is under development and will change, hopefully rapidly.

We start from a collection of documents (each viewed as a bag of words), and use Latent Dirichlet Allocation (LDA) to model the each document as a mixture of a number of topics. A topic is a probability distribution over words. Once we choose a fixed number of topics, LDA provides a set of topics and the proportions in which they should be mixed in each document to best approximate our collection.

The interface presents topics on one column and documents in the other. Hovering over a document highlights relevant topics. Selecting a topic highlights relevant documents.