Runs the Gibbs sampler for the LDA-COL (Collocation) model.

`[ WP,DP,WC,C,Z ] = GibbsSamplerLDACOL( WS , DS , SI , WW , T , N , ALPHA , BETA , GAMMA0, GAMMA1
, DELTA , SEED , OUTPUT );`

will run the Gibbs sampler for the LDA-COL model on a word stream specified by word `WS`, document `DS`, and status `SI` indices. All these vectors are of size 1 x `n` where `n` is the number of word tokens. In `SI`, a status value of 1 at position `k` indicates that the word at position `k` can form a collocation with the word at position `k`-1. This indicator is necessary to flag (with value SI=0) word positions that cross document boundaries or that cross word
positions that originally had stop words in between. `WW` is a `W` x `W` sparse matrix where `W` is the number of words in the vocabulary. `WW(i,j)` contains the count of the number of times that word `i` follows word `j` in the word stream.

The output matrices `WP` and `DP` contain the word-topic and document-topic counts for all words that were assigned to the topic route. `WC( i )` counts the number of times that word `i` led to a collocation with the next word in the word stream. `C` is of size 1 x `n` where `C( i )`=0 if word token `i` was assigned to the topic route, and `C( i )`=1 otherwise. `Z` is of size 1 x `n` and contains the topic assignments for all word tokens

`[ WP,DP,WC,C,Z ] = GibbsSamplerLDACOL( WS , DS , SI , WW , T , N , ALPHA , BETA , GAMMA0, GAMMA1
, DELTA , SEED , OUTPUT , C_IN , Z_IN );`

will run the model without random initialization. The initial conditions are specified by `C_IN` and `Z_IN`. This allows the model to continue from a previous state and provides the possibility of extracting multiple Gibbs samples
from a single Markov chain.

Notes

`N` determines the number of iterations to run the Gibbs sampler

`ALPHA`, `BETA`, `DELTA`, `GAMMA0` and `GAMMA1` are the hyperparameters in the model.

`SEED` sets the seed for the random number generator

`OUTPUT` determines the screen output by the sampler 0 = no output provided 1 = show the iteration number only 2 = show all output

The sampler uses its own random number generator and setting the seed for this function will not influence the random number seed for Matlab functions

References

Tom Griffiths, Technical Report, July 18, 2005.