This example shows how to run the LDA Gibbs sampler on a small dataset to produce multiple Gibbs samples from the same and different Markov chains
Choose the dataset
dataset = 1; % 1 = psych review abstracts 2 = NIPS papers if (dataset == 1) % load the psych review data in bag of words format load 'bagofwords_psychreview'; % Load the psych review vocabulary load 'words_psychreview' elseif (dataset == 2) % load the nips dataset load 'bagofwords_nips'; % load the nips vocabulary load 'words_nips' end
Set the number of topics
T=50;
Set the hyperparameters
BETA=0.01; ALPHA=50/T;
What output to show (0=no output; 1=iterations; 2=all output)
OUTPUT = 1;
The number of iterations
BURNIN = 100; % the number of iterations before taking samples LAG = 10; % the lag between samples NSAMPLES = 2; % the number of samples for each chain NCHAINS = 2; % the number of chains to run
The starting seed number
SEED = 1; for c=1:NCHAINS SEED = SEED + 1; N = BURNIN; fprintf( 'Running Gibbs sampler for burnin\n' ); [ WP,DP,Z ] = GibbsSamplerLDA( WS , DS , T , N , ALPHA , BETA , SEED , OUTPUT ); fprintf( 'Continue to run sampler to collect samples\n' ); for s=1:NSAMPLES filename = sprintf( 'lda_chain%d_sample%d' , c , s ); fprintf( 'Saving sample #%d from chain #%d: filename=%s\n' , s , c , filename ); comm = sprintf( 'save ''%s'' WP DP Z ALPHA BETA SEED N Z T s c' , filename ); eval( comm ); WPM{ s , c } = WP; if (s < NSAMPLES) N = LAG; SEED = SEED + 1; % important -- change the seed between samples !! [ WP,DP,Z ] = GibbsSamplerLDA( WS , DS , T , N , ALPHA , BETA , SEED , OUTPUT , Z ); end end end
Running Gibbs sampler for burnin Iteration 0 of 100 Iteration 10 of 100 Iteration 20 of 100 Iteration 30 of 100 Iteration 40 of 100 Iteration 50 of 100 Iteration 60 of 100 Iteration 70 of 100 Iteration 80 of 100 Iteration 90 of 100 Continue to run sampler to collect samples Saving sample #1 from chain #1: filename=lda_chain1_sample1 Iteration 0 of 10 Saving sample #2 from chain #1: filename=lda_chain1_sample2 Running Gibbs sampler for burnin Iteration 0 of 100 Iteration 10 of 100 Iteration 20 of 100 Iteration 30 of 100 Iteration 40 of 100 Iteration 50 of 100 Iteration 60 of 100 Iteration 70 of 100 Iteration 80 of 100 Iteration 90 of 100 Continue to run sampler to collect samples Saving sample #1 from chain #2: filename=lda_chain2_sample1 Iteration 0 of 10 Saving sample #2 from chain #2: filename=lda_chain2_sample2
Inspect the first few topics of a few samples
for c=1:NCHAINS for s=1:NSAMPLES [S] = WriteTopics( WPM{s,c} , BETA , WO ); fprintf( 'Example topics of chain %d sample %d\n' , c , s ); S(1:5) end end
Example topics of chain 1 sample 1
ans =
'perceptual conditions patterns result organization'
'theories similarity proposed psychological dimensions'
'word words network semantic model'
'problems research strategies empirical theoretical'
'models data based simple rules'
Example topics of chain 1 sample 2
ans =
'perceptual conditions result patterns psychological'
'theories similarity proposed shown dimensions'
'word model words network semantic'
'research problems theoretical strategies methods'
'models data based rules simple'
Example topics of chain 2 sample 1
ans =
'account spatial defined objects series'
'problem problems related variables psychological'
'information processing stage motion rt'
'visual perception perceptual target masking'
'social approach levels system principles'
Example topics of chain 2 sample 2
ans =
'account alternative objects terms spatial'
'problem problems variables independent solving'
'information processing motion stage stages'
'visual perception perceptual target masking'
'social approach levels specific empirical'