Example 2 of applying author topic model (AT)

This example shows how to run the AT Gibbs sampler and extract samples from the same and different chains.

load 'bagofwords_nips'; % load the nips word document dataset
load 'authordoc_nips'; % load the author-document pairings for nips
load 'words_nips'; % Load the vocabulary
load 'authors_nips'; % Load the author names

The text file to show the topic-word and topic-author distributions

filename = 'topics_nips_at.txt';

Set the number of topics

T = 50;

Set the hyperparameters

BETA  = 0.01;
ALPHA = 50/T;

The number of iterations

BURNIN   = 50; % the number of iterations before taking samples
LAG      = 10; % the lag between samples
NSAMPLES = 2; % the number of samples for each chain
NCHAINS  = 2; % the number of chains to run

What output to show (0=no output; 1=iterations; 2=all output)

OUTPUT = 1;

The starting seed number

SEED = 1;

% Convert the sparse word-document matrix counts to vector representation
[ WS , DS ] = SparseMatrixtoCounts( WD );

This function might require 30-45 minutes of compute time

fprintf( 'The following computation might take 30-45 minutes...\n' );

for c=1:NCHAINS
    SEED = SEED + 1;
    N = BURNIN;
    fprintf( 'Running Gibbs sampler for burnin\n' );
    [ WP, AT , Z , X ] = GibbsSamplerAT( WS , DS , AD , T , N , ALPHA , BETA , SEED , OUTPUT );

    fprintf( 'Continue to run sampler to collect samples\n' );
    for s=1:NSAMPLES
        filename = sprintf( 'lda_chain%d_sample%d' , c , s );
        fprintf( 'Saving sample #%d from chain #%d: filename=%s\n' , s , c , filename );
        comm = sprintf( 'save ''%s'' WP AT Z X ALPHA BETA SEED N T s c' , filename );
        eval( comm );

        if (s < NSAMPLES)
           N = LAG;
           SEED = SEED + 1; % important -- change the seed between samples !!
           [ WP, AT , Z , X ] = GibbsSamplerAT( WS , DS , AD , T , N , ALPHA , BETA , SEED , OUTPUT , Z , X );
        end
    end
end
The following computation might take 30-45 minutes...
Running Gibbs sampler for burnin
	Iteration 0 of 50
	Iteration 10 of 50
	Iteration 20 of 50
	Iteration 30 of 50
	Iteration 40 of 50
Continue to run sampler to collect samples
Saving sample #1 from chain #1: filename=lda_chain1_sample1
	Iteration 0 of 10
Saving sample #2 from chain #1: filename=lda_chain1_sample2
Running Gibbs sampler for burnin
	Iteration 0 of 50
	Iteration 10 of 50
	Iteration 20 of 50
	Iteration 30 of 50
	Iteration 40 of 50
Continue to run sampler to collect samples
Saving sample #1 from chain #2: filename=lda_chain2_sample1
	Iteration 0 of 10
Saving sample #2 from chain #2: filename=lda_chain2_sample2

Load in the samples and show some topic-word and author-topic distributions

for c=1:NCHAINS
    for s=1:NSAMPLES
        filename = sprintf( 'lda_chain%d_sample%d' , c , s );
        fprintf( 'Loading sample #%d from chain #%d: filename=%s\n' , s , c , filename );
        comm = sprintf( 'load ''%s''' , filename );
        eval( comm );

        WPM{1} = WP; WPM{2} = AT;
        BETAM(1)=BETA; BETAM(2) = ALPHA;
        WOM{1}=WO; WOM{2}=AN;
        %%
        % Write the word topic and author topic distributions to a text file
        [ SM ] = WriteTopicsMult( WPM , BETAM , WOM , 7 , 0.7 , 4 , filename );

        fprintf( 'The first ten topic-word distributions:\n" );
        SM{1}(1:10)

        fprintf( 'The first ten author-topic distributions:\n" );
        SM{2}(1:10)
    end
end
Error: A MATLAB string constant is not terminated properly.