This example shows how to run the LDA Gibbs sampler on a small dataset to extract a set of topics and shows the most likely words per topic. It also writes the results to a file
Choose the dataset
dataset = 1; % 1 = psych review abstracts 2 = NIPS papers if (dataset == 1) % load the psych review data in bag of words format load 'bagofwords_psychreview'; % Load the psych review vocabulary load 'words_psychreview' elseif (dataset == 2) % load the nips dataset load 'bagofwords_nips'; % load the nips vocabulary load 'words_nips' end
Set the number of topics
T=50;
Set the hyperparameters
BETA=0.01; ALPHA=50/T;
The number of iterations
N = 100;
The random seed
SEED = 3;
What output to show (0=no output; 1=iterations; 2=all output)
OUTPUT = 1;
This function might need a few minutes to finish
tic [ WP,DP,Z ] = GibbsSamplerLDA( WS , DS , T , N , ALPHA , BETA , SEED , OUTPUT ); toc
Iteration 0 of 100 Iteration 10 of 100 Iteration 20 of 100 Iteration 30 of 100 Iteration 40 of 100 Iteration 50 of 100 Iteration 60 of 100 Iteration 70 of 100 Iteration 80 of 100 Iteration 90 of 100 Elapsed time is 14.615118 seconds.
Just in case, save the resulting information from this sample
if (dataset==1) save 'ldasingle_psychreview' WP DP Z ALPHA BETA SEED N; end if (dataset==2) save 'ldasingle_nips' WP DP Z ALPHA BETA SEED N; end
Put the most 7 likely words per topic in cell structure S
[S] = WriteTopics( WP , BETA , WO , 7 , 0.7 );
fprintf( '\n\nMost likely words in the first ten topics:\n' );
Most likely words in the first ten topics:
Show the most likely words in the first ten topics
S( 1:10 )
ans =
'memory recognition retrieval recall items item list'
'representations representation bias single scale known relation'
'information processing action levels automatic components controlled'
'theory theories predictions account explain formation esteem'
'visual attention brain mechanism selection color attentional'
'speech system target motor masking neural relative'
'network semantic ability test lexical predictions normal'
'states related emotional positive primary arousal motivation'
'control conditioning responses conditions avoidance procedures suggested'
'stimulus effects shown concepts trial free generalization'
Write the topics to a text file
WriteTopics( WP , BETA , WO , 10 , 0.7 , 4 , 'topics.txt' ); fprintf( '\n\nInspect the file ''topics.txt'' for a text-based summary of the topics\n' );
Inspect the file 'topics.txt' for a text-based summary of the topics