Illustrates the "bars" example as discussed in Griffiths, T., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235.
This program creates a dataset of 500 images, each containing 25 pixels in a 5 x 5 grid. The intensity of any pixel is specified by an integer value between zero and infinity. This dataset is of exactly the same form as a word document co-occurrence matrix constructed from a database of documents, with each image being a document, with each pixel being a word, and with the intensity of a pixel being its frequency. The images are generated by defining a set of 10 topics corresponding to horizontal and vertical bars, then sampling a multinomial distribution for each image from a Dirichlet distribution with hyperparameter ALPHA, and sampling 100 pixels (words), where for each pixel, a topic is sampled, followed by a pixel from the topic
Parameters
D = 500; % number of images to create NX = 5; % number of rows in image NY = 5; % number of columns in image ns = 100; % the number of pixels to sample per image ALPHA = 1; % ALPHA hyperparameter BETA = 1; % BETA hyperparameter NITER = [ 0 1 5 10 50 100 ]; % the number of iterations per run DSHOW = 60; % maximum number of images to show NROWS = 5; % number of rows in image display NCOLS = 12; % number of columns in image display seed = 4; % seed for the random number generator % the number of topics is determined by the number of rows and columns of % the images T = NX+NY; W = NX*NY; % the number of pixels in an image rand( 'state' , seed ); randn( 'state' , seed );
Create artificial topics. Each topic is a horizontal or vertical bar in an image
topics = zeros( T , W ); i = 0;
Creating the vertical bars
for ii=1:NX i = i + 1; topicmatrix = zeros( NX , NY )'; topicmatrix( : , ii ) = 1; topics( i , : ) = topicmatrix( : ); end
Creating the horizontal bars
for ii=1:NY i = i + 1; topicmatrix = zeros( NX , NY )'; topicmatrix( ii , : ) = 1; topics( i , : ) = topicmatrix( : ); end fmaps = zeros( D , NX , NY ); w = zeros( D * ns , 1 ); d = zeros( size( w )); a = zeros( size( w )); n = 0; doc = 0; for j=1:D % sample a multinomial distribution from a dirichlet prior theta = drchrnd( repmat( ALPHA , 1 , T ) ,1); cumtheta = cumsum( theta ); for k=1:ns ran = rand; z = find( ran <= cumtheta , 1 , 'first' ); % sample a word from topic z cumphi = cumsum( topics( z , : )); ran = rand * cumphi( end ); index = find( ran <= cumphi , 1 , 'first' ); n = n + 1; w( n ) = index; % storing word index for i-th word d( n ) = j; % storing document index y = mod( index-1 , NY )+1; x = floor( (index-1)/NY )+1; fmaps( j,x,y ) = fmaps( j,x,y ) + 1; end end
Display topics (i.e., bars) used to generate the images
figure( 1 ); clf; colormap( gray ); maxcount = max( topics( : )); for j=1:T subplot( 2+length(NITER) , T , j ); fmap = reshape( topics( j , : ) , NY , NX ); imagesc( fmap , [ 0 maxcount ] ); axis off; axis square; if (j==1) title( 'Topics used to create image mixtures' , ... 'HorizontalAlignment' , 'left' ); end end
Display subset of articial images created
maxcount = max( fmaps( : )); figure( 2 ); clf; colormap( gray ); for wh=1:DSHOW subplot( (NROWS) +1 , NCOLS , wh ); fmap = squeeze( fmaps( wh,:,:))'; imagesc( fmap , [ 0 maxcount ] ); axis off; axis square; if (wh==1) title( 'A subset of images created by mixing topics' , ... 'HorizontalAlignment' , 'left' ); end end
Run topic model and display extracted topics
figure( 3 ); clf; colormap gray; NN = 0; for N=NITER [ WP,DP,Z ] = GibbsSamplerLDA( w , d , T , N , ALPHA , BETA , seed , 0 ); NN = NN + 1; maxcount = max( WP( : )); for j=1:T subplot( length(NITER) , T , (NN-1)*T+j ); fmap = reshape( WP( : , j )' , NY , NX ); imagesc( fmap , [ 0 maxcount ] ); axis off; axis square; if (j==1) title( sprintf( 'Topics after %d iterations' , N ), ... 'HorizontalAlignment' , 'left' ); end end end