Example 1 of applying topic models to images

Illustrates the "bars" example as discussed in Griffiths, T., & Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235.

This program creates a dataset of 500 images, each containing 25 pixels in a 5 x 5 grid. The intensity of any pixel is specified by an integer value between zero and infinity. This dataset is of exactly the same form as a word document co-occurrence matrix constructed from a database of documents, with each image being a document, with each pixel being a word, and with the intensity of a pixel being its frequency. The images are generated by defining a set of 10 topics corresponding to horizontal and vertical bars, then sampling a multinomial distribution for each image from a Dirichlet distribution with hyperparameter ALPHA, and sampling 100 pixels (words), where for each pixel, a topic is sampled, followed by a pixel from the topic

Parameters

D  = 500;  % number of images to create

NX = 5;    % number of rows in image
NY = 5;    % number of columns in image

ns = 100;  % the number of pixels to sample per image

ALPHA = 1;   % ALPHA hyperparameter
BETA  = 1;   % BETA hyperparameter

NITER = [ 0 1 5 10 50 100 ]; % the number of iterations per run

DSHOW = 60; % maximum number of images to show
NROWS = 5;  % number of rows in image display
NCOLS = 12; % number of columns in image display

seed  = 4; % seed for the random number generator

% the number of topics is determined by the number of rows and columns of
% the images
T  = NX+NY;
W  = NX*NY; % the number of pixels in an image

rand( 'state' , seed );
randn( 'state' , seed );

Create artificial topics. Each topic is a horizontal or vertical bar in an image

topics = zeros( T , W );

i = 0;

Creating the vertical bars

for ii=1:NX
   i = i + 1;
   topicmatrix = zeros( NX , NY )';
   topicmatrix( : , ii ) = 1;
   topics( i , : ) = topicmatrix( : );
end

Creating the horizontal bars

for ii=1:NY
   i = i + 1;
   topicmatrix = zeros( NX , NY )';
   topicmatrix( ii , : ) = 1;
   topics( i , : ) = topicmatrix( : );
end

fmaps = zeros( D , NX , NY );
w     = zeros( D * ns , 1 );
d     = zeros( size( w ));
a     = zeros( size( w ));
n   = 0;
doc = 0;
for j=1:D
    % sample a multinomial distribution from a dirichlet prior
    theta = drchrnd( repmat( ALPHA , 1 , T ) ,1);
    cumtheta  = cumsum( theta );
    for k=1:ns
        ran  = rand;
        z    = find( ran <= cumtheta , 1 , 'first' );
        % sample a word from topic z
        cumphi = cumsum( topics( z , : ));
        ran    = rand * cumphi( end );
        index  = find( ran <= cumphi , 1 , 'first' );
        n = n + 1;
        w( n )  = index; % storing word index for i-th word
        d( n )  = j;  % storing document index
        y = mod( index-1 , NY )+1;
        x = floor( (index-1)/NY )+1;
        fmaps( j,x,y ) = fmaps( j,x,y ) + 1;
    end
end

Display topics (i.e., bars) used to generate the images

figure( 1 ); clf;
colormap( gray );
maxcount = max( topics( : ));
for j=1:T
    subplot( 2+length(NITER) , T , j );
    fmap = reshape( topics( j , : ) , NY , NX );
    imagesc( fmap , [ 0 maxcount ] );
    axis off;
    axis square;
    if (j==1) title( 'Topics used to create image mixtures' , ...
            'HorizontalAlignment' , 'left' ); end
end

Display subset of articial images created

maxcount = max( fmaps( : ));
figure( 2 ); clf;
colormap( gray );
for wh=1:DSHOW
    subplot( (NROWS) +1 ,  NCOLS , wh );
    fmap = squeeze( fmaps( wh,:,:))';
    imagesc( fmap , [ 0 maxcount ] );
    axis off;
    axis square;

    if (wh==1) title( 'A subset of images created by mixing topics' , ...
            'HorizontalAlignment' , 'left' ); end
end

Run topic model and display extracted topics

figure( 3 ); clf;
colormap gray;

NN = 0;
for N=NITER
    [ WP,DP,Z ] = GibbsSamplerLDA( w , d , T , N , ALPHA , BETA , seed , 0 );
    NN = NN + 1;
    maxcount = max( WP( : ));
    for j=1:T
        subplot( length(NITER) , T , (NN-1)*T+j );
        fmap = reshape( WP( : , j )' , NY , NX );
        imagesc( fmap , [ 0 maxcount ] );
        axis off;
        axis square;

        if (j==1) title( sprintf( 'Topics after %d iterations' , N ), ...
                'HorizontalAlignment' , 'left' ); end
    end
end