LION COMMUNITY USAGE CASE

LIONbook: Linear regression: Handwritten digit recognition.

This is an exercise associated with the LIONbook (chapter 4), and with the LFD book and course by prof. Yaser Abu-Mostafa (Caltech) .

Recognizing handwritten digits is a classic and very useful usage case in learning from data. Here one starts from 16x16 pixel images of digits - from the USA Postal Service Zip Code Database - labeled with the corresponding class (0,1,2,...,9), to learn a map which will generalize to new digits, digits not present in the learning database. This is why the original set is randomly split into two subsets, one for training, the other one for testing the performance.

Using linear regression in LIONoso is seamless if you are already familiar with the user interface. The following description is of tutorial nature, but be sure to read the first pages in the LIONoso manual to familiarize yourself with the user interface (the manual is included in the software distribution). Click on the images to get a larger version.

Step 1: Load data files and linear regression, connect and set parameters

Linear regression is available in LIONoso though the polynomial fit factory.

To load data files and the polynomial fit factory (the creator of linear models) drag the corresponding "CSV file" and "polynomial fit factory" to the workbench to the right, and connect them by drawing an arrow. The files one-five-train.csv and one-five-test.csv contain the intensity and symmetry features only for the "one" and "five" digits. Intensity summarizes the overall number of black pixels, symmetry describes how a character is similar to its specular image, as described in [aml]. These data were made available by the neural network group at AT&T research labs (thanks to Yann Le Cunn).

The parameters for the "polynomial fit factory" are: inputs (Symmetry and Intensity), output (Class), Polynomial degree 1 (for a linear model). Click "Start training" to create the linear model (same icon but no gear symbol).

Step 2: Analyze the mapping

Right-click the linear model icon and select "New sweeper panel". Click on Dashboard to visualize the output sweeper: the inputs of the linear regression in a range are tested and the output is color-coded to illustrate how the input space is mapped to the output. After putting a threshold at the middle value, the space is split into two areas, corresponding to digits "one" and "five".

Step 3: Visualize target classification of training set

Right-click the icon of the "one-five-train" table and select "New panel - Bubble". Click on Dashboard to visualize the Bubble-chart. Drag the variables Intensity and Symmetry (if not already visualized), and color data with the Class parameter.

Step 4: Visualize classification of test set

Open the one-five-test.csv file to get the "one-five-test" table. Connect the new table to the linear node to get the predicted output class. Right-click the icon with the prediction table and select "New panel - Bubble". Click on Dashboard to visualize the Bubble-chart. Drag the variables Intensity and Symmetry (if not already visualized), and color data with the Class parameter.

As expected from the sweeper display, the predicted classification (right plot) now corresponds to the linear classification. Additional insight about the linear model, the classification error, the classification statistics, etc. can be obtained by associating the results in the tables with the desired LIONoso plots. The following image show the results for the "one-five-test.csv" files containing test digits: the bar chart with subclasses showing how, in the test file, the desired "Class-target" is classified into the different "Class" values. For example, the limited amount of confusion between digits "1" and "5" is immediately spotted. The experiment with LIONoso is a matter of less than one minute.

Example data.

References:

[LIONbook] The LION way
Roberto Battiti and Mauro Brunato. LIONlab, University of Trento, Feb 2014.
[aml] Learning from data
Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin. 2012.
Download the LIONoso-ready data file: zipcode-digits-0-5.lion