LION COMMUNITY USAGE CASE

LIONbook: K nearest neighbors: Iris class identification.

This is an exercise associated with the LIONbook
If you are new to the LIONoso software you may want to download it to do the exercise.

This is perhaps the best known (simple) database in machine learning. The task is to predict the class of iris plant. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant (Iris Setosa, Iris Versicolor, Iris Virginica). The number of attributes is four, of numeric type (four geometric measures of the flower: sepal length in cm, sepal width in cm, petal length in cm, petal width in cm).

Step 1: Load data files, load KNN factory, start training

Drag the icons in the left control window to the right "workbench" part. Connect the KNN (K-nearest neighbor) model factory to the training file (by dropping it on the file icon, or by drawing an arrow beteen them). When done, click on "start training", as in the figure.

Step 2: Visualize how the input space is subdivided into three classes.

Right-click on the KNN classifier produced and select "new sweeper panel".

Move between "workbench" and "dashboard" views by clicking on the top-left tabs. In dashboard, in the "sweeper" visualization, the output class is colored with three different colors. Move the sliders in the bottom-left control window to change values of some input parameters. The image visualizes a two-dimensional slice of the input space, the color corresponds the corresponding classification.

Step 3: test the KNN classifier.

Load the iris-test.csv file, connect it to the KNN classifier to get an output table. The produced table will contain a column with the desired ("Class-target") output, and a new column with the predicted output ("Class").

In this simple case, all 30 test examples are correctly recognized, the "Class" and "Class-target" are coincident.

Step 4: explore the data.

Go to the dashboard, feel free to drag attributes from the left control window to the dashboard panel, and select the different visualizations.

A bubblechart combined with a "parallel coordinates" display is shown below. Handles along the parallel coordinates can be used to filter the data.

Familiarize yourself with LIONoso interface and have fun.

Example data.

References:

[1]Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950).
[LIONbook] The LION way
Roberto Battiti and Mauro Brunato. LIONlab, University of Trento, Feb 2014.