Monday, May 22, 2017

Machine Learning. Support Vector Machines (Optical Character Recognition).

It is important to mention that the present posts series began as a personal way of practicing R programming and machine learning. Subsequently feedback from the community, urged me to continue performing these exercises and sharing them.  The bibliography and corresponding authors are cited at all times and this posts series is a way of honoring and giving them the credit they deserve for their work.
 
We will develop a support vector machines example. The example was originally published in "Machine Learning in R" by Brett Lantz, PACKT publishing 2015 (open source community experience destilled).


The example is about optical character recognition.

 
We will carry out the exercise verbatim as published in the aforementioned reference.

For more details on the support vector machines algorithms it is recommended to check the aforementioned reference or any other bibliography of your choice.



### "Machine Learning in R" by Brett Lantz,
### PACKT publishing 2015
### (open source community experience destilled)



### install an load required packages
install.packages("kernlab")

library(kernlab)

### read and explore the data
letters <- read.csv("letterdata.csv")
str(letters)

### SVM require all features to be numeric and each
### feature scaled to a fairly small interval
### kernlab package perform the rescaling
### automatically


### split the data into training and testing sets
### 80% - 20%
### data is already randomized

letters_train <- letters[1:16000, ]
letters_test <- letters[16001:20000, ]


### training a model on the data
### simple linear kernell function

letter_classifier <- ksvm(letter ~ ., data = letters_train,
                          kernel = "vanilladot")
letter_classifier


### evaluating model performance
letter_predictions <- predict(letter_classifier, letters_test)

head(letter_predictions)

table(letter_predictions, letters_test$letter)
agreement <- letter_predictions == letters_test$letter
table(agreement)
prop.table(table(agreement))


### improving model performance
### Gaussian RBF kernel

letter_classifier_rbf <- ksvm(letter ~ ., data = letters_train,
                              kernel = "rbfdot")
letter_predictions_rbf <- predict(letter_classifier_rbf,
                                  letters_test)
agreement_rbf <- letter_predictions_rbf == letters_test$letter
table(agreement_rbf)
prop.table(table(agreement_rbf))


### changing the kernel function accuracy increased from
### 84% to 93%


You can get the dataset in:
https://github.com/pakinja/Data-R-Value