Wednesday, May 3, 2017

Machine Learning Classification with 1R and RIPPER Rule Learners (Edible/Poisonous Mushrooms)

We will develop a classification example using 1R and RIPPER rule learners algorithms. The exercise was originally published in "Machine Learning in R" by Brett Lantz, PACKT publishing 2015 (open source community experience destilled).

The example we will develop is about classifying edible and poisonous mushrooms.


We will carry out the exercise verbatim as published in the aforementioned reference and including some simple transformations to the dataset.

For more details on the algorithms it is recommended to check the aforementioned reference.


### install and load required packages
#install.packages("RWeka")

library(RWeka)

### read and explore the data
mushrooms <- read.csv("mushrooms.csv", stringsAsFactors = TRUE)
str(mushrooms)

### drop $veil_type column (does not provide)
### useful information

mushrooms$veil_type <- NULL

### look at the distribution of the muschroom type
### class variable

table(mushrooms$type)

### data transformation

#Below function transforms the $type column

transformType <- function(key){
  switch (as.character(key),
          'p' = 'poisonous',
          'e' = 'edible'
  )
}

#Below function transforms the $odor column
transformOdor <- function(key){
  switch (as.character(key),
          'a' = 'almond',
          'l' = 'anise',
          'c' = 'creosote',
          'y' = 'fishy',
          'f' = 'foul',
          'm' = 'musty',
          'n' = 'none',
          'p' = 'pungent',
          's' = 'spicy'
  )
}

### apply transformations
mushrooms$type <- sapply(mushrooms$type, transformType)
mushrooms$odor <- sapply(mushrooms$odor, transformOdor)

mushrooms$type <- as.factor(mushrooms$type)
mushrooms$odor <- as.factor(mushrooms$odor)
 

### training a model on the data

### use the 1R implementation in the RWeka package
### called OneR()


### consider all possible features to predict $type
mushroom_1R <- OneR(type ~ ., data = mushrooms)
mushroom_1R


### 1R one rule selected:
 




### evaluating model performance
summary(mushroom_1R)



### improving model performance
### JRip() Java-based implementation of the Ripper
### rule learning algorithm

mushroom_JRip <- JRip(type ~ ., data = mushrooms)

### RIPPER selected rules (9):
mushroom_JRip




You can get the example and the dataset in:
https://github.com/pakinja/Data-R-Value