Thursday, May 18, 2017

Machine Learning. Artificial Neural Networks (Strength of Concrete).

It is important to mention that the present posts series began as a personal way of practicing R programming and machine learning. Subsequently feedback from the community, urged me to continue performing these exercises and sharing them.  The bibliography and corresponding authors are cited at all times and this posts series is a way of honoring and giving them the credit they deserve for their work.
 
We will develop an artificial neural network example. The example was originally published in "Machine Learning in R" by Brett Lantz, PACKT publishing 2015 (open source community experience destilled).


The example we will develop is about predicting the strength of concrete based in the ingredients used to made it.

We will carry out the exercise verbatim as published in the aforementioned reference.

For more details on the model trees and regression trees algorithms it is recommended to check the aforementioned reference or any other bibliography of your choice.


### "Machine Learning in R" by Brett Lantz,
### PACKT publishing 2015
### (open source community experience destilled)
### based on: Yeh IC. "Modeling of Strength of
### High Performance Concrete Using Artificial
### Neural Networks." Cement and Concrete Research
### 1998; 28:1797-1808.

### Strength of concrete example
### relationship between the ingredients used in
### concrete and the strength of finished product

### Dataset
### Compressive strength of concrete
### UCI Machine Learning Data Repository
### http://archive.ics.uci.edu/ml


### install an load required packages

#install.packages("neuralnet")

library(neuralnet)

### read and explore the data
concrete <- read.csv("concrete.csv")
str(concrete)

### neural networks work best when the input data
### are scaled to a narrow range around zero

### normalize the dataset values

normalize <- function(x){
  return((x - min(x)) / (max(x) - min(x)) )
}

### apply normalize() to the dataset columns
concrete_norm <- as.data.frame(lapply(concrete, normalize))

### confirm and compare normalization
summary(concrete_norm$strength)
summary(concrete$strength)

### split the data into training and testing sets
### 75% - 25%

concrete_train <- concrete_norm[1:773, ]
concrete_test <- concrete_norm[774:1030, ]

### training model on the data
concrete_model <- neuralnet(strength ~ cement + slag +
              ash + water + superplastic + coarseagg +
              fineagg + age, data = concrete_train)

### visualize the network topology

plot(concrete_model)



### there is one input node for each of the eight
### features, followed by a single hidden node and
### a single output node that predicts the concrete
### strength
### at the bottom of the figure, R reports the number
### of training steps and an error measure called the
### the sum of squared errors (SSE)

### evaluating model performance

### predictions

model_results <- compute(concrete_model, concrete_test[1:8])
predicted_strength <- model_results$net.result

### because this is a numeric prediction problem rather
### than a classification problem, we cannot use a confusion
### matrix to examine model accuracy
### obtain correlation between our predicted concrete strength
### and the true value

cor(predicted_strength, concrete_test$strength)

### correlation indicate a strong linear relationships between
### two variables

### improving model performance
### increase the number of hidden nodes to five

concrete_model2 <- neuralnet(strength ~ cement + slag +
                 ash + water + superplastic + coarseagg +
            fineagg + age, data = concrete_train, hidden = 5)


plot(concrete_model2)




### SSE has been reduced significantly

### predictions
model_results2 <- compute(concrete_model2, concrete_test[1:8])
predicted_strength2 <- model_results2$net.result

### performance
cor(predicted_strength, concrete_test$strength)

### notice that results can differs because neuralnet
### begins with random weights
### if you'd like to match results exactly, use set.seed(12345)
### before building the neural network 

  
You can get the example and the dataset in:
https://github.com/pakinja/Data-R-Value

 

Wednesday, May 17, 2017

Training Neural Networks with Backpropagation. Original Publication.

Neural networks have been a very important area of scientific study that has evolved by different disciplines such as mathematics, biology, psychology, computer science, etc.
The study of neural networks leapt from theory to practice with the emergence of computers.
Training a neural network by adjusting the weights of the connections is computationally very expensive so its application to practical problems took until the mid-80s when a more efficient algorithm was discovered.

That algorithm is now known as back-propagation errors or simply backpropagation.

One of the most cited articles on this algorithm is:

Learning representations by back-propagating errors
David E. Rumelhart*, Geoffrey E. Hinton & Ronald J. Williams*
Nature 323, 533 - 536 (09 October 1986)



Although it is a very technical article, anyone who wants to study and understand neural networks is obliged to pass through this material.

I share the entire article in:
https://github.com/pakinja/Data-R-Value

Tuesday, May 16, 2017

Machine Learning. Stock Market Data, Part 3: Quadratic Discriminant Analysis and KNN.

It is important to mention that the present posts series began as a personal way of practicing R programming and machine learning. Subsequently feedback from the community, urged me to continue performing these exercises and sharing them.  The bibliography and corresponding authors are cited at all times and this posts series is a way of honoring and giving them the credit they deserve for their work.

We will develop a quadratic discriminant analysis and a K nearest neighbors examples. The examples was originally published in "An Introduction to Statistical Learning. With applications in R" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Springer 2015.




The example we will develop is about classifying when the market value will rise (UP) or fall (Down).
 

We will carry out the exercise verbatim as published in the aforementioned reference and only with slight changes in the coding style.

For more details on the models, algorithms and parameters interpretation, it is recommended to check the aforementioned reference or any other bibliography of your choice.



### "An Introduction to Statistical Learning.
### With applications in R" by Gareth James,
### Daniela Witten, Trevor Hastie and Robert Tibshirani.
### Springer 2015.


### install and load required packages

library(ISLR)
library(psych)
library(MASS)
library(class)

### split the dataset into train and test sets
train <- (Smarket$Year < 2005)
Smarket.2005 <- Smarket[!train, ]
Direction.2005 <- Smarket$Direction[!train]

### Quadratic Discriminant Analysis

### perform quadratic discriminant analysis QDA on the stock
### market data

qda.fit <- qda(Direction ~ Lag1 + Lag2, data = Smarket, subset = train)

### the output does not contain the coefficients of the linear discriminants,
### because the QDA classifier involves a quadratic, rather than a linear,
### function of the predictors

qda.fit

### predictions
qda.class <- predict(qda.fit, Smarket.2005)$class
table(qda.class, Direction.2005)
mean(qda.class == Direction.2005)

### QDA predictions are accurate almost 60 % of the time,
### even though the 2005 data was not used to fit the model
### quite impressive for stock market data, which is known to
### be quite hard to model accurately

### K-Nearest Neighbors

### split the dataset into train and test sets

train.X <- cbind(Smarket$Lag1, Smarket$Lag2)[train, ]
test.X <- cbind(Smarket$Lag1, Smarket$Lag2)[!train, ]
train.Direction <- Smarket$Direction[train]

### knn() function can be used to predict the market’s movement
### for the dates in 2005

set.seed(1)
knn.pred <- knn(train.X, test.X, train.Direction, k = 1)
table(knn.pred, Direction.2005)                  
(83+43)/252                  

### the results using K = 1 are not very good, since only 50 % of the
### observations are correctly predicted

### try k=3

knn.pred <- knn(train.X, test.X, train.Direction, k = 3)
table(knn.pred, Direction.2005)
mean(knn.pred == Direction.2005)

### the results have improved slightly. But increasing K further

### turns out to provide no further improvements. It appears that for ### this data, QDA provides the best results of the methods that we
### have examined so far.

You can get the example in:
https://github.com/pakinja/Data-R-Value 

Monday, May 15, 2017

Machine Learning. Stock Market Data, Part 2: Linear Discriminant Analysis.

It is important to mention that the present posts series began as a personal way of practicing R programming and machine learning. Subsequently feedback from the community, urged me to continue performing these exercises and sharing them.  The bibliography and corresponding authors are cited at all times and this posts series is a way of honoring and giving them the credit they deserve for their work.
 
We will develop a linear discriminant analysis example. The exercise was originally published in "An Introduction to Statistical Learning. With applications in R" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Springer 2015.

The example we will develop is about classifying when the market value will rise (UP) or fall (Down).
 
We will carry out the exercise verbatim as published in the aforementioned reference and only with slight changes in the coding style.

For more details on the models, algorithms and parameters interpretation, it is recommended to check the aforementioned reference or any other bibliography of your choice.



### "An Introduction to Statistical Learning.
### with applications in R" by Gareth James,
### Daniela Witten, Trevor Hastie and Robert Tibshirani.
### Springer 2015.


### install and load required packages

library(ISLR)
library(psych)
library(MASS)

### perform linear discriminant analysis LDA on the stock
### market data

train <- (Smarket$Year < 2005)
Smarket.2005 <- Smarket[!train, ]
Direction.2005 <- Smarket$Direction[!train]

lda.fit <- lda(Direction ~ Lag1 + Lag2 , data = Smarket , subset = train )
lda.fit





### LDA indicates that 49.2% of training observations
### correspond to days during wich the market went down

### group means suggest that there is a tendency for the
### previous 2 days returns to be negative on days when the
### market increases, and a tendency for the previous days
### returns to be positive on days when the market declines

### coefficients of linear discriminants output provides the
### linear combination of Lag1 and Lag2 that are used to form
### the LDA decision rule

### if (−0.642 * Lag1 − 0.514 * Lag2) is large, then the LDA
### classifier will predict a market increase, and if it is
### small, then the LDA classifier will predict a market decline

### plot() function produces plots of the linear discriminants,
### obtained by computing (−0.642 * Lag1 − 0.514 * Lag2) for
### each of the training observations

plot (lda.fit)




### predictions

### the LDA and logistic regression predictions are almost identical
lda.pred <- predict(lda.fit, Smarket.2005)
names(lda.pred)

lda.class <- lda.pred$class
table(lda.class, Direction.2005)
mean(lda.class == Direction.2005)

### apply a 50% threshold to the posterior probabilities allows
### us to recreate the predictions in lda.pred$class

sum(lda.pred$posterior [ ,1] >= 0.5)
sum(lda.pred$posterior [ ,1] < 0.5)

### posterior probability output by the model corresponds to
### the probability that the market will decrease

lda.pred$posterior[1:20 ,1]
lda.class[1:20]

### use a posterior probability threshold other than 50 % in order
### to make predictions

### suppose that we wish to predict a market decrease only if we
### are very certain that the market will indeed decrease on that
### day-say, if the posterior probability is at least 90%

sum(lda.pred$posterior[ ,1] > 0.9)

### No days in 2005 meet that threshold! In fact, the greatest
### posterior probability of decrease in all of 2005 was 52.02%


You can get the example in:
https://github.com/pakinja/Data-R-Value 
 

Sunday, May 14, 2017

Machine Learning. Stock Market Data, Part 1: Logistic Regression.

It is important to mention that the present posts series began as a personal way of practicing R programming and machine learning. Subsequently feedback from the community, urged me to continue performing these exercises and sharing them. 
The bibliography and corresponding authors are cited at all times and this posts series is a way of honoring and giving them the credit they deserve for their work.

We will develop a logistic regression example. The exercise was originally published in "An Introduction to Statistical Learning. With applications in R" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Springer 2015.


The example we will develop is about predicting when the market value will rise (UP) or fall (Down).

We will carry out the exercise verbatim as published in the aforementioned reference and only with slight changes in the coding style.

For more details on the models, algorithms and parameters interpretation, it is recommended to check the aforementioned reference or any other bibliography of your choice.


### "An Introduction to Statistical Learning.
### With applications in R" by Gareth James,
### Daniela Witten, Trevor Hastie and Robert Tibshirani.
### Springer 2015.


### install and load required packages

library(ISLR)
library(psych)

### explore the dataset
names(Smarket)
dim(Smarket)
summary(Smarket)

### correlation matrix
cor(Smarket[,-9])


### correlations between th lag variables and today
### returns are close to zero
### the only substantial correlation is between $Year
### and $Volume

plot(Smarket$Volume, main= "Stock Market Data",
     ylab = "Volume")



 

### scatterplots, distributions and correlations
pairs.panels(Smarket)




### fit a logistic regression model to predict $Direction
### using $Lag1 through $Lag5 and $Volume
### glm(): generalized linear model function
### family=binomial => logistic regression

glm.fit <- glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume,
               data = Smarket, family = binomial)
summary(glm.fit)

### the smallest p_value is associated with Lag1
### the negative coefficient for this predictor suggests
### that if the market had a positive return yesterday,
### then it is less likely to go up today
### at a value of 0.15, the p-value is still relatively large,
### and so there is no clear evidence of a real association
### between $Lag1 and $Direction


### explore fitted model coefficients
coef(glm.fit)
summary(glm.fit)$coef
summary(glm.fit)$coef[ ,4]

### predict the probability that the market will go up,
### given values of the predictors

glm.probs <- predict(glm.fit, type="response")
glm.probs[1:10]
contrasts(Smarket$Direction)

### these values correspond to the probability of the market
### going up, rather than down, because the contrasts()
### function indicates that R has created a dummy variable with
### a 1 for Up

### create a vector of class predictions based on whether the
### predicted probability of a market increase is greater than
### or less than 0.5

glm.pred <- rep ("Down", 1250)
glm.pred[glm.probs > .5] <- "Up"

### confusion matrix in order to determine how many observations
### were correctly or incorrectly classified

table(glm.pred, Smarket$Direction)
mean(glm.pred == Smarket$Direction)

### model correctly predicted that the market would go up on 507
### days and that it would go down on 145 days, for a total of
### 507 + 145 = 652 correct predictions
### ogistic regression correctly predicted the movement of the
### market 52.2 % of the time

### to better assess the accuracy of the logistic regression model
### in this setting, we can fit the model using part of the data,
### and then examine how well it predicts the held out data

train <- (Smarket$Year < 2005)
Smarket.2005 <- Smarket[!train, ]
dim(Smarket.2005)

Direction.2005 <- Smarket$Direction[!train]

glm.fit <- glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume,
               data = Smarket, family = binomial, subset = train)
glm.probs <- predict(glm.fit, Smarket.2005, type = "response")

### compute the predictions for 2005 and compare them to the

### actual movements of the market over that time period
glm.pred <- rep("Down", 252)
glm.pred[glm.probs > 0.5] <- "Up"
table(glm.pred, Direction.2005)

mean(glm.pred == Direction.2005)
mean(glm.pred != Direction.2005)

### not generally expect to be able to use previous days returns ### to predict future market performance

### refit the logistic regression using just $Lag1 and $Lag2,

### which seemed to have the highest predictive power in the
### original logistic regression model
glm.fit <- glm(Direction ~ Lag1 + Lag2 , data = Smarket,
               family = binomial, subset = train)
glm.probs <- predict(glm.fit, Smarket.2005 , type = "response")
glm.pred <- rep("Down", 252)
glm.pred[glm.probs > 0.5] <- "Up"
table(glm.pred, Direction.2005)
mean(glm.pred == Direction.2005)

### results appear to be a little better: 56%

### if we want to predict the returns associated with particular
### values of $Lag1 and $Lag2

predict(glm.fit, newdata = data.frame(Lag1 = c (1.2 ,1.5),
           Lag2 = c(1.1, -0.8)) , type = "response")



You can get the example in:
https://github.com/pakinja/Data-R-Value