Thursday, January 12, 2017

Simulating some synthetic data.

In many cases we require some data with certain characteristics to develop a model, perform research, to test an algorithm or simply to practice.
Here I show an example of how to generate some synthetic data that can help you to generate your own.


We will need the ggplot2 library to display our data:

library(ggplot2)

Now we define the dimensions of the arrangement we need:

lrows <- 3035
lcols <- 11


in this case 3035 rows and 11 columns.

Now we define the array first containing zeros on all entries:

syn_data <- array(data = 0, dim = c(lrows, lcols))

Our data look like this:


Now let's name each field (column):

colnames(syn_data) <- c("ONE","NUMBER","R1","R2","R3",
                       "R4", "R5", "R6","NINE","TEN", "ELEVEN"
)


Now let's assign values to some columns. 

syn_data[,2] <- c(seq(lrows, 1))
syn_data[,1] <- c(runif(lrows, 0.0, 7.5))
syn_data[,9] <- c(runif(lrows, 10, 100))
syn_data[,10] <-c(runif(lrows, 5.0, 50))
syn_data[,11] <-c(runif(lrows, 30.0, 60.0))


You can see each line of the script and see what kind of value it assigned to each entry of which column:



Now for columns R1 to R6 I want to assign a random integer value between 1 and 56 for which we use the following -for- and -while- cycle:

for(i in 1:lrows){
  j = 1
 while(j <= 6){
   syn_data[i,j+2] <- sample(1:56,1)
  j = j + 1
  }
 }


Now our data looks like this:


So far we have our synthetic data. Now let's do some treatments.
 
First we convert the array to a Data Frame type object:

syn_data <- as.data.frame(syn_data)

Now let us calculate the mean of each row from R1 to R6 and accumulate these means in a vector:

smeans = vector()

for(i in 1:lrows){
  smeans[i] <- sum(syn_data[i , 3:8])/6
}



Finally we perform a visualization of the vector of means:



This is a very crude example and is actually inefficient but it is a start. It is up to you to improve it and adapt it to your needs.

You can download the script from this example in:
https://github.com/pakinja 

Tuesday, January 10, 2017

Exercise to Simulate and Fit a Parabolic Shot.

In this exercise we assume that we obtained the measurements of the height and the distance of the movement of a projectile by means of an experiment.

We define the vectors "Height" and "Distance" that have our physical measurements.
We fit a second-order model "lmr".

We defined a scale of heights "nh" to evaluate the motion equation.
With the coefficients of the adjusted model we define the motion equation "fit" and finally we plot.


Height = c(100, 200, 300, 450, 600, 800, 1000)
Distance = c(253, 337, 395, 451, 495, 535, 576) 
 

lmr = lm(Distance ~ Height + I(Height^2))
lmr
 

nh = seq(100, 1000, 10)
 

fit = lmr$coefficients[1]+ lmr$coefficients[2]*nh + 
lmr$coefficients[3]*nh^2
 

fit

plot(Height, Distance, col = "red", main="Parabolic Shot Experiment")
 

lines(nh, fit, lty=1, col = "blue")


Get the example in:
https://github.com/pakinja