Tuesday, January 3, 2017

Function to create an amortization fund table in R

Function to create an amortization fund table.
We establish the precision with which we want to work

options(digits = 7)

we say we do not want to work in scientific notation

options(scipen = 999)

We declare vectors to store the values of the different variables

I <- numeric();CA <<- numeric();SF <<- numeric()

We give values for the amount, number of periods and interest rate

M <- 2400000
n <- 50
i <- 3.7/100

We create function:

f.amort <- function(M,i,n) {
  R <<- M*i/(((1 + i)**n)-1)
  I[1] <<- 0
  I[2] <<- R*i
  CA[1] <<- R
  SF[1] <<- R
  for (k in 1:(n-1)) { 
    CA[k+1] <<- R + I[k+1]
    SF[k+1] <<- SF[k] + CA[k+1]
    if (k < n-1){
      I[k+2] <<- SF[k+1]*i 

f.amort(M, i, n)

tabla <- cbind(I,CA,SF)
rtotal <- R*n
totales <- c(rtotal,sum(I),sum(CA),000)
renta <- c(R, recursive=TRUE)

tabla <- cbind(renta, tabla)
tabla <- rbind(tabla, totales)
colnames(tabla) <- c("Rent","Interest","
Accumulated Amount", "Closing Balance")

You can get the script in:

Brief tutorial to perform descriptive statistics using R with two examples.

Using a pair of databases we will do a brief and basic analysis of descriptive statistics using R.
At the end of the article you can find the corresponding links to get both the script and the databases so that you can perform the exercise.

Install and load the packages we are going to use



It is Necessary to define the working directory. In this directory are your databases. To define it is through the function:


However, if we are working in RStudio, it is easy to define our working directory: at the top of our program we choose Tools -> Set Working Directory -> Choose Directory ...
Then select the folder where the data is located.

Once we have defined the working directory, we read my data:

First Example

OAS <- read.csv("OAS.csv", header = T)

In this case, the data is stored in a csv (coma separate value)
format, a very efficient way for fairly large databases.
The database is defined as the name of the variable and their respective values; This is why I write header = T. It is also possible to refer to the variables by their name, this by means of:


Now we plot a first graph: graph of vertical bars.

barplot(Population, names.arg = StatesMembers, las = 2, cex.axis = 0.7, cex.names = 0.6, col = terrain.colors(length(Poblacion)), main = "Population of states belonging to the OAS", horiz = F)

Now graph of horizontal bars.

barplot(Population, names.arg = StatesMembers, las = 1, cex.axis = 0.7, cex.names = 0.6, col = terrain.colors(length(Poblacion)), main = "Population of states belonging to the OAS", horiz = T)

Now sector graph.

pie3D(Population[1:5], labels = StatesMembers[1:5], explode = 0,main="Population of states belonging to the OAS",col = terrain.colors(length(StatesMembers[1:5])))

Second Example

We now consider the heights of 100 students.
First read the data:

est <- read.csv("height.csv")

We calculate the frequency table:
We will need the "sm" library and define a function:
Where it receives the data and the number of class intervals:

freq.tab <- function(data, n.int){
  raw.tab <- binning(data, nbins = n.int)
  tab <- list()
  tab$intleft <- raw.tab$breaks[1:n.int]
  tab$intright <- raw.tab$breaks[2:(n.int+1)]
  tab$mc <- raw.tab$x
  tab$freq <- raw.tab$table.freq
  tab$freqrel <- raw.tab$table.freq / length(data)
  tab$freqacum <- cumsum(raw.tab$table.freq / length(data))

For example, if you have 6 class intervals, write:

freq.tab(est$Height, 6)

And automatically gives us a list of objects.
First the class intervals appear, the Class, frequencies, relative frequencies and cumulative frequencies.
We can vary the number of class intervals, for example:

freq.tab(est$Height, 9)

Now we graph the histogram of the data "est":

hist(est$Height, xlab = "Sample", ylab = "", main = "Frequency Histogram", col = terrain.colors(13), border = "white", cex.lab = 0.7, cex.main = 0.9, cex.axis = 0.7)

In this case, R considers 8 class intervals.

You can find this material in:

Sunday, January 1, 2017

Script to convert numeric integer data of data frame column into a digit matrix.

At some point I found the need to manipulate and analyze each digit of a series of integer values, perform statistics with each of them and in some occasions add zeros at the beginning of each number. 

So I gave myself the task of performing the following script where data is the dataframe (array) and Numero is the column of interest.
Several cases are established since if there are numbers with less than five digits, that numbers must be filled with leading zeros.

first we declare the array filled with zeros

ar <- array(data = 0, dim = c(length(data$Numero), 5))

for(m in 1:length(data$Numero)){

le = length(as.numeric(strsplit(as.character(data$Numero[m]), "")[[1]]))

if(le == 5){
    for(l in 1:5){
      ar[m,l] <- as.numeric(strsplit(as.character(data$Numero[m]), "")[[1]])[l]
    if(le == 4){
    for(l in 1:4){
      ar[m,l+1] <- as.numeric(strsplit(as.character(data$Numero[m]), "")[[1]])[l]
  if(le == 3){
    for(l in 1:3){
      ar[m,l+2] <- as.numeric(strsplit(as.character(data$Numero[m]), "")[[1]])[l]
  if(le == 2){
    for(l in 1:2){
      ar[m,l+3] <- as.numeric(strsplit(as.character(data$Numero[m]), "")[[1]])[l]
  if(le == 1){
    for(l in 1:1){
      ar[m,l+4] <- as.numeric(strsplit(as.character(data$Numero[m]), "")[[1]])[l]

visualize the final matrix


So with this, we create an array of n x 5 (dealing with integers of five digits) and now we are able to perform statistics with each digit.

You can get the script in:

Welcome to Data R Value


Thank you very much for reading this blog that will be dedicated to everything related to the R programming language and to data science field. I will be publishing scripts, hints, algorithms and many more things. I will also be publishing data analysis that will contribute to forming better societies and increasing the quality of life of people.