Introduction to programming language R

R is an open source programming language designed to permform statistical analysis and data manipulation efficiently. This post is a brief introduction to this scripting language.

Installation

I use a debian based environment where I can simply install R using apt-get install command:

 $sudo apt-get install r-base r-base-dev 

You can also download latest version from R project website. When I wrote this post, last R version was 3.0.0:

    wget https://cran.r-project.org/src/base/R-3/R-3.0.0.tar.gz
    tar -zxvf R-3.0.0.tar.gz

Note: There are versions for Windows and Mac available for downlaod at the R project website.

Basic Steps

Interacting with the console

To lounch the console the only step you have to follow is to type the R command into your console command:


    [manuel@manuel-ThinkPad-T540p ~/src/r/R-3.0.0]$ R

    R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
    Copyright (C) 2013 The R Foundation for Statistical Computing
    Platform: x86_64-pc-linux-gnu (64-bit)

    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.

      Natural language support but running in an English locale

    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.

    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.

Asking for help

The command help.start() will start a browser with all relevant R documentation.

We can use help command to retreive information about a particular command as well:

    > help(rnorm)

    The Normal Distribution

    Description:

         Density, distribution function, quantile function and random
         generation for the normal distribution with mean equal to ‘mean’
         and standard deviation equal to ‘sd’.

    Usage:

         dnorm(x, mean = 0, sd = 1, log = FALSE)
         pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
         qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
         rnorm(n, mean = 0, sd = 1)


If we are looking for something but We don't know the exact name, We can use the help.search() command:

    > help.search("uniform")

    Help files with alias or concept or title matching ‘uniform’ using
    fuzzy matching:


    stats::Uniform          The Uniform Distribution

We can get examples about how to use certain function with the command example()

    > example("mean")

    mean> x <- c(0:10, 50)

    mean> xm <- mean(x)

    mean> c(xm, mean(x, trim = 0.10))
    [1] 8.75 5.50

Performing simple Operations

We can use R console to caculate arithmetic operations

    > 5 * 2 / 20  + 100
    [1] 100.5

In the code below We use the Continuous Uniform Distribution function runif to generate 10 random numbers between 5 and 9

    > runif(10, min=5, max=9) 
 [1] 8.578047 8.660792 5.952743 8.026758 6.712412 6.712905 8.608257 5.123351
 [9] 6.274227 5.357889

We can assign some value to a variable using the traditional assignment = operator:

    total = 3 * 4

We can use the operatos <- to perform the assignment as well:

    total  <- 3 * 4

A first R Session

We can use the function c to take a list of params and build a dataset (vector). You can call the command help(c) to see a detailed explanation:

    > my_set = c("d", "a", "t", "a")
    > my_set
    [1] "d" "a" "t" "a"

    > my_set[0]
    character(0)

    > my_set[1]
    [1] "d"

    > my_set[5]
    [1] NA

    > my_set[4]
    [1] "a"

    > my_set[1:3]
    [1] "d" "a" "t"

Note that index of vectors in R start in number 1.

Built in R Datasets

R has an increasing number of built in dataset which can be accessed with the data() command:

    > data()

    Data sets in package ‘datasets’:

    AirPassengers           Monthly Airline Passenger Numbers 1949-1960
    BJsales                 Sales Data with Leading Indicator
    BJsales.lead (BJsales)
                            Sales Data with Leading Indicator
    BOD                     Biochemical Oxygen Demand
    CO2                     Carbon Dioxide Uptake in Grass Plants
    ChickWeight             Weight versus age of chicks on different diets
    DNase                   Elisa assay of DNase
    EuStockMarkets          Daily Closing Prices of Major European Stock
                            Indices, 1991-1998

    ...

We will ask R about the dataset WorldPhones:

    > help(WorldPhones)

    The World's Telephones

    Description:

         The number of telephones in various regions of the world (in
         thousands).

    Usage:

         WorldPhones
         
    Format:

         A matrix with 7 rows and 8 columns.  The columns of the matrix
         give the figures for a given region, and the rows the figures for
         a year.

         The regions are: North America, Europe, Asia, South America,
         Oceania, Africa, Central America.

         The years are: 1951, 1956, 1957, 1958, 1959, 1960, 1961.

    Source:

         AT&T (1961) _The World's Telephones_.

We can use the code below to generate a very simple graphic using this dataset:


     require(graphics)
     matplot(rownames(WorldPhones), WorldPhones, type = "b", log = "y", xlab = "Year", ylab = "Number of telephones (1000's)")
     legend(1951.5, 80000, colnames(WorldPhones), col = 1:6, lty = 1:5, pch = rep(21, 7))
     title(main = "World phones data: log scale for response")


The code below will show us a demo with a wide range of differents graphics:

 > demo(graphics)

R functions

The code below shows how to create a simple function that will return the square of a number:

    square = function (x) {
        # function that receives a number and return its square
        return (x * x)
    }

    > square(10)
    [1] 100

References