# Introduction to programming language R

R is an open source programming language designed to permform statistical analysis and data manipulation efficiently. This post is a brief introduction to this scripting language.

## Installation

I use a debian based environment where I can simply install R using ` apt-get install `

command:

$sudo apt-get install r-base r-base-dev

You can also download latest version from R project website. When I wrote this post, last R version was 3.0.0:

wget https://cran.r-project.org/src/base/R-3/R-3.0.0.tar.gz tar -zxvf R-3.0.0.tar.gz

** Note:** There are versions for Windows and Mac available for downlaod at the R project website.

## Basic Steps

### Interacting with the console

To lounch the console the only step you have to follow is to type the ` R`

command into your console command:

[manuel@manuel-ThinkPad-T540p ~/src/r/R-3.0.0]$ R R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

### Asking for help

The command ` help.start()`

will start a browser with all relevant R documentation.

We can use help command to retreive information about a particular command as well:

> help(rnorm) The Normal Distribution Description: Density, distribution function, quantile function and random generation for the normal distribution with mean equal to ‘mean’ and standard deviation equal to ‘sd’. Usage: dnorm(x, mean = 0, sd = 1, log = FALSE) pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) rnorm(n, mean = 0, sd = 1)

If we are looking for something but We don't know the exact name, We can use the ` help.search() `

command:

> help.search("uniform") Help files with alias or concept or title matching ‘uniform’ using fuzzy matching: stats::Uniform The Uniform Distribution

We can get examples about how to use certain function with the command ` example() `

> example("mean") mean> x <- c(0:10, 50) mean> xm <- mean(x) mean> c(xm, mean(x, trim = 0.10)) [1] 8.75 5.50

### Performing simple Operations

We can use R console to caculate arithmetic operations

> 5 * 2 / 20 + 100 [1] 100.5

In the code below We use the Continuous Uniform Distribution function `runif `

to generate 10 random numbers between 5 and 9

> runif(10, min=5, max=9) [1] 8.578047 8.660792 5.952743 8.026758 6.712412 6.712905 8.608257 5.123351 [9] 6.274227 5.357889

We can assign some value to a variable using the traditional assignment `=`

operator:

total = 3 * 4

We can use the operatos ` <- `

to perform the assignment as well:

total <- 3 * 4

### A first R Session

We can use the function ` c `

to take a list of params and build a dataset (vector). You can call the command ` help(c) `

to see a detailed explanation:

> my_set = c("d", "a", "t", "a") > my_set [1] "d" "a" "t" "a" > my_set[0] character(0) > my_set[1] [1] "d" > my_set[5] [1] NA > my_set[4] [1] "a" > my_set[1:3] [1] "d" "a" "t"

** Note that index of vectors in R start in number 1. **

## Built in R Datasets

R has an increasing number of built in dataset which can be accessed with the ` data() `

command:

> data() Data sets in package ‘datasets’: AirPassengers Monthly Airline Passenger Numbers 1949-1960 BJsales Sales Data with Leading Indicator BJsales.lead (BJsales) Sales Data with Leading Indicator BOD Biochemical Oxygen Demand CO2 Carbon Dioxide Uptake in Grass Plants ChickWeight Weight versus age of chicks on different diets DNase Elisa assay of DNase EuStockMarkets Daily Closing Prices of Major European Stock Indices, 1991-1998 ...

We will ask R about the dataset WorldPhones:

> help(WorldPhones) The World's Telephones Description: The number of telephones in various regions of the world (in thousands). Usage: WorldPhones Format: A matrix with 7 rows and 8 columns. The columns of the matrix give the figures for a given region, and the rows the figures for a year. The regions are: North America, Europe, Asia, South America, Oceania, Africa, Central America. The years are: 1951, 1956, 1957, 1958, 1959, 1960, 1961. Source: AT&T (1961) _The World's Telephones_.

We can use the code below to generate a very simple graphic using this dataset:

require(graphics) matplot(rownames(WorldPhones), WorldPhones, type = "b", log = "y", xlab = "Year", ylab = "Number of telephones (1000's)") legend(1951.5, 80000, colnames(WorldPhones), col = 1:6, lty = 1:5, pch = rep(21, 7)) title(main = "World phones data: log scale for response")

The code below will show us a demo with a wide range of differents graphics:

> demo(graphics)

## R functions

The code below shows how to create a simple function that will return the square of a number:

square = function (x) { # function that receives a number and return its square return (x * x) } > square(10) [1] 100