K-nearest neighbours algorithm in R

K-nearest neighbours algorithm (k-NN) is a method used for classification and regression . This post is about how to implement a simple solution for classification using R programming language.

K-nearest neighbors algorithm
Image Source: Wikipedia

How k-NN works

k-NN is classified as a Lazy Learning algorithm that takes a dataset and keeps the training examples for later use. In this post We will analyze a forecast rain dataset, below 5 of 100 examples:

Temperature Humidity Wind Speed Rain
68530no
149035no
15868yes
215615yes
17679yes

k-NN uses examples provided by the the dataset in order to make predictions, comparing previous features with new ones, in order to do that, it builds a 2 dimensional coordinate space where one axis is the weighted sum of all features and the other axis is the outcome. The graphic below represents 5 points where the "y" axis is the weighted sum of the features (Temperature, Humidity, Wind Speed) and the "x" axis represents the outcome "yes" or "no":

dataset graphic
Weighted sum of the features (Temperature, Humidity, Wind Speed)

Similarity measure and Majority Voting

K-NN finds out the k closest neighbours to certain input feature, a common way to do that is by applying Euclidean distance . The decision of classifying a new forecast query as "yes" or "no" is decided by the number of votes of the k nearest neighbours. If the majority of neighbours are closer to the "yes" or 1, then the outcome will be "yes", otherwise, it will be "no". Is not required but highly recommended to choose an odd number as parameter k.

The image below shows how a red point A is closer to "no" or 0 and a blue point B is closer to 1 or "yes":

closests
Red point A is closer to "no" or 0 and a blue point B is closer to 1 or "yes"

k-NN implmentation

Loading input file

The functions below will load and process the dataset file:

    load_dataset return ( <-  function (file_name) {
        # opens a csv file
        dataset <- read.csv(file_name, header=FALSE)
        return (dataset)
    }

Normalizing data and Reordering Randomly

Values in dataset are numbers that are in different ranges, the next function will apply min-max data normalization to transform all the values to numbers in a scale between 0 and 1 and will transform yes and no to 1 or 0. After the normalizing, this function will return a random ordered version of the dataset:

    normalize_dataset  <- function (dataset) {

        normalize_number <- function (column) {
            # applies min-max normalization using the formula
            # zi = (xi - min(x)) / ( max(x) - min(x))
            max = max(column)
            min = min(column)
            return ( (column - min) / (max - min) )
        }

        normalize_yes_no  <- function (column) {
            #turns yes to 1 and no to 0
            n = length(column)
            result = vector(length=n)

            for(i in 1:n) {
                if ( column[i] == "yes") {
                    result[i] = 1
                } else {
                    result[i] = 0
                }
            }

            return (result)
        }

        dataset[1] <- normalize_number(dataset[1])
        dataset[2] <- normalize_number(dataset[2])
        dataset[3] <- normalize_number(dataset[3])
        dataset[4] <- normalize_yes_no(dataset[,4])

        randomnize_dataset <- function(dataset) {
            # reorder dataset randomly
            rnumbers <- runif(nrow(dataset))
            return(dataset[order(rnumbers), ])
        }

        return (randomnize_dataset(dataset))
    }

Training Example VS Test Example

We will split the dataset into 2 datasets: one for training and another for testing. It is considered a good practice to have at least 10% of the dataset for testing porpouses, in this example We will use 15%, because We have 100 examples the test dataset will size will be 15 and the training dataset will be 85:

    dataset_train <- dataset[1:85,]
    dataset_test <- dataset[86:100,]

Now We need to create target datasets for train and test datasets, the target dataset will contain the forecast ( "yes" -> 1 , "no" ->0)

    dataset_train_target <- dataset[1:85, 4]
    dataset_test_target <- dataset[86:100, 4]

We can use the function knn to call the algorithm passing the datasets that we created previously and some value k:

    # loading package
    require(class)
    model1 <- knn(train=dataset_train, test=dataset_test, cl=dataset_train_target, k=3)

model1 variable will contain the prediction for the features of the test dataset. Now We can test how well knn predicted the test dataset just comparing the original dataset

  result = table(dataset_test_target, model1)

The result is the table described below, and as We can see there is a perfect match between the dataset and the predicted data:

model1
dataset_test_target01
0120
103

Source code of this example can be found at this link

References

Introduction to programming language R

R is an open source programming language designed to permform statistical analysis and data manipulation efficiently. This post is a brief introduction to this scripting language.

Installation

I use a debian based environment where I can simply install R using apt-get install command:

 $sudo apt-get install r-base r-base-dev 

You can also download latest version from R project website. When I wrote this post, last R version was 3.0.0:

    wget https://cran.r-project.org/src/base/R-3/R-3.0.0.tar.gz
    tar -zxvf R-3.0.0.tar.gz

Note: There are versions for Windows and Mac available for downlaod at the R project website.

Basic Steps

Interacting with the console

To lounch the console the only step you have to follow is to type the R command into your console command:


    [manuel@manuel-ThinkPad-T540p ~/src/r/R-3.0.0]$ R

    R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
    Copyright (C) 2013 The R Foundation for Statistical Computing
    Platform: x86_64-pc-linux-gnu (64-bit)

    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.

      Natural language support but running in an English locale

    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.

    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.

Asking for help

The command help.start() will start a browser with all relevant R documentation.

We can use help command to retreive information about a particular command as well:

    > help(rnorm)

    The Normal Distribution

    Description:

         Density, distribution function, quantile function and random
         generation for the normal distribution with mean equal to ‘mean’
         and standard deviation equal to ‘sd’.

    Usage:

         dnorm(x, mean = 0, sd = 1, log = FALSE)
         pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
         qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
         rnorm(n, mean = 0, sd = 1)


If we are looking for something but We don't know the exact name, We can use the help.search() command:

    > help.search("uniform")

    Help files with alias or concept or title matching ‘uniform’ using
    fuzzy matching:


    stats::Uniform          The Uniform Distribution

We can get examples about how to use certain function with the command example()

    > example("mean")

    mean> x <- c(0:10, 50)

    mean> xm <- mean(x)

    mean> c(xm, mean(x, trim = 0.10))
    [1] 8.75 5.50

Performing simple Operations

We can use R console to caculate arithmetic operations

    > 5 * 2 / 20  + 100
    [1] 100.5

In the code below We use the Continuous Uniform Distribution function runif to generate 10 random numbers between 5 and 9

    > runif(10, min=5, max=9) 
 [1] 8.578047 8.660792 5.952743 8.026758 6.712412 6.712905 8.608257 5.123351
 [9] 6.274227 5.357889

We can assign some value to a variable using the traditional assignment = operator:

    total = 3 * 4

We can use the operatos <- to perform the assignment as well:

    total  <- 3 * 4

A first R Session

We can use the function c to take a list of params and build a dataset (vector). You can call the command help(c) to see a detailed explanation:

    > my_set = c("d", "a", "t", "a")
    > my_set
    [1] "d" "a" "t" "a"

    > my_set[0]
    character(0)

    > my_set[1]
    [1] "d"

    > my_set[5]
    [1] NA

    > my_set[4]
    [1] "a"

    > my_set[1:3]
    [1] "d" "a" "t"

Note that index of vectors in R start in number 1.

Built in R Datasets

R has an increasing number of built in dataset which can be accessed with the data() command:

    > data()

    Data sets in package ‘datasets’:

    AirPassengers           Monthly Airline Passenger Numbers 1949-1960
    BJsales                 Sales Data with Leading Indicator
    BJsales.lead (BJsales)
                            Sales Data with Leading Indicator
    BOD                     Biochemical Oxygen Demand
    CO2                     Carbon Dioxide Uptake in Grass Plants
    ChickWeight             Weight versus age of chicks on different diets
    DNase                   Elisa assay of DNase
    EuStockMarkets          Daily Closing Prices of Major European Stock
                            Indices, 1991-1998

    ...

We will ask R about the dataset WorldPhones:

    > help(WorldPhones)

    The World's Telephones

    Description:

         The number of telephones in various regions of the world (in
         thousands).

    Usage:

         WorldPhones
         
    Format:

         A matrix with 7 rows and 8 columns.  The columns of the matrix
         give the figures for a given region, and the rows the figures for
         a year.

         The regions are: North America, Europe, Asia, South America,
         Oceania, Africa, Central America.

         The years are: 1951, 1956, 1957, 1958, 1959, 1960, 1961.

    Source:

         AT&T (1961) _The World's Telephones_.

We can use the code below to generate a very simple graphic using this dataset:


     require(graphics)
     matplot(rownames(WorldPhones), WorldPhones, type = "b", log = "y", xlab = "Year", ylab = "Number of telephones (1000's)")
     legend(1951.5, 80000, colnames(WorldPhones), col = 1:6, lty = 1:5, pch = rep(21, 7))
     title(main = "World phones data: log scale for response")


The code below will show us a demo with a wide range of differents graphics:

 > demo(graphics)

R functions

The code below shows how to create a simple function that will return the square of a number:

    square = function (x) {
        # function that receives a number and return its square
        return (x * x)
    }

    > square(10)
    [1] 100

References

Deploying a scala application using docker

Docker is an open-source project that automates the deployment of applications inside software containers based on Linux Operative System. This post is about how to deploy an akka scala application on docker.

Setting up Docker environment

Install docker client

    wget https://get.docker.com/builds/Linux/x86_64/docker-latest
    sudo ln -s /home/manuel/src/dockermachine/docker-latest /usr/bin/docker

Installing Docker Machine

Docker Machine lets you create Docker hosts on your computer, cloud providers or inside your own data center:

    mkdir ~/src/docckermachine
    cd ~/src/docckermachine
    wget  https://github.com/docker/machine/releases/download/v0.3.0/docker-machine_linux-amd64
    sudo ln -s /home/manuel/src/dockermachine/docker-machine_linux-amd64 /usr/bin/docker-machine

We will create a docker host on virtualbox which will run a lightweight Linux distribution (boot2docker) with the Docker daemon installed using the command below:

    $docker-machine create --driver virtualbox myhost

    Creating CA: /home/manuel/.docker/machine/certs/ca.pem
    Creating client certificate: /home/manuel/.docker/machine/certs/cert.pem
    Image cache does not exist, creating it at /home/manuel/.docker/machine/cache...
    No default boot2docker iso found locally, downloading the latest release...
    Downloading https://github.com/boot2docker/boot2docker/releases/download/v1.7.0/boot2docker.iso to /home/manuel/.docker/machine/cache/boot2docker.iso...
    Creating VirtualBox VM...
    Creating SSH key...
    Starting VirtualBox VM...
    Starting VM...
    To see how to connect Docker to this machine, run: docker-machine env myhost

Note: In order to create a docker We will use virtualBox, it installation is straightforward and it won't be covered this post.

We can verify if the docker machine creation was successfully:

    $ docker-machine ls
    NAME    ACTIVE   DRIVER       STATE     URL                         SWARM
    myhost   *        virtualbox   Running   tcp://192.168.99.100:2376  

Now We can point our docker client to the host machine running the command below:

    eval "$(docker-machine env myhost)"

We can access the docker machine with the command below:

    docker-machine ssh local

To stop and start docker machine:

  docker-machine stop myhost
  docker-machine start myhost

We can test our docker installation running the nginx container:

     docker run -d -p 8000:80 nginx
     docker ps

Installing an image

We can list the images installed on our docker-machine running the command below:

    $ docker-machine ls
    NAME     ACTIVE   DRIVER       STATE     URL   SWARM
    myhost            virtualbox   Stopped

We can download a debian image from Docker hub to the docker host machine with the command below:

    docker pull debian:latest

Scala akka Simple tcp server

Code below implements a simple tpc echo server using reactive streams:

    package io.ntdata

    import akka.actor.ActorSystem
    import akka.pattern.ask
    import akka.stream.ActorFlowMaterializer
    import akka.stream.scaladsl.{ Flow, Sink, Source, Tcp }
    import akka.util.ByteString
    import scala.concurrent.duration._
    import scala.util.{ Failure, Success }

    object TcpEcho {

      /**
       * Use without parameters to start both client and
       * server.
       *
       * Use parameters `server 0.0.0.0 6001` to start server listening on port 6001.
       *
       * Use parameters `client 127.0.0.1 6001` to start client connecting to
       * server on 127.0.0.1:6001.
       *
       */
      def main(args: Array[String]): Unit = {
        if (args.isEmpty) {
          val system = ActorSystem("ClientAndServer")
          val (address, port) = ("127.0.0.1", 6000)
          server(system, address, port)
          client(system, address, port)
        } else {
          val (address, port) =
            if (args.length == 3) (args(1), args(2).toInt)
            else ("127.0.0.1", 6000)
          if (args(0) == "server") {
            val system = ActorSystem("Server")
            server(system, address, port)
          } else if (args(0) == "client") {
            val system = ActorSystem("Client")
            client(system, address, port)
          }
        }
      }

      def server(system: ActorSystem, address: String, port: Int): Unit = {
        implicit val sys = system
        import system.dispatcher
        implicit val materializer = ActorFlowMaterializer()

        val handler = Sink.foreach[Tcp.IncomingConnection] { conn =>
          println("Client connected from: " + conn.remoteAddress)
          conn handleWith Flow[ByteString]
        }

        val connections = Tcp().bind(address, port)
        val binding = connections.to(handler).run()

        binding.onComplete {
          case Success(b) =>
            println("Server started, listening on: " + b.localAddress)
          case Failure(e) =>
            println(s"Server could not bind to $address:$port: ${e.getMessage}")
            system.shutdown()
        }

      }

      def client(system: ActorSystem, address: String, port: Int): Unit = {
        implicit val sys = system
        import system.dispatcher
        implicit val materializer = ActorFlowMaterializer()

        val testInput = ('a' to 'z').map(ByteString(_))

        val result = Source(testInput).via(Tcp().outgoingConnection(address, port)).
          runFold(ByteString.empty) { (acc, in) ⇒ acc ++ in }

        result.onComplete {
          case Success(result) =>
            println(s"Result: " + result.utf8String)
            println("Shutting down client")
            system.shutdown()
          case Failure(e) =>
            println("Failure: " + e.getMessage)
            system.shutdown()
        }
      }
    }

We can use SBT Assembly plugin to generate a fatjar wich will be the artifact to be deployed on our docker container

    stb assembly

After build the fat jar, run the project is simple:

    java -jar target/scala-2.11/ReactiveTCP-assembly-0.0.1.jar server 0.0.0.0 6001

Building an image from a Dockerfile

We can create an image for our own instead of using an existing one, to do that, We must create a file nemad Dockerfile, this file will contain the instructions required to build our image:

mkdir akka-image touch akka-image/Dockerfile cp tcp-server/target/scala-2.11/ReactiveTCP-assembly-0.0.1.jar akka-image

Our Dockerfile looks like this:

    # Debian based image that is able to run an akka app
    FROM debian:latest
    MAINTAINER Manuel Ignacio Franco Galeano 

    #install oracle jdk
    RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true |  /usr/bin/debconf-set-selections 

    RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee /etc/apt/sources.list.d/webupd8team-java.list

    RUN echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list
    RUN   apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886
    RUN   apt-get update -y
    RUN   apt-get install -y oracle-java8-installer 
    RUN   apt-get install -y oracle-java8-set-default

    USER daemon

    # Adding fatjar file
    ADD ReactiveTCP-assembly-0.0.1.jar /opt/

    # Exppose port where server will be running
    EXPOSE 6001

    CMD [ "java", "-jar", "/opt/ReactiveTCP-assembly-0.0.1.jar", "server", "0.0.0.0",  "6001" ]

We must run docker build -t simpletcp:v1 akka-image/ to generate the image.

After the image creation We can crate a container by running docker run -d -p 6000:6001 simpletcp:v1

References

Extracting data from twitter with Scala, Akka Tookit and Reactive Streams

Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure, akka toolkit team is working on an implementation of this standard. At the moment of write this post (June 2015) the api is still experimental but functional. This post is about how to extract information from twitter using scala, akka toolkit and reactive streams.

Interacting with Tiwtter

There is not an official Twitter client for scala, but We can use Twitter4j that is an unofficial java client that get the job done easy, below, an example about how you can use Twitter4j in scala:

    import twitter4j.TwitterFactory
    import twitter4j.Twitter
    import twitter4j.conf.ConfigurationBuilder

    ...
    ...

    val cb = new ConfigurationBuilder()
    cb.setDebugEnabled(true)
      .setOAuthConsumerKey("YOUR KEY HERE")
      .setOAuthConsumerSecret("YOUR SECRET HERE")
      .setOAuthAccessToken("YOUR ACCESS TOKEN")
      .setOAuthAccessTokenSecret("YOUR ACCESS TOKEN SECRET")
    val tf = new TwitterFactory(cb.build())
    val twitter = tf.getInstance()
 

We will build an command line application that will search tweets and hastags by some term and it will process this information.

Akka streams

At the moment of publication of this post (June 2015), akka streams is still experimental so in order to use it We will need to add experimental repository to our sbt config file:

    "com.typesafe.akka" % "akka-stream-experimental_2.11" % "1.0-RC3"

Basic concepts

There are some basic concepts taken from experimental akka stream documentation that are very important as they are required to undestand in order to implement reactive streams applications sucessfully:

  • Source: A processing stage with exactly one output emitting data elements whenever downstream processing stages are ready to receive them.
  • Sink: A processing stage with exactly one input, requesting and accepting data elements possibly slowing down the upstream producer of elements.
  • Flow: A processing stage which has exactly one input and output, which connects its up- and downstreams by transforming the data elements flowing through it.

Building and Running a Flow from a String

In order to run a Stream We need at least one input Source and one output Sink. In the example below, We create a Source from a String iterator created when We split a string. It is possible manipulate a source as We can see in this example, We remove extra blank spaces with trim and then we convert every line to uppercase. A Source for itself can't do nothing, a Flow or a Sink is required to continue the computation, in the example code below we send the strings processed by the Source to a Sink using the method runForeach :

    implicit val system = ActorSystem("Sys")
    import system.dispatcher

    implicit val materializer = ActorFlowMaterializer()

    val text =
      """I saw the best minds of my generation destroyed by madness,
      starving hysterical naked,
      dragging themselves through the negro streets at dawn looking for an angry fix,
      angel headed hipsters burning for the ancient heavenly connection to the starry dynamo in the machinery of night"""

    //build a source from an iterator
    Source(() => text.split(",").iterator)
      //transform every line
      .map(s => s.trim())
      .map(_.toUpperCase)
      //send every line to a sink
      .runForeach(s => println( s + "\n") )
      //shutdown when finish
      .onComplete(_ => system.shutdown())

Building a simple Graph

Graphs are useful when we want to perform any kind of fan-in ("multiple inputs") or fan-out ("multiple outputs") operations. In the code below We will use the source that We created before and We will send every line to a text file and to the console:

    implicit val system = ActorSystem("Sys")
    import system.dispatcher

    implicit val materializer = ActorFlowMaterializer()

    val text =
      """I saw the best minds of my generation destroyed by madness,
      starving hysterical naked,
      dragging themselves through the negro streets at dawn looking for an angry fix,
      angel headed hipsters burning for the ancient heavenly connection to the starry dynamo in the machinery of night"""



    // take a ByteString as input and write to disk
    val fileSink = SynchronousFileSink(new File("/tmp/outscala.txt"), true)

    // take a String as input and print it to the console
    val consoleSink = Sink.foreach(println)

    //takes a string and convert it to ByteString
    val flowConverter = Flow[String].map(s => ByteString.fromString(s))

    val g = FlowGraph.closed(fileSink, consoleSink)((_,cons) => cons) { implicit builder =>

      (f, console) =>
      import FlowGraph.Implicits._

      //build a source from an iterator
      val in = Source(List(text))
        //transform every line
        .map(s => s.trim())
        .map(_.toUpperCase)


      //Broadcast that take 1 entry and send it to 2 outputs
      val bcast = builder.add(Broadcast[String](2))

      in ~> bcast
            bcast ~> flowConverter ~> f
            bcast ~> console
    }.run()

    g.onComplete {
      case Success(_) =>
        system.shutdown()
      case Failure(e) =>
        println(s"Failure: ${e.getMessage}")
        system.shutdown()
    }

Reactive tweets

Our goal is to build an akka application that uses reactive streams to get tweets from twitter and classify them by author and content. To do that, We will create a tweets source quering data from twitter and We will mantain 3 indexes: tweets, authors, hashtags. A simplfied draw of the process would look like this:



                                             +----------+
                                    +------- | Authors  |
                                    |        +----------+
                                    |
                                    |
                               +---------+
    +----------------+         | Tweets  |
    | Twitter Source |---------+---------+
    +----------------+              |
                                    |    +-----------+
                                    +----| Hashtags  |
                                         +-----------+

Configuring the client

We will define access keys and tokens as envrioment variables:

  def getClient(): Twitter = {
    val api_key = sys.env("REACTIVE_TWITTER_API_KEY")
    val api_secret = sys.env("REACTIVE_TWITTER_API_SECRET")
    val access_token = sys.env("REACTIVE_TWITTER_ACCESS_TOKEN")
    val access_token_secret = sys.env("REACTIVE_TWITTER_ACCESS_TOKEN_SECRET")
    val cb = new ConfigurationBuilder()
    cb.setDebugEnabled(true)
      .setOAuthConsumerKey(api_key)
      .setOAuthConsumerSecret(api_secret)
      .setOAuthAccessToken(access_token)
      .setOAuthAccessTokenSecret(access_token_secret)
    val tf = new TwitterFactory(cb.build())
    tf.getInstance()
  }

Building the source

    val client = getClient()
    val query = new Query(term);
    val tweets = Source( () => client.search(query).getTweets.asScala.iterator)

Building the Graph

    val tweetPath = Paths.get(dataDir, "tweets.txt")
    val tweetSink = SynchronousFileSink(new File(tweetPath.toString), true)
    val authorPath = Paths.get(dataDir, "authors.txt")
    val authorSink = SynchronousFileSink(new File(authorPath.toString), true)
    val hashtagPath = Paths.get(dataDir, "hashtags.txt")
    val hashtagSink = SynchronousFileSink(new File(hashtagPath.toString), true)

    val tweetFlow = Flow[Status]

    val twitterGraph = FlowGraph.closed(tweets, tweetSink)( (tweets, tSink) => tSink) { implicit builder =>
      (in, tSink) =>
      import FlowGraph.Implicits._

    val DELIMITER = "==="

      //Broadcast that take 1 entry and send it to 2 outputs
      val bcast = builder.add(Broadcast[Status](3))

      in ~> bcast
            //bcast ~> tweetToStrFlow ~> autSink
            bcast ~> tweetFlow
              .map(t => ByteString.fromString(
                t.getId + DELIMITER  + t.getText + DELIMITER + t.getSource + "\n")) ~> tSink

            bcast ~> tweetFlow
            .map(t => ByteString.fromString(
              t.getId + DELIMITER + t.getUser.getId + DELIMITER
              + DELIMITER + t.getUser.getScreenName + DELIMITER + t.getUser.getName + "\n")) ~> authorSink

            bcast ~> tweetFlow
              .filter(t => t.getHashtagEntities.length > 0 )
              .map(t => {
                var lineList = ArrayBuffer.empty[String]
                lineList.append(t.getId + DELIMITER)
                t.getHashtagEntities.foreach(h => lineList.append(h.getText + DELIMITER))
                lineList.append("\n")
                ByteString.fromString(lineList.mkString(""))
              }) ~> hashtagSink
    }.run(

Running the code

First thing required is to set up env variables with twitter access values:

    export REACTIVE_TWITTER_API_KEY=API_KEY
    export REACTIVE_TWITTER_API_SECRET=API_SECRET
    export REACTIVE_TWITTER_ACCESS_TOKEN=ACCESS_TOKEN
    export REACTIVE_TWITTER_ACCESS_TOKEN_SECRET=TOKEN_SECRET

Then You can execute sbt :

    sbt clean "run --dataDir /tmp --query internet&colombia"

Source code can be found at this link

References

Gradient Descent Algorithm in Akka Toolkit

Gradient descent is an algorithm that minimizes functions, it starts at some initial random point, and iterates until it finds the smallest error margin. This algorithm is commonly used to calculate margin error on statistical anaylisis shuch as linear regression. In this post We will implement gradient descent algorithm using akka toolkit.

Linear Regresion and Cost Function

We can use standard equation y = mx + b to model our set of points, nevertheless, the precision of our model dependens of how accurate the values m and b are, in other words, we will need to find the best m and b values. A cost function J (θ0, θ1) is a function that is used to measure how good a line fits into the data.

In order to calculate the cost function we will iterate over all (x, y) points and we will sume the square distances between each point’s y value and the candidate line’s y value (computed at mx + b).

  case class Point(x: Double, y: Double)
  case class Intercept(theta0: Double, theta1: Double, cost: Double)

  //y = mx + b
  //hθ(x)= θ0 + θ1x
  def calculateCost(intercept: Intercept, numberList: List[Point]): Intercept = {
    // Calculte error for a particular set of θ0  and θ1
    var totalCost: Double = 0.0
    for (i <- 0 to numberList.size) {
      totalCost += math.pow( (numberList{i}.y - (intercept.theta1 *  numberList{i}.x + intercept.theta0)), 2)
    }
    val result = Intercept(intercept.theta0, intercept.theta1, totalCost / numberList.size)
    return result
  }

Gradient descent Algorithm

This Algorithm is used to minimize the cost function finding the the best values (θ0, θ1) for our standard equation hθ(x)= θ0 + θ1x .

Before start to code we will write the steps required to implement this algorithm:

  1. Select a random starting values θ0 and θ1.
  2. Take the gradient at your location.
  3. Move your location in the opposite direction of your gradient just a bit. Specifically, take your gradient and subtract some value alpha . This variable alpha is a small number that ideally should be passed by configuration so that would be easy change this value in order to adjust (tune) the algorithm.
  4. Repeat steps 2 and 3 until you’re satisfied and repeating them more doesn’t help you too much.

  def stepGradient(
    intercept: Intercept, numberList: List[Point], learningRate: Double): Intercept = {

    var theta0Gradient: Double = 0.0
    var theta1Gradient: Double = 0.0
    var x: Double = 0.0
    var y: Double = 0.0
    val N = numberList.size.toDouble

    for (i <- 0 to numberList.size) {
      x = numberList{i}.x
      y = numberList{i}.y
      //b gradient
      theta0Gradient += -(2/N) * (y - ((intercept.theta1 * x) + intercept.theta0))
      //m gradient
      theta1Gradient += -(2/N) * x * (y - ((intercept.theta1 * x) + intercept.theta0))
    }
    return Intercept(
      intercept.theta0 - (learningRate * theta0Gradient),
      intercept.theta1 - (learningRate * theta1Gradient), 0.0)
  }

  def calculateGradient(
    intercept: Intercept, numberList: List[Point],
    learningRate: Double, nrOfIterations: Int): Intercept = {

    log.info("Calculating gradient with starting values {} {}", intercept.theta0, intercept.theta1)
    var result: Intercept = intercept
    for (i <- 0 to nrOfIterations) {
      result = stepGradient(result, numberList, learningRate)
    }
    log.info("Gradient result {} {}", result.theta0, result.theta1)
    return result;
  }

Algorithm implementation using Akka toolkit

We are going to follow the steps described on my previous post akka toolkit introduction. so I won't cover basic steps such as intial setup.

In order implement the algorithm we will need to define 6 messages:

  • ComputeIntercept: sent to the Master actor to start the calculation.
  • CalculateGradient: sent from the Master actor to the Worker actors to compute the gradient.
  • CalculateCost: sent from the Master actor to the Worker actors to compute error cost.
  • GradientResult: sent from the Worker actors to the Master actor containing the result from the worker’s gradient calculation.
  • CostResult: sent from the Worker actors to the Master actor containing the result from the worker’s cost calculation.
  • InterceptResult: Sent from the Master actor to the Listener actor contaning values (θ0, θ1) for our standard equation hθ(x)= θ0 + θ1x , the cost of use these values and how long time the calculation took.

Below the complete implementation:


    package com.notempo1320

    import akka.actor._
    import akka.event.Logging
    import akka.routing.RoundRobinRouter

    import org.clapper.argot._
    import ArgotConverters._

    import scala.concurrent.duration._
    import scala.collection.mutable.ArrayBuffer
    import scala.io.{Source, BufferedSource}

    import java.nio.file.{Paths, Files}
    import java.io._

    case class Point(x: Double, y: Double)
    case class Intercept(theta0: Double, theta1: Double, cost: Double)

    object InterceptApp extends App {

      sealed trait InterceptMessage

      case object ComputeIntercept extends InterceptMessage

      case class CalculateCost(intercept: Intercept,
            numberList: List[Point]) extends InterceptMessage

      case class CostResult(intercept: Intercept) extends InterceptMessage

      case class CalculateGradient(
        intercept: Intercept, numberList: List[Point], learningRate: Double,
        nrOfIterations: Int) extends InterceptMessage

      case class GradientResult(intercept: Intercept) extends InterceptMessage

      case class InterceptResult(interceptResult: Intercept, duration: Duration)


      class Worker extends Actor {
          val log = Logging(context.system, this)

          //y = mx + b
          //hθ(x)= θ0 + θ1x
          def calculateCost(intercept: Intercept, numberList: List[Point]): Intercept = {
            // Calculte error for a particular set of θ0  and θ1
            var totalCost: Double = 0.0
            for (i <- 0 to numberList.size -1) {
              totalCost += math.pow( (numberList{i}.y - (intercept.theta1 *  numberList{i}.x + intercept.theta0)), 2)
            }
            val result = Intercept(intercept.theta0, intercept.theta1, totalCost / numberList.size)
            log.info("CalculateCost for intercept {} {} = {}", result.theta0, result.theta1, result.cost)
            return result
          }

          def stepGradient(
            intercept: Intercept, numberList: List[Point], learningRate: Double): Intercept = {

            var theta0Gradient: Double = 0.0
            var theta1Gradient: Double = 0.0
            var x: Double = 0.0
            var y: Double = 0.0
            val N = numberList.size.toDouble

            for (i <- 0 to numberList.size - 1) {
              x = numberList{i}.x
              y = numberList{i}.y
              //b gradient
              theta0Gradient += -(2/N) * (y - ((intercept.theta1 * x) + intercept.theta0))
              //m gradient
              theta1Gradient += -(2/N) * x * (y - ((intercept.theta1 * x) + intercept.theta0))
            }
            return Intercept(
              intercept.theta0 - (learningRate * theta0Gradient),
              intercept.theta1 - (learningRate * theta1Gradient), 0.0)
          }

          def calculateGradient(
            intercept: Intercept, numberList: List[Point],
            learningRate: Double, nrOfIterations: Int): Intercept = {

            log.info("Calculating gradient with starting values {} {}", intercept.theta0, intercept.theta1)
            var result: Intercept = intercept
            for (i <- 0 to nrOfIterations) {
              result = stepGradient(result, numberList, learningRate)
            }
            log.info("Gradient result {} {}", result.theta0, result.theta1)
            return result;
          }

          def receive = {
            // ! means “fire-and-forget”. Also known as tell.
            case CalculateGradient(intercept: Intercept, numberList: List[Point],
              learningRate: Double, nrOfIterations: Int) =>
              sender ! GradientResult(calculateGradient(intercept, numberList, learningRate, nrOfIterations))

            case CalculateCost(intercept: Intercept, numberList: List[Point]) =>
              sender ! CostResult(calculateCost(intercept, numberList))
          }

      }


      class Master(nrOfWorkers: Int, nrOfIterations: Int,
          learningRate: Double, numberList: List[Point], listener: ActorRef)
        extends Actor {
        val log = Logging(context.system, this)
        var initialIntercept: Intercept = _
        var partialResults = ArrayBuffer.empty[Intercept]
        // Every worker will perform nrOfIterations / nrOfWorkers
        var nrOfCalculations: Int = nrOfIterations / nrOfWorkers
        var nrOfResults: Int = 0
        var theta1: Double = 0.0
        val start: Long = System.currentTimeMillis

        override def preStart() = {
          log.info("Starting master")
          super.preStart()
        }

        // create a round-robin router to make it easier to spread out the work between the workers
        val workerRouter = context.actorOf(
          Props[Worker].withRouter(RoundRobinRouter(nrOfWorkers)), name = "workerRouter")

        def generate_random_intercept_point(): Double = {
          // Semi random generation based on number of x,y points
          return (
            numberList.size.toDouble /
            ((util.Random.nextInt(numberList.size) + 1) * (numberList.size + util.Random.nextInt(numberList.size))))
        }


        def receive = {
          case ComputeIntercept =>
            for(i <- 1 to nrOfWorkers + 1) {
              //get random initial intercept
              initialIntercept = Intercept(
                generate_random_intercept_point(), generate_random_intercept_point(), 0.0)

              if (i == 1) {
                // First execution loop computes initial value of cost function
                log.info("CalculateCost call for initial intercept theta0:{} theta1:{}",
                  initialIntercept.theta0, initialIntercept.theta1)
                workerRouter ! CalculateCost(initialIntercept, numberList)
              }

              workerRouter ! CalculateGradient(
                initialIntercept, numberList, learningRate, nrOfIterations)
            }

          case GradientResult(intercept) =>
            // calculate cost for every result
            log.info("CalculateCost call for intercept theta0:{} theta1:{}", intercept.theta0, intercept.theta1)
            workerRouter ! CalculateCost(intercept, numberList)

          case CostResult(result) =>
            log.info("CostResult ...")
            partialResults.append(result)
            nrOfResults += 1
            if (nrOfResults == nrOfWorkers + 1) {
              // Find the point with minimum error
              var error: Double = partialResults{0}.cost
              var position: Int = 0
              for (i <-0 to partialResults.size -1) {
                log.info("partial cost {} i {}", partialResults{i}.cost, i)
                if (partialResults{i}.cost < error) {
                  error = partialResults{i}.cost
                  position = i
                }
              }

              // Send the result to the listener
              listener ! InterceptResult(
                partialResults{position}, duration = (System.currentTimeMillis - start).millis)

              // Stops this actor and all its supervised children
              context.stop(self)
            }
        }
      }

      class Listener extends Actor {
        val log = Logging(context.system, this)

        override def preStart() = {
          log.info("Starting listener")
          super.preStart()
        }

        def receive = {
          case InterceptResult(intercept: Intercept, duration: Duration) =>
            log.info("total duration {}", duration)
            log.info(
              "GradientDescent final result theta0: {} theta1: {} cost: {}",
              intercept.theta0, intercept.theta1, intercept.cost)
            context.system.shutdown()
        }
      }


      def calculate(nrOfWorkers: Int, nrOfIterations: Int, learningRate: Double, fileName: String) {
        val numberList = Source.fromFile(fileName).getLines().map(
          s => s.split(",")).map(x => Point( x{0}.toDouble, x{1}.toDouble)).toList

        // Create an Akka system
        val system = ActorSystem("GradientDescentSystem")

        // create the result listener, which will log the result and will shutdown the system
        val listener = system.actorOf(Props[Listener], name = "listener")

       // create the master
        val master = system.actorOf(Props(new Master(
          nrOfWorkers, nrOfIterations, learningRate, numberList, listener)),
          name = "master")

        // start the calculation
        master ! ComputeIntercept

      }

       // Main program
      override def main(args: Array[String]) {
        val parser = new ArgotParser("gradient", preUsage=Some("Version 1.0"))
        val fileName = parser.option[String](List("f", "filename"), "filename", "filename")
        val nrOfWorkers = parser.option[Int](List("w", "workers"), "workers", "workers")
        val nrOfIterations = parser.option[Int](List("i", "iterations"), "itetarions", "iterations")
        val learningRate = parser.option[Double](List("r", "rate"), "rate", "rate")
        parser.parse(args)
        calculate(
          nrOfWorkers=nrOfWorkers.value.get, nrOfIterations=nrOfIterations.value.get,
          learningRate=learningRate.value.get, fileName=fileName.value.get)
      }
    }

Running the application

I am going to use the same dataset that I used on a previous post about linear regression . In that post, I used Octave to compute the cost function for my dataset and I got the results θ0 = 2.1368e+05 and θ1x = 2.3589e+03.

To run the akka application as a console command all you have to do is call it usign sbt

 sbt "run --rate 0.0002  --iterations 200000 --workers 4 --filename ../total_by_period.csv"

I tried different parameters combinations and this was the result that I got:

Learning Rate Number of Iterations Number of Workers θ0 Result θ1 Result Cost Execution time
0.0001 1000 2 13444.522450507255 15733.153261356721 1.4915513928506067E10 82 milliseconds
0.0001 10000 2 96714.66045646855 10171.263617220522 6.991494692323004E9 170 milliseconds
0.0001 10000 4 96714.64937053746 10171.264357686756 6.9914914322857065E9 290 milliseconds
0.0002 10000 4 149323.8010837764 6657.324029165672 4.129770277424713E9 250 milliseconds
0.0001 100000 4 213137.42909120218 2394.999439961037 2.8873926496464314E9 795 milliseconds
0.0002 100000 4 213677.0403639862 2358.957007201501 2.887304849208148E9 854 milliseconds
0.0002 200000 4 213678.41665775442 2358.865079960366 2.887304848639881E9 997 milliseconds
0.0002 400000 4 213678.41666654532 2358.8650793731726 2.887304848639881E9 861 milliseconds

The data above showed me that this algorithm works better if I increase the number of iterations and after 20000 iterations there are not significants improvements. Maybe the outcome could be different with another set of data. At this You can find the source code of this example

References

Restful services in Scala with Sprayio

Spray is a suite of lightweight Scala libraries providing client and server-side REST/HTTP support on top of Akka toolkit. It provides a set of integrated components for most of REST/HTTP needs. In this post we are going to create a simple restful API with spray.

Getting Started

Spray can run as an standalone service or inside a servlet container. There is a spray template project on github that can be used as starting point. This github repository has different branches, one branch for every possible configuration, we will clone this repo and we will use the branch on_spray-can_1.3_scala-2.11 . This branch contains the structure for a standalone spray-can, Scala 2.11 + Akka 2.3 + spray 1.3

     git clone git@github.com:spray/spray-template.git
     git checkout on_spray-can_1.3_scala-2.11

Running tests, Starting and stopping applicaton

Running tests:

    sbt test

To run server you need to execute sbt:

    sbt
    > re-start
    [info] Application demo not yet started
    [info] Starting application demo in the background ...
    demo Starting com.example.Boot.main()
    [success] Total time: 0 s, completed 15-Apr-2015 16:11:08
    > demo [INFO] [04/15/2015 16:11:09.323] [on-spray-can-akka.actor.default-dispatcher-3] [akka://on-spray-can/user/IO-HTTP/listener-0] Bound to localhost/127.0.0.1:8080

You can now request the applicaton using curl command:

        $ curl http://localhost:8080

Stopping application:

    sbt
    > re-stop

User Resource Example

We are going to implement a basic CRUD for a user resource. Just Crate and Get for now.

Json serializer

In order to handle json serialization, we will use spray-json to create the class com.notempo1320.ApiJsonProtocol that will implement a custom json protocol:


    package com.notempo1320

    import spray.httpx.SprayJsonSupport._
    import spray.httpx.SprayJsonSupport
    import spray.json._
    import DefaultJsonProtocol._

    case class User(var id: Option[Long], username: String, email: String)

    object ApiJsonProtocol extends DefaultJsonProtocol with SprayJsonSupport {
      //jsonFormatX depends of number of parameters that the object receives
      implicit val userFormat = jsonFormat3(User)

    }

User Service

We are going to create a class child of spray.routing.HttpService that will implement the Restful http logic. This class uses functionality from spray-routing module, this module provides a high-level routing DSL for defining RESTful web services.

    // this trait defines our service behavior independently from the service actor
    trait UserService extends HttpService {
      var userList = new ListBuffer[User]()

      val userRoute =
        path("user") {
          get {
            respondWithMediaType(`application/json`) {

              userList.append(User(Option(util.Random.nextInt(10000).toLong), "user1", "email1"))
              userList.append(User(Option(util.Random.nextInt(10000).toLong), "user2", "email2"))
              complete(userList.toList.toJson.compactPrint)
            }
          } ~
          post {
            entity(as[User]) { user =>
              val user2 = User(Option(util.Random.nextInt(10000).toLong), user.username, user.email)
              userList += user2
              respondWithMediaType(`application/json`) {
                complete(StatusCodes.Created, user2.toJson.compactPrint)
              }
            }
          }
        }
    }

The code above use routes to deliver the request to the appriate component.

User Actor

Now we will create an Actor that will wrap UserService logic:

    // we don't implement our route structure directly in the service actor because
    // we want to be able to test it independently, without having to spin up an actor
    class UserActor extends Actor with UserService {

      // the HttpService trait defines only one abstract member, which
      // connects the services environment to the enclosing actor or test
      def actorRefFactory = context

      // this actor only runs our route, but you could add
      // other things here, like request stream processing
      // or timeout handling
      def receive = runRoute(userRoute)
    }

Api class

Api class is a console command wich will be responsible to run spray-can server:

    package com.notempo1320

    import akka.actor.{ActorSystem, Props}
    import akka.io.IO
    import spray.can.Http
    import akka.pattern.ask
    import akka.util.Timeout
    import scala.concurrent.duration._

    object Api extends App {

      // we need an ActorSystem to host our application in
      implicit val system = ActorSystem("on-spray-can")

      // create and start our service actor
      val service = system.actorOf(Props[UserActor], "user-service")

      implicit val timeout = Timeout(5.seconds)
      // start a new HTTP server on port 8080 with our service actor as the handler
      IO(Http) ? Http.Bind(service, interface = "localhost", port = 7000)
    }

Updating Configuration file build.sbt


    organization  := "com.notempo1320"

    version       := "0.1"

    scalaVersion  := "2.11.6"

    scalacOptions := Seq("-unchecked", "-deprecation", "-encoding", "utf8")

    libraryDependencies ++= {
      val akkaV = "2.3.9"
      val sprayV = "1.3.3"
      Seq(
        "io.spray"            %%  "spray-can"     % sprayV,
        "io.spray"            %%  "spray-routing" % sprayV,
        "io.spray"            %%  "spray-json"    % "1.3.1",
        "io.spray"            %%  "spray-httpx"   % sprayV,
        "io.spray"            %%  "spray-testkit" % sprayV  % "test",
        "com.typesafe.akka"   %%  "akka-actor"    % akkaV,

        "com.typesafe.akka"   %%  "akka-testkit"  % akkaV   % "test",
        "org.specs2"          %%  "specs2-core"   % "2.3.11" % "test"
      )
    }



    Revolver.settings
    mainClass in (Compile, run) := Some("com.notempo1320.Api")
    mainClass in Revolver.reStart := Some("com.notempo1320.Api")

Creating an user

    curl -i -H "Accept: application/json" -H "Content-Type: application/json" -X POST -d '{"username": "myname", "email": "mymail@server.com"}' http://localhost:7000/user

Getting a list of users

  curl http://localhost:7000/user

You can find source code of this example here

References

Akka toolkit introduction

Akka is an open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM. Akka supports multiple programming models for concurrency, but it emphasizes actor-based concurrency, with inspiration drawn from Erlang. In this post we will create a service that will listen comma separated lines, the system must split every line and it will maintain on the hard drive an index of words with the number of times that every word has been sent.

Creating the project

We are going to use sbt which is a build tool for java and scala to manage our project, the first step is to create a directory named word-service and inside this directory a file build.sbt with the lines below:

    name := "Word service"

    version := "1.0"

    scalaVersion := "2.11.6"

    resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"


    libraryDependencies += "com.typesafe.akka" % "akka-actor_2.11" % "2.3.4"

After create build.sbt file, we need to create the directory src/main/scala in which We will store the Scala source files. Inside this directory we must create a file with name WordCount.scala, then We can add some initial imports to this file:

    import akka.actor._
    import akka.routing.RoundRobinRouter
    import akka.util.Duration
    import akka.util.duration._

Creating the messages

Proposed desing consists on one Master actor responsible for starting the computation by creating a set of Worker actors, then it splits the work into discrete chunks that will be sent to these Workers following a round robin policy. The master waits until all the workers have completed their work and sent back result for aggregation. When computation is completed the master sends the result to the Listener which is responsible for mantaining an filesystem index with every word.

In order to execute these calculations we will need to define four messages:

  • Calculate: sent to the Master actor to start the calculation.
  • Work: sent from the Master actor to the Worker actors containing the work assignment.
  • Result: sent from the Worker actors to the Master actor containing the result from the worker’s calculation.
  • Index: Sent from the Master actor to the Listener actor contaning an array of words and how long time the calculation took.

Case classes in scala are very useful to create immutable objects, so we will use them to create messages.

  sealed trait WordIndexMessage

  case object Calculate extends WordIndexMessage

  case class Work(line: String) extends WordIndexMessage

  case class Result(words: Array[String]) extends WordIndexMessage

  case class Index(words: ArrayBuffer[String], duration: Duration)

Creating the worker

Worker is created mixing the Actor trait and defining the receive method which defines the message handler.

  class Worker extends Actor {
      // count words
      def receive = {
        // ! means “fire-and-forget”. Also known as tell.
        case Work(line) => sender ! Result(splitLine(line)) // perform the work
      }

      def splitLine(line: String): Array[String] = {
        line.split(",");
      }
  }

Creating the master

Master Actor receives in its contructor 2 paremeters:

  • nrOfWorkers: defining how many workers we should start up.
  • itLines: String Iterator that contains the chunks to send out to the workers.

Master Actor recieves itLines iterator from some caller, then it creates nrOfWorkers and send lines to be processed, after process all lines it returns to the caller the result. The code below describes the Master actor:

  class Master(nrOfWorkers: Int, itLines: Iterator[String], listener: ActorRef)
    extends Actor {
    val messagesPercall = 100
    var words = ArrayBuffer.empty[String]
    val start: Long = System.currentTimeMillis
    var nrOfResults: Int = _
    var nrOfLines: Int = 0

    // create a round-robin router to make it easier to spread out the work between the workers
    val workerRouter = context.actorOf(
      Props[Worker].withRouter(RoundRobinRouter(nrOfWorkers)), name = "workerRouter")


    def receive = {
      case Calculate =>
        //sent to the workers all lines in groups of nrOfLines items
        while (itLines.nonEmpty) {
          for (line <- itLines.take(messagesPercall)) {
            nrOfLines += 1
            workerRouter ! Work(line)
          }
        }

      case Result(value) =>
        words.appendAll(value)
        nrOfResults += 1
        if (nrOfResults == nrOfLines) {
          // Send the result to the listener
          listener !Index(words, duration = (System.currentTimeMillis - start).millis)
          // Stops this actor and all its supervised children
          context.stop(self)
        }
    }
  }

Creating the result listener

Result Listener receives the list of words produced by the workers and updates a file system based index:

   class Listener extends Actor {
    val rootDir = "/tmp/wordcount"
    var nrOfWords: Int = 1;
    var src: Source = _;

    def indexWord(word: String) {
      val filePath = Paths.get(rootDir, word)
      def computeValue() {
        def wordExist(): Boolean = {
          return Files.exists(filePath)
        }

        //If word file does not exists, create it
        if (!wordExist()) {
          Files.createFile(filePath)
          src = Source.fromFile(filePath.toString())
        } else {
          src = Source.fromFile(filePath.toString())
          nrOfWords = src.getLines().toList{0}.toInt
        }

        src.close()
      }

      def updateValue() {
        val file = new File(filePath.toString())
        val bw = new BufferedWriter(new FileWriter(file))
        bw.write(nrOfWords.toString())
        bw.close()
      }

      computeValue()
      updateValue()
    }

    def receive = {
      case Index(words, duration) =>
        words.foreach(indexWord)
        println("\nCalculation time: %s".format(duration))
        context.system.shutdown()
    }
  }

Bootstrap the calculation

We can create now a class named Indexer which will extend from the App trait in Scala, this class will be able to run from the command line:


    object Indexer extends App {

      calculate(nrOfWorkers = 4)

      // actors and messages ...
      def calculate(nrOfWorkers: Int) {

        //Open file with words
        val wordLines = Source.fromInputStream(getClass.getResourceAsStream("/words.txt")).getLines()

        // Create an Akka system
        val system = ActorSystem("WordCountSystem")

        // create the result listener, which will print the result and will shutdown the system
        val listener = system.actorOf(Props[Listener], name = "listener")

        // create the master
        val master = system.actorOf(Props(new Master(
          nrOfWorkers, wordLines, listener)),
          name = "master")

        // start the calculation
        master ! Calculate
      }
    }

Below the full source code:


    package com.notempo1320.demo

    import akka.actor._
    import akka.routing.RoundRobinRouter
    import scala.concurrent.duration._

    import java.nio.file.{Paths, Files}
    import java.io._

    import scala.collection.mutable.ArrayBuffer
    import scala.io.{Source, BufferedSource}


    object Indexer extends App {

      calculate(nrOfWorkers = 4)


      sealed trait WordIndexMessage

      case object Calculate extends WordIndexMessage

      case class Work(line: String) extends WordIndexMessage

      case class Result(words: Array[String]) extends WordIndexMessage

      case class Index(words: ArrayBuffer[String], duration: Duration)

      class Worker extends Actor {

          def splitLine(line: String): Array[String] = {
            line.split(",");
          }

          // count words
          def receive = {
            // ! means “fire-and-forget”. Also known as tell.
            case Work(line) => sender ! Result(splitLine(line)) // perform the work
          }

      }


      class Master(nrOfWorkers: Int, itLines: Iterator[String], listener: ActorRef)
        extends Actor {
        val messagesPercall = 100
        var words = ArrayBuffer.empty[String]
        val start: Long = System.currentTimeMillis
        var nrOfResults: Int = _
        var nrOfLines: Int = 0

        // create a round-robin router to make it easier to spread out the work between the workers
        val workerRouter = context.actorOf(
          Props[Worker].withRouter(RoundRobinRouter(nrOfWorkers)), name = "workerRouter")


        def receive = {
          case Calculate =>
            //sent to the workers all lines in groups of nrOfLines items
            while (itLines.nonEmpty) {
              for (line <- itLines.take(messagesPercall)) {
                nrOfLines += 1
                workerRouter ! Work(line)
              }
            }

          case Result(value) =>
            words.appendAll(value)
            nrOfResults += 1
            if (nrOfResults == nrOfLines) {
              // Send the result to the listener
              listener !Index(words, duration = (System.currentTimeMillis - start).millis)
              // Stops this actor and all its supervised children
              context.stop(self)
            }
        }
      }



      class Listener extends Actor {
        val rootDir = "/tmp/wordcount"
        var nrOfWords: Int = 1;
        var src: Source = _;

        def indexWord(word: String) {
          val filePath = Paths.get(rootDir, word)
          def computeValue() {
            def wordExist(): Boolean = {
              return Files.exists(filePath)
            }

            //If word file does not exists, create it
            if (!wordExist()) {
              Files.createFile(filePath)
              src = Source.fromFile(filePath.toString())
            } else {
              src = Source.fromFile(filePath.toString())
              nrOfWords = src.getLines().toList{0}.toInt
            }

            src.close()
          }

          def updateValue() {
            val file = new File(filePath.toString())
            val bw = new BufferedWriter(new FileWriter(file))
            bw.write(nrOfWords.toString())
            bw.close()
          }

          computeValue()
          updateValue()
        }

        def receive = {
          case Index(words, duration) =>
            words.foreach(indexWord)
            println("\nCalculation time: %s".format(duration))
            context.system.shutdown()
        }
      }

      def calculate(nrOfWorkers: Int) {

        //Open file with words
        val wordLines = Source.fromInputStream(getClass.getResourceAsStream("/words.txt")).getLines()

        // Create an Akka system
        val system = ActorSystem("WordCountSystem")

        // create the result listener, which will print the result and will shutdown the system
        val listener = system.actorOf(Props[Listener], name = "listener")

        // create the master
        val master = system.actorOf(Props(new Master(
          nrOfWorkers, wordLines, listener)),
          name = "master")

        // start the calculation
        master ! Calculate
      }

    }

Compiling and running the code

    $sbt clean compile
    $sbt run

References

Apache Spark RDD Introduction

Spark Resilient Distributed Dataset (RDD), are fault-tolerant collections of elements that can be operated on in parallel. This post is an introduction about how to process data using RDD operations.

RDD basis

RDD is a distributed immutable collection of objects that can be processed on multiple partitions on the cluster, for testings purposes Whe can create a simple rdd on memory:

    sc = SparkContext("local", "Rdd intro")
    data = [x for x in xrange(100)]
    distData = sc.parallelize(data)

We can manipulate distData in order to get a collection of prime numbers:


    def is_prime(n):
        if n in (1, 2 ,3):
            return True

        if n == 4:
            data = [2]
        else:
            data = [x for x in xrange(2, n/2)]

        n = n * 1.0
        for d in data:
            if n % d == 0.0:
                return False
        return True

    primeData = distData.filter(is_prime)

RDD are recomputed every time that are accessed, if we want to reuse a RDD it is better persist it:


def get_fibonacci(n):
    return ((1+sqrt(5))**n-(1-sqrt(5))**n)/(2**n*sqrt(5))

primeData.persist(StorageLevel.MEMORY_ONLY)
fibonacciData = primeData.map(get_fibonacci)
print fibonacciData.collect()

Using Spark RDD to analize data from open data colombia

In a previous post we learn how to dowload Open Data Colombia database, now We can use spark to analize this data, we don't know what kind of infomation is inside this data, all We know is data is json encoded, so an starting point could be find out what keys are in the datasets:


    from operator import add
    from pyspark import SparkContext, StorageLevel
    from sets import Set
    import json
    import os



    sc = SparkContext("local", "Open Data Colombia")


    def to_json(line):
        try:
            j = json.loads(line)
            return j
        except Exception as e:
            return {'error': line}

    def extract_all_keys(structure):
        if not structure:
            return []
        key_set = []
        def traverse(k_set, st):
            for key, value in st.iteritems():
                k_set.append(key)
                if dict == type(value):
                    traverse(k_set, value)
                elif list == type(value):
                    for v in value:
                        traverse(k_set, v)
            return k_set
        return traverse(key_set, structure)

    def merge_keys(l1, l2):
        return l1 + l2

    def process_files(data_dir):
        """
        Step 1. Load json files from location directory.
        Step 2. Aply mapping function to extract all keys by file
        Step 3. Count how many times a key exists on all files
        Step 4. Order dataset
        """
        input = sc.textFile(data_dir)
        data = input.map(
            to_json).map(
                extract_all_keys).flatMap(
                    lambda x: [(k, 1) for k in x]).reduceByKey(
                        lambda x, y: x + y)

        print 'there are %s unique keys\n' % data.count()

        d2 = data.map(lambda x: (x[1], x[0],)).sortByKey(False)
        print d2.take(100)
        sc.stop()

    def main(data_dir):
        process_files(data_dir)

    if __name__ == "__main__":
        # directory where opendata catalog is stored
        open_data_dir = '/notempo/opendata/*'
        main(open_data_dir)

References

Downloading open data Colombia database using python

Open Data Colombia is a website that contains public information about public entities in Colombia, the information can be queried with commercial or non commercial activities, this post is about how to download the whole dataset to a local hard drive using python.

Getting container lists

A container is a goverment division responsable for one or more catalogs, We can get list of containers using console command curl:

    curl http://servicedatosabiertoscolombia.cloudapp.net/v1/

The result has AtomPub format, it would be easy use python to get the list and parse the response:

    def get_container_list():
        """
        Get container list from open data Colombia
        """
        h = httplib2.Http()
        (resp_headers, content) = (
            h.request("http://servicedatosabiertoscolombia.cloudapp.net/v1/",
            "GET"))
        root = ET.fromstring(content)
        return [ child.get('href') for child in root[0]]

Getting list of Catalogs

We can get catalog list by container with a GET request by every container: una petición GET a la url http://servicedatosabiertoscolombia.cloudapp.net/v1/

 curl http://servicedatosabiertoscolombia.cloudapp.net/v1/Ministerio_deacion_y_las_comunicaciones

We can use python to traverse entity list and get all catalogs:

    def get_catalog_list(container_list):
        """
        Get Catalog list
        """
        h = httplib2.Http()
        url = "http://servicedatosabiertoscolombia.cloudapp.net/v1/"
        catalogs_by_entity = {}

        for container in container_list:
            catalog_list = []
            if container is not None:
                (resp_headers, content) = h.request(url + container, "GET")
                try:
                    root = ET.fromstring(content)
                    for child in root[0]:
                        if 'href' in child.attrib:
                            catalog_list.append(child.attrib['href'])
                except Exception as e:
                    print e
                    print '\n'
                    print container
                catalogs_by_entity[container] = catalog_list
        return catalogs_by_entity

Getting the data

We can make GET request to http://servicedatosabiertoscolombia.cloudapp.net/v1/container/set?$format=json in order to get the data.

curl http://servicedatosabiertoscolombia.cloudapp.net/v1/Ministerio_de_tecnologias_de_informacion_y_las_comunicaciones/dataset3

We can use a python scritp to get all data from all entities and store on our hard drive:

    def get_catalogs(catalogs_by_entity, directory='./'):
        """
        Get catalog data and save it in disk
        """
        h = httplib2.Http()
        root_path = os.path.join(directory, 'opendata')
        # remove directory if exist
        if os.path.exists(root_path):
            shutil.rmtree(root_path)
        os.mkdir(root_path)
        url = "http://servicedatosabiertoscolombia.cloudapp.net/v1/"
        for k, v in catalogs_by_entity.items():
            dir_path = os.path.join(root_path, k)
            os.makedirs(dir_path)
            for cat in v:
                (resp_headers, content) = h.request(
                    url + k + '/' + cat + '?$format=json', "GET")
                fname = cat + '.json'
                f = open (os.path.join(dir_path, fname), 'w')
                f.write(content)
                f.close()

Source code can be found at this link

References

Apache Spark Introduction

Apache Spark is an open-source cluster computing framework. It has been designed to perform generic big data processing with great performance, it is well suited to machine learning algorithms, in this posts We will cover basic aspects such as components, installation and python console.

Spark Components

Spark Core and Resilient Distributed Datasets

Spark Core contains the basic functionality of the project. It provides distributed task dispatching, scheduling, and basic I/O . Spark Core is the place where Resilient Distributed Datasets (RDD) API is allocated, this API hides complexity inherent to distribuited systems and expose funcionally through Api clients in Java, Python, Scala. For the point of view of applications, processed data is not distribuited accross several nodes, data seems to be on the same machine and it can be accessed in the same way as any local file is accessed.

Spark SQL

This component provides query support trough this components, data can be accessed using SQL or domain-specific language to manipulate SchemaRDDs in Scala, Java, or Python. This component is very powerful because complex query with standard SQL and SchemaRDD can be mixed in order to execute complex data maniputation process.

Spark Streaming

This component ingests data and performs RDD transformations.

MLlib Machine Learning Library

This component implements many common machine learning and statistical algorithms to simplify large scale machine learning pipelines.

GraphX

This component provides an API for expressing graph computation.

Installation

>

You can download and uncompress last version from spark website or You can use a chef recipe that will install it for You

First Stetps

We will assume that You have downloaded and installed spark-1.2.1 with support for hadoop2.4 on /opt directory, let's check what is inside:

    $cd /opt/spark-1.2.1-bin-hadoop2.4/
    $ls
    bin  conf  data  ec2  examples  lib  LICENSE  NOTICE  python  README.md  RELEASE  sbin  

Python Interpreter

You can use the command to interact with the python shell:

    $ bin/pyspark
    Python 2.7.3 (default, Mar 13 2014, 11:03:55) 
    [GCC 4.7.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    Spark assembly has been built with Hive, including Datanucleus jars on classpath
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.2.1
      /_/

    Using Python version 2.7.3 (default, Mar 13 2014 23:03:55)
    SparkContext available as sc  

Let's open /opt/spark-1.2.1-bin-hadoop2.4/README.md and try some basic operations:

    >>> text = sc.textFile("README.md")
    >>> print text
    README.md MappedRDD[1] at textFile at NativeMethodAccessorImpl.java:-2
    >>> text.count()
    98
    >>> text.first()
    u'# Apache Spark'

    >>> lines_with_spark = text.filter(lambda x: 'Spark' in x)
    >>> lines_with_spark.count()
    19

    >>> text.filter(lambda x: 'Spark' in x).count()
    19
    

Find the line with the most words:

    >>>text.map(lambda line: len(line.split())).reduce(lambda a, b: a if (a > b) else b)
    14

Find the line with the biggest number of odd words:

    >>>text.map(lambda line: len(line.split())  if len(line.split()) % 2 != 0 else 0   ).reduce(lambda a, b: a if (a > b) else b)
    13

We can create a python file named process_file.py with the code below:


    from operator import add
    from pyspark import SparkContext

    logFile = "/opt/spark-1.2.1-bin-hadoop2.4/README.md"  # Should be some file on your system
    sc = SparkContext("local", "Simple App")
    logData = sc.textFile(logFile).cache()

# get set of list with odd number of words
    odd_words_count = logData.filter(lambda line: len(line.split()) % 2 != 0).count()

#get number of unique words
    unique_words = logData.flatMap(lambda line: line.split(' ')).map(lambda x: (x, 1)).reduceByKey(add).collect()

    print "there are %s lines with odd number of words\n" % odd_words_count
    print "there are %s unique words\n" % len(unique_words)
    print "and the unique words are :" 
    for w in unique_words:
        print "%s %s times" % w

We can execute this program with the lines below:

    $bin/spark-submit \
    --master local[4] \
    process_file.py

References

Linear regression internet adoption in Colombia

Linear regression is a statistic method useful to make predictions, in this post we will generate a linear regression over public data from Colombian Goverment about Internet growth.

Tools to use

  • Python : We will use it to format data.
  • Octave : We will use it to generate regression data.

Formating the data

Data from Colombian Goverment contains several variables about different topics, We need to process this data and generate a simple cvs file where first column Will be months since first measure (Oct 2011) and the second value will be the number of internet affiliates by period. The first step is to get raw data:

    http://estrategiaticolombia.co/estadisticas/opendata/dat_dedicado_general.json

Now We can use a simple python script to format the data to feed octave:


from decimal import *


import json
import csv

def total_by_period(jf):
    """
    Process dedicated internet statistics in Colombia from json file downloaded from
    http://estrategiaticolombia.co/estadisticas/opendata/dat_dedicado_general.json
    And generate total  by period
    """
    i = 0
    data_dict = {}

    def proccess_record(line):
        """
        Process every line inside file in order to get a consolidate
        """
        j_line = json.loads(line)[0]
        period_key = '{}-{}'.format(j_line['ANHO'], j_line['PERIODO'])
        j_line['SUSCRIPTORES'] = Decimal(j_line['SUSCRIPTORES'])
        if period_key not in data_dict:
            record = {
                    'total': j_line['SUSCRIPTORES']
            }
            data_dict[period_key]  = record

        else:
            val = j_line['SUSCRIPTORES']
            data_dict[period_key]['total'] += val

    with open(jf, 'r') as f:
        for line in f:
            i +=1

            try:
                proccess_record(line)
            except Exception as e:
                print e
                print 'error processing line {}'.format(i)

            if i > 10000:
                f.close()
                break

    # We must construct an array in format (x, y) where by every period y we have
    # total by period y indexes must be numbers so we get the y axis and
    # create a list of indexes, every item in this list represent a period
    # one period represents 3 months
    period_list = []
    keys = data_dict.keys()
    keys.sort()
    period_index = 0
    with open('total_by_period.csv', 'w') as csvfile:
        spamwriter = csv.writer(csvfile, delimiter=',')
        #every period represents 3 monts
        for n_index in range(len(keys)):
            val = data_dict[keys[n_index]]
            row = (period_index, str(val['total']))
            spamwriter.writerow(row)
            period_index += 3




def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--task', default='total_by_period')
    parser.add_argument('--file', default='./dat_dedicado_general.json')
    args = parser.parse_args()

    task_dict = {
        'total_by_period': total_by_period,
    }

    task_dict[args.task](args.file)

if __name__ == "__main__":
    main()

Generating Regression

We can now use octave to generate linear regression, first step is to load the data and create 2 vectos, one with x variables and another with y variables:

    % Load csv file
    data = load('total_by_period.csv');

    % Define x and y
    x = data(:,1);
    y = data(:,2);

Next step is to create plot function and call it:

    % Create a function to plot the data
    function plotData(x,y)
    plot(x,y,'rx','MarkerSize',8); % Plot the data
    end

    % Plot the data

    plotData(x,y);

    ylabel('Internet adoption in Colombia');
    xlabel('Months since october 2011');

    fprintf('Program paused. Press enter to continue.\n');

    pause;

This funciton will generate the image below:

Next step is to calculate regression:

    % Count how many data points we have
    m = length(x); % Add a column of all ones (intercept term) to x
    X = [ones(m, 1) x];

    % Calculate theta

    theta = (pinv(X'*X))*X'*y

For our sample data theta values are [2.1368e+05, 2.3589e+03] this is our linear regression, Now we can plot it:

    hold on; % this keeps our previous plot of the training data visible 
    plot(X(:,2), X*theta, '-');
    legend('Training data', 'Linear regression');
    hold off % Do not put any more plots on this figure
    pause;

The result image below:

Tabulating some data

We can apply formula y = theta[x] + theta[y] to get some data:

    internet growth = theta[x] * month  + theta[y]
    internet growth = 2.1368e+05 * month + 2.3589e+03
Months since October 2011 Number of users
48 (oct 2015) 10258998.9
57 (Jun 2016) 12182118.9
72 (oct 2017) 15387318.9

References

Scala Introduction

Scala is an object-functional programming language for general software applications. Scala programs run inside JVM so it is easy to build hybrid applications with java and scala components running under the same Java Virtual Machine. This post is a brief introduction to this language.

Installation

    sudo apt-get install scala

Hello world

We will create a file named HelloWorld.scala and then We can copy code below:

    object HelloWorld {
      def main(args: Array[String]) {
        println("Hello, world!")
      }
    }

We can compile with scalac HelloWorld.scala this command will generate a class file named HelloWorld.class after that We can execute it with scala HelloWorld

Simple TCP echo server Using java.net.*

We will write a simple echo tcp server, first at all, let's code a java version:


    import java.lang.*;
    import java.io.*;
    import java.net.*;

    class Server {
        public static void main(String args[]) {
            try {
                int serverPort = 4020;
                ServerSocket serverSocket = new ServerSocket(serverPort);
                serverSocket.setSoTimeout(10000);

                while(true) {
                    System.out.println("Waiting for client on port " + serverSocket.getLocalPort() + "...");

                    Socket server = serverSocket.accept();
                    System.out.println("Just connected to " + server.getRemoteSocketAddress());

                    PrintWriter toClient =
                        new PrintWriter(server.getOutputStream(),true);
                    BufferedReader fromClient =
                        new BufferedReader(
                                new InputStreamReader(server.getInputStream()));
                    String line = fromClient.readLine();
                    System.out.println("Server received: " + line);
                    toClient.println(line);
                    server.close();
                }

            } catch(Exception e) {
                System.out.print(e.getMessage());
            }
        }
    }

We can write same code in scala using java.net.* package from java:


import java.net.{ ServerSocket, SocketException, SocketTimeoutException, Socket }
import java.io.{ PrintWriter, BufferedReader, InputStreamReader, OutputStreamWriter }

object EchoServer {
  var port = 1320

  def run() {
    val serverSocket = new ServerSocket(port)

    def process( socket: Socket) {
      val bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream))
      var line = ""
      var msg = ""
        do {
          line = bufferedReader.readLine()
          msg += line + "\n"
        } while (line != "")

      val out: PrintWriter = new PrintWriter(new OutputStreamWriter(socket.getOutputStream))
      out.println(msg)
      out.flush()
    }

    def loop() {
      while (true) {
        val socket = serverSocket.accept()
        process(socket)
        socket.close()
      }
    }
    loop()
  }

  def main(args: Array[String]) {

    if(args.length > 0) {
      val arglist = args.toList
      port = arglist(0).toInt
    }
    run()
  }

}

References

Compiling less files with Gulpjs

Gulp is a nodejs library useful to automatize workflows, in this post We will see how to use it to compile bootstrap less files.

Project Structure

Inside test folder we have an index.html file that imports bootstrap css from public/css/bootstrap.css. bootstrap-3.3.2/less directory contains less files that we have to compile in order to generate css file:

    test/
    ├── bootstrap-3.3.2
    │   ├── dist
    │   ├── fonts
    │   ├── js
    │   ├── less
    │   │   ├── bootstrap.less
    │   │   ├── mixins.less
    └── index.html
    └── gulpfile.js
    └── public
        └── css

Our Goal

What We want to do is to load test/bootstrap-3.3.2/less/bootstrap.less and generate public/css/bootstrap.css . We want to do it automatically, so every time that We modify some .less file inside test/bootstrap-3.3.2/less/ css file will be regenerated.

Libraries Installation

    npm install less
    npm install gulp
    npm install gulp-less
    npm install gulp-sourcemaps
    npm install  gulp-watch

Gulp file

Inside test folder, We will create a file named gulpfile.js:

    var gulp = require('gulp');

    gulp.task('default', function() {
      // place code for your default task here
    });

Compiling less

    var gulp = require('gulp'),
        less = require('gulp-less'),
        path = require('path'),
        sourcemaps = require('gulp-sourcemaps'),
        less_paths = {};

    // paths where less files are allocated

    less_paths = [
            __dirname + '/bootstrap-3.3.2/less/bootstrap.less'
        ]

    gulp.task('default', ['less:static']);


    gulp.task('less:static', function () {
        return gulp.src(less_paths)
            .pipe(sourcemaps.init())
            .pipe(less())
            .pipe(sourcemaps.write('./'))
            .pipe(gulp.dest(__dirname + '/public/css'));

    });

Now You can compile less and generate static css executing command gulp inside test directory. Gulp will search gulpfile.js and it will execute tasks defined as default.

Detecting changes and re compiling automatically

We can create another task that will watch for changes on the less files, so if You change any less file, gulp will regenerate css automatically. To do that We will add another funciton to gulpfile.js

    gulp.task('check', function() {
        gulp.watch(watch_paths, ['less:static']);
    });

gulpfile.js can be found at this link. Many tanks to my friend Raúl who helped me with this post.

References

  • Gulp's documentation Less Framework Gulp less Building with gulp
  • AWS SES Introduction

    Amazon SES is a service that allows AWS users to send outbound bulk email correspondence. This post cover the basis about this service.

    Verifying your email

    AWS requires email verification in order to grant that emails are being sent only by email owners. Verifying process is quick and simple you only have to login in your AWS account and follow the steps contained on this link: Verifying Email Addresses in Amazon SES

    Grant sending access to a group

    Is good idea have different groups for different tasks, below a json policy that can be used to grant ses access to a group:

        {
            "Version": "2012-10-17",
            "Statement":[{
               "Effect":"Allow",
               "Action":["ses:SendEmail", "ses:SendRawEmail"],
               "Resource":"*"
               }
            ]
         }
    

    To assing the policy to some group we can execute the command below:

        $aws iam put-group-policy --group-name webapp --policy-document file://ses-policy.json --policy-name ses-webapp
    

    AWS-cli and SES

    We can access AWS SES using console aws client, to see a complete of avaiable commands you can type aws ses help . First thing We can do is to list the identities associated to the AWS account with the command aws ses list-identities :

        $aws ses list-identities
    
        {
            "Identities": [
                "email@mydomail.com"
            ]
        }
    
    

    Sending messages

    aws ses send-email --from email@mydomail.com --to user@box.com --subject 'test message' --text 'hello world'  --html 'hello world'
    

    References

    AWS SQS Introduction

    Amazon Simple Queue Service (Amazon SQS) is a distributed queue messaging that supports programmatic sending of messages via web service applications as a way to communicate over the Internet. This post is about how to use this service.

    Creating Admin Security group and User

    It is a good security practice create an admin group and a user associated to this group. this user will be responsable for administrative task such as create or delete queues.

    Creating Admins group

      aws iam create-group --group-name Admins
    

    Setting admin access rule for Admin group

    We must define a policy that contains a set of permissions and then assing that policy to a certain group, to create admin policiy first at all, we must create a file named admin-policy.json and paste the code below:

        {
          "Version": "2012-10-17",
          "Statement": [{
            "Effect": "Allow",
            "Action": ["*"],
            "Resource": ["*"]
          }]
        }
    

    Now we can asign the policy definied in the json file to Admins group:

        aws iam put-group-policy --group-name Admins --policy-document file://AdminPolicy.json --policy-name AdminRoot
    

    Creating Admin User

    After create Admins group is good idea to create an admin user and associate this user to Admins group, is not recommended use AWS with your main aws access keys.

      aws iam create-user  --user-name mycloudadmin
      aws iam add-user-to-group --user-name mycloudadmin --group-name Admins
    

    SQS and aws-cli

    By typing aws sqs help We can see all options available for this service, We can create a queue with the command below:

        aws sqs create-queue --queue-name MyQueue --region us-west-2
    
    

    We can verify the queue creation with the command aws sqs list-queues

    Creating an User with limited access to the queue

    First at all We need to create a group named webapp:

    Setting admin access rule for webapp group

    We want to give reading SQS access to webapp:

      aws iam create-group --group-name webapp
    

    We will set up an access policy to allow webapp group read, put and delete messages from MyQueue, we will create a sqs-policy.json file:

    {
       "Version":"2012-10-17",
       "Statement" : [
          {
             "Effect":"Allow",
             "Action":["SQS:SendMessage",
                       "SQS:ReceiveMessage", "SQS:DeleteMessage"],
             "Resource":"arn:aws:sqs:*:123456789012:MyQueue"
          }
       ]
    }
    

    You must change the resource arn according your own data ,the format is arn:aws:sqs:region:account_ID:queue_name

        aws iam put-group-policy --group-name webapp --policy-document file://sqs-policy.json --policy-name webappSQS
    

    Creating the user

      aws iam create-user  --user-name myweb
      aws iam create-access-key --user-name myweb
      aws iam add-user-to-group --user-name myweb --group-name webapp
    

    Sending a message to the queue

        aws sqs  send-message --queue-url https://us-west-2.queue.amazonaws.com/123456789012/MyQueue --message-body '{"msg": "hello world"}'
    

    Reading a message from the queue

        aws sqs  receive-message --queue-url https://us-west-2.queue.amazonaws.com/123456789012/MyQueue --max-number-of-messages 10
    

    References

    Installing Chef Server on AWS using Python

    This post is about how to install chef server on a AWS EC2 instance using fabric python library.

    Create EC2 Instance

    In this link you will find all the steps required to create and start an EC2 instance using python

    Installing Chef server using Fabric Library

    Fabric is a python library and command-line tool that makes easy the use of SSH for application deployment or systems administration tasks.

    Fabric installation

        pip install fabric
    

    First Steps with fabric

    First at all we must create a file named fabfile.py we will add the code below to this file in order to check the linux box type:

        from fabric.api import run
    
        def host_type():
            run('uname -s')
    

    We can run fab to see what happens:

        $fab show_host_type  -i ~/.ssh/my_key.pem --host ec2-36-37-35-346.us-west-2.compute.amazonaws.com -u ubuntu
        [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] Executing task 'show_host_type'
        [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: uname -s
        [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Linux
        [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 
    
    
    Done.
    

    Encrypting EC2 Volume

    Chef components such as recipes, databags, etc, usually store sensitive data about our infraestructure, for that reason we will encrypt the block storage that we attached on the first step when we created the ec2 instance with python.

    In our example we will encrypt /dev/xvdf device using luks library, the first step is setting up the encryption on the device, to do that, we add another task on the fabfile:

        def setup_luks(device):
            cmd  = 'sudo cryptsetup -y --cipher blowfish luksFormat {}'.format(device)
            run(cmd)
    
    
    fab setup_luks:'/dev/xvdf' -i ~/.ssh/my_key.pem --host ec2-36-37-35-346.us-west-2.compute.amazonaws.com -u ubuntu
    
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] Executing task 'setup_luks'
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo cryptsetup -y --cipher blowfish luksFormat /dev/xvdf
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: WARNING!
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: ========
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: This will overwrite data on /dev/xvdf irrevocably.
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Are you sure? (Type uppercase yes): YES
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Enter passphrase: 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Verify passphrase: 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 
    
    
    Done.
    Disconnecting from ec2-54-69-12-111.us-west-2.compute.amazonaws.com... done.
    
    

    The next step is verify the device, in order to do this we will add another task to the fabfile:

        def verify_luks_device(device):
            cmd = 'sudo file -s {0}'.format(device)
            run(cmd)
    
        fab verify_luks_device:'/dev/xvdf' -i ~/.ssh/my_key.pem --host ec2-36-37-35-346.us-west-2.compute.amazonaws.com -u ubuntu
    
    [ec2-54-69-12-111.us-west-2.compute.amazonaws.com] Executing task 'verify_luks_device'
    [ec2-54-69-12-111.us-west-2.compute.amazonaws.com] run: sudo file -s /dev/xvdf
    [ec2-54-69-12-111.us-west-2.compute.amazonaws.com] out: /dev/xvdf: LUKS encrypted file, ver 1 [blowfish, cbc-plain, sha1] UUID: e20d71a4-0a4a-43ba-84cc-1ab3470e845f
    
    

    Now We now that We have a valid luks encryted device, the next is open the device, to do that we add another task to the fabfile:

        def open_luks_device(device, storedev):
            cmd = 'sudo cryptsetup luksOpen {} {}'.format(device, storedev)
            run(cmd)
    
    
        fab open_luks_device:'/dev/xvdf,mydevice' -i ~/.ssh/my_key.pem --host ec2-36-37-35-346.us-west-2.compute.amazonaws.com -u ubuntu
    
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] Executing task 'open_luks_device'
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo cryptsetup luksOpen /dev/xvdf myvol
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Enter passphrase for /dev/xvdf: 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 
    

    Now we have the device mapped in /dev/mapper/mydevice, the next step is format the device:

        def format_luks_device(storedev):
            run('sudo mkfs.ext4 -m 0 /dev/mapper/{}'.format(storedev))
    
        fab format_luks_device:'mydevice' -i ~/.ssh/my_key.pem --host ec2-36-37-35-346.us-west-2.compute.amazonaws.com -u ubuntu
    
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] Executing task 'format_luks_device'
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo mkfs.ext4 -m 0 /dev/mapper/mydevice
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: mke2fs 1.42.9 (4-Feb-2014)
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Filesystem label=
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: OS type: Linux
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Block size=4096 (log=2)
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Fragment size=4096 (log=2)
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Stride=0 blocks, Stripe width=0 blocks
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 655360 inodes, 2620928 blocks
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 0 blocks (0.00%) reserved for the super user
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: First data block=0
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Maximum filesystem blocks=2684354560
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 80 block groups
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 32768 blocks per group, 32768 fragments per group
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 8192 inodes per group
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Superblock backups stored on blocks: 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out:         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Allocating group tables: done                            
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Writing inode tables: done                            
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Creating journal (32768 blocks): done
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: Writing superblocks and filesystem accounting information: done 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] out: 
    

    The next step is backup the luks header and store it on our local box, We can use the task below:

        def backup_luks_header_device(device):
            run('sudo cryptsetup luksHeaderBackup --header-backup-file /home/ubuntu/luksbackup {}'.format(device))
            run('sudo chown ubuntu /home/ubuntu/luksbackup')
            get('/home/ubuntu/luksbackup', './')
            run('sudo rm /home/ubuntu/luksbackup')
    
    fab backup_luks_header_device:'/dev/xvdf' -i ~/.ssh/mykey.pem --host ec2-36-37-35-346.us-west-2.compute.amazonaws.com -u ubuntu                                                                                                                                                                         
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] Executing task 'backup_luks_header_device'                                                                                                   
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo cryptsetup luksHeaderBackup --header-backup-file /home/ubuntu/luksbackup /dev/xvdf                                                 
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo chown ubuntu /home/ubuntu/luksbackup                                                                                               
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] download: /home/manuel/src/blog/2015/01/chef-aws-python/luksbackup <- /home/ubuntu/luksbackup                                                
    
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: rm /home/ubuntu/luksbackup
    

    The last step is mount the volume:

        def mount_luks_device(storedev, mount_point, owner):
            run('sudo mkdir -p {}'.format(mount_point))
            run('sudo mount /dev/mapper/{} {}'.format(storedev, mount_point))
            run('sudo chown {} {}'.format(owner, mount_point))
    
    fab mount_luks_device:'mydevice,/secure,ubuntu' -i ~/.ssh/mykey.pem --host ec2-36-37-35-346.us-west-2.compute.amazonaws.com -u ubuntu
    
    
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] Executing task 'mount_luks_device'
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo mkdir -p /secure
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo mount /dev/mapper/mydevice /secure
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo chown ubuntu /secure
    
    

    If you want to unmount and close the device you must execute the task below:

        def unmount_luks_device(storedev, mount_point):
            run('sudo umount {}'.format(mount_point))
            run('sudo cryptsetup luksClose {}'.format(storedev))
    
        fab unmount_luks_device:'mydevice,/secure' -i ~/.ssh/mykey.pem --host ec2-36-37-35-346.us-west-2.compute.amazonaws.com -u ubuntu
    
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] Executing task 'unmount_luks_device'
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo umount /secure
    [ec2-36-37-35-346.us-west-2.compute.amazonaws.com] run: sudo cryptsetup luksClose mydevice
    

    Chef Server Installation

    The objective is automatize the steps required to install chef server

        def install_chef_server():
            run('sudo apt-get install ruby')
            run(' wget  https://web-dl.packagecloud.io/chef/stable/packages/ubuntu/trusty/chef-server-core_12.0.1-1_amd64.deb')
            run('sudo dpkg -i /home/ubuntu/chef-server-core_12.0.1-1_amd64.deb')
            run('sudo chef-server-ctl reconfigure')
    

    The fabfile.py can be found at this link

    References

    AWS EC2 and Python Introduction

    This post is about how to create an ec2 ubuntu 14.04 AWS instance using python.

    PYTHON BOTO

    Boto is a python interface to Amazon Web Services, it is widely used and it has good documentation. The first step is to install the library:

        $pip install boto
    

    Boto Configuration

    Create a ~/.boto file with these contents:

        [Credentials]
        aws_access_key_id = YOURACCESSKEY
        aws_secret_access_key = YOURSECRETKEY
    

    Getting started with boto

        >>> import boto.ec2
    
        >>> for region in  boto.ec2.regions():
        ...     print region
        ... 
        RegionInfo:us-east-1
        RegionInfo:cn-north-1
        RegionInfo:ap-northeast-1
        RegionInfo:eu-west-1
        RegionInfo:ap-southeast-1
        RegionInfo:ap-southeast-2
        RegionInfo:us-west-2
        RegionInfo:us-gov-west-1
        RegionInfo:us-west-1
        RegionInfo:eu-central-1
        RegionInfo:sa-east-1
    
        >>> ec2 = boto.ec2.connect_to_region('us-west-2')
    
        >>> for zone in  ec2.get_all_zones():
        ...     print zone
        ... 
        Zone:us-west-2a
        Zone:us-west-2b
        Zone:us-west-2c
    
    
        >>> for sec in ec2.get_all_security_groups():
        ...     print sec
        ... 
        SecurityGroup:default
    
    

    Creating Security Groups

    We can create a python script that gets or creates a security group by name:

        import boto
        import boto.ec2
    
        import argparse
    
    
        def get_or_create_security_group(region ,group_name):
            """
            Search by group_name, if doesn't exits, it is created
            """
            try:
                ec2 = boto.ec2.connect_to_region(region)
                group = ec2.get_all_security_groups(groupnames=[group_name])[0]
            except ec2.ResponseError, e:
                if e.code == 'InvalidGroup.NotFound':
                    group = ec2.create_security_group(
                        group_name, 'group {} for region {}'.format(group_name, region))
                else:
                    raise
            return group
    
    
        def main():
            parser = argparse.ArgumentParser()
            parser.add_argument('--name', required=True)
            parser.add_argument('--region',  default='us-west-2')
            args = parser.parse_args()
            print get_or_create_security_group(args.region, args.name)
    
    
        if __name__ == '__main__':
            main()
    
    

    We can call this script to create a chef group:

      python create_security_group.py --name=my-chef-group  --region=us-west-2
    

    After that we can grant access to certain cidr addresses with the script below:

    
        import boto
        import boto.ec2
    
        import argparse
    
    
        def revoque_from_security_group(
                ip_protocol='tcp', region='us-west-2', name=None, from_port=None,
                to_port=None, cidr_ip=None):
            """
            Authorize cidr_ip access to a certain security group
            """
            ec2 = boto.ec2.connect_to_region(region)
            group = ec2.get_all_security_groups(groupnames=[name])[0]
            return group.authorize(ip_protocol=ip_protocol, from_port=from_port, to_port=to_port, cidr_ip=cidr_ip)
    
    
        def main():
            parser = argparse.ArgumentParser()
            parser.add_argument('--name', required=True)
            parser.add_argument('--region',  default='us-west-2')
            parser.add_argument('--ip_protocol', default='tcp')
            parser.add_argument('--from_port', required=True)
            parser.add_argument('--to_port', required=True)
            parser.add_argument('--cidr_ip', required=True)
            args = parser.parse_args()
            print revoque_from_security_group(
                ip_protocol=args.ip_protocol, region=args.region, name=args.name,
                from_port=args.from_port, to_port=args.to_port, cidr_ip=args.cidr_ip)
    
    
        if __name__ == '__main__':
            main()
    

    Now We can grant ssh access to the chef group:

      python autorize_security_group.py --name=chef-group --ip_protocol=tcp --from_port=22 --to_port=22 --cidr_ip=my_ip/32 --region=us-west-2
    

    We can revoke access to the group with the script below:

    
        import boto
        import boto.ec2
    
        import argparse
    
    
        def revoque_from_security_group(
                ip_protocol='tcp', region='us-west-2', name=None, from_port=None,
                to_port=None, cidr_ip=None):
            """
            Revoque cidr_ip access to a certain security group
            """
            ec2 = boto.ec2.connect_to_region(region)
            group = ec2.get_all_security_groups(groupnames=[name])[0]
            return group.revoke(ip_protocol=ip_protocol, from_port=from_port, to_port=to_port, cidr_ip=cidr_ip)
    
    
        def main():
            parser = argparse.ArgumentParser()
            parser.add_argument('--name', required=True)
            parser.add_argument('--region',  default='us-west-2')
            parser.add_argument('--ip_protocol', default='tcp')
            parser.add_argument('--from_port', required=True)
            parser.add_argument('--to_port', required=True)
            parser.add_argument('--cidr_ip', required=True)
            args = parser.parse_args()
            print revoque_from_security_group(
                ip_protocol=args.ip_protocol, region=args.region, name=args.name,
                from_port=args.from_port, to_port=args.to_port, cidr_ip=args.cidr_ip)
    
    
        if __name__ == '__main__':
            main()
    

    We can revoke ssh access to chef group whenever We want:

    python revoke_security_group.py --name=chef-group --ip_protocol=tcp --from_port=22 --to_port=22 --cidr_ip=my_ip/32 --region=us-west-2

    Creating ssh keypairs

    We need a ssh keypair in order to access ec2 instances. Keypairs are by region, that means we need a keypair for every region where We want to create ec2 instances, we can create a keypair with the script below:

        import boto
        import boto.ec2
    
        import argparse
    
    
        def get_or_create_key_pair(key_name=None, key_dir='~/.ssh', region='us-west-2'):
            try:
                ec2 = boto.ec2.connect_to_region(region)
                key = ec2.get_all_key_pairs(keynames=[key_name])
            except ec2.ResponseError, e:
                if e.code == 'InvalidKeyPair.NotFound':
                    # Create an SSH key to use when logging into instances.
                    key =  ec2.create_key_pair(key_name)
                    if not key.save(key_dir):
                        print('Key could not be created\n')
                        raise
                else:
                    raise
            return key
    
    
        def main():
            parser = argparse.ArgumentParser()
    
            parser.add_argument('--key_name', required=True)
            parser.add_argument('--key_dir', required=False)
            parser.add_argument('--region',  default='us-west-2')
            args = parser.parse_args()
    
            key = get_or_create_key_pair(
                key_name=args.key_name, key_dir=args.key_dir, region=args.region)
    
            print(key)
    
        if __name__ == '__main__':
            main()
    
    
      python create_key_pair.py --region=us-west-2 --key_name=manuel_uswest2 --key_dir='~/.ssh'
    

    Creating EBS Volume

    Now we need to create the ebs volume that will be attached to the EC2 instance, We can use the script below:

    
        import boto
        import boto.ec2
    
        import argparse
    
    
        def create_volume(
                region='us-west-2', zone='us-west-2a', size_gb=None, snapshot=None,
                volume_type='standard', iops=None, dry_run=False):
    
            ec2 = boto.ec2.connect_to_region(region)
            v = ec2.create_volume(
                size_gb, zone, snapshot=snapshot, volume_type=volume_type, iops=iops,
                dry_run=dry_run)
            return v
    
    
        def main():
            parser = argparse.ArgumentParser()
    
            parser.add_argument('--size_gb', type=int,required=True)
            parser.add_argument('--region',  default='us-west-2')
            parser.add_argument('--zone', default='us-west-2a')
            parser.add_argument('--volume_type',  default='standard')
            parser.add_argument('--snapshot')
            args = parser.parse_args()
            v = create_volume(
                region=args.region, zone=args.zone, size_gb=args.size_gb,
                snapshot=args.snapshot, volume_type=args.volume_type)
            print(v)
    
        if __name__ == '__main__':
            main()
    
      $python create_ebs_volume.py --region=us-west-2 --zone=us-west-2a --size_gb=10 --volume_type=standard
    
      Volume:vol-d543d3a1
    

    Creating EC2 Instance

    First at all, We need to choose an AMI image, in that case we will use the AMI for ubuntu 14.04 with code ami-e55100d5 you can see at this link all ubuntu AMIS available by version and region.

    After choosing the rigth AMI You can use the script bellow to create the instance:

        $python launch_instance.py --region=us-west-2 --ami=ami-e55100d5 --instance_type=t1.micro --key_name=manuel_uswest2 --tag=mychefserver --group_name=chef- --ssh_port=22 --cidr_ip=37.46.83.126/32 
    
        .
        .
        .
        Instance:i-5da46734
    

    Attach Volume to instance

    AWS instances have a root hard drive, but when an instance is stopped the root volume associated to that instance dissapears. In order to keep important data save is important attach an additional volume. We can do that with the script below:

    
        import boto
        import boto.ec2
    
        import argparse
    
    
        def attach_volume(
                region='us-west-2', volume_id=None, instance_id=None,
                device_name='sdb'):
    
            ec2 = boto.ec2.connect_to_region(region)
            volume = ec2.get_all_volumes(volume_ids=volume_id)[0]
            volume.attach(instance_id, device_name)
            return volume
    
        def main():
            parser = argparse.ArgumentParser()
    
            parser.add_argument('--region',  default='us-west-2')
            parser.add_argument('--volume_id',  required=True)
            parser.add_argument('--instance_id',  required=True)
            parser.add_argument('--device_name',  default='sdb')
    
            args = parser.parse_args()
            v = attach_volume(
                region=args.region, volume_id=args.volume_id,
                device_name=args.device_name, instance_id=args.instance_id)
            print(v)
    
        if __name__ == '__main__':
            main()
    
        $python attach_volume.py --region=us-west-2 --volume_id=vol-d543d3a1 --instance_id=i-5da46734 --device_name=sdb
    

    The code for this post can be found at this link

    References

    Go Programming Language Introduction

    Golang, is a programming language initially developed at Google. One of it's creators is Ken Thompson , one of the creators of C Programming Language. It is a statically-typed language with syntax loosely derived from that of C, adding garbage collection, type safety, some dynamic-typing capabilities, additional built-in types such as variable-length arrays and key-value maps, and a large standard library. In this post We will cover installation, and the construtcion of a basic website.

    Installation

    We will install and configure debian based system. You can referd to official documentation where instructions for other platforms are availabe.

    Installing the language

      $ sudo apt-get install golang  
    

    Installing the tools

    After install the language you must download the tools file and extract it into /usr/local

      $ wget https://storage.googleapis.com/golang/go1.3.3.linux-amd64.tar.gz
      $ sudo mv go1.3.3.linux-amd64.tar.gz /usr/local/
      $ cd /usr/local
      $ sudo  tar -xzf go1.3.3.linux-amd64.tar.gz
    

    Add /usr/local/go/bin to the PATH environment variable

        export PATH=$PATH:/usr/local/go/bin
    

    Hello world

    Create a file named hello.go and copy the code:

      package main
      import "fmt"
    
      func main() {
          fmt.Printf("hello, world\n")
      }
    

    Run the program:

      $ go run hello.go  
    

    Simple Website

    We will use the standard http package to write a simple httpserver, first at all We will create a file named server.go and Wi will add the next lines:

        package main
    
        import (
            "fmt"
            "io/ioutil"
            "net/http"
        )
    

    Now we will create an struct where to store every page:

        type Page struct {
            Title string
            Body  []byte
        }
    

    Every page is represented by a title tht is a string and the body that will be represented as a slide (sequence) of bytes. A detailed explanation of slides can be found at this link.

    Now we create a method that will read the body from a text file, for simplicity We assume that filename is the page's title.

        func loadPage(title string) (*Page, error) {
            filename := title + ".html"
            body, err := ioutil.ReadFile(filename)
            if err != nil {
                return nil, err
            }
            return &Page{Title: title, Body: body}, nil
        }
    

    Now we add the code that will handle http request and will return the html content:

    
        func viewHandler(w http.ResponseWriter, r *http.Request) {
            title := r.URL.Path[len("/view/"):]
            p, _ := loadPage(title)
            fmt.Fprintf(w, "%s",  p.Body)
        }
    
    

    Finally, We will add the main method, where We create the http server:

        func main() {
            http.HandleFunc("/view/", viewHandler)
            http.ListenAndServe(":8080", nil)
        }
    

    We can build the program with the command go build server.go and then running it just typing ./server

    You can test the server with a simple curl request:

        $curl http://localhost:8080/view/go-intro
    

    You can see the source code at this link

    References

    Dropwizard Api Rest Example

    Dropwizard is a Java framework used to build Restful APIs. This post shows an example application that can be found at this link.

    Maven project management

    Apache Maven is a software project management and comprehension tool. You can learn the basis at this link

    Create maven project

    mvn archetype:generate -DgroupId=com.notempo1320 \
        -DartifactId=user-rest -DarchetypeArtifactId=maven-archetype-quickstart \
        -DinteractiveMode=false
    

    Maven project configuration

    Add a dropwizard.version property to your POM with the current version of Dropwizard (which is 0.7.0 at the moment of write this post):

        
            0.7.0
        
    

    Add the dropwizard libraries as dependencies:

      
        
          com.h2database
          h2
          1.3.170
        
        
          io.dropwizard
          dropwizard-core
          ${dropwizard.version}
        
        
          io.dropwizard
          dropwizard-db
          ${dropwizard.version}
        
        
          io.dropwizard
          dropwizard-hibernate
          ${dropwizard.version}
        
        
          io.dropwizard
          dropwizard-auth
          ${dropwizard.version}
        
        
          io.dropwizard
          dropwizard-client
          ${dropwizard.version}
        
        
          io.dropwizard
          dropwizard-testing
          ${dropwizard.version}
          test
        
      
    
    

    Dropwizard project

    Creating A Configuration Class

        public class AppConfiguration extends io.dropwizard.Configuration {
            private URI endpointUri = null;
    
            @NotNull
            @JsonProperty("endpoint")
            private  String endpoint;
    
            @Valid
            @NotNull
            @JsonProperty("database")
            private DataSourceFactory database = new DataSourceFactory();
    
            public DataSourceFactory getDataSourceFactory() {
                return database;
            }
    
            public URI getEndpoint() throws java.net.URISyntaxException {
                if (null == this.endpointUri) {
                    this.endpointUri = new URI(this.endpoint);
                }
                return this.endpointUri;
            }
        }
    

    Creating An Application Class

    The App class is the entry point to the Rest Api. The example below shows how to configure hibernate bundle , command line interfaces , an Authentication provider and a basic Person Resource:

    
        public class App extends Application {
    
            private final HibernateBundle hibernate;
    
            public App() {
                this.hibernate= new HibernateBundle(
                    Person.class, User.class) {
                    @Override
                    public DataSourceFactory getDataSourceFactory(
                        AppConfiguration configuration) {
                        return configuration.getDataSourceFactory();
    
                    }
                };
    
            }
    
            @Override
            public void initialize(Bootstrap bootstrap) {
    
                bootstrap.getObjectMapper().registerModule(new MrBeanModule());
                bootstrap.getObjectMapper()
                        .setPropertyNamingStrategy(
                            PropertyNamingStrategy.CAMEL_CASE_TO_LOWER_CASE_WITH_UNDERSCORES);
                bootstrap.addBundle(this.hibernate);
                bootstrap.addCommand(new CreateUserCommand());
                bootstrap.addCommand(new ListUserCommand());
    
    
            }
    
            @Override
            public void run(AppConfiguration configuration, Environment environment)
                throws Exception {
    
                final GenericDAO userDao =
                        new UserHibernateDAO(this.hibernate.getSessionFactory());
                final BaseFacade userFacade = new UserFacade(userDao);
    
                final GenericDAO dao =
                        new PersonHibernateDAO(this.hibernate.getSessionFactory());
                final BaseFacade personFacade = new  PersonFacade(dao);
    
                environment.jersey().register(
                    new PersonResource(personFacade, configuration)
                );
    
                environment.jersey().register(
                    new BasicAuthProvider<>(new SimpleAuthenticator(userFacade), "ntrest"));
    
    
            }
            public static void main(String[] args) throws Exception {
                new App().run(args);
            }
        }
    
    

    Model entities

    Base Entity

    To make easier serialization and common operations, all models will inherit from a Base Class:

    
        public abstract class BaseModel {
             public UUID randomUUID() {
                return UUID.randomUUID();
            }
    
    
        }
    
    

    Person Entity

    
        @Entity
        @Table(
            name = "persons",
            uniqueConstraints =
                {@UniqueConstraint(columnNames={"username", "email"})}
        )
    
        public class Person extends BaseModel {
            @Id
            @GeneratedValue(strategy=GenerationType.IDENTITY)
            @Column
            private Long    id;
    
            @Column
            @NotNull @Length(min = 8)
            private String email;
    
            @Column
            @NotNull @Length(min = 8)
            private String username;
    
            @Column
            private String token;
    
            @Column(name="first_name")
            private String firstName;
    
            @Column(name="last_name")
            private String lastName;
    
            @Column
            @NotNull
            private boolean active;
    
            @JsonProperty
            public Long getId() {
                return id;
            }
    
            @JsonProperty
            public String getUsername() {
                return this.username;
            }
    
            @JsonProperty
            public String getEmail() {
                return this.email;
            }
    
            @JsonProperty
            public String getToken() {
                return this.token;
            }
    
            @JsonProperty
            public String getFirstName() {
                return this.firstName;
            }
    
            @JsonProperty
            public String getLastName() {
                return this.lastName;
            }
    
            @JsonProperty
            public boolean getActive() {
                return this.active;
            }
    
            public void setUsername(String username) {
                this.username = username;
            }
    
            public void setEmail(String email) {
                this.email = email;
            }
    
            public void setToken(String token) {
                this.token = token;
            }
    
            public void setActive(boolean active) {
                this.active = active;
            }
    
            public void setFirstName(String firstName) {
                this.firstName = firstName;
            }
    
            public void setLastName(String lastName) {
                this.lastName = lastName;
            }
    
        }
    
    

    User Entity

    
    
        @Entity
        @Table(
            name = "users",
            uniqueConstraints =
                {@UniqueConstraint(columnNames={"username", "email", "token"})}
        )
    
    
        public class User extends BaseModel {
           @Id
            @GeneratedValue(strategy=GenerationType.IDENTITY)
            @Column
            private Long    id;
    
            @Column(name = "email")
            @NotNull @Length(min = 8)
            private String email;
    
            @Column(name = "username")
            @NotNull @Length(min = 8)
            private String username;
    
            @Column(name = "token")
            @NotNull
            private String token;
    
            @Column(name = "is_active")
            @NotNull
            private boolean active;
    
    
            public Long getId() {
                return id;
            }
    
            public String getUsername() {
                return this.username;
            }
    
            public String getEmail() {
                return this.email;
            }
    
            public String getToken() {
                return this.token;
            }
    
            public void setUsername(String username) {
                this.username = username;
            }
    
            public void setEmail(String email) {
                this.email = email;
            }
    
            public void setToken(String token) {
                this.token = token;
            }
    
            public boolean getActive() {
                return this.active;
            }
    
            public void setActive(boolean active) {
                this.active = active;
            }
    
        }
    

    Data Access layer

    Generic DAO

    
        public interface GenericDAO  {
            public T create(T obj);
    
            public Optional findById(Long id);
    
            public List findByParams(Optional> params);
    
            public T update(T obj);
    
            public void delete(T obj);
    
            public long count();
        }
    

    Hibernate base DAO

    
    
        abstract public class BaseHibernateDAO extends AbstractDAO  {
    
            private static final Logger LOGGER =
                LoggerFactory.getLogger(BaseHibernateDAO.class);
    
            public BaseHibernateDAO(SessionFactory sessionFactory) {
                super(sessionFactory);
            }
    
            public abstract T create(T obj);
    
            public abstract Optional findById(Long id);
    
            public List findByParams(
                Optional> params) {
                Criteria criteria = criteria();
                LOGGER.info("Executing findByParams");
                if(params.isPresent()) {
    
                    Map mapParams = params.get();
                    for (Map.Entry entry: mapParams.entrySet()) {
                        criteria.add(
                            Restrictions.eq(entry.getKey(), entry.getValue()));
                    }
                }
    
                return criteria.list();
            }
    
            public abstract T update(T obj);
    
            public abstract void delete(T obj);
    
            public long count() {
                return (Long) criteria().
                    setProjection(Projections.rowCount()).uniqueResult();
            }
    
        }
    

    User DAO

    
    
        public class UserHibernateDAO extends BaseHibernateDAO
            implements GenericDAO
            {
    
            @Inject
            public UserHibernateDAO(SessionFactory factory) {
                super(factory);
            }
    
            @Override
            public User create(User obj){
                return persist(obj);
            }
    
            @Override
            public Optional findById(Long id) {
                return Optional.fromNullable(get(id));
            }
    
            @Override
            public User update(User obj) {
                return persist(obj);
            }
    
            @Override
            public void delete(User obj) {
                currentSession().delete(obj);
            }
        }
    
    
    

    Person DAO

    
    
        public class PersonHibernateDAO extends BaseHibernateDAO
            implements GenericDAO
             {
    
            @Inject
            public PersonHibernateDAO(SessionFactory factory) {
                super(factory);
            }
    
            @Override
            public Person create(Person obj){
                obj.setToken(obj.randomUUID().toString());
                return persist(obj);
            }
    
            @Override
            public Optional findById(Long id) {
                return Optional.fromNullable(get(id));
            }
    
    
            @Override
            public Person update(Person obj) {
                return persist(obj);
            }
    
            @Override
            public void delete(Person obj) {
                currentSession().delete(obj);
            }
        }
    

    Facade Layer

    Base Facade

    
        public interface BaseFacade {
    
            public T create(T obj);
    
            public Optional findById(Long id);
    
            public List findByParams(Optional> params);
    
            public T update(T obj);
    
            public void delete(T obj);
    
            public long count();
        }
    

    User Facade

    
    
        public class UserFacade implements BaseFacade {
            private GenericDAO dao;
    
            @Inject
            public UserFacade(GenericDAO dao) {
                this.dao = dao;
            }
    
            public User create(User model) {
                return dao.create(model);
            }
    
            public Optional findById(Long id) {
                return dao.findById(id);
            }
    
            public List findByParams(Optional> params) {
                return dao.findByParams(params);
            }
    
            public User update(User obj) {
                return dao.update(obj);
            }
    
            public void delete(User obj) {
                dao.delete(obj);
            }
    
            public long count() {
                return dao.count();
            }
    
    
        }
    
    

    Person facade

    
        public class PersonFacade implements BaseFacade {
            private GenericDAO dao;
    
            @Inject
            public PersonFacade(GenericDAO dao) {
                this.dao = dao;
            }
    
            public Person create(Person model) {
                return dao.create(model);
            }
    
            public Optional findById(Long id) {
                return dao.findById(id);
            }
    
            public List findByParams(Optional> params) {
                return dao.findByParams(params);
            }
    
            public Person update(Person obj) {
                return dao.update(obj);
            }
    
            public void delete(Person obj) {
                dao.delete(obj);
            }
    
            public long count() {
                return dao.count();
            }
    
        }
    

    Command Line Interfaces

    For security reasons, I don't want to expose Admin User creation over http, so the only chance to create and list admin users is using a command Line interface.

    Command Line to create Users

    
        public class CreateUserCommand extends ConfiguredCommand {
    
            private static final Logger LOGGER =
                LoggerFactory.getLogger(CreateUserCommand.class);
    
            private GuiceBundle guiceBundle;
    
            public CreateUserCommand() {
                super("create_user", "Create a user that can access the app");
    
    
            }
    
            @Override
            public void configure(Subparser subparser) {
                super.configure(subparser);
                subparser.addArgument("-u", "--username")
                         .help("admin username");
            }
    
    
            @Override
            protected void run(Bootstrap bootstrap,
                               Namespace namespace,
                               AppConfiguration configuration) {
                AnnotationConfiguration dbConfig = null;
                SessionFactory factory = null;
                try {
                    dbConfig = HibernateConfig.getConfig(configuration);
                    dbConfig.addAnnotatedClass(User.class);
                    factory = dbConfig.buildSessionFactory();
                    final GenericDAO dao =
                        new UserHibernateDAO(factory);
                    final UserFacade facade = new UserFacade(dao);
    
                    String username = namespace.getString("username");
                    String pass1;
                    String pass2;
                    User obj = new User();
    
                    factory.getCurrentSession().beginTransaction();
                    System.out.print("\n Creating user: \n");
                    System.out.print("\n type your username: \n");
                    BufferedReader in = new BufferedReader(
                        new InputStreamReader(System.in));
                    obj.setUsername(in.readLine());
    
                    System.out.print("\n type your email: ");
                    in = new BufferedReader(new InputStreamReader(System.in));
                    obj.setEmail(in.readLine());
    
                    obj.setToken(obj.randomUUID().toString());
    
                    User createdUser = dao.create(obj);
                    factory.getCurrentSession().getTransaction().commit();
                    System.out.print("\nUser successfully created:\n");
    
                    StringWriter writer = new StringWriter();
                    bootstrap.getObjectMapper().writeValue(writer, createdUser);
                    System.out.print(writer.toString());
                    System.out.print("\n");
                    System.exit(0);
                } catch (Exception ex) {
                    ex.printStackTrace();
                    factory.getCurrentSession().getTransaction().rollback();
                    System.exit(1);
                }
    
            }
    
        }
    

    Command Line to List Admin Users

    
    
        public class ListUserCommand extends ConfiguredCommand {
    
            private static final Logger LOGGER =
                LoggerFactory.getLogger(ListUserCommand.class);
    
            private GuiceBundle guiceBundle;
    
            public ListUserCommand() {
                super("list_users", "List all admin users");
    
    
            }
    
            @Override
            public void configure(Subparser subparser) {
                super.configure(subparser);
            }
    
    
            @Override
            protected void run(Bootstrap bootstrap,
                               Namespace namespace,
                               AppConfiguration configuration) {
                AnnotationConfiguration dbConfig = null;
                SessionFactory factory = null;
                try {
                    dbConfig = HibernateConfig.getConfig(configuration);
                    dbConfig.addAnnotatedClass(User.class);
                    factory = dbConfig.buildSessionFactory();
                    final GenericDAO dao =
                        new UserHibernateDAO(factory);
                    final UserFacade facade = new UserFacade(dao);
    
    
                    factory.getCurrentSession().beginTransaction();
                    System.out.print("\n Admin User List \n");
    
                    List users = facade.findByParams(
                        Optional.fromNullable(null));
                    StringWriter writer = new StringWriter();
                    bootstrap.getObjectMapper().writeValue(writer, users);
                    System.out.print(writer.toString());
                    System.out.print("\n");
                    System.exit(0);
                } catch (Exception ex) {
                    ex.printStackTrace();
                    factory.getCurrentSession().getTransaction().rollback();
                    System.exit(1);
                }
    
            }
    
        }
    

    Rest Resources

    Base Resource

    
    
    
        public interface BaseResource  {
    
            @POST
            public Response create(@Auth User user, @Valid T model, @Context UriInfo info);
    
            @GET
            public List list(@Auth User user) throws InternalErrorException;
    
            @GET
            @Path("/{id}")
            public T retrieve(@Auth User user, @PathParam("id") Long id)
                throws ResourceNotFoundException, InternalErrorException;
    
            @PUT
            @Path("/{id}")
            public T update(@Auth User user, @PathParam("id") Long id,
                @Valid T entity) throws ResourceNotFoundException,
                InternalErrorException;
    
            @DELETE
            @Path("/{id}")
            public Response delete(@Auth User user, @PathParam("id") Long id)
                throws ResourceNotFoundException, InternalErrorException;
        }
    

    Person Resource

    
    
        @Path("/persons")
        @Produces(MediaType.APPLICATION_JSON)
        @Consumes(MediaType.APPLICATION_JSON)
        public class PersonResource implements BaseResource {
    
            private static final Logger LOGGER =
                LoggerFactory.getLogger(PersonResource.class);
    
            private final BaseFacade facade;
            private final URI resourceUri;
            private final AppConfiguration config;
    
            @Inject
            public PersonResource(BaseFacade facade, AppConfiguration config)
                throws java.net.URISyntaxException {
                this.resourceUri = config.getEndpoint().resolve("/persons");
                this.facade = facade;
                this.config = config;
            }
    
            @Override
            @POST
            @UnitOfWork
            public Response create(@Auth User user, @Valid Person model,
                @Context UriInfo info) {
    
                LOGGER.info("Creating person with admin user {}", user.getUsername());
                model.setActive(true);
                Person obj = facade.create(model);
                URI resource = HttpUtils.getCreatedResourceURI(info,
                    resourceUri, obj.getId());
    
                LOGGER.info("Person with id {} created", obj.getId());
                return Response.created(resource).entity(
                    new SerializedModel<>("person", obj)).build();
            }
    
            @Override
            @GET
            @UnitOfWork
            public List list(@Auth User user) throws InternalErrorException {
                LOGGER.info("Getting person list with admin user {}",
                    user.getUsername());
                return facade.findByParams(Optional.fromNullable(null));
            }
    
            @GET
            @Override
            @Path("/{id}")
            @UnitOfWork
            public Person retrieve(@Auth User user, @PathParam("id") Long id)
                throws ResourceNotFoundException, InternalErrorException {
    
                LOGGER.info("Retreiving person {} with admin user {}", id,
                    user.getUsername());
    
                Optional op = facade.findById(id);
                if (!op.isPresent()) {
                    throw new ResourceNotFoundException();
                }
                return op.get();
    
            }
    
            @PUT
            @Override
            @Path("/{id}")
            @UnitOfWork
            public Person update(@Auth User user, @PathParam("id") Long id,
                @Valid Person model) throws ResourceNotFoundException,
                InternalErrorException {
    
                LOGGER.info("Updating person {} with admin user {}", id,
                    user.getUsername());
    
                Optional op = facade.findById(id);
    
                if (!op.isPresent()) {
                    throw new ResourceNotFoundException();
                }
    
                return facade.update(model);
            }
    
    
            @DELETE
            @Override
            @Path("/{id}")
            @UnitOfWork
            public Response delete(@Auth User user, @PathParam("id") Long id)
                throws ResourceNotFoundException, InternalErrorException {
    
                LOGGER.info("Deleteing person {} with admin user {}", id,
                    user.getUsername());
                Optional op = facade.findById(id);
                if (!op.isPresent()) {
                    throw new ResourceNotFoundException();
                }
    
                Person obj = op.get();
                facade.delete(obj);
                return Response.ok().build();
            }
        }
    

    References

    OpenStack Nova API Introduction

    OpenStack is a free and open-source software cloud computing software platform that can be accessed an managed through a restfull API. This post is an introduction to this API using a well known OpenStack implementation at Rackspace .

    Prerequisites: You need to open a rackspace account https://cart.rackspace.com/cloud/

    Generating API Keys

    First Thing we need to de is generate a valid key to access the API, to do that we must login to the control panel and Follow this instructions.

    Note: for security reasons is recommended create a separate user only for API access and generate a key for this user.

    Command Line Rackspace Nova API Client Tool

    Nova is the project name for OpenStack Compute. The command line tool is a python application that can be installed via pip:

      pip install rackspace-novaclient
    

    You can check your installation running nova help command.

    Configuring Nova API Client

    We need to define environment variables:

        export OS_AUTH_URL=https://identity.api.rackspacecloud.com/v2.0/
        export OS_AUTH_SYSTEM=rackspace
        export OS_REGION_NAME=DFW
        export OS_USERNAME=
        export OS_TENANT_NAME=
        export NOVA_RAX_AUTH=1
        export OS_PASSWORD=
        export OS_PROJECT_ID=
        export OS_NO_CACHE=1  
    

    Note: tenant_id is your account number Check if your configuration is correct:

        nova credentials
    

    OpenStack Flavors

    Virtual hardware templates are called "flavors" in OpenStack, defining sizes for RAM, disk, number of cores, and so on. To get a list of flavors use the command nova flavor-list

    OpenStack Images

    A virtual machine image is a single file which contains a virtual disk that has a bootable operating system installed on it. To get a list of images run the command nova image-list

    SSH keypair

    In order to access virtual machines a ssh keypair is required, you can create one using the command nova keypair-add keypairname

        nova keypair-add key_test >> key_file.pem
    

    Managing Servers

    List Servers

    We can use the command nova list to get a list of all the servers that we have, at the beginning this list will be empty.

    Create a server

    We want to create a 512MB Standard Instance (CODE 2) with Debian 7 Wheezy (CODE 06cbc0a2-a906-4e6a-8ed7-bd7c952c9f81). We will pass the ssh key as well in order to grant ssh access:

        nova boot testServer --image 06cbc0a2-a906-4e6a-8ed7-bd7c952c9f81 --flavor 2 --key-name key_test
    

    The result of this operation include a server id, we can check the status of this server with the command nova show server-id At the beginning the server status is BUILD, that means that the server is being built. Run this command till the status change to ACTIVE , this can take some minutes.

    SSH Access

    When the server change its status to ACTIVE you can access via ssh with the user root and public ip

        ssh root@public-ip -i yourkey.pem
    
        Linux debian 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u3 x86_64
    
        The programs included with the Debian GNU/Linux system are free software;
        the exact distribution terms for each program are described in the
        individual files in /usr/share/doc/*/copyright.
    
        Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
        permitted by applicable law.
        Last login: Thu Jan  1 00:00:10 1970
        root@testserver:~# 
    

    Note: This server is insecure for default, if you have plans to follow this guide to deploy a production server you must configure the security policies.

    Securing your server

    Change root password

        #passwd
    

    Adding an admin user:

        #adduser admin
    

    Add the user to the sudo group:

        # usermod -a -G sudo admin
    

    Update Apt

        #apt-get update
        #apt-get upgrade
    

    SSH Config

    Edit /etc/ssh/sshd_config and change the next values :

    
      Port 9999
      PermitRootLogin PermitRootLogin no
      PasswordAuthentication no
    
    

    Firewall config

    By default, all ports are open in debian systems, to verify that you can use the iptables command:

        admin@testserver:~$ sudo iptables -L
        Chain INPUT (policy ACCEPT)
        target     prot opt source               destination         
    
        Chain FORWARD (policy ACCEPT)
        target     prot opt source               destination         
    
        Chain OUTPUT (policy ACCEPT)
        target     prot opt source               destination  
    

    Grant access to ssh port:

      sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
      sudo iptables -I INPUT -p tcp --dport 9999 -m state --state NEW,ESTABLISHED -j ACCEPT  
    

    Grant http (port 80) access

    :
     sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
    

    Reject any other external traffic

     sudo iptables -A INPUT -j DROP
    

    Allow internal traffic:

      iptables -I INPUT 1 -i lo -j ACCEPT  
    

    Verify iptables config:

        admin@testserver:~$ sudo iptables -L -v
    '
    '   Chain INPUT (policy ACCEPT 0 packets, 0 bytes)'    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
        pkts bytes target     prot opt in     out     source               destination         
           15  1148 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp dpt:9999 state NEW,ESTABLISHED
            6   568 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp dpt:9999 state NEW,ESTABLISHED
            0     0 ACCEPT     all  --  any    any     anywhere             anywhere             state RELATED,ESTABLISHED
            0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp dpt:http
            1    44 DROP       all  --  any    any     anywhere             anywhere            
    
        Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
         pkts bytes target     prot opt in     out     source               destination         
    
        Chain OUTPUT (policy ACCEPT 3 packets, 348 bytes)
         pkts bytes target     prot opt in     out     source               destination
    

    Save your rules to the iptables.rules file in the /etc directory

        #iptables-save > /etc/iptables.rules
    

    Edit or create the file /etc/network/if-pre-up.d/iptaload in order to create a service that applies the rules at server start-up and add the following lines:

    
        #!/bin/sh
        iptables-restore < /etc/iptables.rules
        exit 0
    

    Give execution permit to the file:

      sudo chmod +x /etc/network/if-pre-up.d/iptaload
    

    Restart the server and verify configuration with sudo iptables -L

    References

    Amazon AWS EC2 Introduction

    Amazon Web Services (AWS) is a collection of remote computing services (also called web services) that together make up a cloud computing platform, offered over the Internet by Amazon.com. Because AWS is huge and complex, is not possible cover all the topics on an simple post, so we will cover here the basis of Amazon EC2.

    Open a free account

    This post assumes that you have an amazon account. If you don't have one, you can create one for free with the instructions on the video Create an AWS account

    Getting your access key ID and secret access key

    Is not a good practice use you account keys to access you resources because this key has access to your bills information, unlimited resources, etc. Is better create another user with limited access with diferent keys. The article IAM Best Practices contains a list of recommended practices to avoid security issues with your keys.

    Create individual IAM users

    Inside your dashboard go to users options click Create new users button , write the username and click. AWS will create a new AIM user with a new keypair. Click download keys and store on a secure location.

    Installing and configuring AWS Command Line Interface

    AWS-cli is a command line interface written in python that is used to send commands and get information from AWS. As any standard package from python can be installed via pip:

      pip install awscli
    

    You can test if the installation was successfull executing the next command:

        aws help
    

    Configuring the AWS Command Line Interface

    You can setup the environment variables – AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY or you can run awscli configure:

        $aws configure
    
        AWS Access Key ID [None]: IGMAOKYKYYLNEDIEYDF
        AWS Secret Access Key [None]: ASDA=JSADFJJJ;Jj;kdfu=asdf
        Default region name [None]: us-west-1
        Default output format [None]: json
    

    Amazon Elastic Compute Cloud EC2

    (EC2) provides scalable virtual private servers using Xen. The user can create by demand one or many virtual servers that can be deleted by demand as well. Resources for an instance such as disk, network interfaces, cpu, memory can be changed at any moment so it is possible scale an application at low cost. All the physical and infraestructure is run by Amazon, so as user you don't have to take care of aspects as physical security, maintenance, etc.

    The command aws ec2 help show you all the options that you have to manage ec2 instances.

    Listing EC2 Regions

      $ aws ec2 describe-regions
    
    
        {
            "Regions": [
                {
                    "Endpoint": "ec2.eu-west-1.amazonaws.com", 
                    "RegionName": "eu-west-1"
                }, 
                {
                    "Endpoint": "ec2.sa-east-1.amazonaws.com", 
                    "RegionName": "sa-east-1"
                }, 
                {
                    "Endpoint": "ec2.us-east-1.amazonaws.com", 
                    "RegionName": "us-east-1"
                }, 
                {
                    "Endpoint": "ec2.ap-northeast-1.amazonaws.com", 
                    "RegionName": "ap-northeast-1"
                }, 
                {
                    "Endpoint": "ec2.us-west-2.amazonaws.com", 
                    "RegionName": "us-west-2"
                }, 
                {
                    "Endpoint": "ec2.us-west-1.amazonaws.com", 
                    "RegionName": "us-west-1"
                }, 
                {
                    "Endpoint": "ec2.ap-southeast-1.amazonaws.com", 
                    "RegionName": "ap-southeast-1"
                }, 
                {
                    "Endpoint": "ec2.ap-southeast-2.amazonaws.com", 
                    "RegionName": "ap-southeast-2"
                }
            ]
        }  
    

    Security Groups

    A security group acts like a virtual Firewall that control access to one or more instances, you can define custom rules to manage traffic associated to every instance and associate one or more users to one or more security groups.

    Listing Security Groups

    You can use the command aws ec2 describe-security-groups to get a list of your security groups:

    
      $ aws ec2 describe-security-groups
        {
            "SecurityGroups": [
                {
                    "IpPermissionsEgress": [
                        {
                            "IpProtocol": "-1", 
                            "IpRanges": [
                                {
                                    "CidrIp": "0.0.0.0/0"
                                }
                            ], 
                            "UserIdGroupPairs": []
                        }
                    ], 
                    "Description": "default VPC security group", 
                    "IpPermissions": [
                        {
                            "IpProtocol": "-1", 
                            "IpRanges": [], 
                            "UserIdGroupPairs": [
                                {
                                    "UserId": "495087967", 
                                    "GroupId": "sg-85u5mch"
                                }
                            ]
                        }
                    ], 
                    "GroupName": "default", 
                    "VpcId": "vpc-34985c", 
                    "OwnerId": "098765454", 
                    "GroupId": "sg-adfa3545"
                }
            ]
        }  
    
    
    

    Creating Security Groups

    We wil create a security group named security_test to do that we use the command aws ec2 create-security-group

        $ aws ec2 create-security-group --group-nam "security_test" --description "Security test for blog"
    
        {
            "return": "true", 
            "GroupId": "sg-fdasd"
        }
    

    Giving ssh access to groups

    We will want to access ec2 instances via ssh, to do that, we must add ssh access to the security group associated with an instance. We can use the command aws ec2 authorize-security-group-ingress

         $aws ec2 authorize-security-group-ingress --group-name security_test --protocol tcp --port 22 --cidr 0.0.0.0/0
    
      {
        "return": "true"
      }
    
    

    Note: 0.0.0.0/0 address is ok for short time testing porposal but unnaceptable for production.

    SSH keypair

    In order to access ec2 instances a ssh keypair is required, you can create one using the command aws ec2 create-key-pair

        $aws ec2 create-key-pair --key-name manuel_dev_key --query 'KeyMaterial' --output text > manuel_key.pem
    
    

    Amazon Machine Images (AMIs)

    An AMI is a template that contains a software configuration (Operative System, Applications, etc.). From an AIM you can create an instance wich is a running copy of this template. From one AMI can be created one or more Instances.

    Amazon Linux AMIs

    Amazon Linux is Linux flavor pretty similar to Red Hat/ CentOs with the advantange that is developed and maintained by Amazon, so if you choose one of this AMI you will get for free:

    • Security updates relased by Amazon
    • Drivers tunned for optimal performance inside Amazon enviroment
    • AWS support

    We will choose for this post an AMI ami-a8d3d4ed which is available only for us-west-1 region . Here a link where you can find a list of Amazon Linux AMIs for every region .

    You can see the ami list using the aws ec2 describe-images command:

      $aws ec2 describe-images --image-ids '["ami-a8d3d4ed"]'
        {
            "Images": [
                {
                    "VirtualizationType": "paravirtual", 
                    "Name": "amzn-ami-pv-2014.03.2.x86_64-ebs", 
                    "Hypervisor": "xen", 
                    "ImageOwnerAlias": "amazon", 
                    "ImageId": "ami-a8d3d4ed", 
                    "RootDeviceType": "ebs", 
                    "State": "available", 
                    "BlockDeviceMappings": [
                        {
                            "DeviceName": "/dev/sda1", 
                            "Ebs": {
                                "DeleteOnTermination": true, 
                                "SnapshotId": "snap-d6bea21a", 
                                "VolumeSize": 8, 
                                "VolumeType": "standard", 
                                "Encrypted": false
                            }
                        }
                    ], 
                    "Architecture": "x86_64", 
                    "ImageLocation": "amazon/amzn-ami-pv-2014.03.2.x86_64-ebs", 
                    "KernelId": "aki-880531cd", 
                    "OwnerId": "137112412989", 
                    "RootDeviceName": "/dev/sda1", 
                    "Public": true, 
                    "ImageType": "machine", 
                    "Description": "Amazon Linux AMI x86_64 PV EBS"
                }
            ]
        }  
    
    
    

    Listing EC2 Instances

    We can query the list of instances for account/user usign the next command:

        $aws ec2 describe-instances
    
        {
            "Reservations": []
        }  
    

    Note: Because we haven't create any instance yet, the returned list is empty.

    Runing EC2 Instances

    The command aws ec2 run-instances has a lot of parameters that won't be covered on this post. To run an instance you will need some basic params:

      $aws ec2 run-instances --image-id ami-a8d3d4ed --instance-type t1.micro --key-name manuel_dev_key --security-groups security_test 
    

    Connecting to EC2 instances via ssh

    Run the command aws ec2 describe-instance check the public url for your instace in the value PublicDnsName . Now you can use ssh with your key and the ec2-user

      ssh -i ./manuel_key.pem ec2-user@ec2-856-3632-56665-fdad.us-west-1.compute.amazonaws.com  
    
        Last login: Sat Aug 30 16:15:27 2014 from xx.xxx
    
           __|  __|_  )
           _|  (     /   Amazon Linux AMI
          ___|\___|___|
    
    

    Terminanting an instance

    Use the command aws ec2 terminate-instances passing the instance id as a param

    
      $ aws ec2 terminate-instances --instance-ids i-3f522861
     
        {
            "TerminatingInstances": [
                {
                    "InstanceId": "i-3f522861", 
                    "CurrentState": {
                        "Code": 32, 
                        "Name": "shutting-down"
                    }, 
                    "PreviousState": {
                        "Code": 16, 
                        "Name": "running"
                    }
                }
            ]
        }  
    
    

    References

    Development Django Box with vagrant and Ubuntu

    Vagrant is an open source tool for virtual development enviroments. In this guide we will create an Ubuntu Virtual machine for django development that will be provisioned with chef

    VirtualBox Instalation

    Prefered virtualization system for vagrant is VirtualBox. It works with another virtualizations tools, you can refer oficial documentation for more information. If you don't have already Virtualbox installed at your local box you can install the package using your package manager:

        $sudo apt-get install virtualbox
    

    Installing Vagrant

    My sugestion is not install from package manager, vagrant is a tool that release new version very quickly and always is better get the last version for the oficial website. So go to http://www.vagrantup.com/downloads.html . Choose the version for your system, download it and install it, in my case, i use ubuntu 64 bits:

      $sudo dpkg -i vagrant_1.6.3_x86_64.deb  
    

    Adding a box

    Before create an enviroment you need install in your local machine a vagrant box. You can browse Vagrant's Cloud website to find more boxes. For this post we will use an Ubuntu box:

        $ vagrant box add hashicorp/precise64
    
        Vagrant is upgrading some internal state for the latest version.
        Please do not quit Vagrant at this time. While upgrading, Vagrant
        will need to copy all your boxes, so it will use a considerable
        amount of disk space. After it is done upgrading, the temporary disk
        space will be freed.
    
        Press ctrl-c now to exit if you want to remove some boxes or free
        up some disk space.
    
        Press any other key to continue.
        ==> box: Loading metadata for box 'hashicorp/precise64'
            box: URL: https://vagrantcloud.com/hashicorp/precise64
        This box can work with multiple providers! The providers that it
        can work with are listed below. Please review the list and choose
        the provider you will be working with.
    
        1) hyperv
        2) virtualbox
        3) vmware_fusion
        Enter your choice: 2
        ==> box: Adding box 'hashicorp/precise64' (v1.1.0) for provider: virtualbox
            box: Downloading: https://vagrantcloud.com/hashicorp/precise64/version/2/provider/virtualbox.box
        ==> box: Successfully added box 'hashicorp/precise64' (v1.1.0) for 'virtualbox'!
    

    Create an Ubuntu box Development Environment

    You can create a new box in seconds with the command vagrant init and then tell vagrant what kind of box do you want.

        $vagrant init
    
        A `Vagrantfile` has been placed in this directory. You are now    
        ready to `vagrant up` your first virtual environment! Please read
        the comments in the Vagrantfile as well as documentation on
        `vagrantup.com` for more information on using Vagrant.
    

    This commands creats a file named Vagrantfile. We should edit this file in order to setup our virtualbox according of our needs, so the first step is tell vagrant that we want and ubuntu box editing this file, find the option config.vm.box and editing according the box you want:

        config.vm.box = "hashicorp/precise64"
    

    Up and SSh

        $vagrant up
    
        Bringing machine 'default' up with 'virtualbox' provider...
        ==> default: Importing base box 'hashicorp/precise64'...
        ==> default: Matching MAC address for NAT networking...
        ==> default: Checking if box 'hashicorp/precise64' is up to date...
        ==> default: Setting the name of the VM: vagrant_default_1407584461236_45519
        ==> default: Clearing any previously set network interfaces...
        ==> default: Preparing network interfaces based on configuration...
            default: Adapter 1: nat
        ==> default: Forwarding ports...
            default: 22 => 2222 (adapter 1)
        ==> default: Booting VM...
        ==> default: Waiting for machine to boot. This may take a few minutes...
            default: SSH address: 127.0.0.1:2222
            default: SSH username: vagrant
            default: SSH auth method: private key
        ==> default: Machine booted and ready!
        ==> default: Checking for guest additions in VM...
            default: The guest additions on this VM do not match the installed version of
            default: VirtualBox! In most cases this is fine, but in rare cases it can
            default: prevent things such as shared folders from working properly. If you see
            default: shared folder errors, please make sure the guest additions within the
            default: virtual machine match the version of VirtualBox you have installed on
            default: your host and reload your VM.
            default: 
            default: Guest Additions Version: 4.2.0
            default: VirtualBox Version: 4.3
        ==> default: Mounting shared folders...
            default: /vagrant => /home/manuel/src/notempo/ntchef/vagrant
    
        $vagrant ssh
        Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-generic x86_64)
    
         * Documentation:  https://help.ubuntu.com/
        Welcome to your Vagrant-built virtual machine.
        Last login: Fri Sep 14 06:23:18 2012 from 10.0.2.2
    

    Synced Folders

    You can tell vagrant which or your local folders do you want to by synchronized on your virtual machine. In this example we will setup the local directory ~/src/django whre will be a simple django project. In order to setup this directory you have to stop your localbox with vagrant halt and edit Vagrantfile . Find the option config.vm.synced_folder and mofidy it, the first argument is where your local directory is and the second where will be synchronized on the local box:

          config.vm.synced_folder "~/src/django", "/vagrant_data"
    

    Provisioning

    Above the list of tools/frameworks/libraries that we need to install in order to setup a django dev box:

    • Python
    • Python virtual env with django framework and postgresql driver
    • Postgresql db
    • Supervisor

    Django vagrant chef

    Chef is a systems and cloud infrastructure automation framework. You can find an introduction at this link.. We will use chef to provisioning our virtualbox, so everytime that the machine starts, chef will be executed and if there is a change in the infraestructre, cheff will update our box automatically.

    Django vagrant chef is a chef project that contains cookboks to setup an ubuntu machine for django development with django/postgresql. You can find the documentation here

        $cd ~/src
        $git clone https://github.com/maigfrga/django-vagrant-chef.git
    
    
    Django vagrant chef Attributes

    Python attributes:

        ['python']['install_method'] = 'package'
        ['python']['venv_dir'] = '/svr/env/'
    

    Postgresql attributes

        ['django_postgresql']['database'] = ''
        ['django_postgresql']['user'] = ''
        ['django_postgresql']['password'] = ''
    
    

    Packages to be installed on the virtualenv

    ['django']['python']['packages'] = ["ipdb", "http2"]
    

    Django app details

        ['django']['app']['project_home'] = '/vagrant_data/project'
        ['django']['app']['settings'] = 'project.settings'
        ['django']['app']['port'] = '8080'
    

    Supervisor

        ['supervisor']['inet_username'] = ''
        ['supervisor']['inet_password'] = ''
    

    Configuring vagrant to use Django vagrant chef

    Install omnibus plugin that find chef dependencies

      $vagrant plugin install vagrant-omnibus  
      $vagrant plugin install vagrant-librarian-chef
    

    Edit Vagrantfile, setup the location of django-vagran-chef project, and tell vagrant which recipes should be executed:

      config.omnibus.chef_version = :latest
      config.librarian_chef.cheffile_dir = '~/src/django-vagrant-chef'
      config.vm.provision "chef_solo" do |chef|
        chef.cookbooks_path = ["~/src/django-vagrant-chef/cookbooks", "~/src/django-vagrant-chef/site-cookbooks"]
        chef.roles_path = "~/src/django-vagrant-chef/roles"
        chef.data_bags_path = "~/src/django-vagrant-chef/data_bags"
        chef.add_role('django')
          chef.json = {
            "postgresql" => {
              "password" => {
                "postgres" => "iloverandompasswordsbutthiswilldo"
              }
            },
            "python" => {
                "install_method" => "package",
                "venv_dir" => "/svr/env/",
            },
            "django" => {
                "python" => {
                    "packages" => ["ipdb", "http2"]
                },
                "app" => {
                    "project_home" => "/vagrant_data/project",
                    "settings" => "project.settings",
                    "port" => "8080"
    
               }
            },
            "supervisor" => {
                "inet_username" => "manuel",
                "inet_password" => "manuel"
            }
          }
      end
    

    Networking

    We will setup a custom ip address for the vagrant box and fordwar the port 80 from our localhost to the port 8080 where the django app is running at the vagrant box. All the docs about vagrant networking at this link

    Configuring a private network address

        config.vm.network "private_network", ip: "192.168.33.10"
    

    Reloading box

    Every time that you start the vagrant box with vagrant up command, configuration changes are applied, if the vagrant box is already up you can reload configuration with vagran reload --provision

    At this link Vagrant file used in this guide

    References

    Chef Server Introduction

    Chef is a system and cloud infrastructure automation framework that makes it easy to deploy servers and applications to any physical, virtual, or cloud location. This guide is a simple introduction of the key concepts of chef. Extensive documentation can be found at this link

    Chef Components

    Nodes

    A node is any physical, virtual, or cloud machine that is configured to be maintained by a chef-client.

    Workstations

    A workstation is a computer that is configured to run Knife, to synchronize with the chef-repo, and interact with a single Chef server. Common task developed on a workstation are:

    • Developing cookbooks and recipes.
    • Keeping the chef-repo synchronized with version source control.
    • Using Knife to upload items from the chef-repo to the Chef server.
    • Configuring organizational policy, including defining roles and environments and ensuring that critical data is stored in data bags.
    • Interacting with nodes.

    Knife

    Knife is a command-line tool that provides an interface between a local chef-repo and the Chef server. Knife helps users to manage:

    • Nodes.
    • Cookbooks and recipes.
    • Roles.
    • Stores of JSON data (data bags), including encrypted data.
    • Environments.
    • Cloud resources, including provisioning.
    • The installation of the chef-client on management workstations.
    • Searching of indexed data on the Chef server.

    Repository

    The chef-repo is the location in which the following data objects are stored:

    • Cookbooks (including recipes, versions, cookbook attributes, resources, providers, libraries, and templates).
    • Roles.
    • Data bags.
    • Environments.
    • Configuration files (for clients, workstations, and servers).

    The Hosted Server

    The Chef server stores cookbooks, the policies that are applied to nodes, and metadata that describes each registered node that is being managed by the chef-client. Nodes use the chef-client to ask the Chef server for configuration details, such as recipes, templates, and file distributions. The chef-client then does as much of the configuration work as possible on the nodes themselves (and not on the Chef server).

    Cookbooks

    A cookbook is the fundamental unit of configuration and policy distribution. Each cookbook defines a scenario and contains all of the components that are required to support that scenario, including:

    • Attribute values that are set on nodes.
    • Definitions that allow the creation of reusable collections of resources.
    • File distributions.
    • Libraries that extend the chef-client and/or provide helpers to Ruby code.
    • Recipes that specify which resources to manage and the order in which those resources will be applied.
    • Custom resources and providers.
    • Templates.
    • Versions.
    • Metadata about recipes (including dependencies), version constraints, supported platforms, and so on.

    Chef Example

    In this example we use Linux Centos 6.5 as chef server. We will configure an ubuntu server as workstation and then we will deploy a c compiler on another ubuntu.

    Basic Network Configuration

    Hostname ip
    chef-server 10.0.0.10
    station1 10.0.0.20
    db-node 10.0.0.30

    Chef Server Installation

    Go to http://www.getchef.com/chef/install/ . Select the platform, and version. Download the rpm file and install it.

      $wget    https://opscode-omnibus-packages.s3.amazonaws.com/el/6/x86_64/chef-server-11.1.3-1.el6.x86_64.rpm
      $sudo rpm -Uvh chef-server-11.1.3-1.el6.x86_64.rpm
    

    Configure server

     $sudo chef-server-ctl reconfigure
    

    Chef workstation Installation

    Go to http://www.getchef.com/chef/install/ . Select download chef-client, choose the platform and version. Follow the instructions.

      $curl -L https://www.opscode.com/chef/install.sh | sudo bash  
    

    Verify installation:

        $chef-client -v
        
    

    Setting up the chef Repo

    Install git
        $sudo apt-get install git
    
    Clone chef Repo
      $git clone git://github.com/opscode/chef-repo.git  
    
    Create .chef Directory

    The .chef directory is used to store three files: knife.rb, ORGANIZATION-validator.pem and USER.pem.

    Where ORGANIZATION and USER represent strings that are unique to each organization. These files must be present in the .chef directory in order for a workstation to be able to connect to a Chef server. To create the .chef directory:

      $sudo mkdir -p ~/chef-repo/.chef  
    

    Add .chef to the .gitignore file to prevent uploading the contents of the .chef folder to github. For example:

     $echo '.chef' >> ~/chef-repo/.gitignore
    
    Get Config Files

    Copy the files /etc/chef-server/chef-validator.pem /etc/chef-server/admin.pem from chef-server to station1 on a secure location and execute the command knife configure --initial . this command will ask for the chef-validator.pem file location.

        $ knife configure --initial
        Overwrite /home/manuel/.chef/knife.rb? (Y/N)y
        Please enter the chef server URL: [https://station1:443] https://chef-server:443
        Please enter a name for the new user: [manuel] 
        Please enter the existing admin name: [admin] 
        Please enter the location of the existing admin's private key: [/etc/chef-server/admin.pem] 
        Please enter the validation clientname: [chef-validator] 
        Please enter the location of the validation key: [/etc/chef-server/chef-validator.pem] 
        Please enter the path to a chef repository (or leave blank): 
        Creating initial API user...
        Please enter a password for the new user: 
        Created user[manuel]
        Configuration file written to /home/manuel/.chef/knife.rb
        
    
    Add Ruby to the $PATH environment variable
        $echo 'export PATH="/opt/chef/embedded/bin:$PATH"' >> ~/.bash_profile && source ~/.bash_profile
        
    
    Verify the chef-client install
        cd ~/chef-repo
        knife client list
        
    

    Bootstrap a Node

    A node is any physical, virtual, or cloud machine that is configured to be maintained by a chef-client. There are two ways to install the chef-client on a node so that it may be maintained by the chef-client:

    • Use the knife bootstrap subcommand to bootstrap a node using the omnibus installer.
    • Use an unattended install to bootstrap a node from itself, without using SSH.

    This post will cover the knife bootstrap install opion

    The knife bootstrap command is a common way to install the chef-client on a node. The default for this approach assumes that node can access the Chef website so that it may download the chef-client package from that location.

    In this post we will bootsrap an ubuntu and setup a c compiler.

    Run the bootstrap command

    The knife bootstrap command is used to SSH into the target machine, and then do what is needed to allow the chef-client to run on the node. It will install the chef-client executable (if necessary), generate keys, and register the node with the Chef server. To setup the chef node execute the next command:

        $ knife bootstrap db-node -x username -P password --sudo
    

    Verify the installation:

        $knife client show db-node
    

    Opscode C compilere Cookbook

    Opscode Build Essential is a github repository created by opscode team that provides out of the box c compiler provisioning. the first step is clone the cookbook and dependences repos on the workstation:

        $mkdir ~/cookbooks
        $cd ~/cookbooks
        $git clone https://github.com/opscode-cookbooks/apt.git
        $git clone https://github.com/opscode-cookbooks/build-essential.git
        $git clone https://github.com/opscode-cookbooks/openssl.git
        $git clone https://github.com/sethvargo/chef-sugar.git
    

    edit .chef/knife.rb and add the location of the cookbooks folder:

      cookbook_path  ['/home/manuel/chef-repo/cookbooks/', '/home/manuel/cookbooks']
    

    Upload cookbooks to the chef-server:

        $knife cookbook upload apt
        $knife cookbook upload build-essential
        $knife cookbook upload chef-sugar
        $knife cookbook upload openssl
    

    Adding a recipe to the node run_list

    We will add the c compiler recipe build-essential to the node:

        $knife node run_list add  db-node  recipe[build-essential::default]
        db-node:
          run_list: recipe[build-essential::default]
    
        $knife node show db-node
    
    

    After that we must login db-node to run the updated run-list, first at all, we will check that the system doesn't have installed the c compiler:

        $ gcc
        The program 'gcc' is currently not installed. You can install it by typing:
        sudo apt-get install gcc
    

    Now we can execute chef-client command that will connect to chef server and update node state:

        $sudo chef-client
    

    After the execution we can check if the c compier was installed:

        $gcc -v
        Using built-in specs.
        COLLECT_GCC=gcc
        COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
        Target: x86_64-linux-gnu
        Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.2-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
        Thread model: posix
        gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
    

    OpenStack Introduction for Ubuntu Part IV

    This is the fourth Post about Open Stack Introduction. The first part of this post series is about general concepts, basic configuration and Identity service installation , the second part one continue with Image and Compute services., the third part is about about dashboard and block storage configuration.. This part is about Object Storage service. At this link a complete installation guide can be found.

    Add Object Storage

    Object Storage service

    The Object Storage service is a storage system for large amounts of unstructured data through a RESTful HTTP API. It includes the following components:

    • Proxy servers (swift-proxy-server). Accepts Object Storage API and raw HTTP requests to upload files, modify metadata, and create containers.
    • Account servers (swift-account-server). Manage accounts defined with the Object Storage service
    • Container servers (swift-container-server). Manage a mapping of containers, or folders, within the Object Storage service.
    • Object servers (swift-object-server). Manage actual objects, such as files, on the storage nodes.
    • Periodic process for general maintenance tasks (auditors, updaters, reapers)

    Systems Requirements

    At this guide we won't think about this, but for a real production environment You need to study carefully the system requirements defined at this link.

    Plan networking for Object Storage

    This network will have one proxy and 3 storage nodes with the next ip addresses:

        192.168.0.13    swift-proxy
        192.168.0.14    storage1
        192.168.0.15    storage2
        192.168.0.16    storage3
    

    The param STORAGE_LOCAL_NET_IP is the local ip address of every storage node.

    Another network options can be found at this link

    Example Object Storage installation architecture

    • Node: A host machine that runs one or more OpenStack Object Storage services.
    • Proxy node: Runs Proxy services.
    • Storage node: Runs Account, Container, and Object services.
    • Ring: A set of mappings between OpenStack Object Storage data to physical devices.
    • Replica: A copy of an object. By default, three copies are maintained in the cluster.
    • Zone: A logically separate section of the cluster, related to independent failure characteristics.

    Note: for this guide we will install one proxy node which runs the swift-proxy-server processes and three storage nodes that run the swift-account-server, swift-container-server, and swift-object-server processes which control storage of the account databases, the container databases, as well as the actual stored objects.

    Edit /etc/hosts in all nodes (controller, block1 , storage1...)

        127.0.0.1       localhost
        192.168.0.10    controller
        192.168.0.11    compute1
        192.168.0.12    block1
        192.168.0.13    swift-proxy
        192.168.0.14    storage1
        192.168.0.15    storage2
        192.168.0.16    storage3
    

    Edit /etc/hostname and set hostname to swift-proxy, storage1, storage2, storage3

    General Installation steps

    Add Open Stack repositories

       # apt-get install python-software-properties
       # add-apt-repository cloud-archive:havana 
       # apt-get update && apt-get dist-upgrade
       # reboot
    

    Create a swift user that the Object Storage Service can use to authenticate with the Identity Service. Execute the next commands in the controller node:

       # keystone user-create --name=swift --pass=SWIFT_PASS \
      --email=swift@example.com
      # keystone user-role-add --user=swift --tenant=service --role=admin 
    
        +----------+----------------------------------+
        | Property |              Value               |
        +----------+----------------------------------+
        |  email   |        swift@example.com         |
        | enabled  |               True               |
        |    id    | b64f304b791d485ea960d8a0296bb63d |
        |   name   |              swift               |
        +----------+----------------------------------+
    
    

    Create a service entry for the Object Storage Service:

       # keystone service-create --name=swift --type=object-store \
      --description="Object Storage Service" 
    
        +-------------+----------------------------------+
        |   Property  |              Value               |
        +-------------+----------------------------------+
        | description |      Object Storage Service      |
        |      id     | e54246ea9eb64d5ca002cbf5481dd5eb |
        |     name    |              swift               |
        |     type    |           object-store           |
        +-------------+----------------------------------+
    
    

    Specify an API endpoint for the Object Storage Service by using the returned service ID.

    # keystone endpoint-create \
      --service-id=the_service_id_above \
      --publicurl='http://swift-proxy:8080/v1/AUTH_%(tenant_id)s' \
      --internalurl='http://swift-proxy:8080/v1/AUTH_%(tenant_id)s' \
      --adminurl=http://swift-proxy:8080
    
        +-------------+----------------------------------------------+
        |   Property  |                    Value                     |
        +-------------+----------------------------------------------+
        |   adminurl  |            http://controller:8080            |
        |      id     |       e8f5a0654b7f4af6a9c2357250dd5c63       |
        | internalurl | http://controller:8080/v1/AUTH_%(tenant_id)s |
        |  publicurl  | http://controller:8080/v1/AUTH_%(tenant_id)s |
        |    region   |                  regionOne                   |
        |  service_id |       e54246ea9eb64d5ca002cbf5481dd5eb       |
        +-------------+----------------------------------------------+
    
    

    Create the configuration directory on all swift nodes:

       # mkdir -p /etc/swift 
    

    Create /etc/swift/swift.conf on all nodes:

        [swift-hash]
        # random unique string that can never change (DO NOT LOSE)
        swift_hash_path_suffix = afLIeftgibit
    

    Note: The suffix value in /etc/swift/swift.conf should be set to some random string of text to be used as a salt when hashing to determine mappings in the ring. This file must be the same on every node in the cluster!

    Install and configure Storage nodes

    Install Storage node packages:

       # apt-get install swift swift-account swift-container swift-object xfsprogs 
    

    For each device on the node that you want to use for storage, set up the XFS volume (/dev/sdb is used as an example). Use a single partition per drive.

        # fdisk /dev/sdb
        # mkfs.xfs /dev/sdb1
        # echo "/dev/sdb1 /srv/node/sdb1 xfs noatime,nodiratime,nobarrier,logbufs=8 0 0" >> /etc/fstab
        # mkdir -p /srv/node/sdb1
        # mount /srv/node/sdb1
        # chown -R swift:swift /srv/node
    

    Create /etc/rsyncd.conf:

        uid = swift
        gid = swift
        log file = /var/log/rsyncd.log
        pid file = /var/run/rsyncd.pid
        address = STORAGE_LOCAL_NET_IP
         
        [account]
        max connections = 2
        path = /srv/node/
        read only = false
        lock file = /var/lock/account.lock
         
        [container]
        max connections = 2
        path = /srv/node/
        read only = false
        lock file = /var/lock/container.lock
         
        [object]
        max connections = 2
        path = /srv/node/
        read only = false
        lock file = /var/lock/object.lock
    

    Edit the following line in /etc/default/rsync:

        RSYNC_ENABLE=true
    

    Start the rsync service:

       # service rsync start 
       # mkdir -p /var/swift/recon
      # chown -R swift:swift /var/swift/recon
    
    

    Install and configure the proxy node

    The proxy server takes each request and looks up locations for the account, container, or object and routes the requests correctly. The proxy server also handles API requests. You enable account management by configuring it in the /etc/swift/proxy-server.conf file.

    Install swift-proxy service:

       # apt-get install swift-proxy memcached python-keystoneclient python-swiftclient python-webob 
    

    Modify memcached to listen on the default interface on a local, non-public network. Edit this line in the /etc/memcached.conf file: change

       -l 127.0.0.1 
    

    to:

       -l PROXY_LOCAL_NET_IP 
    

    Restart the memcached service:

       # service memcached restart 
    

    Create /etc/swift/proxy-server.conf:

       [DEFAULT]
        bind_port = 8080
        user = swift
         
        [pipeline:main]
        pipeline = healthcheck cache authtoken keystoneauth proxy-server
         
        [app:proxy-server]
        use = egg:swift#proxy
        allow_account_management = true
        account_autocreate = true
         
        [filter:keystoneauth]
        use = egg:swift#keystoneauth
        operator_roles = Member,admin,swiftoperator
         
        [filter:authtoken]
        paste.filter_factory = keystoneclient.middleware.auth_token:filter_factory
         
        # Delaying the auth decision is required to support token-less
        # usage for anonymous referrers ('.r:*').
        delay_auth_decision = true
         
        # cache directory for signing certificate
        signing_dir = /home/swift/keystone-signing
         
        # auth_* settings refer to the Keystone server
        auth_protocol = http
        auth_host = controller
        auth_port = 35357
         
        # the service tenant and swift username and password created in Keystone
        admin_tenant_name = service
        admin_user = swift
        admin_password = SWIFT_PASS
         
        [filter:cache]
        use = egg:swift#memcache
        memcache_servers = PROXY_LOCAL_NET_IP
    
        [filter:catch_errors]
        use = egg:swift#catch_errors
         
        [filter:healthcheck]
        use = egg:swift#healthcheck 
    

    Create the account, container, and object rings. The builder command creates a builder file with a few parameters.

        # cd /etc/swift
        # swift-ring-builder account.builder create 18 3 1
        # swift-ring-builder container.builder create 18 3 1
        # swift-ring-builder object.builder create 18 3 1 
    

    The parameter with the value of 18 represents 2 ^ 18th, the value that the partition is sized to. Set this “partition power” value based on the total amount of storage you expect your entire ring to use. The value 3 represents the number of replicas of each object, with the last value being the number of hours to restrict moving a partition more than once.

    For every storage device on each node add entries to each ring:

        # swift-ring-builder account.builder add zZONE-STORAGE_LOCAL_NET_IP:6002[RSTORAGE_REPLICATION_NET_IP:6005]/DEVICE 100
        # swift-ring-builder container.builder add zZONE-STORAGE_LOCAL_NET_IP_1:6001[RSTORAGE_REPLICATION_NET_IP:6004]/DEVICE 100
        # swift-ring-builder object.builder add zZONE-STORAGE_LOCAL_NET_IP_1:6000[RSTORAGE_REPLICATION_NET_IP:6003]/DEVICE 100
    

    In this guide. every storage node has a partition in Zone 1 . Their IP adress are 192.168.0.14, 192.168.0.15 and 192.168.0.16 without replication network. The mount point of this partition is /srv/node/sdb1, and the path in /etc/rsyncd.conf is /srv/node/, the DEVICE would be sdb1 and the commands are:

    
        # swift-ring-builder account.builder add z1-192.168.0.14:6002/sdb1 100
        # swift-ring-builder container.builder add z1-192.168.0.14:6001/sdb1 100
        # swift-ring-builder object.builder add z1-192.168.0.14:6000/sdb1 100
    
        # swift-ring-builder account.builder add z1-192.168.0.15:6002/sdb1 100
        # swift-ring-builder container.builder add z1-192.168.0.15:6001/sdb1 100
        # swift-ring-builder object.builder add z1-192.168.0.15:6000/sdb1 100
    
        # swift-ring-builder account.builder add z1-192.168.0.16:6002/sdb1 100
        # swift-ring-builder container.builder add z1-192.168.0.16:6001/sdb1 100
        # swift-ring-builder object.builder add z1-192.168.0.16:6000/sdb1 100
    

    Verify the ring contents for each ring:

       # cd /etc/swift/
    
       # swift-ring-builder account.builder
       # swift-ring-builder container.builder
       # swift-ring-builder object.builder 
    
        account.builder, build version 1
        262144 partitions, 3.000000 replicas, 1 regions, 1 zones, 1 devices, 100.00 balance
        The minimum number of hours before a partition can be reassigned is 1
        Devices:    id  region  zone      ip address  port  replication ip  replication port      name weight partitions balance meta
                     0       1     1    192.168.0.16  6002    192.168.0.16              6002      sdb1 100.00          0 -100.00 
    
    

    Rebalance the rings:

        # swift-ring-builder account.builder rebalance
        # swift-ring-builder container.builder rebalance
        # swift-ring-builder object.builder rebalance
    

    Copy the account.ring.gz, container.ring.gz, and object.ring.gz files to each of the Proxy and Storage nodes in /etc/swift.

    Make sure the swift user owns all configuration files:

       # chown -R swift:swift /etc/swift 
    

    Restart service:

        #service swift-proxy stop
        #service swift-proxy start 
    

    Start services on the storage nodes

        # swift-init all start
    

    Verify the installation

    You can run these commands from the proxy server or any server that has access to the Identity Service.

      #swift stat  
       Account: AUTH_fe13b472ad9e43e2aa8c71e7cc1c5f7c
    Containers: 0
       Objects: 0
         Bytes: 0
    Content-Type: text/plain; charset=utf-8
    X-Timestamp: 1402842465.82440
    X-Put-Timestamp: 1402842465.82440
        
    

    Create and upload a test.txt file

        $echo "this is a test" > test.txt
        $swift upload myfiles test.txt 
    
        $swift download myfiles  
    

    OpenStack Introduction for Ubuntu Part III

    This is the third Post about Open Stack Introduction. The first part of this post series is about general concepts, basic configuration and Identity service installation , the second part one continue with Image and Compute services. This part is about dashboard and block storage configuration. At this link a complete installation guide can be found.

    Install the Dashboard

    The dashboard is a web interface that enables cloud administrators and users to manage various OpenStack resources and services. The dashboard enables web-based interactions with the OpenStack Compute cloud controller through the OpenStack APIs.

    Install apache server and the dashboard on the node that can contact the Identity Service as root :

       # apt-get install apache2 libapache2-mod-wsgi  memcached  openstack-dashboard 
    
       # apt-get remove --purge openstack-dashboard-ubuntu-theme 
    

    Modify the value of CACHES['default']['LOCATION'] in /etc/openstack-dashboard/local_settings.py to match the ones set in /etc/memcached.conf

       CACHES = {
        'default': {
            'BACKEND' : 'django.core.cache.backends.memcached.MemcachedCache',
            'LOCATION' : '127.0.0.1:11211'
            }
        }
    

    Update the ALLOWED_HOSTS in local_settings.py to include the addresses you wish to access the dashboard from

       ALLOWED_HOSTS = ['localhost', 'my-desktop'] 
       OPENSTACK_HOST = "controller" 
    

    You can now access the dashboard at http://controller/horizon . Login with credentials for any user that you created with the OpenStack Identity Service.

    Add the Block Storage Service

    The OpenStack Block Storage Service works though the interaction of a series of daemon processes named cinder-* that reside persistently on the host machine or machines. You can run the binaries from a single node or across multiple nodes. You can also run them on the same node as other OpenStack services.

    Block Storage Service

    The Block Storage Service enables management of volumes, volume snapshots, and volume types. It includes the following components:

    • cinder-api: Accepts API requests and routes them to cinder-volume for action.
    • cinder-volume: Responds to requests to read from and write to the Block Storage database.
    • cinder-scheduler daemon: Picks the optimal block storage provider node on which to create the volume.
    • Messaging queue: Routes information between the Block Storage Service processes.

    Configure a Block Storage Service controller

    Install packages

       # apt-get install cinder-api cinder-scheduler 
    

    Configure Block Storage to use your MySQL database. Edit the /etc/cinder/cinder.conf file and add the following key under the [database] section

       [database]
        ...
        connection = mysql://cinder:CINDER_DBPASS@controller/cinder 
    

    Create database and user

    # mysql -u root -p mysql> CREATE DATABASE cinder; mysql> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost' \ IDENTIFIED BY 'CINDER_DBPASS'; mysql> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'%' \ IDENTIFIED BY 'CINDER_DBPASS';

    Create tables

       # cinder-manage db sync 
    

    Create a keystone cinder user

       # keystone user-create --name=cinder --pass=CINDER_PASS --email=cinder@example.com
        +----------+----------------------------------+
        | Property |              Value               |
        +----------+----------------------------------+
        |  email   |        cinder@example.com        |
        | enabled  |               True               |
        |    id    | ff2562ccad9d490aaf214e5e5e063186 |
        |   name   |              cinder              |
        +----------+----------------------------------+
       
       # keystone user-role-add --user=cinder --tenant=service --role=admin 
    

    Add the credentials to the file /etc/cinder/api-paste.ini on the section [filter:authtoken]

        [filter:authtoken]
        paste.filter_factory=keystoneclient.middleware.auth_token:filter_factory
        auth_host=controller
        auth_port = 35357
        auth_protocol = http
        auth_uri = http://controller:5000
        admin_tenant_name=service
        admin_user=cinder
        admin_password=CINDER_PASS
    

    Configure Block Storage to use the RabbitMQ message broker by setting these configuration keys in the [DEFAULT] configuration group of the /etc/cinder/cinder.conf file

       [DEFAULT]
        ...
        rpc_backend = cinder.openstack.common.rpc.impl_kombu
        rabbit_host = controller
        rabbit_port = 5672
        rabbit_userid = guest
        rabbit_password = RABBIT_PASS 
    

    Register the Block Storage Service with the Identity Service so that other OpenStack services can locate it.

       # keystone service-create --name=cinder --type=volume --description="Cinder Volume Service" 
        +-------------+----------------------------------+
        |   Property  |              Value               |
        +-------------+----------------------------------+
        | description |      Cinder Volume Service       |
        |      id     | 5f4ac31dd0334c9d89e72375f165c26d |
        |     name    |              cinder              |
        |     type    |              volume              |
        +-------------+----------------------------------+
    

    Use id property to create the endpoint

       # keystone endpoint-create \
      --service-id=the_service_id_above \
      --publicurl=http://controller:8776/v1/%\(tenant_id\)s \
      --internalurl=http://controller:8776/v1/%\(tenant_id\)s \
      --adminurl=http://controller:8776/v1/%\(tenant_id\)s 
    

    Register a service and endpoint for version 2 of the Block Storage Service API.

       # keystone service-create --name=cinderv2 --type=volumev2 \
      --description="Cinder Volume Service V2" 
    
        +-------------+----------------------------------+
        |   Property  |              Value               |
        +-------------+----------------------------------+
        | description |     Cinder Volume Service V2     |
        |      id     | 46d56ead76f94608b5ff8432b156b865 |
        |     name    |             cinderv2             |
        |     type    |             volumev2             |
        +-------------+----------------------------------+
    
    

    Register endpoint

           # keystone endpoint-create \
      --service-id=the_service_id_above \
      --publicurl=http://controller:8776/v2/%\(tenant_id\)s \
      --internalurl=http://controller:8776/v2/%\(tenant_id\)s \
      --adminurl=http://controller:8776/v2/%\(tenant_id\)s 
    
    

    Restart the cinder service

       # service cinder-scheduler restart
       # service cinder-api restart 
    

    Configure a Block Storage Service node

    We will setup a new node that will contain the disk that serves volumes, update /etc/hosts in controller and in the new node:

        127.0.0.1       localhost
        192.168.0.10    controller
        192.168.0.11    compute1
        192.168.0.12    block1
    

    Install the required LVM packages in the block1 node

       # apt-get install lvm2 
    

    Create the LVM physical and logical volumes. This guide assumes a second disk /dev/sdb that is used for this purpose

        # pvcreate /dev/sdb
        # vgcreate cinder-volumes /dev/sdb 
    

    Add a filter entry to the devices section /etc/lvm/lvm.conf to keep LVM from scanning devices used by virtual machines. Each item in the filter array starts with either an a for accept, or an r for reject. The physical volumes that are required on the Block Storage host have names that begin with a. The array must end with "r/.*/" to reject any device not listed. In this example, /dev/sda1 is the volume where the volumes for the operating system for the node reside, while /dev/sdb is the volume reserved for cinder-volumes.

       devices {
        ...
        filter = [ "a/sda1/", "a/sdb/", "r/.*/"]
        ...
        } 
    

    Install the appropriate packages for the Block Storage Service on block1 node

       # apt-get install cinder-volume 
    

    Edit /etc/cinder/api-paste.ini , locate the section [filter:authtoken]:

       [filter:authtoken]
        paste.filter_factory=keystoneclient.middleware.auth_token:filter_factory
        auth_host=controller
        auth_port = 35357
        auth_protocol = http
        admin_tenant_name=service
        admin_user=cinder
        admin_password=CINDER_PASS 
    

    Configure Block Storage to use the RabbitMQ message broker by setting these configuration keys in the [DEFAULT] configuration group of the /etc/cinder/cinder.conf file

       [DEFAULT]
        ...
        rpc_backend = cinder.openstack.common.rpc.impl_kombu
        rabbit_host = controller
        rabbit_port = 5672
        rabbit_userid = guest
        rabbit_password = RABBIT_PASS 
    

    Configure Block Storage to use your MySQL database. Edit the /etc/cinder/cinder.conf

       [database]
        ...
        connection = mysql://cinder:CINDER_DBPASS@controller/cinder 
    

    Configure Block Storage to use the Image Service. Edit the /etc/cinder/cinder.conf file and update the glance_host option in the [DEFAULT] section:

    [DEFAULT]
    ...
    glance_host = controller
    

    Restart the cinder service:

       # service cinder-volume restart
       # service tgt restart 
    

    OpenStack Introduction for Ubuntu Part II

    This is the second Post about Open Stack Introduction. The first part of this post series is about general concepts, basic configuration and Identity service installation, this second part continue with Image and Compute services. At this link a complete installation guide can be found.

    Configure the Image Service

    The OpenStack Image Service enables users to discover, register, and retrieve virtual machine images. Also known as the glance project, the Image Service offers a REST API that enables you to query virtual machine image metadata and retrieve an actual image.

    Image Service overview

    • glance-api. Accepts Image API calls for image discovery, retrieval, and storage.
    • glance-registry. Stores, processes, and retrieves metadata about images. Metadata includes size, type, and so on.
    • Database. Stores image metadata.
    • Storage repository for image files, you can use the Object Storage Service as the image repository, but the Image Service supports normal file systems, RADOS block devices, Amazon S3, and HTTP.

    Install the Image Service

    The OpenStack Image Service acts as a registry for virtual disk images. Users can add new images or take a snapshot of an image from an existing server for immediate storage. Use snapshots for back up and as templates to launch new servers.

    Install the Image Service on the controller node
       # apt-get install glance python-glanceclient 
    

    Edit /etc/glance/glance-api.conf and /etc/glance/glance-registry.conf and change the [DEFAULT] section to configure database connection:

        ...
        [DEFAULT]
        ...
        # SQLAlchemy connection string for the reference implementation
        # registry server. Any valid SQLAlchemy connection string is fine.
        # See: http://www.sqlalchemy.org/docs/05/reference/sqlalchemy/connections.html#sqlalchemy.create_engine
        sql_connection = mysql://glance:GLANCE_DBPASS@controller/glance
        ...
    

    Note: You should create a new database, an user with access to this database if this doesn't exists, after that you can create the database tables for the Image Service.

       # glance-manage db_sync 
    

    Create a glance user that the Image Service can use to authenticate with the Identity Service. Choose a password and specify an email address for the glance user. Use the service tenant and give the user the admin role.

        # keystone user-create --name=glance --pass=GLANCE_PASS --email=glance@example.com
    
        +----------+----------------------------------+
        | Property |              Value               |
        +----------+----------------------------------+
        |  email   |        glance@example.com        |
        | enabled  |               True               |
        |    id    | 3c6813f8f5c84379adfc23132562efce |
        |   name   |              glance              |
        +----------+----------------------------------+
    
       # keystone user-role-add --user=glance --tenant=service --role=admin 
    

    Configure the Image Service to use the Identity Service for authentication. Edit the /etc/glance/glance-api.conf and /etc/glance/glance-registry.conf files. Replace GLANCE_PASS with the password you chose for the glance user in the Identity Service. Add the following keys under the [keystone_authtoken] section:

        [keystone_authtoken]
        ...
        auth_uri = http://controller:5000
        auth_host = controller
        auth_port = 35357
        auth_protocol = http
        admin_tenant_name = service
        admin_user = glance
        admin_password = GLANCE_PASS
    

    Add the following key under the [paste_deploy] section:

       flavor = keystone 
    

    Add the credentials to the /etc/glance/glance-api-paste.ini and /etc/glance/glance-registry-paste.ini files. Edit each file to set the following options in the [filter:authtoken] section .

       [filter:authtoken]
        paste.filter_factory=keystoneclient.middleware.auth_token:filter_factory
        auth_host=controller
        admin_user=glance
        admin_tenant_name=service
        admin_password=GLANCE_PASS
    

    Register the service and create the endpoint:

       # keystone service-create --name=glance --type=image --description="Glance Image Service" 
    
        +-------------+----------------------------------+
        |   Property  |              Value               |
        +-------------+----------------------------------+
        | description |       Glance Image Service       |
        |      id     | d0c6e90350f1454596ed2200421a044b |
        |     name    |              glance              |
        |     type    |              image               |
        +-------------+----------------------------------+
    
    

    Use the id property returned for the service to create the endpoint:

       # keystone endpoint-create --service-id=the_service_id_above \
         --publicurl=http://controller:9292 --internalurl=http://controller:9292 \
        --adminurl=http://controller:9292 
    
        +-------------+----------------------------------+
        |   Property  |              Value               |
        +-------------+----------------------------------+
        |   adminurl  |      http://controller:9292      |
        |      id     | 0a4e70031d4d4840a4c888053b221e1d |
        | internalurl |      http://controller:9292      |
        |  publicurl  |      http://controller:9292      |
        |    region   |            regionOne             |
        |  service_id | d0c6e90350f1454596ed2200421a044b |
        +-------------+----------------------------------+
    
    

    Restart the glance service with its new settings.

       # service glance-registry restart
       # service glance-api restart 
    

    Verify the Image Service installation

    We need download at least one virtual machine image compatible with Open Stack. For this test we will use CirrOS which is a small test image that can be used to test Open Stack services. More information about build and download images at this link.

        $ mkdir images
        $ cd images/
        $ wget http://cdn.download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img 
    

    To upload the image to the Image Service you need pass the next parameters:

    • --name: The name of the image.
    • --disk-format: Specifies the format of the image file. Valid formats include qcow2, raw, vhd, vmdk, vdi, iso, aki, ari, and ami.
    • --container-format: Specifies the container format. Valid formats include: bare, ovf, aki, ari and ami.
    • --is-public: if is true all users can use the image, if is false only admnistrators
       # glance image-create --name="CirrOS 0.3.1" --disk-format=qcow2 \
      --container-format=bare --is-public=true < cirros-0.3.1-x86_64-disk.img 
    
        +------------------+--------------------------------------+
        | Property         | Value                                |
        +------------------+--------------------------------------+
        | checksum         | d972013792949d0d3ba628fbe8685bce     |
        | container_format | bare                                 |
        | created_at       | 2014-04-10T15:43:23.652215           |
        | deleted          | False                                |
        | deleted_at       | None                                 |
        | disk_format      | qcow2                                |
        | id               | 2c0955c1-3811-451a-927e-4681bf26eca0 |
        | is_public        | True                                 |
        | min_disk         | 0                                    |
        | min_ram          | 0                                    |
        | name             | CirrOS 0.3.1                         |
        | owner            | None                                 |
        | protected        | False                                |
        | size             | 13147648                             |
        | status           | active                               |
        | updated_at       | 2014-04-10T15:43:24.347303           |
        +------------------+--------------------------------------+
    
    

    Verify:

    # glance image-list
    +--------------------------------------+--------------+-------------+------------------+----------+--------+
    | ID                                   | Name         | Disk Format | Container Format | Size     | Status |
    +--------------------------------------+--------------+-------------+------------------+----------+--------+
    | 2c0955c1-3811-451a-927e-4681bf26eca0 | CirrOS 0.3.1 | qcow2       | bare             | 13147648 | active |
    +--------------------------------------+--------------+-------------+------------------+----------+--------+
    
    

    Configure Compute services

    The Compute service is a cloud computing fabric controller, interacts with the Identity Service for authentication, Image Service for images, and the Dashboard for the user and administrative interface. Access to images is limited by project and by user; quotas are limited per project.

    Compute services components

    API
    • nova-api service: Accepts and responds to end user compute API calls.
    • nova-api-metadata service: Accepts metadata requests from instances.
    Compute core
    • nova-compute process: worker daemon that creates and terminates virtual machine instances through hypervisor APIs.
    • nova-scheduler process: Takes a virtual machine instance request from the queue and determines on which compute server host it should run.
    • nova-conductor module: Mediates interactions between nova-compute and the database. Aims to eliminate direct accesses to the cloud database made by nova-compute.
    Networking for VMs
    • nova-network worker daemon: Accepts networking tasks from the queue and performs tasks to manipulate the network, such as setting up bridging interfaces or changing iptables rules.
    • nova-dhcpbridge script: Tracks IP address leases and records them in the database by using the dnsmasq dhcp-script facility.
    Console interface
    • nova-consoleauth daemon: Authorizes tokens for users that console proxies provide.
    • nova-novncproxy daemon: Provides a proxy for accessing running instances through a VNC connection. Supports browser-based novnc clients.
    • nova-xvpnvncproxy daemon: A proxy for accessing running instances through a VNC connection. Supports a Java client specifically designed for OpenStack.
    • nova-cert daemon: Manages x509 certificates.
    Command-line clients and other interfaces
    • nova client: Enables users to submit commands as a tenant administrator or end user.
    • nova-manage client: Enables cloud administrators to submit commands.

    Install Compute controller services

    Install Compute packages:

       # apt-get install nova-novncproxy novnc nova-api \
         nova-ajax-console-proxy nova-cert nova-conductor \
         nova-consoleauth nova-doc nova-scheduler \
        python-novaclient 
    

    Edit the /etc/nova/nova.conf file and add these lines to the [database] and [keystone_authtoken] sections:

    
        ...
        [database]
        # The SQLAlchemy connection string used to connect to the database
        connection = mysql://nova:NOVA_DBPASS@controller/nova
        [keystone_authtoken]
        auth_host = controller
        auth_port = 35357
        auth_protocol = http
        admin_tenant_name = service
        admin_user = nova
        admin_password = NOVA_PASS
    
    

    Configure the Compute Service to use the RabbitMQ message broker by setting these configuration keys in the [DEFAULT] configuration group of the /etc/nova/nova.conf file:

        rpc_backend = nova.rpc.impl_kombu
        rabbit_host = controller
        rabbit_password = RABBIT_PASS 
    

    Create the Compute service tables:

       # nova-manage db sync 
    

    Set the my_ip, vncserver_listen, and vncserver_proxyclient_address configuration options to the internal IP address of the controller node: Edit the /etc/nova/nova.conf file and add these lines to the [DEFAULT] section:

         ...
        [DEFAULT]
        ...
        my_ip=192.168.0.10
        vncserver_listen=192.168.0.10
        vncserver_proxyclient_address=192.168.0.10 
    
    

    Create a nova user that Compute uses to authenticate with the Identity Service. Use the service tenant and give the user the admin role

       # keystone user-create --name=nova --pass=NOVA_PASS --email=nova@example.com
    
        +----------+----------------------------------+
        | Property |              Value               |
        +----------+----------------------------------+
        |  email   |         nova@example.com         |
        | enabled  |               True               |
        |    id    | 64e4c1ff707449a2b470d82d1269a91b |
        |   name   |               nova               |
        +----------+----------------------------------+
    
    
    
       # keystone user-role-add --user=nova --tenant=service --role=admin 
    

    Configure Compute to use these credentials with the Identity Service running on the controller. Replace NOVA_PASS with your Compute password. Edit the [DEFAULT] section in the /etc/nova/nova.conf file to add this key:

       [DEFAULT]
        ...
        auth_strategy=keystone 
    

    Add the credentials to the /etc/nova/api-paste.ini file. Add these options to the [filter:authtoken] section:

       [filter:authtoken]
        paste.filter_factory = keystoneclient.middleware.auth_token:filter_factory
        auth_host = controller
        auth_port = 35357
        auth_protocol = http
        auth_uri = http://controller:5000/v2.0
        admin_tenant_name = service
        admin_user = nova
        admin_password = NOVA_PASS 
    
       Register Compute with the Identity Service so that other OpenStack services can locate it. 
       Register the service and specify the endpoint: 
    
       # keystone service-create --name=nova --type=compute \
      --description="Nova Compute service" 
    
        +-------------+----------------------------------+
        |   Property  |              Value               |
        +-------------+----------------------------------+
        | description |       Nova Compute service       |
        |      id     | 4835b37554764710b7fbc669ced638c7 |
        |     name    |               nova               |
        |     type    |             compute              |
        +-------------+----------------------------------+
    
    
    

    Use the id property that is returned to create the endpoint.

       # keystone endpoint-create \
      --service-id=the_service_id_above \
      --publicurl=http://controller:8774/v2/%\(tenant_id\)s \
      --internalurl=http://controller:8774/v2/%\(tenant_id\)s \
      --adminurl=http://controller:8774/v2/%\(tenant_id\)s 
    
    
        +-------------+-----------------------------------------+
        |   Property  |                  Value                  |
        +-------------+-----------------------------------------+
        |   adminurl  | http://controller:8774/v2/%(tenant_id)s |
        |      id     |     d347a210e04945b0b17c7c7ec1f95aca    |
        | internalurl | http://controller:8774/v2/%(tenant_id)s |
        |  publicurl  | http://controller:8774/v2/%(tenant_id)s |
        |    region   |                regionOne                |
        |  service_id |     4835b37554764710b7fbc669ced638c7    |
        +-------------+-----------------------------------------+
    
    

    Restart Compute services:

        # service nova-api restart
        # service nova-cert restart
        # service nova-consoleauth restart
        # service nova-scheduler restart
        # service nova-conductor restart
        # service nova-novncproxy restart 
    

    To verify your configuration, list available images:

      # nova image-list 
        +--------------------------------------+--------------+--------+--------+
        | ID                                   | Name         | Status | Server |
        +--------------------------------------+--------------+--------+--------+
        | 2c0955c1-3811-451a-927e-4681bf26eca0 | CirrOS 0.3.1 | ACTIVE |        |
        +--------------------------------------+--------------+--------+--------+
    
    

    OpenStack Introduction for Ubuntu Part I

    OpenStack is an open source cloud-computing project for public and private clouds managed by the OpenStack Foundation, the mail goal of this project is provide an Infrastructure as a Service (IaaS) plataform. This post series are an introduction about how can be the platafform installed on Ubuntu 12.04 (LTS). At this link a complete installation guide can be found

    OpenStack Services

    • OpenStack Dashboard (http://www.openstack.org/software/openstack-dashboard/): Graphical interface to access, provision and automate cloud-based resources.
    • OpenStack Compute (http://www.openstack.org/software/openstack-compute/): Cloud operating system that enables enterprises and service providers to offer on-demand computing resources, by provisioning and managing large networks of virtual machines.
    • OpenStack Networking (http://www.openstack.org/software/openstack-networking/): Pluggable, scalable and API-driven system for managing networks and IP addresses.
    • OpenStack Storage (http://www.openstack.org/software/openstack-storage/): Object and Block storage for use with servers and applications.
    • Identity Service (http://www.openstack.org/software/openstack-shared-services/): Central directory of users mapped to the OpenStack services they can access.
    • Image Service (http://www.openstack.org/software/openstack-shared-services/): Provides discovery, registration and delivery services for disk and server images.
    • Telemetry Service (http://www.openstack.org/software/openstack-shared-services/): Aggregates usage and performance data across the services deployed in an OpenStack cloud.
    • Orchestration Service (http://www.openstack.org/software/openstack-shared-services/): Template-driven engine that allows application developers to describe and automate the deployment of infrastructure.

    Basic architecture with OpenStack Networking (Neutron)

    • The controller node runs the Identity Service, Image Service, dashboard, and management portions of Compute and Networking. It also contains the associated API services, MySQL databases, and messaging system.
    • The network node runs the Networking plug-in agent and several layer 3 agents that provision tenant networks and provide services to them, including routing, NAT, and DHCP. It also handles external (internet) connectivity for tenant virtual machines.
    • The compute node runs the hypervisor portion of Compute, which operates tenant virtual machines. By default, Compute uses KVM as the hypervisor. The compute node also runs the Networking plug-in agent, which operates tenant networks and implements security groups.

    Networking

    Configuring Internal network

    Conventional configuration requires two network interfaces on every machine, one for internet communication and another for local network communication, for simplicity we will assume that these interfaces are eth0 and eth1. An example of /etc/network/interfaces configuration file for controller node can be:

    # Internal Network
        auto eth0
        iface eth0 inet static
        address 192.168.0.10
        netmask 255.255.255.0
    # External Network
        auto eth1
        iface eth1 inet static
        address 10.0.0.10
        netmask 255.255.255.0
    

    Use the hostname command to set the host name:

       # hostname controller 
    

    To configure this host name to be available when the system reboots, you must specify it in the /etc/hostname file, which contains a single line with the host name.

    For a simple example edit hosts file and add all the hostnames, real systems should use a dns or a system like chef to set up this host list

        127.0.0.1       localhost
        192.168.0.10    controller
        192.168.0.11    compute1
    

    Network Time Protocol (NTP)

    To synchronize services across multiple machines, you must install NTP on every node. After install edit /etc/ntp.conf and change the server directive to use the controller node as internet time source.

       # apt-get install ntp 
    

    Passwords

    Every service require a password. You can generate random passwords using openssl rand -hex 10 command. The complete list of passwords you need to define in this guide are:

    Passwords
    Password name Description
    Database password (no variable used) Root password for the database
    RABBIT_PASS Password of user guest of RabbitMQ
    KEYSTONE_DBPASS Database password of Identity service
    ADMIN_PASS Password of user admin
    GLANCE_DBPASS Database password for Image Service
    GLANCE_PASS Password of Image Service user glance
    NOVA_DBPASS Database password for Compute service
    NOVA_PASS Password of Compute service user nova
    DASH_DBPASS Database password for the dashboard
    CINDER_DBPASS Database password for the Block Storage Service
    CINDER_PASS Password of Block Storage Service user cinder
    NEUTRON_DBPASS Database password for the Networking service
    NEUTRON_PASS Password of Networking service user neutron
    HEAT_DBPASS Database password for the Orchestration service
    HEAT_PASS Password of Orchestration service user heat
    CEILOMETER_DBPASS Database password for the Telemetry service
    CEILOMETER_PASS Password of Telemetry service user ceilometer

    Mysql Install

    MysqlDb is required to store information Open Stack Information

    Mysql Controller Setup

       # apt-get install python-mysqldb mysql-server 
    

    Mysql Node Setup

       apt-get install python-mysqldb 
    

    Open Stack Packages

    The Ubuntu Cloud Archive is a special repository that allows you to install newer releases of OpenStack on the stable supported version of Ubuntu.

       # apt-get install python-software-properties
       # add-apt-repository cloud-archive:havana 
       # apt-get update && apt-get dist-upgrade
       # reboot
    

    Messaging server

    Install RabbitMQ and change its default password.

       # apt-get install rabbitmq-server 
       # rabbitmqctl change_password guest RABBIT_PASS
    

    Configure the Identity Service

    Identity Service concepts

    The Identity Service performs the following functions:

    • User management: Tracks users and their permissions.
    • Service catalog: Provides a catalog of available services with their API endpoints.

    Install the Identity Service

    Install the OpenStack Identity Service on the controller node:

       # apt-get install keystone 
    

    Edit /etc/keystone/keystone.conf and change the [sql] section.

        [sql]
        # The SQLAlchemy connection string used to connect to the database
        connection = mysql://keystone:KEYSTONE_DBPASS@controller/keystone
    

    Create a keystone database user:

      # mysql -u root -p
        mysql> CREATE DATABASE keystone;
        mysql> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost' \
        IDENTIFIED BY 'KEYSTONE_DBPASS';
        mysql> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'%' \
        IDENTIFIED BY 'KEYSTONE_DBPASS'; 
    

    Create the database tables for the Identity Service:

        # keystone-manage db_sync
    

    Define an authorization token to use as a shared secret between the Identity Service and other OpenStack services. Use openssl to generate a random token and store it in the configuration file:

       # openssl rand -hex 10 
    

    Edit /etc/keystone/keystone.conf and change the [DEFAULT] section, replacing ADMIN_TOKEN with the results of the command

       [DEFAULT]
        # A "shared secret" between keystone and other openstack services
        admin_token = ADMIN_TOKEN 
    

    Restart #service keystone restart

    Define users, tenants, and roles

    We don't have any users yet for the identity service but we can connect it usign the admin access token, in order to that, we need export next environment variables:

       # export OS_SERVICE_TOKEN=ADMIN_TOKEN
       # export OS_SERVICE_ENDPOINT=http://controller:35357/v2.0 
    

    Create a tenant for an administrative user and a tenant for other OpenStack services to use.

        # keystone tenant-create --name=admin --description="Admin Tenant"
        # keystone tenant-create --name=service --description="Service Tenant"
    

    Create an administrative user called admin. Choose a password for the admin user and specify an email address for the account.

       # keystone user-create --name=admin --pass=ADMIN_PASS --email=admin@example.com 
    

    Create a role for administrative tasks called admin.

        # keystone role-create --name=admin
    

    Add roles to users. Users always log in with a tenant, and roles are assigned to users within tenants. Add the admin role to the admin user when logging in with the admin tenant.

       # keystone user-role-add --user=admin --tenant=admin --role=admin 
    

    Define services and API endpoints

    You must register each service in your OpenStack installation using the next commands:

    • keystone service-create. Describes the service.
    • keystone endpoint-create. Associates API endpoints with the service.

    You must also register the Identity Service itself. Use the OS_SERVICE_TOKEN environment variable, as set previously, for authentication.

    Create a service entry for the Identity Service
       # keystone service-create --name=keystone --type=identity --description="Keystone Identity Service"
    
        +-------------+----------------------------------+
        |   Property  |              Value               |
        +-------------+----------------------------------+
        | description |    Keystone Identity Service     |
        |      id     | 19072b6631ec4c0b9d3f93a8b89efd41 |
        |     name    |             keystone             |
        |     type    |             identity             |
        +-------------+----------------------------------+
    
    
    Specify an API endpoint for the Identity Service

    Use the returned service ID. When you specify an endpoint, you provide URLs for the public API, internal API, and admin API.

       # keystone endpoint-create \
      --service-id=19072b6631ec4c0b9d3f93a8b89efd41\
      --publicurl=http://controller:5000/v2.0 \
      --internalurl=http://controller:5000/v2.0 \
      --adminurl=http://controller:35357/v2.0
    
    
        +-------------+----------------------------------+
        |   Property  |              Value               |
        +-------------+----------------------------------+
        |   adminurl  |   http://controller:35357/v2.0   |
        |      id     | dcff14339da049fbb42031f4b1490250 |
        | internalurl |   http://controller:5000/v2.0    |
        |  publicurl  |   http://controller:5000/v2.0    |
        |    region   |            regionOne             |
        |  service_id | 19072b6631ec4c0b9d3f93a8b89efd41 |
        +-------------+----------------------------------+
    

    Note: As you add other services to your OpenStack installation, call these commands to register the services with the Identity Service

    Verify the Identity Service installation

    Unset the OS_SERVICE_TOKEN and OS_SERVICE_ENDPOINT environment variables

        $ unset OS_SERVICE_TOKEN OS_SERVICE_ENDPOINT 
    

    You can now use regular username-based authentication. Request an authentication token using the admin user and the password you chose during the earlier administrative user-creation step. You should receive a token in response, paired with your user ID. This verifies that keystone is running on the expected endpoint, and that your user account is established with the expected credentials.

       $ keystone --os-username=admin --os-password=ADMIN_PASS \
      --os-auth-url=http://controller:35357/v2.0 token-get 
    

    Verify that authorization is behaving as expected by requesting authorization on a tenant.

      $ keystone --os-username=admin --os-password=ADMIN_PASS \
      --os-tenant-name=admin --os-auth-url=http://controller:35357/v2.0 token-get 
    

    You can also set your --os-* variables in your environment to simplify command-line usage

        export OS_USERNAME=admin
        export OS_PASSWORD=ADMIN_PASS
        export OS_TENANT_NAME=admin
        export OS_AUTH_URL=http://controller:35357/v2.0 
    

    Verify if the variables has been setted correctly, using keystone command without --os option

       $ keystone token-get 
    

    Verify that your admin account has authorization to perform administrative commands.

       $ keystone user-list 
    

    You can continue with the sencond part at this link

    Asynchronous Database code with Tornado and Postgresql

    In order to write asynchronous code is not enough use a framework like Tornado, all libraries that we use in order to talk with databases, web clients, etc, must support asynchronous style too. momoko is a wrapper for Psycopg2 that allows use postgresql with tornado asynchronously. Full documentation can be found at http://momoko.61924.nl/en/latest/

    Initializing Database connection

    We can use momoko.Pool function to get a postgresql asynchronous connection http://momoko.61924.nl/en/latest/api.html#connections

    
        from daos import UserDAO
        from tornado import httpserver, ioloop, options, web, gen
        from tornado.escape import json_decode
    
        import psycopg2
        import momoko
    
    
    
        class Application(web.Application):
            def __init__(self):
                handlers = [
                    (r"/create-table/", PostgresTableHandler),
                    (r"/delete-table/", PostgresTableHandler),
                    (r"/user/", PostgresUserHandler)
                ]
                web.Application.__init__(self, handlers)
                dsn = 'dbname=ds_test user=db_test password=test ' \
                      'host=localhost port=5432'
                self.db = momoko.Pool(dsn=dsn, size=5)
    
    
        class PostgresHandler(web.RequestHandler):
            SUPPORTED_METHODS = ("GET", "POST", "DELETE", PUT")
    
            @gen.coroutine
            def prepare(self):
                if self.request.headers.get("Content-Type") == "application/json":
                    try:
                        self.json_args = json_decode(self.request.body)
                    except Exception as error:
                        self.finish('invalid request')
            @property
            def db(self):
                return self.application.db
    
    
    

    A simple DAO class

    
        from tornado import gen
    
        import momoko
        import string
        import random
    
        class UserDAO(object):
            def __init__(self, db):
                self.db = db
    
            def _get_random_str(self, size=10):
                return ''.join(random.choice(string.ascii_uppercase + string.digits)
                               for x in range(size))
    
            @gen.coroutine
            def get(self, id):
                sql = """
                    SELECT id, username, email, password
                    FROM users_user
                    WHERE id=%s
                """
                cursor = yield momoko.Op(self.db.execute, sql, (id,))
                desc = cursor.description
                result = [dict(zip([col[0] for col in desc], row))
                                 for row in cursor.fetchall()]
    
                cursor.close()
                return result
    
            @gen.coroutine
            def get_list(self):
                sql = """
                    SELECT id, username, email, password
                    FROM users_user
                """
                cursor = yield momoko.Op(self.db.execute, sql)
                desc = cursor.description
                result = [dict(zip([col[0] for col in desc], row))
                                 for row in cursor.fetchall()]
    
                cursor.close()
                return result
    
            @gen.coroutine
            def create(self):
                sql = """
                    INSERT INTO users_user (username, email, password)
                    VALUES (%s, %s, %s  )
                """
                username = self._get_random_str()
                email = '{0}@{1}.com'.format(self._get_random_str(),
                                             self._get_random_str())
                password = self._get_random_str()
                cursor = yield momoko.Op(self.db.execute, sql, (username, email, password))
                return cursor
    
    
            @gen.coroutine
            def update(self, id, data={}):
                fields = ''
                for key in data.keys():
                    fields += '{0}=%s,'.format(key)
    
                sql = """
                    UPDATE users_user
                    SET {0}
                    WHERE id=%s
                """.format(fields[0:-1])
                params = list(data.values())
                params.append(id)
                cursor = yield momoko.Op(self.db.execute, sql, params)
                return cursor
    
    
            @gen.coroutine
            def delete_table(self):
                sql = """
                    DROP TABLE IF EXISTS users_user;
                    DROP SEQUENCE IF EXISTS user_id;
                """
                cursor = yield momoko.Op(self.db.execute, sql)
                return cursor
    
            @gen.coroutine
            def delete(self, id):
                sql = """
                    DELETE
                    FROM users_user
                    WHERE id=%s
                """
                cursor = yield momoko.Op(self.db.execute, sql, (id,))
                cursor.close()
                return ''
    
            @gen.coroutine
            def create_table(self, callback=None):
                sql = """
                    CREATE SEQUENCE  user_id;
                    CREATE TABLE IF NOT EXISTS users_user (
                        id integer PRIMARY KEY DEFAULT nextval('user_id') ,
                        username  varchar(80) UNIQUE,
                        email  varchar(80) UNIQUE,
                        password  varchar(80) 
                    );
                    ALTER SEQUENCE user_id OWNED BY users_user.id;
                """
                cursor = yield momoko.Op(self.db.execute, sql)
                return cursor
    
    

    Table RequestHandler

    This handler has two methods: one for create the table and another to delete it

        class PostgresTableHandler(PostgresHandler):
    
            @gen.coroutine
            def post(self):
                dao = UserDAO(self.db)
                cursor = yield (dao.create_table())
                if not cursor.closed:
                    self.write('closing cursor')
                    cursor.close()
                self.finish()
    
            @gen.coroutine
            def delete(self):
                dao = UserDAO(self.db)
                cursor = yield (dao.delete_table())
                if not cursor.closed:
                    self.write('closing cursor')
                    cursor.close()
                self.finish()
    
    

    User RequestHandler

    Handles simple CRUD operations using 4 HTTP methods (GET, PUT, POST, DELETE):

        class PostgresUserHandler(PostgresHandler):
            @gen.coroutine
            def get(self, id=None):
                dao = UserDAO(self.db)
                if not id:
                    dict_result = yield (dao.get_list())
                else:
                    dict_result = yield (dao.get(id))
                self.write(json.dumps(dict_result))
                self.finish()
    
            @gen.coroutine
            def post(self):
                dao = UserDAO(self.db)
                cursor = yield (dao.create())
                if not cursor.closed:
                    self.write('closing cursor')
                    cursor.close()
                self.finish()
    
            @gen.coroutine
            def put(self, id=None):
                if not hasattr(self, 'json_args'):
                    self.write('invalid request')
                    self.finish()
                else:
                    dao = UserDAO(self.db)
                    if id:
                        result = yield (dao.update(id, data=self.json_args))
                        dict_result = yield (dao.get(id))
                        self.write(json.dumps(dict_result))
                    else:
                        self.write('invalid user')
                    self.finish()
    
            @gen.coroutine
            def delete(self, id=None):
                if id:
                    dao = UserDAO(self.db)
                    result = yield (dao.delete(id))
                    self.write('user deleted')
                else:
                    self.write('invalid user')
                self.finish()
    

    Using the Application

    Creating the table

    
        $ curl  -X POST  localhost:8004/create-table
    

    Creating random users

       $ curl  -X POST  localhost:8004/user 
    

    Get user list

    
       $ curl  -X GET  localhost:8004/user
        [{"id": 1, "email": "OUGI14GG3Q@BYQMP4RC1B.com", "password": "ND3ISHYVJA", "username": "W30HNJAYI4"}, {"id": 2, "email": "J2LQPR0Z94@VC0XP3DZV8.com", "password": "CCJVGTB9CF", "username": "3LRZDYYD52"}] 
    
    

    Get user by id

       $ curl  -X GET  localhost:8004/user/1
        [{"id": 1, "email": "OUGI14GG3Q@BYQMP4RC1B.com", "password": "ND3ISHYVJA", "username": "W30HNJAYI4"}]     
    
    

    Update user

       $ curl -X PUT -H "Content-Type: application/json"   localhost:8004/user/2 -d '{"username": "userupdated577"}'
        [{"id": 2, "email": "J2LQPR0Z94@VC0XP3DZV8.com", "password": "CCJVGTB9CF", "username": "userupdated577"}] 
    
    
    

    Delete user

       $ curl  -X DELETE  localhost:8004/user/1
        user deleted 
    
    

    Delete table

       curl  -X DELETE  localhost:8004/delete-table 
    
    

    Example code can be found at this link

    Asynchronous Programming with Tornado Framework

    In order to understand Asynchronous programming is important understand how synchronous programming works and why it is a problem on high traffic websites.

    A Simple Web Application Example

    Let's think about a web application build with one popular web framework (django, flask, ruby on rails, java servlets, etc). This application will run as a server waiting for request directly or behind a web server like apache or nginx. Our example application has to do the 4 next task every user request:

    1. Create user in database
    2. Send confirmation email to user.
    3. Share social profile on social networks using external API providers.
    4. Send response to the user

    Synchronous Aproach

    Every time that a client makes a request to a Synchronous Web application, a thread takes care of this request and start perform the task sequentially, so let's suppose that connect, send information and wait response from database in order to perform the first step in the request takes 20 ms. in that time the thread can not do any other task because is waiting for the database response, so this thread is BLOCKED.

    After the database response the thread can continue its work, so now is time to send the email, the connection to the smpt server and send the email and get the response can take 60 ms, in this time the thread will be blocked again, and then same thing happen when send information to the socials API but this time the thread will have to wait 200 ms. So in total the thread will be inactive and BLOCKED = 20ms + 60ms + 200ms = 280ms

    The next image shows the workflow for this approach

    Code example

    The model

    A simple user class that emulates the delay:

        import time
    
    
        class User(object):
            def save(self):
                time.delay(0.2)
    
            def send_email(self):
                time.delay(0.6)
    
            def social_api(self):
                time.delay(2)
    
    
    The Server

    A simple Http server built with standard python library http://docs.python.org/3/library/http.server.html, any conventional python web framework (django, flask, bottle, pylons, etc) follow this philosophy:

        from models import User
    
    
        from datetime import datetime
    
        import http.server
        import socketserver
    
        class SyncServer(http.server.BaseHTTPRequestHandler):
            def do_GET(self, *args, **kwargs):
                user = User()
                user.save()
                user.send_email()
                user.social_api()
                self.send_response(200, message='ok')
                self.end_headers()
    
        def run():
            PORT = 8000
    
            Handler = SyncServer
    
            httpd = socketserver.TCPServer(("", PORT), Handler)
    
            print("serving at port", PORT)
            httpd.serve_forever()
    
    
        if __name__ == '__main__':
            run()
    
    

    ASynchronous Approach

    Every time that a request is made, the thread does not execute the tasks, instead of this defer the execution of all the task needed to be done to another moment. For me has been really hard try to understand what this means, so i'll try to explain with my own words that maybe doesn't fit the exact tecnhical definition: The Thread does not execute the tasks required to get the job done, instead the thread makes a list of these tasks and pass the control to another component who actually perform the task and will report to the main thread when the process is finish in order to send the response to the client. The big difference in this approach is the main thread is no blocked waiting for database, stmp and API responses, so is possible receive another request from another clients in the meantime. That can make a real big performance difference on heavy load web applications, that is not black magic, on both approachs some component have to wait 280ms between database access, smtp and API social connections. The big difference is that in synchronous approach the main thread will be blocked waiting for the response of every component, this not happen on the asynchronous approach because the main thread transfer the responsability of these task to another component and can continue processing client request in meantime. When the task finish the main thread is notified in order to finish the request and delivery response to the client. The biggest challenge when we want write asynchronous code is change our mental model and understand that the code is not executed by the main thread, instead of this, the main thread transfer the control flow of this execution to another component that will execute the code in the future and will report the result to a callback function that must be defined. After the main thread delegates the responsability of this execution is free to continue the normal execution and doesn't wait for the response.

    The next image shows the workflow for this approach

    Futures python module

    The concurrent.futures module provides a high-level interface for asynchronously executing callables. This pattern was introduced in python 3.2 but is avaiable for older releases via pip, the official documentation for this module in this link: http://docs.python.org/3.4/library/concurrent.futures.html

    Code Example using Tornado Framework

    The Model

    We use the future feature http://www.tornadoweb.org/en/stable/concurrent.html in order to make our code asynchronous:

        import time
        import datetime
        from tornado.concurrent import return_future
    
        class AsyncUser(object):
            @return_future
            def save(self, callback=None):
                time.sleep(0.02)
                result = datetime.datetime.utcnow()
                callback(result)
    
            @return_future
            def send_email(self, callback=None):
                time.sleep(0.06)
                result = datetime.datetime.utcnow()
                callback(result)
    
            @return_future
            def social_api(self, callback=None):
                time.sleep(0.2)
                result = datetime.datetime.utcnow()
                callback(result)
    
    

    The @return_future decorator make a function that returns via callback return a Future. The wrapped function should take a callback keyword argument and invoke it with one argument when it has finished. To signal failure, the function can simply raise an exception (which will be captured by the StackContext and passed along to the Future).

    The Server

    tornado.gen is a generator-based interface to make it easier to work in an asynchronous environment. The documentation can be found at this link: http://www.tornadoweb.org/en/stable/gen.html

    
        import tornado.httpserver
        import tornado.ioloop
        import tornado.options
        import tornado.web
    
        from tornado import gen
    
        from models import AsyncUser
    
        class Application(tornado.web.Application):
            def __init__(self):
                handlers = [
                    (r"/", UserHandler),
                ]
                tornado.web.Application.__init__(self, handlers)
    
    
        class UserHandler(tornado.web.RequestHandler):
    
            @gen.coroutine
            def get(self):
                user = AsyncUser()
                response = yield (user.save())
    
                response2 = yield (user.send_email())
                response3 = yield (user.social_api())
                self.finish()
    
        def main():
            http_server = tornado.httpserver.HTTPServer(Application())
            PORT = 8001
            print("serving at port", PORT)
            http_server.listen(PORT)
            tornado.ioloop.IOLoop.instance().start()
    
    
        if __name__ == "__main__":
            main()
    
    
    
    

    Apache Benchmark comparision

    Synchronous Server

        $ ab -c 12 -n 120  127.0.0.1:8000/
    
        Server Software:        BaseHTTP/0.6
        Server Hostname:        127.0.0.1
        Server Port:            8000
    
        Document Path:          /
        Document Length:        0 bytes
    
        Concurrency Level:      12
        Time taken for tests:   62.697 seconds
        Complete requests:      120
        Failed requests:        0
        Write errors:           0
        Total transferred:      10920 bytes
        HTML transferred:       0 bytes
        Requests per second:    1.91 [#/sec] (mean)
        Time per request:       6269.746 [ms] (mean)
        Time per request:       522.479 [ms] (mean, across all concurrent requests)
        Transfer rate:          0.17 [Kbytes/sec] received
    
        Connection Times (ms)
                      min  mean[+/-sd] median   max
        Connect:        0   59 396.0      0    3006
        Processing:   281 3890 9879.3   1971   61132
        Waiting:      281 3890 9879.3   1970   61131
        Total:        282 3949 10215.0   1971   62697
    
    

    ASynchronous Server

        $ ab -c 12 -n 120 127.0.0.1:8001/
    
        Server Software:        TornadoServer/3.1.1
        Server Hostname:        127.0.0.1
        Server Port:            8001
    
        Document Path:          /
        Document Length:        0 bytes
    
        Concurrency Level:      12
        Time taken for tests:   33.895 seconds
        Complete requests:      120
        Failed requests:        0
        Write errors:           0
        Total transferred:      23280 bytes
        HTML transferred:       0 bytes
        Requests per second:    3.54 [#/sec] (mean)
        Time per request:       3389.531 [ms] (mean)
        Time per request:       282.461 [ms] (mean, across all concurrent requests)
        Transfer rate:          0.67 [Kbytes/sec] received
    
        Connection Times (ms)
                      min  mean[+/-sd] median   max
        Connect:        0    0   0.1      0       0
        Processing:  3185 3371  58.6   3389    3392
        Waiting:     3185 3371  58.6   3389    3392
        Total:       3185 3371  58.6   3389    3392
    
        Percentage of the requests served within a certain time (ms)
          50%   3389
          66%   3390
          75%   3390
          80%   3390
          90%   3391
          95%   3391
          98%   3391
          99%   3391
    
    
    

    Results comparisions

    +-------------------+---------------+--------------+ | Item | Synchronous | ASynchronous | +-------------------+---------------+--------------+ | Concurrency Level | 12 | 12 | +-------------------+---------------+--------------+ | Complete requests | 120 | 120 | +-------------------+---------------+--------------+ | Time taken for | 62.697 second | 33.89 second | | tests | | | +-------------------+---------------+--------------+ | Requests per | 1.91 | 3.54 | | second | | | +-------------------+---------------+--------------+

    Resources

    Arduino Introduction

    Arduino is a open source hardware plataform for protoyping, makes easy for a non electronic engineer builds interative objects. This post is an introduction, most of the information has been taken from arduino website. Some of the images has been create with Fritzing

    IDE Instalation

    Arduino IDE is multiplatform, check this link to find the instructions for your plataform http://arduino.cc/en/Main/Software

    Blinkin a Led

    The most basic example to test the arduino is blinking a led, a complete tutorial about this can be found a this link

    The circuit

    attach a 220-ohm resistor(red one end and gold another end) to pin 13. Then attach the long leg of an LED (the positive leg, called the anode) to the resistor. Attach the short leg (the negative leg, called the cathode) to ground

    The code

    Open the arduino ide and paste the next code

    const int LED = 13; // LED connected to
                        // digital pin 13
    
    void setup()
    {
      pinMode(LED, OUTPUT);    // sets the digital
                               // pin as output
    }
    
    
    /**1. Turns on the led
       2. Wait one second
       3. Tuns off the led
       4. Wait one second
    **/
    void loop()
    {
      digitalWrite(LED, HIGH); 
      delay(1000);               
      digitalWrite(LED, LOW);  
      delay(1000);              
    }
    

    Compile and upload the code to the arduino board.

    Saying hello world in morse code

    Morse code has been used for years to communicate visually through great distances. Let's try To improve the blink skethc to say hello world, this in morse code is:

    • H: ....
    • E: .
    • L: .---
    • O: ---
    • W: .--
    • R: .-.
    • D: -..
    .... . .--- .--- ---  .-- --- .-. .--- -..

    For every dash the led will be on by 1 second, by every point for 0.5 second and between words willbe a 1.5 second delay

    The circuit

    Is the same circuit that the previous one, but this use the protoboard:

    The code

        const int LED = 13; // LED connected to
                            // digital pin 13
    
    
        void setup()
        {
          pinMode(LED, OUTPUT);    // sets the digital
                                   // pin as output
        }
    
        void point(){
          digitalWrite(LED, HIGH); 
          delay(800);               
          digitalWrite(LED, LOW);                  
          delay(200);
        }
    
        void dash(){
          digitalWrite(LED, HIGH); 
          delay(1600);               
          digitalWrite(LED, LOW);                
          delay(200);
        }
    
    
    
        void space(){
          delay(500);               
        }
    
        /**print "hello world" in morse code
        **/
        void sayHello(){
          point();
          point();
          point();
          point();
          space();
          point();
          space();
          point();
          dash();
          dash();
          dash();
          space();
          point();
          dash();
          dash();
          dash();
          space();
          dash();
          dash();
          dash();
          space();
          space();
          space();
          point();
          dash();
          dash();
          space();
          dash();
          dash();
          dash();
          space();
          point();
          dash();
          point();
          space();
          point();
          dash();
          dash();
          dash();
          space();
          dash();
          point();
          point();
          space();
          space();
          space();
            
    
        }
    
        void loop()
        {
          sayHello();
        }
        
    

    Putting a Swith in the circuit

    Source code of this examples here

    Installing Archlinux

    Modern linux distributions are really easy to install, every almost flavor has a guided process that helps an user setup a linux box with a series of steps that deploy almost automatically the system. That is really helpful but if don't have real control about whats is really happen. Archlinux is a linux distribution that have a different philosofy where the user has all the control of your system. This post is a simple resume about my personal experience installing an archlinux box following the Official documentacion.

    Making a bootable usb

        sudo dd bs=4M if=/home/manuel/Downloads/archlinux-2013.10.01-dual.iso of=/dev/sdx && sync
    

    Loading keyboard layout

        loadkeys us
        loadkeys es
    

    Setting up the hard drive

    Partitioning hard drive

    The complete instructions to make the partitions are here

        check hard disk size
        fdisk -l | grep Disk
    
        /boot 300MB
        SWAP 2GB
        /var 22GB
        / 24 GB
        /home 280GB
    
        gdisk /dev/sda
    
        ? for help
    
        o to create a new empty GUID partition table
        n to add new partition
        i show info
        w write disk partition table
    
        lsblk to check all partitions
    
        https://wiki.archlinux.org/index.php/File_Systems#Step_2:_create_the_new_file_system
    
        mkfs -t ext2 /dev/sda1
        mkfs -t ext4 /dev/sda2
        mkfs -t ext4 /dev/sda3
        mkfs -t ext4 /dev/sda4
        mkswap /dev/sda5
    

    Encrypting hard drive

    https://wiki.archlinux.org/index.php/Disk_Encryption https://wiki.archlinux.org/index.php/Dm-crypt_with_LUKS

    Secure wipe of the hard disk drive

    https://wiki.archlinux.org/index.php/Securely_wipe_disk

    dd if=/dev/zero of=/dev/sdX iflag=nocache oflag=direct bs=4096
    
    Encrypting and mapping a system partition

    https://wiki.archlinux.org/index.php/Dm-crypt_with_LUKS#Encrypting_a_system_partition

    cryptsetup --cipher aes-xts-plain64 --key-size 512 --hash sha512 --iter-time 5000 --use-random --verify-passphrase luksFormat /dev/sda2
    cryptsetup --cipher aes-xts-plain64 --key-size 512 --hash sha512 --iter-time 5000 --use-random --verify-passphrase luksFormat /dev/sda3
    cryptsetup --cipher aes-xts-plain64 --key-size 512 --hash sha512 --iter-time 5000 --use-random --verify-passphrase luksFormat /dev/sda4
    
    Open encrypted partitions:
    cryptsetup open --type luks /dev/sda2 root
    cryptsetup open --type luks /dev/sda3 var
    cryptsetup open --type luks /dev/sda4 home
    
    Format partitions:
    mkfs -t ext4 /dev/mapper/root
    mkfs -t ext4 /dev/mapper/var
    mkfs -t ext4 /dev/mapper/home
    
    Mounting Partitions
    mount -t ext4 /dev/mapper/root /mnt
    mount -t ext4 /dev/mapper/var /mnt
    mount -t ext4 /dev/mapper/home /mnt
    

    Note: /boot partition can not be encrypted

    Encrypting the Swap partition

    Edit/etc/crypttab

    swap         /dev/hdx4        /dev/urandom            swap,cipher=aes-cbc-essiv:sha256,size=256
    
    Mounting /boot Partition:
    mkdir /mnt/boot
    mount -t ext2 /dev/sda1 /mnt/boot
    

    Connect provisionally to the internet

    wifi-menu
    

    Install the base system

    Edit/etc/pacman.d/mirrorlist and configure your mirror, then run:

    pacstrap /mnt base
    

    Generate an fstab:

    genfstab -p /mnt >> /mnt/etc/fstab

    Install chroot

    arch-chroot /mnt

    Timezone configuration

    Symlink /etc/localtime to /usr/share/zoneinfo/Zone/SubZone. Replace Zone and Subzone to your liking.

    ln -s /usr/share/zoneinfo/Europe/Dublin /etc/localtime

    Uncomment the selected locale in /etc/locale.gen and generate it with locale-gen Set locale preferences in /etc/locale.conf https://wiki.archlinux.org/index.php/Locale#Setting_system-wide_locale

    Edit /etc/locale.conf LANG=EN_IE.UTF-8 LANGUAGE="en_IE:es_CO:en" LC_COLLATE="C"

    The last step is uptade the system locale-gen

    Configure initramfs

    Edit /etc/mkinitcpio.conf and uncomment the next line: HOOKS="base udev ... encrypt ... filesystems ..." Then execute mkinitcpio -p linux

    Set a root password with passwd

    Setting wireless connection

    Complete informatin at this link

    check the driver status with lspci -k Also check the output of ip link command to see if a wireless interface (e.g. wlan0, wlp2s1, ath0) ip link set wlan0 up

    Wireless setup

    Install iw tool
    pacman -S iw
    
    Get wireless device name
       $ iw dev
        phy#0
        Interface wlp5z1
    		ifindex 4
    		wdev 0x1
    		addr 6c:88:14:6e:d9:90
    		type managed
    		channel 6 (2437 MHz), width: 20 MHz, center1: 2437 MHz 
    
    Check device status
    $ iw dev  wlp5z1 link
    Not connected.
    
    Check what access points are available:
        iw dev wlp5z1 scan | less
    
    install wpa_supplicant:
    pacman -S wpa_supplicant
    
    Connection to wpa/wpa2 network

    Edit /etc/wpa_supplicant.conf according the options obtained at the scan process

    Detailed info can be viewed at https://wiki.archlinux.org/index.php/Wireless_Setup

    Install and configure Syslinux boot loader

        pacman -S syslinux
        pacman -S gptfdisk
        syslinux-install_update -i -a -m
    
        /boot/syslinux/syslinux.cfg
    
         PROMPT 1
         TIMEOUT 50
         DEFAULT arch
         
         LABEL arch
                 LINUX ../vmlinuz-linux
                 APPEND root=/dev/sda2 rw
                 INITRD ../initramfs-linux.img
         
         LABEL archfallback
                 LINUX ../vmlinuz-linux
                 APPEND root=/dev/sda2 rw
                 INITRD ../initramfs-linux-fallback.img
    
    

    Install video driver

    Check video driver chipset lspci | grep VGA and install the proper driver, in this example, an intel driver pacman -S xf86-video-intel

    Install network manager

    https://wiki.archlinux.org/index.php/NetworkManager pacman -S networkmanager

    Install Gnome and graphical login screen

    https://wiki.archlinux.org/index.php/GNOME

    pacman -S gnome gnome-extra network-manager-applet
    pacman -S gstreamer
    pacman -S --force gst-libav
    pacman -S vlc
    pacman -S flashplugin
    pacman -S ttf-dejavu
    

    Basic development Libraries and utils

    pacman -S linux-headers
    pacman -S gcc make automake
    

    Adding an user and setting sudo

    useradd -m -g users -s /bin/bash manuel
    pacman -S sudo
    

    Edit configuration file using visudo Add the user relevant information:

    manuel ALL=(ALL) ALL

    Introduction to Twisted Framework

    Twisted is an event-driven networking engine written in Python. A lot of information that covers a lot of different topics can be viewd at the project's website . This post shows a basic example based on the official documentation.

    Time UUID server example

    Let's suppose that we have a distribuited system, with more than one application server that needs to insert records on a database cluster that is shared for the entire system, the primary key of some tables in this database are Universally unique identifier UUID type this keys has to be unique so we decided develop a service that will return a serial time UUID that will be sequencially different for every request as is shown in the next figure:

    Writing The Server

    The Protocol

    The main goal is implement a network protocol parsing and handling for TCP servers. This protocol will be a subclass of twisted.internet.protocol.Protocol. An instance of the protocol class is instantiated per-connection, on demand, and will go away when the connection is finished. This means that persistent configuration is not saved in the Protocol.

        from twisted.internet import reactor
        from twisted.internet.protocol import Factory
        from twisted.protocols.basic import LineReceiver
    
        import datetime, time_uuid
    
        class TimeUUId(LineReceiver):
            def __init__(self):
                self.options = {'C': self.handle_GETUUID, 'X': self.handle_CLOSE}
    
            def connectionMade(self):
                message = """
                          Welcome to Time UUID generator :-)\r\n
                          press C to get a time UUID
                          press X to exit
                          """
                self.transport.write(message)
    
            def lineReceived(self, line):
                if line.upper() in self.options:
                    self.options[line.upper()]()
                else:
                    self.transport.write('Invalid option :-( \n')
    
            def handle_CLOSE(self):
                self.transport.write('good bye :-) \n')
                self.transport.loseConnection()
    
            def handle_GETUUID(self):
                uuid =time_uuid.TimeUUID.with_utc(datetime.datetime.utcnow())
                self.transport.write(str(uuid))
    
    

    The Factory

    A design goal must be the possibility to expose the service on multiple ports or network addresses. That's the reason why configuration will be defined using a Factory class that will be a child from twisted.internet.protocol.Factory. The buildProtocol method of the Factory is used to create a Protocol for each new connection.

        class TimeUUIdFactory(Factory):
    
            protocol = TimeUUId
    
            def __init__(self):
                pass
    
            def buildProtocol(self, addr):
                return TimeUUId()
    

    Starting the Server

        reactor.listenTCP(1320, TimeUUIdFactory())
        reactor.run()
    

    Testing the Server

        $ telnet 127.0.0.1 1320
        Trying 127.0.0.1...
        Connected to 127.0.0.1.
        Escape character is '^]'.
    
                          Welcome to Time UUID generator :-)
    
                          press C to get a time UUID
                          press X to exit
                          c
        ecaa6a3a-35ea-11e3-918a-002185c69ff8
        c
        ee078390-35ea-11e3-9cf9-002185c69ff8
        f
        Invalid option :-( 
        x
        good bye :-) 
        Connection closed by foreign host.
    

    Writing the client

    The protocol

    The class that implents the protocol parsing and hadling is the protocol class, which is a subclass of twisted.internet.protocol.Protocol

        from twisted.internet import reactor
        from twisted.internet.protocol import ClientFactory
        from twisted.protocols.basic import LineReceiver
    
        class TimeUUIdClient(LineReceiver):
    
            welcome = 'Welcome to Time UUID generator :-)'
    
            def send_option(self, option):
                self.sendLine(option.replace(' ', ''))
    
            def ask_option(self):
                print 'press C to get a time UUID or X to exit\n'
                option = raw_input()
                if option.upper() == 'C':
                    self.send_option(option.upper())
                elif option.upper() == 'X':
                    print 'good bye :-)\n'
                    self.transport.loseConnection()
                else:
                    print 'invalid option good bye :-(\n'
                    self.transport.loseConnection()
    
            def connectionMade(self):
                self.status = self.option_status['connected']
    
            def dataReceived(self, line):
                if self.welcome in  line :
                    self.ask_option()
                else:
                    print line
                    self.ask_option()
    

    The Factory

        class TimeUUIdClientFactory(ClientFactory):
            def startedConnecting(self, connector):
                print 'Started to connect.'
    
            def buildProtocol(self, addr):
                print 'Connected.'
                return TimeUUIdClient()
    
            def clientConnectionLost(self, connector, reason):
                reactor.stop()
    
            def clientConnectionFailed(self, connector, reason):
                reactor.stop()
    

    Running the client

        # create factory protocol and application
        factory = TimeUUIdClientFactory()
    
        # connect factory to this host and port
        reactor.connectTCP('localhost', 1320, factory)
    
        # run client
        reactor.run()
    

    Source code can be viewed at this link.

    Command Pattern, Java and Python Example

    Command Pattern, Java and Python Example

    From Wikipedia In object-oriented programming, the command pattern is a behavioral design pattern in which an object is used to represent and encapsulate all the information needed to call a method at a later time. This information includes the method name, the object that owns the method and values for the method parameters. This post shows an example implemented on Java and Python.

    Multilevel Undo Example

    Lest suppouse that we have an editor, an user can copy , cut and paste text inside a screen. If all user actions in this program are implemented as command objects, the program can keep a stack of the most recently executed commands. When the user wants to undo a command, the program simply pops the most recent command object and executes its undo() method.

    Java Implementation

    The Receiver

    Is the object that will be manipulated by a command, in this particular example the Receiver Object is a Screen Object.

        package co.ntweb;
    
        public class Screen {
            private StringBuffer text;
            private StringBuffer clipBoard;
    
            public Screen(String text){
                this.text = new StringBuffer(text);
                this.clipBoard = new StringBuffer();
            }
    
            public void clearClipBoard(){
                this.clipBoard.delete(0, this.clipBoard.length());
            }
    
            public void cut(int start, int end){
                clearClipBoard();
                this.clipBoard.append(this.text.substring(start, end));
                this.text.delete(start, end);
            }
    
            public void copy(int start, int end){
                clearClipBoard();
                this.clipBoard.append(this.text.substring(start, end));
            }
    
            public void paste(int offset){
                this.text.insert(offset, getClipBoard());
            }
    
            public String getClipBoard(){
                return this.clipBoard.toString();
            }
    
            public String toString(){
                return this.text.toString();
            }
    
            public void setText(String text){
                this.text = new StringBuffer(text);
            }
    
            public int length(){
                return this.text.length();
            }
        }
    
    Command Objects

    Command Object implents all the logic that the command needs, this include undo instructions and screen state before execute command.

    
        package co.ntweb;
    
        public interface Command {
            void execute();
            void undo();
        }
    
    
        package co.ntweb;
    
        import co.ntweb.Command;
        import co.ntweb.Screen;
    
        public class CopyCommand implements Command{
            private Screen screen;
            private String previous_status;
            private int start;
            private int end;
    
            public CopyCommand(Screen screen, int start, int end){
                this.screen = screen;
                this.previous_status = screen.toString();
                this.start = start;
                this.end = end;
            }
    
            public void execute(){
                this.screen.copy(this.start, this.end);
            }
    
            public void undo(){
                this.screen.clearClipBoard();
            }
        }
    
    
    
        package co.ntweb;
    
        import co.ntweb.Command;
        import co.ntweb.Screen;
    
        public class CutCommand implements Command{
            private Screen screen;
            private String previousStatus;
            private int start;
            private int end;
    
            public CutCommand(Screen screen, int start, int end){
                this.screen = screen;
                this.previousStatus = screen.toString();
                this.start = start;
                this.end = end;
            }
    
            public void execute(){
                this.screen.cut(this.start, this.end);
            }
    
            public void undo(){
                this.screen.clearClipBoard();
                this.screen.setText(this.previousStatus);
            }
        }
    
    
    
        package co.ntweb;
    
        import co.ntweb.Command;
        import co.ntweb.Screen;
    
        public class PasteCommand implements Command{
            private Screen screen;
            private String previousStatus;
            private int offset;
    
            public PasteCommand(Screen screen, int offset){
                this.screen = screen;
                this.previousStatus = screen.toString();
                this.offset = offset;
            }
    
            public void execute(){
                this.screen.paste(this.offset);
            }
    
            public void undo(){
                this.screen.clearClipBoard();
                this.screen.setText(this.previousStatus);
            }
        }
    
    
    
    Invoker Object

    The button, toolbar button, or menu item clicked, the shortcut key pressed by the user.

        package co.ntweb;
    
        import co.ntweb.Command;
    
        import java.util.LinkedList;
    
        public class ScreenInvoker{
            private LinkedList history = new LinkedList();
    
            public ScreenInvoker(){
            }
    
            public void storeAndExecute(Command cmd) {
              this.history.add(cmd);
              cmd.execute();
            }
    
            public void undoLast(){
                this.history.removeLast().undo();
            }
    
        }
    
    Pattern usage
            Screen editor = new Screen("hello world!!");
            ScreenInvoker client = new ScreenInvoker();
    
            CutCommand cutCommand = new CutCommand(editor, 5, 11);
            client.storeAndExecute(cutCommand);
            assertEquals("hello!!", editor.toString());
    
            PasteCommand pasteCommand = new PasteCommand(editor, 0);
            client.storeAndExecute(pasteCommand);
            assertEquals(editor.toString(), " worldhello!!");
    
            CopyCommand copyCommand = new CopyCommand(editor, 0, editor.length());
            client.storeAndExecute(copyCommand);
    
            PasteCommand pasteCommand2 = new PasteCommand(editor, 0);
            client.storeAndExecute(pasteCommand2);
            assertEquals(editor.toString(), " worldhello!! worldhello!!");
    
            //undo last paste
            client.undoLast();
            assertEquals(editor.toString(), " worldhello!!");
    
            //undo copy
            client.undoLast();
            assertEquals(editor.toString(), " worldhello!!");
    
            //undo paste
            client.undoLast();
            assertEquals("hello!!", editor.toString());
    
            //undo cut
            client.undoLast();
            assertEquals("hello world!!", editor.toString());
    

    Python Implementation

    The Receiver
        class Screen(object):
            def __init__(self, text=''):
                self._text = text
                self._clip_board = ''
    
            @property
            def text(self):
                return self._text
    
            @property
            def clipboard(self):
                return self._clip_board
    
            def copy(self, start=0, end=0):
                self._clip_board = self._text[start:end]
    
            def cut(self, start=0, end=0):
                self._clip_board = self._text[start:end]
                self._text = self._text[:start] + self._text[end:]
    
            def paste(self, offset=0):
                self._text = self._text[:offset] + self._clip_board + self._text[offset:]
    
            def clear_clipboard(self):
                self._clip_board = ''
    
            def length(self):
                return len(self._text)
    
            def __str__(self):
                return self._text
    
    The Commands
        class Command(object):
            _previous_status = ''
    
            def execute(self):
                pass
    
            def undo(self):
                pass
    
    
        class CopyCommand(Command):
            def __init__(self, screen, start=0, end=0):
                self._screen = screen
                self._start = start
                self._end = end
                self._previous_status = screen.text
    
            def execute(self):
                self._screen.copy(start=self._start, end=self._end)
    
            def undo(self):
                self._screen.clear_clipboard()
    
        class CutCommand(Command):
            def __init__(self, screen, start=0, end=0):
                self._screen = screen
                self._start = start
                self._end = end
                self._previous_status = screen.text
    
            def execute(self):
                self._screen.cut(start=self._start, end=self._end)
    
            def undo(self):
                self._screen.clear_clipboard()
                self._screen._text = self._previous_status
    
        class PasteCommand(Command):
            def __init__(self, screen, offset=0):
                self._screen = screen
                self._offset = offset
                self._previous_status = screen.text
    
            def execute(self):
                self._screen.paste(offset=self._offset)
    
            def undo(self):
                self._screen.clear_clipboard()
                self._screen._text = self._previous_status
    
    The Invoker
        class ScreenInvoker(object):
            def __init__(self):
                self._history = []
    
            def store_and_execute(self, command):
                command.execute()
                self._history.append(command)
    
            def undo_last(self):
                if self._history:
                    self._history.pop().undo()
    
    Pattern Usage
        screen = Screen('hello world!!')
        client = ScreenInvoker()
    
        cut_command = CutCommand(screen, start=5, end=11)
        client.store_and_execute(cut_command)
    
        paste_command = PasteCommand(screen, offset=0)
        client.store_and_execute(paste_command)
    
        copy_command = CopyCommand(screen, start=0, end=screen.length())
        client.store_and_execute(copy_command)
    
        paste_command2 = PasteCommand(screen, offset=0)
        client.store_and_execute(paste_command2)
    
        #undo last paste
        client.undo_last()
    
        #undo copy
        client.undo_last()
        self.assertEquals(' worldhello!!', screen.text)
    
        #undo paste
        client.undo_last()
        self.assertEquals('hello!!', screen.text)
    
        #undo cut
        client.undo_last()
        self.assertEquals('hello world!!', screen.text)
    

    Source code can be checked here

    Observer Pattern, Java and Python Example

    From Wikipedia: The observer pattern is a software design pattern in which an object, called the subject, maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods. It is mainly used to implement distributed event handling systems. This post shows an example implemented on Java and Python.

    Notification System Example

    Let's say that we have a social network, every time that an user makes a change password request, an email and a SMS messages should be send to the user.

    Java Implementation

    The Subject:

        package co.ntweb.maigfrga.observer;
    
        import java.util.Observable;
    
        /* Subject implementation Observer pattern*/
    
        public class NotificationSubject extends Observable
        {
            public void sendMessage(String message){
                System.out.println( "Change status and notify observers" );
                setChanged();
                notifyObservers(message);
            }
    
        }
    

    Email observer:

        package co.ntweb.maigfrga.observer;
    
        import java.util.Observable;
        import java.util.Observer;
    
        public class EmailObserver implements Observer {
            private String message;
    
            public void update(Observable obj, Object arg) {
                if (arg instanceof String) {
                    message = (String) arg;
                    System.out.println("\nSend message by  Email: " + message );
                }
            }
        }
    

    SMS Observer

        package co.ntweb.maigfrga.observer;
    
        import java.util.Observable;
        import java.util.Observer;
    
        public class SMSObserver implements Observer {
            private String message;
    
            public void update(Observable obj, Object arg) {
                if (arg instanceof String) {
                    message = (String) arg;
                    System.out.println("\nSend message by  SMS: " + message );
                }
            }
        }
    

    Registering the observers

        NotificationSubject subject = new NotificationSubject();
        EmailObserver email_observer = new EmailObserver();
        SMSObserver sms_observer = new SMSObserver();
    
        subject.addObserver(email_observer);
        subject.addObserver(sms_observer);
        subject.sendMessage("Hello world :-)");
    

    Python Implementation

    The Subject:

        class Subject(object):
        """Observer pattern http://en.wikipedia.org/wiki/Observer_pattern
        """
        def __init__(self, *args, **kwargs):
            pass
    
        def register(self, observer):
            pass
    
        def unregister(self, observer):
            pass
    
        def notify_all(self, *args, **kwargs):
            pass
    
    class  NotificationSubject(Subject):
        def __init__(self, *args, **kwargs):
            self._observers = []
    
        def register(self, observer):
            self._observers.add(observer)
    
        def unregister(self, observer):
            self._observers.remove(observer)
    
        def notify_all(self, message):
            for observer in self._observers:
                observer.notify(message)
    
        def send_message(self, message):
            print 'notification subject'
            self.notify_all(message)
    

    The Observers

        class Observer(object):
            def __init__(self, *args, **kwargs):
                pass
    
            def notify(self, *args, **kwargs):
                pass
    
    
        class EmailObserver(Observer):
            def notify(self, message):
                print '\n Send Email message'
    
        class SMSObserver(Observer):
            def notify(self, message):
                print '\n Send SMS message'
    

    Registering the observers

        subject = NotificationSubject()
        email_observer = EmailObserver()
        subject.register(email_observer)
        sms_observer = SMSObserver()
        subject.register(sms_observer)
        subject.send_message('Hello world :-)')
    

    Source code can be found at this link

    Counting, Indexing and Ordering in Cassandra

    Traditional databases are very good performing common tasks like ordering, filtering, counting. These tasks are not always easy on Cassandra because different records of the same table are distribuited across different nodes and the performance could be innaceptable. This post try to show some examples about when is possible perform these tasks.

    Counting

    Let's create a Product Table and after a couple inserts let's count how many records are stored.

        CREATE TABLE product  (
          id uuid PRIMARY KEY,
          category text,
          bar_code text,
          name text,
          price double,
         );
    
        CREATE TABLE purchase_order  (
          id uuid ,
          product_id uuid,
          product_name text,
          quantity int,
          PRIMARY KEY(id, product_id)
        );
    
        INSERT INTO product(id, category, bar_code, name, price) 
            VALUES(bdf3e070-4f75-11e1-a5c5-00215a17aed0, 'mobile', '4F5F4', 'TABLET ORANGE', 300);
    
        INSERT INTO product(id, category, bar_code, name, price) 
            VALUES(d86cb040-0e02-11e0-9677-361934cac9ba, 'mobile', '4F5F4', 'PHONE BANANA', 100);
    
        INSERT INTO product(id, category, bar_code, name, price) 
            VALUES(4066b980-2f09-11e0-8551-361934cac9ba, 'tv', '4F5F4', 'SMART TV', 100);
    
        cqlsh:store> SELECT COUNT(*) FROM product;
    
         count
        -------
             3
    

    Now let's see whats happens after 20000 inserts:

        cqlsh:store> SELECT COUNT(*) FROM product;
    
         count
        -------
         10000
    
        Default LIMIT of 10000 was used. Specify your own LIMIT clause to get more results.
    
    
        cqlsh:store> SELECT COUNT(*) FROM product limit 5000;
    
         count
        -------
         5000
    
    
        cqlsh:store> SELECT COUNT(*) FROM product limit 80000;
    
         count
        -------
         20000
    

    Cassandra counts record by record, across the nodes, applying a default limit, so the result could not be real, if you set a limit not big enough, the count could be wrong, now let's try to count after 6 millions of inserts:

        cqlsh:store> SELECT COUNT(*) FROM product limit 6000000;
        Request did not complete within rpc_timeout.
    

    Counter Table

    It's a special kind of table, that contains a Primary Key that can be of any Cassandra DataType and one or more Columns of type counter

        CREATE TABLE counter_store(
            object text PRIMARY KEY,
            count counter,
        );
    

    Inserting Data and Incrementing Counter

    Only UPDATE opetarions are allowed on Counter Tables, the example show how to Increment a Counter Table:

    
        UPDATE counter_store set count= count + 1 where object='product';
    
        cqlsh:store> select * from counter_store ;
    
         object  | count
        ---------+-------
         product |     1
    
        UPDATE counter_store set count = count + 3 where object='product';
    
        cqlsh:store> select * from counter_store ;
    
         object  | count
        ---------+-------
         product |     4
    
    
        UPDATE counter_store set count= count + 1 where object='purchase_order';
    
    
        cqlsh:store> select * from counter_store ;
    
         object         | count
        ----------------+-------
                product |     4
         purchase_order |     1
    

    Indexing

    Let's try to get all products for the mobile category:

        cqlsh:store> SELECT * FROM product WHERE category='mobile';
        Bad Request: No indexed columns present in by-columns clause with Equal operator
    

    This means that is no possible by default, perform a query by a Non Primary Key, this can be solved creating a secundary Index over the fields to be filtered. Every secundary index creates a hidden table to handle the index in order to improve the performarmance:

        cqlsh:store> CREATE INDEX product_category_idx ON product(category);
    
        cqlsh:store> SELECT * FROM product WHERE category='mobile';
    
         id                                   | bar_code | category | name          | price
        --------------------------------------+----------+----------+---------------+-------
         d86cb040-0e02-11e0-9677-361934cac9ba |    4F5F4 |   mobile |  PHONE BANANA |   100
         bdf3e070-4f75-11e1-a5c5-00215a17aed0 |    4F5F4 |   mobile | TABLET ORANGE |   300
    

    When is ok to use Cassandra Index

    Indexes are good to query data from tables that have a lots of records that can potencialy match the query criteria, ie. If there are too many products with a certain category. But if there is a lot of categories with too few products, the index will not have effect.

    Ordering

    Let's try to order products by price:

        cqlsh:store> SELECT * FROM product order by price;
        Bad Request: ORDER BY is only supported when the partition key is restricted by an EQ or an IN.
    

    This error message means that is only possible make an order by when we restrict the query criteria using a WHERE OR AN IN CLAUSE , this has not sense on product table, becasue the id is unique and exists one product by id. Maybe is better have this table inside a relational Database? Let's create some purchase orders and see if we can order this table:

    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(5e15135e-4b72-11e1-b84d-00215a17aed0, bdf3e070-4f75-11e1-a5c5-00215a17aed0, 'TABLET ORANGE', 3);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(5e15135e-4b72-11e1-b84d-00215a17aed0, d86cb040-0e02-11e0-9677-361934cac9ba, 'PHONE BANANA', 8);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity) 
            VALUES(5e15135e-4b72-11e1-b84d-00215a17aed0, 4066b980-2f09-11e0-8551-361934cac9ba, 'SMART TV', 1);
    
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(c0a874f0-392a-11e0-a0df-361934cac9ba, d86cb040-0e02-11e0-9677-361934cac9ba, 'PHONE BANANA', 2);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity) 
            VALUES(c0a874f0-392a-11e0-a0df-361934cac9ba, 4066b980-2f09-11e0-8551-361934cac9ba, 'SMART TV', 3);
    
    
        cqlsh:store> select * from purchase_order ;
    
         id                                   | product_id                           | product_name  | quantity
        --------------------------------------+--------------------------------------+---------------+----------
         5e15135e-4b72-11e1-b84d-00215a17aed0 | d86cb040-0e02-11e0-9677-361934cac9ba |  PHONE BANANA |        8
         5e15135e-4b72-11e1-b84d-00215a17aed0 | 4066b980-2f09-11e0-8551-361934cac9ba |      SMART TV |        1
         5e15135e-4b72-11e1-b84d-00215a17aed0 | bdf3e070-4f75-11e1-a5c5-00215a17aed0 | TABLET ORANGE |        3
         c0a874f0-392a-11e0-a0df-361934cac9ba | d86cb040-0e02-11e0-9677-361934cac9ba |  PHONE BANANA |        2
         c0a874f0-392a-11e0-a0df-361934cac9ba | 4066b980-2f09-11e0-8551-361934cac9ba |      SMART TV |        3
    
    
        cqlsh:store> select * from purchase_order where id=c0a874f0-392a-11e0-a0df-361934cac9ba;
    
         id                                   | product_id                           | product_name | quantity
        --------------------------------------+--------------------------------------+--------------+----------
         c0a874f0-392a-11e0-a0df-361934cac9ba | d86cb040-0e02-11e0-9677-361934cac9ba | PHONE BANANA |        2
         c0a874f0-392a-11e0-a0df-361934cac9ba | 4066b980-2f09-11e0-8551-361934cac9ba |     SMART TV |        3
    
    
    

    Now let's try order purchase orders by quantity

        cqlsh:store> select * from purchase_order where id=c0a874f0-392a-11e0-a0df-361934cac9ba order by quantity;
        Bad Request: Order by is currently only supported on the clustered columns of the PRIMARY KEY, got quantity
    

    Is only possible order records applying a WHERE CLAUSE first and the order criteria Column is part of a compound Primary Key

        DROP TABLE purchase_order;
    
        CREATE TABLE purchase_order  (
          id uuid ,
          product_id uuid,
          product_name text,
          quantity int,
          PRIMARY KEY(id, quantity, product_id)
        );
    
    
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(5e15135e-4b72-11e1-b84d-00215a17aed0, bdf3e070-4f75-11e1-a5c5-00215a17aed0, 'TABLET ORANGE', 3);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(5e15135e-4b72-11e1-b84d-00215a17aed0, d86cb040-0e02-11e0-9677-361934cac9ba, 'PHONE BANANA', 8);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity) 
            VALUES(5e15135e-4b72-11e1-b84d-00215a17aed0, 4066b980-2f09-11e0-8551-361934cac9ba, 'SMART TV', 1);
    
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(c0a874f0-392a-11e0-a0df-361934cac9ba, d86cb040-0e02-11e0-9677-361934cac9ba, 'PHONE BANANA', 2);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity) 
            VALUES(c0a874f0-392a-11e0-a0df-361934cac9ba, 4066b980-2f09-11e0-8551-361934cac9ba, 'SMART TV', 3);
    
    
    
        select * from purchase_order where id=c0a874f0-392a-11e0-a0df-361934cac9ba order by quantity;
    
        cqlsh:store> select * from purchase_order where id=c0a874f0-392a-11e0-a0df-361934cac9ba order by quantity;
    
         id                                   | quantity | product_id                           | product_name
        --------------------------------------+----------+--------------------------------------+--------------
         c0a874f0-392a-11e0-a0df-361934cac9ba |        2 | d86cb040-0e02-11e0-9677-361934cac9ba | PHONE BANANA
         c0a874f0-392a-11e0-a0df-361934cac9ba |        3 | 4066b980-2f09-11e0-8551-361934cac9ba |     SMART TV
    
    

    Some personal conclusions

    • Some common task like Ordering. Filtering are so easy for Relational Databases but difficut or even impossible on Cassandra.
    • Think very carefully about how the data will be retreived instead of how data will be inserted when design a CassandraDB data model.
    • Test your models, insert millions of records on a test enviroment, make sure that you can retreive the data.
    • Make a column counter for every table that you need to know how many records you have.
    • Realize that a count operation is slow and could be wrong. So desing your model to be inserted ordered.

    CassandraDB Data Model Basis

    Creating a data model in Cassandra is similar to traditional Databases, you can create tables, define schemas and query tables, but because the distruited nature of Cassandra, concept as foreing keys doesn't exist, and simple task like a join are not possible, for this reason denormalization is a key factor when you desing a Cassandra data model. Before star you will need install CassandraDB, at this post you can find how to do it

    Primary Keys

    As any traditional Database, a CassandraDb Primary Key helps to index data in order to query it quickly, but keep in mind that CassandraDb is a distribuitedDB, so your data will be randomly distribuited across the nodes. Good datatypes candidates as primary key are UUID and TIMEUUID data types. As the dabase model designer is your responsability choose a primaray key that ensures a correct distribution of records accross the nodes, because Cassandra Won't do it for you. If you choose a Primary as an Secuencial Integer similar to SQL Secuence, is very possible that almost all the records will be stored at one node, this will cause an overload of this node, and an incorrect Load Balancing process.

    A Product/Order basic model

        CREATE TABLE product  (
          id uuid PRIMARY KEY,
          bar_code text,
          name text,
          price double
         );
    
    
        CREATE TABLE purchase_order  (
          id uuid PRIMARY KEY,
          product_id uuid,
          product_name text,
          quantity int
        );
    

    Let's try to insert some products:

        INSERT INTO product(id, bar_code, name, price) VALUES(bdf3e070-4f75-11e1-a5c5-00215a17aed0,
                                                              '4F5F4', 'TABLET', 300);
    
        INSERT INTO product(id, bar_code, name, price) VALUES(0ba10600-6712-11e1-b653-00215a17aed0,
                                                             '4Fu84', 'TV', 800);
    
        INSERT INTO product(id, bar_code, name, price) VALUES(14bc9100-6712-11e1-990c-00215a17aed0,
                                                             '4R5F', 'PC', 400);
    
        INSERT INTO product(id, bar_code, name, price) VALUES(1a16bc50-6714-11e1-8366-00215a17aed0,
                                                             '8D4F', 'WATCH', 250);
    

    Now we can create an order:

        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(35643ec0-1690-11e2-947c-00215a17aed0, bdf3e070-4f75-11e1-a5c5-00215a17aed0, 'TABLET', 3);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity) 
            VALUES(35643ec0-1690-11e2-947c-00215a17aed0, 0ba10600-6712-11e1-b653-00215a17aed0, 'TV', 8);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(35643ec0-1690-11e2-947c-00215a17aed0, 14bc9100-6712-11e1-990c-00215a17aed0, 'PC', 4);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(35643ec0-1690-11e2-947c-00215a17aed0, 1a16bc50-6714-11e1-8366-00215a17aed0, 'WATCH', 2);
    

    After 4 inserts we expect see 4 records at purchase_order table, one by every product, actually this assumption is wrong as you will see:

        cqlsh:store> select * from purchase_order ;
    
         id                                   | product_id                           | product_name | quantity
        --------------------------------------+--------------------------------------+--------------+----------
         35643ec0-1690-11e2-947c-00215a17aed0 | 1a16bc50-6714-11e1-8366-00215a17aed0 |        WATCH |        2
    

    Now let's try to query all purchase orders that have some product:

        cqlsh:store> select * from purchase_order where product_id=08d181c0-dc83-11e1-9c22-00215a17aed0;
        Bad Request: No indexed columns present in by-columns clause with Equal operator
    

    This suggests that the purchase_order design is wrong, if we need to store differents products on a same purchase order we need to create a table with a COMPOUND PRIMARY KEY:

    COMPOUND PRIMARY KEY

    Similar to traditional Databases, a compound primary key is an index compounding for two or more values, the first value is known as Partion key and define where inside the cluster the record will be stored, The remaining column or columns of the PRIMARY KEY, are clustered columns, an CassadraDb uses this column - columns to create an ordered index this grants a very efficient retrieval of rows, let's drop purchase_order table, and create it again with the compound primary key:

        DROP TABLE purchase_order;
    
        CREATE TABLE purchase_order  (
          id uuid,
          product_id uuid,
          product_name text,
          quantity int,
          PRIMARY KEY(id, product_id)
        );
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(35643ec0-1690-11e2-947c-00215a17aed0, bdf3e070-4f75-11e1-a5c5-00215a17aed0, 'TABLET', 3);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity) 
            VALUES(35643ec0-1690-11e2-947c-00215a17aed0, 0ba10600-6712-11e1-b653-00215a17aed0, 'TV', 8);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(35643ec0-1690-11e2-947c-00215a17aed0, 14bc9100-6712-11e1-990c-00215a17aed0, 'PC', 4);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(35643ec0-1690-11e2-947c-00215a17aed0, 1a16bc50-6714-11e1-8366-00215a17aed0, 'WATCH', 2);
    

    Now we can query purchase_order table again and see what happends:

        cqlsh:store> select * from purchase_order;
    
         id                                   | product_id                           | product_name | quantity
        --------------------------------------+--------------------------------------+--------------+----------
         35643ec0-1690-11e2-947c-00215a17aed0 | bdf3e070-4f75-11e1-a5c5-00215a17aed0 |       TABLET |        3
         35643ec0-1690-11e2-947c-00215a17aed0 | 0ba10600-6712-11e1-b653-00215a17aed0 |           TV |        8
         35643ec0-1690-11e2-947c-00215a17aed0 | 14bc9100-6712-11e1-990c-00215a17aed0 |           PC |        4
         35643ec0-1690-11e2-947c-00215a17aed0 | 1a16bc50-6714-11e1-8366-00215a17aed0 |        WATCH |        2
    

    Let's create another purchase order:

        INSERT INTO purchase_order(id, product_id, product_name, quantity)
            VALUES(5e15135e-4b72-11e1-b84d-00215a17aed0, 1a16bc50-6714-11e1-8366-00215a17aed0, 'WATCH', 8);
    
        INSERT INTO purchase_order(id, product_id, product_name, quantity) 
            VALUES(5e15135e-4b72-11e1-b84d-00215a17aed0, 0ba10600-6712-11e1-b653-00215a17aed0, 'TV', 1);
    
    
        cqlsh:store> select * from purchase_order;
    
         id                                   | product_id                           | product_name | quantity
        --------------------------------------+--------------------------------------+--------------+----------
         35643ec0-1690-11e2-947c-00215a17aed0 | bdf3e070-4f75-11e1-a5c5-00215a17aed0 |       TABLET |        3
         35643ec0-1690-11e2-947c-00215a17aed0 | 0ba10600-6712-11e1-b653-00215a17aed0 |           TV |        8
         35643ec0-1690-11e2-947c-00215a17aed0 | 14bc9100-6712-11e1-990c-00215a17aed0 |           PC |        4
         35643ec0-1690-11e2-947c-00215a17aed0 | 1a16bc50-6714-11e1-8366-00215a17aed0 |        WATCH |        2
         5e15135e-4b72-11e1-b84d-00215a17aed0 | 0ba10600-6712-11e1-b653-00215a17aed0 |           TV |        1
         5e15135e-4b72-11e1-b84d-00215a17aed0 | 1a16bc50-6714-11e1-8366-00215a17aed0 |        WATCH |        8
    

    Let's filter by order id:

        cqlsh:store> SELECT * from purchase_order where id=5e15135e-4b72-11e1-b84d-00215a17aed0;
    
         id                                   | product_id                           | product_name | quantity
        --------------------------------------+--------------------------------------+--------------+----------
         5e15135e-4b72-11e1-b84d-00215a17aed0 | 0ba10600-6712-11e1-b653-00215a17aed0 |           TV |        1
         5e15135e-4b72-11e1-b84d-00215a17aed0 | 1a16bc50-6714-11e1-8366-00215a17aed0 |        WATCH |        8
    

    Indexing

    Let's filter by order product_id:

        cqlsh:store> SELECT * FROM purchase_order WHERE product_id=0ba10600-6712-11e1-b653-00215a17aed0;
    
        Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
    

    In the query above, we tried to get purchase_order by product_id, but the PARTITION KEY is the order_id, so we can have several orders with one or more products that match the WHERE clause distribuited across the nodes, the operation required to get this information requeries read all prochase_order across all the nodes, check if match with the criteria, merge the results, CassandraDb can not handle this because the performance would be inaceptable. A solution that can be implemented in this case is create another table that store information by purchase_order_id , product_id and create two indexes:

        CREATE TABLE product_purchase_order  (
          id uuid PRIMARY KEY,
          purchase_id uuid,
          product_id uuid,
          product_name text,
          quantity int
        );
    
        CREATE INDEX ON product_purchase_order(purchase_id);
        CREATE INDEX ON product_purchase_order(product_id);
    
    
        INSERT INTO product_purchase_order(id, purchase_id, product_id, product_name, quantity)
            VALUES(f5b87f80-913c-11e1-8995-00215a17aed0 ,35643ec0-1690-11e2-947c-00215a17aed0, 
                   bdf3e070-4f75-11e1-a5c5-00215a17aed0, 'TABLET', 3);
    
        INSERT INTO product_purchase_order(id, purchase_id, product_id, product_name, quantity)
            VALUES(54e793f0-a11a-11e1-bba9-00215a17aed0, 35643ec0-1690-11e2-947c-00215a17aed0,
                   0ba10600-6712-11e1-b653-00215a17aed0, 'TV', 8);
    
        INSERT INTO product_purchase_order(id, purchase_id, product_id, product_name, quantity)
            VALUES(9ba4eda2-38a6-11e2-b72b-00215a17aed0, 35643ec0-1690-11e2-947c-00215a17aed0, 
                   14bc9100-6712-11e1-990c-00215a17aed0, 'PC', 4);
    
        INSERT INTO product_purchase_order(id, purchase_id, product_id, product_name, quantity)
            VALUES(4b01bf90-7853-11e1-85e9-00215a17aed0, 35643ec0-1690-11e2-947c-00215a17aed0, 
                   1a16bc50-6714-11e1-8366-00215a17aed0, 'WATCH', 2);
    
        INSERT INTO product_purchase_order(id, purchase_id, product_id, product_name, quantity)
            VALUES(d33dc000-1d1e-11e2-943c-00215a17aed0, 5e15135e-4b72-11e1-b84d-00215a17aed0,
                   1a16bc50-6714-11e1-8366-00215a17aed0, 'WATCH', 8);
    
        INSERT INTO product_purchase_order(id, purchase_id, product_id, product_name, quantity) 
            VALUES(862fdbd0-26ac-11e2-95ee-00215a17aed0, 5e15135e-4b72-11e1-b84d-00215a17aed0, 0ba10600-6712-11e1-b653-00215a17aed0, 'TV', 1);
    
        
    

    Now we can filter all the orders that have some product

        cqlsh:store> SELECT *  FROM product_purchase_order WHERE product_id=0ba10600-6712-11e1-b653-00215a17aed0;
    
         id                                   | product_id                           | product_name | purchase_id                          | quantity
        --------------------------------------+--------------------------------------+--------------+--------------------------------------+----------
         54e793f0-a11a-11e1-bba9-00215a17aed0 | 0ba10600-6712-11e1-b653-00215a17aed0 |           TV | 35643ec0-1690-11e2-947c-00215a17aed0 |        8
         862fdbd0-26ac-11e2-95ee-00215a17aed0 | 0ba10600-6712-11e1-b653-00215a17aed0 |           TV | 5e15135e-4b72-11e1-b84d-00215a17aed0 |        1
    
    

    Behind the scenes, every Index is a new hidden table.

    Some personal conclusions

    • The information contained at this post is not enough, CassandraDB is too Big and Complex.
    • Is easy insert data but difficult query data on CassandraDb.
    • Think very carefully about the queries that you will need when you are desinging your database.
    • Because joins and relational operations are not allowed on Cassandra, don't be afraid to duplicate your data accoss models
    • If choose a primary/partition key datataype different from uuid or timeuuid, make sure that the distribution of your data across nodes will be uniform before put your code on production

    Resources

    Installing CassandraDB on ubuntu

    CassandraDB is a powerfull Java based NoSQL Database, i'ts open source so we can download and installed, by default there are not cassandra packages at Unbuntu repositories, this post shows how to install Cassandra on your Ubuntu box.

    Installed the jdk

    Oracle java1.6 is required in order to cassandra work properly, you will have to install it by hand because Ubuntu default jdk is openjdk, more information about installed oracle jdk can be viewd at this post.

    Adding and configuring the debian repository

    1. Edit /etc/apt/sources.list and add the cassandra repository deb http://debian.datastax.com/community stable main .
    2. Add datasax repository key to your trusted repo keys $ curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -
    3. Update repo and install CassandraDB and drivers sudo apt-get update && sudo apt-get install python-cql=1.0.10-1 dsc=1.0.10 cassandra=1.0.10
    4. Cassandra will start by default after installition, to stop it sudo service cassandra stop

    Upgrading to Cassandra last version

    $sudo apt-get upgrade cassandra
    

    Testing the connection

    We can connect to cassandra usign cqlsh command:
        $ cqlsh 
        Connected to Test Cluster at localhost:9160.
        [cqlsh 2.0.0 | Cassandra 1.0.10 | CQL spec 2.0.0 | Thrift protocol 19.20.0]
        Use HELP for help.
        cqlsh> help
    
        Documented commands (type help ):
        ========================================
        ASSUME  DESC  DESCRIBE  EXIT  HELP  SELECT  SHOW  USE
    
        Miscellaneous help topics:
        ==========================
        DROP_INDEX                 CREATE                       DELETE_WHERE       
        ALTER_DROP                 DROP_KEYSPACE                UPDATE_USING       
        SELECT_EXPR                ALTER_ALTER                  UPDATE_WHERE       
        UUID_INPUT                 TYPES                        TIMESTAMP_OUTPUT   
        DELETE_COLUMNS             SELECT_COLUMNFAMILY          CONSISTENCYLEVEL   
        ALTER_ADD                  CREATE_COLUMNFAMILY_OPTIONS  CREATE_INDEX       
        ALTER_WITH                 BEGIN                        CREATE_KEYSPACE    
        APPLY                      UPDATE_SET                   ASCII_OUTPUT       
        DELETE_USING               UPDATE_COUNTERS              DROP               
        CREATE_COLUMNFAMILY_TYPES  TRUNCATE                     TIMESTAMP_INPUT    
        DROP_COLUMNFAMILY          INSERT                       ALTER              
        BLOB_INPUT                 TEXT_OUTPUT                  CREATE_COLUMNFAMILY
        SELECT_WHERE               UPDATE                       DELETE             
        BOOLEAN_INPUT              SELECT_LIMIT               
    
        cqlsh> SHOW 
        ASSUMPTIONS  HOST         VERSION      
        cqlsh> SHOW HOST 
        Connected to Test Cluster at localhost:9160.
        cqlsh> 
    

    troubleshooting

    1. First time that i attempted start cassandra, did not work, the log messages said "the stack size specified is too small, Specify at least 160k" so i edited /etc/cassandra/cassandra-env.sh search the line JVM_OPTS and change the value for something over 160k, at my case i changed to 256 JVM_OPTS="$JVM_OPTS -Xss256k"

    Installing oracle jdk on ubuntu

    Ubuntu boxes have by default open-jdk wich is a great tool, some tools requires oracle jdk version in order to run properly, ie CassandraDB this post shows how to install oracle jdk on ubuntu systems

    Checking the current java version installed

    $ java --version
    java version "1.5.0"
    gij (GNU libgcj) version 4.6.3
    Copyright (C) 2007 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    

    Download and Installing oracle jdk

    1. Create a directory on your local filesystem where the jdk will live $sudo mkdir -p /opt/oracle/jdk
    2. Visit Oracle website choose jdk that suit your needs and dowload it download the file with the .bin extension
    3. Change permission and execute downloaded file under /opt/oracle/jdk directory
    4. Make symbolyc links between oracle jdk executables and /etc/alternatives directory. $sudo ln -s /opt/oracle/jdk/jdk1.6.0_45/bin/java /etc/alternatives/javaoracle $sudo ln -s /opt/oracle/jdk/jdk1.6.0_45/bin/javac /etc/alternatives/javacoracle
    Every time that you want want to use oracle jdk you can define an alias alias java=/etc/alternatives/javaoracle Now you can check again your java version:
     $java -version
     java version "1.6.0_45"
     Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
     Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
    
    

    Creating a RESTful API With Flask

    Flask http://flask.pocoo.org/ is a python microframework that allows create simple web application quickly and simply. It has some advantages over django, one of these if the RESTFUL support out of the box. This post shows a very simple API implementation using flask.

    The first step is created a base class will extend the MethodView . More info about flask class based views here http://flask.pocoo.org/docs/views/#method-based-dispatching

    The code

        from flask import abort, jsonify
        from flask.views import MethodView
    
    
        class APIView(MethodView):
            """ Generic API, all the API views will inherit for this base class
                every dispatch method will return an invalid request message, every
                child class must implements this methods properly
                More info about flask class based views here
                http://flask.pocoo.org/docs/views/#method-based-dispatching.
            """
    
            ENDPOINT = '/mymusic/api/v1.0'
    
            def get(self):
                abort(400)
    
            def post(self):
                abort(400)
    
            def put(self):
                abort(400)
    
            def delete(self):
                abort(400)
    
            def json_response(self, data={}):
                return jsonify(data)
    

    The param that define the endpoint for the api url is defined at APIView.ENDPONT constant, you can change for whatever you want. After that you create a very simple Resource called UserResource:

        class UserResource(object):
        def __init__(self, data={}, model=None, **kwargs):
            self._dict_data = data
            self._is_valid = True
    
        def is_valid(self):
            return self._is_valid
    
        def to_serializable_dict(self):
            return self._dict_data
    
        def add(self):
            for key, value in self._dict_data.items():
                setattr(self, key, value)
    
        def update(self):
            for key, value in self._dict_data.items():
                setattr(self, key, value)
    
        def delete(self):
            self._dict_data = {}
    
        @classmethod
        def get(cls, user_id):
            dict_data = {'id': user_id, 'name': 'user {0}'.format(user_id)}
            return UserResource(data=dict_data)
    
        @classmethod
        def get_list(cls, *args, **kwargs):
            user_list = []
            for x in range(4):
                dict_data = {'id': x, 'name': 'user {0}'.format(x)}
                resource = UserResource(data=dict_data)
                user_list.append(resource.to_serializable_dict())
    
            return user_list
    

    The next step is create a concrete implementation from APIView we will call it Userview:

        class UserAPIView(APIView):
            def get(self, user_id):
                """
                If GET request have the id: attribute, this view will search and return
                an user with this id, If id: atrribute is no setted will return list of
                users
                """
                if user_id:
                    user_resource = UserResource.get(user_id)
                    data_dict = {'user': user_resource.to_serializable_dict()}
                else:
                    data_dict = {'users': UserResource.get_list()}
                return self.json_response(data=data_dict)
    
    
            def post(self):
                """
                user Creation is performed by POST METHOD
                """
                response = {}
                user_resource = UserResource(request.json)
                if user_resource.is_valid():
                    try:
                        user_resource.add()
                        response['user'] = user_resource.to_serializable_dict()
                    except Exception as error:
                        pass
                return self.json_response(data=response)
    
            def put(self, user_id):
                """
                User modification using PUT method
                """
                response = {}
                user_resource = UserResource(request.json)
                if user_resource.is_valid():
                    try:
                        user_resource.update()
                        response['user'] = user_resource.to_serializable_dict()
                    except Exception as error:
                        pass
                return self.json_response(data=response)
    
            def delete(self, user_id):
                response = {}
                user_resource = UserResource(request.json)
                try:
                    user_resource.delete()
                    response['ok'] = 'record deleted'
                except Exception as error:
                    pass
                return self.json_response(data=response)
    

    Configuring the Routes

        app = Flask(__name__)
        user_view = UserAPIView.as_view('user_api')
        app.add_url_rule('{0}/users/'.format(UserAPIView.ENDPOINT), defaults={'user_id': None},
                          view_func=user_view, methods=['GET',])
    
        app.add_url_rule('{0}/users/'.format(UserAPIView.ENDPOINT), view_func=user_view, methods=['POST',])
    
        app.add_url_rule('{0}/users/'.format(UserAPIView.ENDPOINT), view_func=user_view,
                         methods=['GET', 'PUT', 'DELETE'])
    
        def run_app():
            app.run()
    
        if __name__ == "__main__":
            run_app()
    

    Testing the API

    First we start the app:
        python flaskapi.py
    

    Getting user list

    curl localhost:5000/mymusic/api/v1.0/users/ will return:

        {
          "users": [
            {
              "id": 0, 
              "name": "user 0"
            }, 
            {
              "id": 1, 
              "name": "user 1"
            }, 
            {
              "id": 2, 
              "name": "user 2"
            }, 
            {
              "id": 3, 
              "name": "user 3"
            }
          ]
        }
    

    Getting an user

    curl localhost:5000/mymusic/api/v1.0/users/1 will return:

        {
          "user": {
            "id": 1,
            "name": "user 1"
          }
        }
    

    Creating an user

     curl -X POST localhost:5000/mymusic/api/v1.0/users/ -H "Content-Type: application/json" \
         -d '{"id": 500, "name": "maigfrga"}'
    

    will return:

        {
          "user": {
            "id": 500,
            "name": "maigfrga"
          }
        }
    

    Updating an user

     curl -X PUT localhost:5000/mymusic/api/v1.0/users/500 -H "Content-Type: application/json" \
         -d '{"id": 500, "name": "maigfrga modified"}'
    

    will return:

        {
          "user": {
            "id": 500,
            "name": "maigfrga modified"
          }
        }
    

    Deleting an user

     curl -X DELETE localhost:5000/mymusic/api/v1.0/users/500 -H "Content-Type: application/json" -d '{}'
    

    will return:

        {
          "ok": "record deleted"
        }