AWS CloudFront

I have started to work with AWS CloudFront with the purpose of improving the time response of the store.

Initial expectations:

  • Have an easy way to engage with the services.
  • Integrated and clear invoices explaining the consumed services and unit prices.
  • Training.

Concerns:

  • Need to understand how AWS and Google services are evolving in the market.

A diagram to remind:

AWS_CloudFront_Architectural_OverviewThe initial impression is that Amazon is this decade what Microsoft in the 90s: they are making the cloud accessible to the whole world in an understandable way.

Update, December 2014

  • Amazon communicated that the storage prices has decreased. This is the new normal behavior of the market: the vendor communicates the customer that the prices are decreasing proactively.

Apache Hadoop

hadoop-logoApache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. They cover all these challenges and other ones.

  • Hadoop is scalable, it linearly adds more nodes to the cluster to handle larger data.
  • Hadoop is accessible, it runs on large clusters of commodity machines or on cloud computing services such as Amazon’s Elastic Compute Cloud (EC2). If you are rich and can purchase an IBM Big Blue, then good for you, but the rest of the mortals, we have to utilize inexpensive servers that both store and process the data.
  • Hadoop is robust, : It is architected with the assumption of frequent hardware failure, so it has been implemented to handle these type of failures.
  • Hadoop is simple, it allows users to quickly write efficient parallel code. Hadoop’s accessibility and simplicity give it an edge over writing and running large distributed programs. Please do not mix with “easy”, it requires intelligence and knowledge.
  • Hadopp is versatile, it understands the “big data” challenge where today we do not know the data we will be required to analyze by tomorrow. Hadoop’s breakthrough, businesses, organizations, data and is able to analyze data that was recently considered useless.

Before Hadoop, big data

Big Data can be defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Big data is basically about a new set of algorithms, topology and use of resources that enable us to gather, analyze and show more data.

Every day, we create thousand of bytes of data, the number grows in an exponential way.

In the past all data was stored in a relational database, so to handle it, it was a question of simple rules to select data from a database. Today 80% of data is unstructured, and this percentage is growing: videos, pictures, comments, GPS, transactions, locations…

Under this situation, individuals and companies face scenarios where the data is too big or it moves too fast or it exceeds current processing capacity. This limit to handle the data comes from:

  • Volume: to overcome the excessive size of data requires scalable technologies and distributed approaches to querying or finding a given data.
  • Velocity: we need to organize the resources (CPUs, memories, networks…) to enable real-time data that enables us to make decisions at the right time.
  • Variety: the unstructured data makes that the known paradigms of data search are not valid anymore. Since this moment you can differentiate between “Hadoop people” and “RDBMS people”, they are a different thing.

Think about Facebook, in the terms mentioned above:

  • Volume: the amount of data introduced by all people, companies distributed around the world. Imagine how they should organize all information.
  • Velocity: As user I want all as fast as possible, imagine a facebook that takes 2 seconds to refresh a screen: nobody would use it.
  • Variety: Facebook is tracking data from different nature, and the number of natures is growing up.

Well, Facebook, Yahoo, Twitter, EBay… they all use Hadoop.

Before Hadoop, distributed processing

I’m reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this comes from and all the toolkit generated around it. It’s massive amount of information, but fascinating for me.

Hadoop project comes from the need of requiring more resources for a give goal. The solution has been to distribute the data and the processing of data. You need to process a huge amount of data with a simple computer that offers limited processing cycles, then you use combined group of computers to run these processes in less time.

The major resources considered while distributed processing system are: Processor time, memory, hard drive space, network bandwidth. For instance virtual servers is a  sophisticated software that detects idle CPU capacity on a rack of physical server and parcels out the virtual environments to utilize it.

There are so many challenges on distributed processing when it’s applied at large scale, and Hadoop faces them. It’s important to mention these challenges to understand (or admire) what the Apache Hadoop project does.

  • One individual compute node may overheat, crash, experience hard drive failures, or run out of memory or disk space.
  • The networks can experience partial or total failure if switches and routers break down. The network congestion which causes data transfer.
  • Multiple implementations or versions of client software may speak slightly different protocols from one another.
  • If the input data set is several terabytes, then this would require a thousand or more machines to hold it in RAM.
  • Intermediate data sets generated while performing a large-scale computation can take several times more space than what the original input data.
  • Synchronization between multiple machines.

In each of the mentioned cases, the distributed system should be able to recover from the component failure or transient error condition and continue to make progress.

PTC acquires Thingworx

I follow this company since some time ago, they are evolving on IoT arena with a strong platform solution. I also follow the evolution of OpenRemote, and I’m willing to see how it evolves.

Related to thingworx it was a question of time that someone came and buy it. In the case of PTC, they had PLM solution and with this acquisition the will be move in short terms to offer M2M solutions, looking for long term niche: Service Life-cycle Management.

All industries are moving to a clear “service” approach and manufacturing industry is not an exception. So this acquisition is the natural move of an emerging company which have a clear strategy.

Is it a risk? sure!…

PTC has today PLM, ALM, CAD, SLM, IoT, the only natural piece of the puzzle is MES. Let’s see the next move.

Desafío sur del Torcal

Es la primera vez que me enfrentaba a la distancia de la maratón. Hacerla en montaña es más liviano que correrla por asfalto, al menos para mi.

La existencia de esta prueba que tiene muy poco desnivel hace más sencillo el reto.

Muy contento con la manera que fue el día, piernas menos cansadas de lo esperado,  eso si la planta de los pies muy calentitas en los últimos 10 kms.

Tiempo: 6:29:13.

Desafio_Sur_Torcal_2014