Tuesday, March 25, 2014

Find Type of Vagrant VM is Running

I am really enjoying Vagrant.  It's one of those tools that are indispensable.  However, today I wanted to install a CentOS VM in my application and I didn't remember the version name that I was using in my other VMs.  To find out, the only thing that you have to do is to check a previous VM.  Here's an example:
vim ~/vagrant_boxes/kafka/Vagrantfile
You will be able to see the version inside the file:

Friday, March 21, 2014

Strata 2014 - Newbie Perspective

Marc Andreessen noticed that software is eating the world.  I see the same thing with Big Data.  Big Data is shaping the world around us.  It has been used on presidential elections, weather reports, consumer analysis/sentiment, fraud check, etc.   Strata conference is the epicenter of new technologies, use cases, and new innovations related to Big Data.  I've been meaning to go there for quite some time.  Previously, I purchased the videos from O'Reilly because I couldn't make it.  Thanks to my current company, 3C (they're pretty awesome), I was able to go along with five of my coworkers.  It's the place where you can meet the experts, the main committers, and ask them questions.  If your eyes get dilated when you talk of Hadoop, or you get exited when you need to solve a problem that has to do with a huge amount of data including the famous "three V's" (volume, velocity, and variety), then this conference is for you.  This is a quick summary of my experience of the conference.

The conference revolved around four clusters:

  1. How quickly can you get the data into your system (ingest)
  2. How fast can you show the results
  3. It's all about presentation (charts)
  4. Big Data doesn't mean Hadoop


How Quickly Can You Get Data


The presentation that left me mesmerized was Spark!  I can't wait to use it.  It is a very compelling product and it's now backed up by Cloudera.  With Spark you can do the following:
  • Get a compute engine for Hadoop data - no need to reinvent the wheel
  • Speed up! A 100% faster MapReduce engine
  • Sophisticated: it runs all the sophisticated algorithms.  Get access to a library of sophisticated algorithms
  • A a big community behind it; the most popular Big Data open source (followed by Hadoop)
  • Learning from the big guys - Yahoo!, Conviva, and Cloudera are using it
Not to mention that it comes integrated with a analytic suite (Shark), a large-scale graph processing (Bagel), and real-time analysis (Spark Streaming).  This is nice because rather than doing Hive, Hadoop, and Mahout, and Storm, I only have to learn one programming paradigm.

How Fast Can You Show The Results


Twitter explains how they monitor millions (+5,700 tweets per second) of Time Series.  The presentation was superb.  I found out that the stack that they're using, named "Observability", is composed on: Finnagle, Cassandra, and query language and execution engines based on Scala.  Although is a work in progress the stack is about three years old.  I hope that they open-sourced it stack so I can get more context on how they monitor a large distributed system.  

Another very interesting product was Google's Big Query.  This was one of those presentations in which we (my team and I) stumbled upon by accident.  The presentation showed how to use Google's toolkit: Freebase, Maps, and BigQuery to do analytics.

It's All About Context, Results, or Charts


Another company that impressed me was Trifacta.  With their tool you can clean data, see the model (graph) and recursively do it again in case you see patterns or not.  The tool is targeted to data scientists, data wranglers, and data analysts.  It's a great tool to mine data data, but most important, you can clean the data and show the results with relative ease.

IPython: This rekindled my interest in Python.  IPythons notebooks are great for data scientists.  You can get code, text, and graphics all in one page, so it's the perfect tool to show quick results.  It's not that Python wasn't a popular language for data scientists.  NumPy library provides a solid MATLAB-like matrix data structure, with efficient matrix and vector operations.  It also provides other great APIs like SciPy and Pandas.

Big Data != Hadoop


Two topics that opened my eyes were Mesos and YARN.  Mesos, what Twitter uses to manage its clusters, is similar to YARN (Yet Another Resource Negotiator).  The Hadoop 2.0, or YARN, it's becoming more of an environment and operating system; not just a MapReduce.  With YARN, the JobTracker is gone.  The ResourceManager is what does the job of the JobTracker.  The ResourceManager (RM) is a scheduler - it allocates resources based on a pluggable scheduling algorithm. RM manages and monitors all the applications, so it strictly limits to arbitrating available resources.

One of our favorite (me and two of my buddies), was Netflix Data Platform by Kurt Brown.  A different and a great presentation.  Rather than going on the technology side, they explained how the culture is intertwined with their technology stack or decisions.  For example, they talked about the reason for using "the cloud".  Obvious reasons like: it's cheaper, much flexible (growth, a better place to do tests/spikes), and having multi data center is definitely a plus.  Also, Amazon and RackSpace have great services such as SQL, EMR, and S3.  But the main reason is "focus".  They are focused on getting movies and increasing their audience rather than to focus on the "plumbing".  They expressed their commitment to "open-source software" (OSS).  They mentioned the great talent that they can get and how they can "manage their own destiny" by following these principles and using these tools.

Netflix explained their philosophy and how it's the "soul" of their decision (technical and business).  For example, they keep keyboards, mice, and other peripherals in vending machines (they are free), so that everyone knows to "act in Netflix best interest".  Furthermore, every decision or project needs to answer a basic question: "what value are you adding?".  They apply the rule "accept that things will break".  Because of this, they build safety nets around their systems.  Again, it was a very nice and interesting presentation.

I really enjoyed the conference.  I also just purchased the videos.  Which I highly recommend!!  During the next few months, I'm going to try to learn some of these tools and present them at the Miami JVM Meetup.  Hopefully I can get to see you there, or better yet, hope to see you at Strata 2015.  If you're going to either one of these events, let's meet up and share a beer...or two and discuss Big Data.  I promise that my eyes will get dilated.



Monday, March 17, 2014

El Dilema de Ser Buen Samaritano o Come Mierda

Siempre me a gustado ayudar a la gente, pero hay veces me pongo a pensar…soy "buen samaritano" o un "come mierda"?  Desde pequeño me gusto salir y hablar.  Mi mamá siempre me dijo que hablo “hasta por los codos”.  Lo que he notado es que ahora, muchas personas me suelen hablar y pedirme por cosas.  Algunas personas se acercan a mi para venderme algo, y otras para ayudarlas.  La verdad es que muchas personas dicen que yo soy un muchacho “agradable”, otras personas dicen que soy “simpatico” (me gusta), yo a veces pienso que tengo cara de "come mierda”.  Por ejemplo, es común que cuando voy a un mall, siempre las personas que tienen una tienda en un kiosco, siempre me llaman, “señor, le limpio el reloj?" o "Joven, tengo esto en especial.”  Siempre termino en decir, “no gracias" con mi sonrisa y sigo adelante, para que…para encontrarme con otros dos muchacho/as que me van a preguntar lo mismo.  Esto es muy común para mi.  Mi esposa siempre me dice que vea abajo para que no me persigan, pero hasta eso!  La otra ves, caminando en un estacionamiento con mi familia, una señora me paro en plena calle y me llamo.  Luego me dijo, “me puedes hacer un favor, se me desamarro mi zapato, me lo amarras?”  Y que es lo que hice?  Pues lo amarre…como buen come mierda.  Para consolarme, la señora era una anciana obesa.  Pero aun así, de todas las personas, tuve que ser yo?  Mi ultima escena de “buen samaritano” fue la otra ves que fui a desayunar con mi familia.  Apenas salí de el carro, un señor me vio y me dijo que su carro lo encerro con las llaves dentro de el vehículo.  Yo se, medio bruto el personaje, pero también yo soy super despistado - lo entiendo.  “Me puedes llevar a casa para agarrar mis llaves de repuesto?  Vivo bien cerca.”  Mi esposa solo me vio, me dio un sonrisa, y me dijo que mientras iba a agarrar la mesa con los niños.

La verdad muchas veces me pongo a pensar, “voy a ponerme así, todo cabrón y mandar al diablo a esa gente”.  Pero no es como soy yo.  Como dije, me encanta hablar!  Cuando hice mi ultima “labor” de buen samaritano, le pregunte a el señor (algo mayor) que cuantas personas le había preguntado.  El me respondió fui el primero, “tienes la pinta de ser amable”.  Yo pensé, "mas bien, cara de come mierda.”

Después de tanto tiempo así, viéndolo en retrospectiva, no solamente me da mucho agrado ayudar, pero también me ha ido muy bien con mis pequeñas labores.  Como dije, se siente bien ayudar a la gente.  Ademas, creo que da un buen ejemplo a mis dos hijos (tengo uno de doce y otra de 3 que cree que tiene trece).  Y más aun cuando lo haces sin pensarlo mucho o pedir algo en cambio.  Aunque nunca he pedido nada, siempre termino notando cosas bonitas.  Como cuando el señor que dejo las llaves en el carro pago mi desayuno, en el caso de el año pasado alguien me compro un par de zapatos de $120, solamente porque tenia cara de buena gente.  Al parecer, a los de cara de come mierda, tiene mucha suerte.

Un abrazo!
Marcelo

Thursday, March 6, 2014

Disruptive Possibilities: How Big Data Changes Everything

I was looking forward to this book because of the title. I was under the impression that I was going to find concrete examples on how Big Data has affected and disrupted some industries. Best of all, I thought that I was going to read what industries will be impacted and how.  The book showed some examples at the end, but in my opinion, it leaves something very important: speed and sophistication. 

I just came back from Strata 2014, which is why I was looking forward to this book, and when I heard Matei Zaharia's keynote, it was all I needed to know about the current disruption of big data. Nowadays, big data storage is becoming commoditized, so the best value added is speed (how quick you can get the answer of your problem) and sophistication (run the best algorithms on the data). The book doesn't mentioned this but it might be because of its age - things are moving super quick on Big Data.

Some of the things that the book does well:
  • Introduces some history about the Big Data problem
  • How it affected some of the silos technologies like RDBS
  • How they solve the scalability issue
If you are a manager or someone that has no understanding of the world of Big Data, then I would recommended.  However, if you are a developer, data scientist, or data wrangler, then this book will be too basic.  The one thing that I highly recommend, if you are interested in this subject, is to attend (or at least purchase the videos) of Strata.

You can get the book here.

Happy reading,
Marcelo