Joe on Data » Joe MontiJoe on Data

↧

Welcome

February 24, 2013, 1:15 pm

Welcome to my new blog, Joe on Data. Hopefully the title of the blog is explanatory enough for what it is about. In it I will be writing about data; how to manage it, how to store it, how to process...

View Article

Intel Launches Hadoop Distribution and Project Rhino

February 26, 2013, 11:35 am

Intel apparently is launching it’s own distribution of Hadoop as well as Project Rhino. Project Rhino is an “open-source effort to enhance security in Hadoop,” which makes Hadoop a more viable option...

View Article

Picking the Right Database for Your Application

February 26, 2013, 10:09 pm

One of the first things you need to decide when building a new application or major feature is how you are going to store and process the data for it, which means picking the right database for the...

View Article

Big Data is CRAP

February 27, 2013, 10:42 pm

I was watching an interesting “Leaders in Big Data” panel and one of the panelists, Charles Fan from VMWare, had a great name for Big Data (or what you do with it): Create Replicate Append Process Very...

View Article

Playing With D3.js

March 5, 2013, 10:44 pm

D3.js is a “JavaScript library for manipulating documents based on data.” D3 works a little like jQuery, but is focused on providing a set of tools for binding data to visual components and a rich set...

View Article

Shell Analytics: command line tools for data analysis

December 17, 2013, 8:01 pm

Shell Analytics is when you use command line tools to analyse or manipulate data. Command line tools are an invaluable tool for working with data, specifically files or command line programs which...

View Article

Amazon ElasticMapReduce and the Screen Command

April 7, 2014, 2:15 pm

How many times have you SSH’d into a remote GNU/Linux server, ran a long-running command, only to come back to a stuck or disconnected SSH session? GNU Screen to the rescue! GNU Screen is perhaps one...

View Article

Memory Management in Hadoop MapReduce

May 22, 2014, 8:16 pm

If you ever have to write MapReduce jobs or custom UDF or SerDe classes for Hive in Java, you will want to re-use memory as much as possible, meaning as few object and array allocations as possible,...

View Article