Home    Bloggers    Messages    Polls    Resources
Tw  |  Fb  |  In  |  G+  |  Rss
Bill Kleyman

Managing Big-Data With Open-Source Tools

Bill Kleyman
dstrait
dstrait
7/3/2013 10:48:49 AM
User Rank
Platinum
Control
Are we really controlling the growth or are we surrendering to simple human behavior?
 
I've had this ticking away in the back of my head for years now. I can't get past the idea that, at least when it comes to data, most people are hoarders. 
 
An example: I encountered a person who was collecting clickstream data from a busy, multinational website. (This was back when clickstream was just starting to be a thing, maybe even a bit before that.) This fellow collected the data (the means aren't important), exported it and then burned it onto CDR because he didn't have enough storage on his computers. He then put the CDR in his desk. This was a daily task for him. When we first interacted with him, he'd been doing this for months. He had a desk full of CDRs. He had no system for importing that data back into something that could do analysis, or even just perform simple searches. He didn't have a good idea of what metrics people would really be interested in. He had a lot of data he was convinced was useful. He was collecting the data because he could and the costs of storage weren't "too much". He was hoping that someone would come up with a reason to have the data, all the while performing his daily ritual. 
 
IIRC, he stopped his project when we looked at the costs of creating a system and standing it up. (That would have been a beastly clustered RDBMS system with a second, identical warm failover system. SAN storage was required. There were chargebacks. All of that means "expensive" and time-consuming (and possibly unnecessary but that was our standard).
 
Nowadays, he would chuck that into a big data system running on cheaply scalable, virtualized, opensource software. He might not even need IT support, if he can buy cloud resources on a corporate expense account. 


50%
50%
Bill Kleyman
Bill Kleyman
7/2/2013 9:27:35 PM
User Rank
Blogger
Re: Managing Big-Data With Open-Source Tools
@hash.era - Well, I'd imagine there are a few ways to manage big data. In fact, big storage vendors are jumping into the big data game. There is direct integration with big data analytics engines already from  vendors like EMC and NetApp.

These big data and BI engines can be resource intensive. Now, from a backup perspective - some of the more refined editions of big data management solutions aren't always free. As mentioned earlier, bringing in a solution that has backup, redundancy and even mirroring will probably cost an enterprise license. 

50%
50%
Bill Kleyman
Bill Kleyman
7/2/2013 11:55:07 AM
User Rank
Blogger
Re: Hadoop: DIY or Commercial
@Cheryl - That's a really cool example! I was just mentioning how more open-source products are revolutionizing how we control and manage data.

MapR already has an edition called M7 NoSQL Edition. 

50%
50%
Bill Kleyman
Bill Kleyman
7/2/2013 11:53:14 AM
User Rank
Blogger
Re: Hadoop: DIY or Commercial
@Michael.Steinhart - That's a great question. Many smaller organizations looking to start out are trying the free versions of these products. Remember, the entire Apache Hadoop "platform" is now commonly considered to consist of the Hadoop kernel. This means that technologies like MapReduce and Hadoop Distributed File System (HDFS), as well as Apache Hive, Apache HBase are all built on the Hadoop model. 

The product itself can be deployed at no charge. In many cases, it's the integration with other systems that comes with a cost. Furthermore, the ability to tie into other paid products will depend on the edition of the Hadoop platform that's being deployed.

Let's look at a specific example, MapR can be obtained for free or through two other purchased editions. The paid editions offer more features like instant node recovery, volume-based data management for tables, and Snapshots for tables. Paid versions also include support.

50%
50%
Cheryl
Cheryl
7/2/2013 2:18:18 AM
User Rank
Platinum
Re: Hadoop: DIY or Commercial
I wanted to mention one additional tool that i tried called CouchDB. it is also free and something that I was introduced to while I was working with Apple. It was enough to spark my interest in NoSQL, so I keep up with it even now. I always following tools that relate to Data.

50%
50%
Michael.Steinhart
Michael.Steinhart
6/30/2013 11:47:57 PM
User Rank
Editor
Hadoop: DIY or Commercial
Thanks for this helpful overview, Bill. I've seen Hadoop available free and also available with value-adds like support and depoyment templates and vertical-focused integration. Which do you recommend for a company that wants to leverage the platform for BI?

100%
0%
hash.era
hash.era
6/30/2013 10:32:51 AM
User Rank
Platinum
Managing Big-Data With Open-Source Tools
@David: You mean kind of a back-up plan is it ?                 

0%
100%
More Blogs from Bill Kleyman
Multiple driving forces, such as regulatory changes, business trends, and datacenter technology, are coming together to advance the hybrid cloud.
Prepare for a flood of network and datacenter traffic by adopting some of these ten tips for greater cloud/datacenter efficiency.
Want to get ahead in IT or on the business side? Check out Bill Kleyman's career tips.
Bill Kleyman says that among other developments, the new year promises more shifts to hybrid clouds and new ways to ensure regulatory compliance in cloud environments.
Government and industry initiatives are making it easier to ensure that cloud services help enterprises to comply with key regulatory guidelines and security standards.
Digital Audio
Latest Archived Broadcast

Michael Biddick, CEO of Fusion PPT and a speaker at the upcoming Interop and Cloud Connect Summit events, is joining The Enterprise Cloud Site community to share his advice on making a hybrid cloud work for your organization on Thursday, March 27, 2014, at 1:00 p.m. (Eastern).

flash poll
Live Video
On-demand Video with Chat
What lessons can today's enterprises learn from federal cloud intiatives -- successful and less so?
follow us on twitter
like us on facebook
The Enterprise Cloud Site
About Us     Contact Us     Help     Register     Twitter     Facebook     RSS