At one point, the Big Data trend–sorting and sifting large data sets with new tools in pursuit of surfacing meaningful angles on stored information–was an enterprise-only story, but now businesses of all sizes are looking into tools that can help them glean meaningful insights from the data they store. As we’ve noted, the open source Hadoop projecthas been one of the big drivers of this trend, and has given rise to commercial companies that offer custom Hadoop distributions, support, training and more. Cloudera and Hortonworks are leading the pack among these Hadoop-focused companies.
Front ends for working with Hadoop, which make it easier to sift large data sets, are also appearing. Talend, which offers a number of open source middleware solutions, is out with a new one, and Microsoft is making it easier to work with Hadoop from the Excel spreadsheet.
Talend Open Studio for Big Data, which provides a front end for easily working with Hadoop to mine large data sets, has just been announced and is released under an Apache license. According to a post on Virtual Strategy:
“Talend Open Studio for Big Data is a powerful and versatile open source solution for data integration that dramatically improves the efficiency of integration job design through an easy-to-use graphical development environment. Talend Open Studio for Big Data provides native support for Hadoop Distributed File System (HDFS), Pig, HBase, Sqoop and Hive. By leveraging Hadoop’s MapReduce architecture for highly-distributed data processing, Talend generates native Hadoop code and runs data transformations directly inside Hadoop for maximum scalability. This feature enables organizations to easily combine Hadoop-based processing, with traditional data integration processes, either ETL or ELT-based, for superior overall performance.”
“Thanks to Talend Open Studio for Big Data, users of Hortonworks Data Platform will be able to greatly simplify the deployment of Hadoop. Talend Open Studio for Big Data abstracts the complexity of Hadoop and its ‘interfaces’ (specifically Pig, HBase, Sqoop and Hive) by allowing graphical design of the big data integration jobs, and generating native MapReduce code. It alleviates the need for a deep, technical understanding of MapReduce and the different components of Hadoop. And, equally important, it brings to the table over 450 connectors to ‘the rest’ of the information system – integrating enterprise data into Hadoop.”
Meanwhile, as noted in a GigaOM post, Rob Bearden, CEO of Hortonworks and former COO of JBoss and SpringSource, says that Hadoop “has an opportunity to be bigger than those two companies, as well as open source database MySQL, combined.” Hadoop has become an open source phenomenon.
Hortonworks is also working with Microsoft to link the Excel spreadsheet to Hadoop, according to Computerworld:
“Microsoft is developing a connector that will allow Excel users to download and analyze output from Hadoop, potentially opening the open-source data processing platform to a much wider audience. Microsoft is working on the connector with Hortonworks, a Yahoo spinoff that offers a Hadoop distribution and commercial support services.”
“The connector will be an ODBC (Online Database Connector) that interacts with Hadoop through the Hive data warehouse system,” Computerworld reports.
If you or your organization have been interested in working with Hadoop, the tools for doing so are becoming more varied and more approachable. As we noted here, Hadoop skills are very highly valued in the tech job market at this point, and we have also written about Hortonworks University, which focuses on teaching Hadoop skills. You can find a class near you and register here.