Monat: May 2016

The Lambda Architecture and Big Data Quality

  In my previous post about data quality in the Big Data era, we’ve seen some of the challenges raised by the recently born data operating system that came with Hadoop 2.0 and YARN . In Part 2 of this series, I’d like to explore how this new framework changes the traditional landscape of the data quality dimensions.  […]

Career Opportunities in Talend for Big Data: Your Guide to Bagging Top Talend ETL Jobs

  With the world becoming more connected and data savvy with every passing year, there’s a rising need for businesses to efficiently manage the trillions of bytes of data that they capture, and gain insights from them. Talend helps businesses do exactly this while boosting developer productivity and reducing time-to-value for ETL data warehouse projects. […]

Talend and “The Data Vault”

  In my previous blog “Beyond ‘The Data Vault’” I examined various data storage options and a practical architecture/design for an Enterprise Data Vault Warehouse.  As you may have realized by now I am quite smitten with this innovative data modeling methodology and recommend to anyone who is developing a ‘Data Lake’ or Data Warehouse […]

Stop Chasing Perfection in Analytics. Here’s Why

I wrote a blog around another favorite topic of mine, DevOps, a while back and in it I discussed the notion of perfection being the enemy of ‘good enough’. After some conversations these last few weeks, I have reaffirmed my stance and broadened it to include everything, especially analytics.  The things I hear time and […]

Introduction to Apache Beam

  This blog is the first in a series of posts explaining the overarching goal and purpose of the Apache Beam project. In the future blogs, we will explain how to use Apache Beam to implement data processing jobs.  When you have an existing big data platform, the continuous evolution of that platform is important. […]