Andy Burgin

I was going to organise the digital industry, but just went to loads of gigs instead

episode 006 – Updating data using ETL tools

If I look at the definition of “grey areas” they sometimes include tasks that don’t really fit one role in a team. One such task is “here’s a spreadsheet with the stockists data for website X – can we upload these?”.

Now it may be that it’s a complete set of stockists data that matches the existing fields in the database with a stockistid key field which is consistent with what you have. In this case it’s a simple export, upload and test job.

Unfortunately, it isn’t always as simple as that. It might not be a full set of data and just be updates, chances are it’s been dumped from some CRM system and the field definitions/format might not match whats there already. This can be quite a time consuming task.

So in this episode rather than loading data into a database, searching and mapping rows I’m going to try some tools built for the job. I’m going to use 4 ETL (Extract Transform and Load) tools:

Apatar
Talend Open Studio
Rapidminer
Pentaho

So armed with a dump of the northwind database supplier table I’m going to try 3 example updates – update contact info, update supplier address and add a new supplier.

Watch to find out how I got on.

Leave a Reply