Expanding the Tableau Data Pipeline: Auto-creation of tde from csv

The scenario

Imagine that you’ve pulled some data out of a database (choose your flavor) and want to now analyze in Tableau. You notice, however, that it’s too massive to tinker with in Excel (yes, this actually happened).

So, you think: “Wouldn’t it be awesome to have Tableau generate a shared data source for me if I just point it to a csv and then choose what datatypes are in it?” At that point, you can analyze to your heart’s content. Bonus: it’s repeatable and, if your csv updates, so will your data source.

We can now use this as part of our data pipeline: we write some awesome query and want to share the results with our colleagues. Instead of sending a csv file to the crew, we just send them a Tableau data source.

The setup

You will need 2 config/base Tableau files (we’ve included them here):

  • XML file with the structure for a TDS
  • Basic/simple/dummy TDE file (really, it’s not dumb at all, as TDEs are amazing; rather, it’s just a basic ‘helper’ TDE we’ll use to package with the TDS for the TDSX)

csv_to_tde_1

 

csv_to_tde_2

The script (see our github repo for the script):

  • Reads your csv file
  • Let’s you choose what data type you need for the columns; if you want to let Tableau work its magic, just set the ‘Choose Data Type?’ to false and the Extract engine will come to the rescue.
  • updates the XML in the TDS
  • Packages the ‘helper’ TDE with the update TDS
  • Publishes to Server and refreshes with the new data and new file

 

csv_to_tde_3

 

 

Don’t believe us? Watch the video (this is where we’ve set the ‘Choose Data Type’ to True)…

 

 

A Practical Example

With the ability to search the web for interesting and varied data sets, we run into csv’s a lot. The NYC Taxi data is no exception

So, we grabbed a month’s worth of data (approx 2gb) and pulled into Tableau (see image below) and in 180 seconds (this was for the refresh on 12 million rows; the script took less than 3 seconds), it was a shared data source.

Another version of this example exported all the months, merged them and then made a data source on Tableau. Either way, all one needs is a csv file.

taxi_data

 

 

We’ll update this in future releases to use tables/custom sql as well as make it a more robust pipeline.

 

CsvTdeCreator_Master

TdeContent