Imagine that you’ve pulled some data out of a database (choose your flavor) and want to now analyze in Tableau. You notice, however, that it’s too massive to tinker with in Excel (yes, this actually happened).
So, you think: “Wouldn’t it be awesome to have Tableau generate a shared data source for me if I just point it to a csv and then choose what datatypes are in it?” At that point, you can analyze to your heart’s content. Bonus: it’s repeatable and, if your csv updates, so will your data source.
We can now use this as part of our data pipeline: we write some awesome query and want to share the results with our colleagues. Instead of sending a csv file to the crew, we just send them a Tableau data source.
You will need 2 config/base Tableau files (we’ve included them here):
- XML file with the structure for a TDS
- Basic/simple/dummy TDE file (really, it’s not dumb at all, as TDEs are amazing; rather, it’s just a basic ‘helper’ TDE we’ll use to package with the TDS for the TDSX)
The script (see our github repo for the script):
- Reads your csv file
- Let’s you choose what data type you need for the columns; if you want to let Tableau work its magic, just set the ‘Choose Data Type?’ to false and the Extract engine will come to the rescue.
- updates the XML in the TDS
- Packages the ‘helper’ TDE with the update TDS
- Publishes to Server and refreshes with the new data and new file
Don’t believe us? Watch the video (this is where we’ve set the ‘Choose Data Type’ to True)…
A Practical Example
With the ability to search the web for interesting and varied data sets, we run into csv’s a lot. The NYC Taxi data is no exception
So, we grabbed a month’s worth of data (approx 2gb) and pulled into Tableau (see image below) and in 180 seconds (this was for the refresh on 12 million rows; the script took less than 3 seconds), it was a shared data source.
Another version of this example exported all the months, merged them and then made a data source on Tableau. Either way, all one needs is a csv file.
We’ll update this in future releases to use tables/custom sql as well as make it a more robust pipeline.