Recreating the Dot Plot/Gantt/Box&Whisker/Bar Thing from the Washington Post

I recently came across a really cool plot in the Washington Post from Sahil Chinoy (@sahilchanoy) denoting the various commute times from major metropolitan areas in the US. Couldn’t figure out exactly what the plot should be called, but I knew it was insightful, easily understood, and applicable to a variety of use cases in the world of data. For reference, here’s the link:

I don’t get to do much dashboard dev’ing any more, but I thought it’d a fun challenge to try to recreate the visualization in Tableau. Here’s the breakdown of that effort.

Using the Sample – Superstore dataset provided in Desktop, I thought I’d illustrate how this kind of viz could be helpful in analyzing multi-dimensional datasets. The first use case that came to mind was looking at Sales per U.S. State per Segment. So we were looking at a max of 3 Segments across all of the states in the data, and I wanted to see how aggregated sales spanned the customers segments in each state. Questions we had the potential to answer with this approach would be things like:

  • Which states had a larger than normal sales spread between Segments?
  • Was there a pattern across the states in the ranking of customer segments?
  • Do large deltas between segments (in each state) simply correlate with gross aggregated sales per state?
  • etc

Let’s start with the finished product:

To get there, you need to create 4 calculated fields.

First, an LOD calc to get SUM(Sales) at the State and Segment granularity:

Next, you need the MIN() and MAX() of that LOD calc:

Finally, you need the difference between the MIN() and MAX() for each state:

Now you need to start dragging fields onto your sheet.

  • Put State on Rows
  • Put SUM(Sales) on Columns, and choose Circle as your mark
  • Put (previously created calc) Max Sales per State and Segment on Columns, and choose Gantt Bar as your mark
  • Create a Dual Axis for those two measures, and synchronize those axes
  • On the SUM(Sales) Marks card, drag Segment onto Color, and select your chosen color for the space between each state’s min and max amounts. This is the range per state.
  • On the Max Sales per State and Segment Marks card, drag the Diff between totals calculated measure to Size.
    • Right click on that calculated measure, and choose ‘Edit in Shelf’. Just put a minus (-) in front of the measure to move the bar representing Diff between totals from a position where it starts from the Max amount per state and goes to the right to the correct position, which is where it spans the delta between the Min and Max segment amounts per state.
  • From there, just adjust your sizing and colors to your liking


Here’s the workbook if you’re interested in dissecting and recreating:

Getting a Count of Something on the Last Day of Each Month

Recently, I received a request to plot a count of our employees as of the last day of each month. A reasonable request for sure; however, Tableau didn’t offer a quick solution. I could have written the function in SQL, but thought I’d figure it out in Tableau as well.

Here’s how I got there:

My initial employee headcount plot was counting all unique Employee IDs for each month. On the surface, this gives the appearance of a count of employees as of the 1st day of every month. That appearance was deceiving though. In actuality, it was counting every unique Employee ID found in each month, even if they were terminated before the end of the month.

What I needed to do was identify the last day of every month, and then only count the unique Employee IDs on those particular dates, even if I was plotting monthly headcount.

First I created a calculated field that identifies the last day of every month:


Then I needed to create another calculated field that compares the last day of every month with the termination date of each employee. Here’s that:

[Last Day of Month] >= [Termination Date]

I put that calculated field on the Filters shelf, and set the value to ‘False’. This meant that if an employee was terminated prior to the end of the month, we would not include them in the count of employees we are making on the last day of the month. This is the key. This removes all those employees that didn’t stay active until the last day of the month.

If you are not working with Employee Headcounts, then the [Termination Date] field can be replaced with any other kind of date that denotes that an item is no longer part of the pool that should be counted.

Hope this was helpful.

Some links to create EDA Visualizations in PowerBI

Pareto Chart:








Box and Whisker Plots:


Dual Axis (Combo chart):

Small Change, Big Impact – Highlighting Values in a View

Have you ever created chart and thought to yourself, “Man, I’d really like to have {insert dimension} stand out so it was immediately noticeable”? In my case, I was comparing companies based on dates and a select measure. While dragging and dropping all the necessary fields, I starting thinking about how the customer could easily identify which line was their data vs their competitor’s. Sure, making their data a brighter color could work, but is it the best/only option? After playing around with different ideas, I decided that I wanted to make the client’s data line thicker than the others. But how could I make it a reality? Often times, the simplest solution is overlooked. Below is how I decided to present the data.

*data presented is not real


The Data Format:


The Chart:


The Question: How can I make Mountain Dew stand out (other than just color)?

The Solution: I created a Calculated Field (let’s call it “IsCompany”). In the Calculated Field, using a CASE statement, Identify which company is the one you want to identify, all others fall under the “ELSE”:


I then drag the IsCompany field into Size and adjust the ‘N’ & ‘Y’ size so that Y is larger (it should default to alphabetical, I still adjust it to meet the desired sizing.


I also adjusted the coloring to make it stand out even more:


As you can see, the Client’s data is now the focus immediately. Finally, when placing the sheet into a Dashboard, leave out the IsCompany Size Legend:


One Set of Date Filters to Rule Them All

If you are a Tableau and Salesforce analyst/administrator, this post will likely resonate with you. Salesforce data sets tend to have more than a few date fields in them. That being said, it’s definitely not exclusive to that particular context and use case. Here goes…

The Problem:

You have a data set that contains multiple Date fields. You are creating several views for your dashboard, each of which uses a different date field to identify trends over time. On your dashboard, you don’t want to have several date filters…one for each view. Having multiple date filters on a singular dashboard causes clutter and confusion for your audience. However, since each view uses a different date field, adjusting one of the filters does nothing for the rest of the views that don’t contain that particular date field.

The Question:

How can I get all my views to be controlled by one filter, or in this case, one Begin Date filter and one End Date filter?

The Solution:

Create two parameters and several calculated fields (details of which are below) that allow you to control all of your date fields, and thus, the entirety of your dashboard.

First of all, you need to create two Date parameters, where you manually adjust the Minimum value to match the earliest date value in your entire data set. You’ll want to leave the Maximum value untouched, allowing it to adjust to whatever your audience chooses. Then change the Step Size to “1 Months”.

Secondly, you’ll need to create the following calculated field for each of your views on your dashboard:

[Date Field] >= [Start Date_Universal Filter] AND [Date Field] <= [End Date_Universal Filter]

Note: You need one of these calculated fields for each view. Use the particular date field that pertains to each view as your [Date Field].

For a step by step overview, here’s a quick how-to video using the Superstore data set for your viewing pleasure…


Thanks for taking a look. Hope it was helpful


Use a Level of Detail Calculation to Move Your Row Totals Wherever You Want

When creating crosstab-like views in Tableau, it’s a common practice to add the row totals to your view. This allows users to understand the parts:whole relationship between data points and their categorical total. Here’s a simple example:


This view is easily accomplished by going to the Analysis menu, and Totals > Show Row Grand Totals.

However, sometimes a user may request the ability to see their row totals closer to the left side of the view, and save themselves the effort of scanning and/or scrolling to the right of their screen.

If you use the ‘Show Row Grand Totals’ option, here’s where you run into trouble. When you attempt to move the Grand Totals column to anywhere but the far right side of the view, you quickly realize that it’s not really possible to do so. This calls for a little ingenuity. In this case, we’re going to use a level of detail calculation.

Here’s the formula: { FIXED [Dimension] : SUM (Measure) }

Pretty straight-forward, but this level of detail calculation will allow us to create a movable Row Grand Totals column. You can position the newly-created column anywhere in the view. Here’s a how-to video, using Desktop 9.0, and Sample-Coffee Chain dataset.





Seasonality in Data – Cycle Plots

As the summer heat finally starts to abate, and the leaves ever so slightly begin to turn into different shades, my thoughts turn to seasonality in data sets. In Tableau, there are a few ways to highlight seasonality in your data. One of my favorites is called the Cycle Plot. If you’re a fan of Stephen Few, then you’re likely to be familiar with Cycle Plots already. If not, well, here’s an intro.

Cycle plots allow you to view patterns in your data across a particular component of the date/time in the dataset. For example:

  • How have my sales changed on each weekday over the past six months?
  • What is my site traffic in August for the past five years? And how does that compare to other months?

These are just a couple examples, but you get the picture. If you’re interested in date-oriented patterns in your data, then cycle plots can give you a good sense of where your peaks and valleys are. Here’s a quick how-to in Tableau:


DataOps-IST: Toolmaking and Delivery

“In God we trust; all others must bring data.”

-W. Edwards Deming

We should all have concerns about the modern business world. We are asking our colleagues to do things they are not capable of doing. We are setting unrealistic expectations for individual performance. As data professionals, we provide the material support, and then neglect to provide the requisite skill and training to take advantage of those resources. We ignore evidence that suggests it is much more difficult to master a new skill than we allow ourselves to believe. We allow managers to repeat buzzwords like ‘analytics’, ‘big data’, and ‘business intelligence’ without providing the infrastructure necessary to handle the weight of those terms.

We can do better. At Pluralsight, we are doing better; DataOps-IST enables this.

DataOps is a domain, by which we can begin to tackle the problems associated with the waves of data and the corresponding undue expectations we thus put upon our users.

There are three primary pillars of DataOps-IST:

In previous posts, we covered the first two pillars. Toolmaking and Delivery is the aspect of DataOps that first provides users with a resource, but along with that, also provides the underpinning of continual monitoring and improvement, which leads to optimal utilization of that resource.

If we are to accomplish the objective of increasing data literacy, analytical skill, and data-supported output throughout the organization, then it is imperative that the Data Team (however it manifests itself in your org) works towards a role that entails less report-making, and more ‘tool’ provisioning. To put it another way, they need to provide the means rather than the ends of the reporting cycle.

And those means include the following: Shared Data Sources that users can access and analyze, Plug-and-Play Report Templates that allow for quick and sensible data visualizations, Data Models, as well as Data Dictionaries that denote agreed upon contexts and definitions of fields, tables, views, etc.

Everything mentioned above, in addition to our previous posts, is only the beginning though. The last component of DataOps is the most important, for it affords us the opportunity of ceaseless evaluation of our previous efforts. It kicks off the invaluable cycle of assessment. How do we determine that the tools and resources we’ve provided our colleagues are of the utmost value and relevance? How do we ensure that our users are leveraging our data to provide the best possible end product?

We study the logs, of course. Logentries provides us the ability to do so. If we are falling short in any particular aspect of the process, our examination of the logs will show us that. Even if it’s to simply identify and remove those legacy reports and leftover dashboards that inevitably take up too much space on your server. The monitoring and reevaluation step of DataOps allows us to get back to the data again, to begin the process anew, and ensure that our colleagues are continually equipped to ‘bring the data’ they need in order to do their work in a meaningful way.




DataOps-IST: Creating a Culture of Data Analysts

Today’s rapidly evolving workplace provides a context where our decision makers must be able to leverage their data to make optimal judgments. This is not limited to managers and organizational leaders. On the contrary, we are (or soon will be) all data-driven decision makers in our organizations.

The issue at hand is identifying and building an organizational framework that enables people, especially non-technical staff, to use data to become better at their work, to provide more meaningful outputs, and to boost their own sense of personal worth by giving them the ability to measure and analyze their (and their team’s) efforts. The way to accomplish this is through DataOps-IST.

In our previous post, we looked at the importance of understanding your data and how it’s used in your organization. Acquiring that knowledge is a critical first step in developing a data-centered environment. This knowledge underpins the next pillar of DataOps, and that is the social aspect.

Developing the social dimension of an analytics culture is indispensable. Without it, an organization will be left with regrettable and failed technology initiatives, resulting in sunk costs in software, equipment, and other tools. The social dimension is THE forgotten piece of the puzzle. The one that all managers pay lip service to, yet fail to truly develop or understand.

Perhaps the biggest obstacle that stands before an organization that hopes to build a successful analytics and data-oriented culture is the lack of Data Literacy, which begs the question: How do you overcome the lack of data literacy

While there is certainly a multitude of ideas in the tech sphere regarding how we can improve the ability of our people to use data. What’s usually missing from those prescriptions is an understanding of the psychology and social aspect of getting people to engage with their data. Here’s how Pluralsight goes about fostering a sense of data-ownership from everyone:

            Make your data available to everyone in your organization

Allow your people to see the data that affects their work. Allow them to engage with the granularity that comprises the larger processes they participate in. Doing so will make them more informed, more skilled, and more enthusiastic about the ways they can improve the organization.

            Make analytical tools available to everyone

Data doesn’t belong to IT. It doesn’t belong to management. It doesn’t belong to your analysts. It belongs to everyone! Thus, you need to provide tools that promote your data. Those tools need to let your people create their own analyses and visualizations, and then allow them to freely share their data discoveries amongst their colleagues. (For this, Pluralsight uses Tableau and Logentries.)

            Remove the barriers between technical and non-technical staff

It is one thing to figuratively do this, but at Pluralsight, we literally remove physical barriers. This fosters an environment of increased communication and collaboration.

            Whenever possible, remove technical jargon from discussions, meetings, training, and documentation

Non-technical staff are reticent and discouraged from engaging in their data because they, literally, do not know how to talk about it. Question: What is the percentage of people in your organization that know what a data warehouse is? Or an ETL job? Or Big Data? Or SQL? Or Data Science? Or even analytics, itself? On the contrary, what is the percentage of people in your organization who are affected by these ideas and products? The latter number is obviously much higher than the former. These terms are ubiquitous is the tech realm. Yet, they are not nearly so in the rest of the business world. If we truly want our people to take ownership of their data, to analyze it, and to make decisions based on it, then we need to be able to speak differently to each other about it.

            Encourage all staff to engage in opportunities to develop their data skills

You must continually assess and improve the data skills of your employees. Does your staff know the basics in data manipulation, statistics, and data visualization? In order to achieve a viable ROI on analytics tools (such as Tableau), users must be able to successfully leverage the features that a tool provides. For example, we routinely provide training (live and archived video) to all staff. We then provide shared data sources that everyone in the company can access and analyze. Transparency of data goes hand-in-hand with the development of a strong data-centered culture.

            Recognize and reinforce any and all data curiosity

Every time an employee says something like, “I wonder how many times X happened before Y occurred,” we recognize this as an opportunity for data exploration and analysis, as well as an opportunity to encourage subsequent and ongoing research.

            Do not rely on an “Intuitive UI to engage your people

System-based enhancements and optimizations are prerequisites for DataOps and data literacy, not a panacea.

            Leadership buy-in and fluency

From the CEO to our front-line managers, our leadership uses data to support and validate their decision making processes. When there is data to be had, they believe in discovering the truth, not creating it based on their intuition or hunches.

The so-called analytics skills gap is a very real thing. We have endless data, and not nearly enough skilled people to create value from it. Employing DataOps strategies fills this gap by fostering an environment of social engagement with data. The solution to the skills gap problem will not be found in traditional education, standard BI strategies, or from systems themselves. Instead, it will be found when we encourage the data-curious among us to take ownership of their data, improve their own skills, and become analysts themselves.


This post was originally published on in May 2015.