Thursday, August 10, 2017

PowerBI vs Tableau vs R

Yesterday at the Nashville Analytics Summit I had the pleasure of demonstrating the strengths, weaknesses, similarities, and differences between Microsoft PowerBI, Tableau, and R.

The Setup

Last year when I spoke at the summit, I provided a rather in-depth review of of the DBIR data workflow.  One thing I noticed is the talk was further along in the data science process from most attendees who were still working in Tableau or even trying to decide what tool to use for their organization.  This year I decided to try and address that gap.

I recruited Kindall (a daily PowerBI user) and Ian (a daily Tableau user) to help me do a bake-off.  Eric, our moderator, would give us all a dataset we'd never seen (and it turned out, in a domain we don't work in) and some questions to answer.  We'd get them at 8:30 in the morning and then spend the day up until our talk at 4:15 analyzing the dataset and answering the questions.  (I got the idea from the fuzzing vs reverse engineering panel at Defcon a few years ago.)

The dataset was about 100,000 rows and 50 or so columns (about half medications given) related to medical stays involving diabetes.  The features were primarily factors of various sorts with a continuous feature for time in the hospital (the main variable of interest).

The Results

I'll skip most of the findings from the data as that wasn't really the point.  Instead I'll focus on the tools.  At a basic level, all three tools can create bar charts very quickly including color and alpha.  Tableau and PowerBI were very similar so I'll start there.

Tableau and PowerBI Similarities

  • Both are dashboard based
  • Both are driven from the mouse, dragging and dropping features into the dashboard
  • Both have a set of visualization types pre-defined that can be used
  • Both allow interactivity out of the box with clicking one chart subsetting others

Tableau and PowerBI Differences:

  • PowerBI is a bit more web-based.  It was easy to move from local to cloud and back.
  • PowerBI has more robust integration with other MS tools and will be familiar to excel users (though the formulas have some differences compared to excel as they are written in DAX).
    PowerBI keeps a history of actions that allow you to go backwards and see how you got where you are.
  • To share a dashboard in PowerBI you simply share a link to it.
  • Finally, PowerBI is pretty easy to use for free until you need to share dashboards.
  • Tableau Is more desktop application based.
  • You can publish dashboards to a server if you have the enterprise version or you can install the Tableau viewer app (however that still requires the receiver install software).  Also, sharing the actual workbook basically removes any security associated with your data.
  • Tableau dashboards can also be exported as PDFs but it is not the primary approach.
  • Tableau allows good organization of data within the GUI to help facilitate building the dashboard.
  • Tableau lacks the history though so there is no good way of telling how you did what you did.

Differences between R and Tableau/PowerBI

Most differences came between R and the other two tools
  • While PowerBI and Tableau are driven by the mouse and interact with a GUI, R is driven from the keyboard and interacts with a command-line.
  • In PowerBI or Tableau, initial investigation basically involves throwing features on the x and y axis and looking at the result.  Both provide the ability to look at the data table behind the dashboard but it's not really part of the workflow.  In R, you normally start at the data with something like `dplyr::glimpse()`, `summary()`, or `str()` which give you some summary statistics about the actual data.
  • In R you can build a dashboard similar to PowerBI or Tableau using the Shiny package, but it is _much_ harder.  Rather than be drag-and-drop, it is very manual.  To share the dashboard, the other person either needs Rstudio to run the app or you need a shiny server. (Shiny servers are free for a single concurrent user but cost money beyond that.)
  • R dashboards allow interaction, but it is again, more laborious.
  • R, however, you can actually do pretty much anything you want.  As an example, we discussed plotting the residuals of a regression.  In R it's a few lines.  In Tableau and PowerBI there was no straight-forward method at all.  The only options were to create a plot with a trend line (but no access to the underlying trend line model).  We discussed building more robust models such as a decision tree for classification.  Kindall found an option for it in PowerBI, but when she clicked it, it was basically just a link to R code.  Finally, the concept of tidyr::gather() (which combines a set of columns into two columns, 1 for the column names, and one for the column values) was both unknown and very appealing to Ian but unavailable in Tableau.)
  • R can install packages.  As far as we could tell, Tableau and PowerBI do not.  That means someone can add Joy plots to R on a whim.
  • In R, making the initial image is harder.  It's at least data plus an aesthetic plus a geom.  To get it to match the basic figure in PowerBI and Tableau is a lot harder, potentially adding theme information, possibly additional geoms for labeling columns, etc.  However, the amount of work to improve a figure in R scales linearly.  After you have matching figures across all three tools, if you wanted to, say, put a plot of points in the background with a lower opacity, that's a single line similar to `geom_jitter(alpha=0.01) + `.  Thats about the same amount of work as to make any other change.  In Tableau or PowerBI, it would be hours of messing with things to make such simple additions or modifications (if it's possible at all).  This is due to R's use of the Grammar of Graphics for figure generation.
  • Using the Grammar of Graphics, R can make incredible reports.  PDFs can be consumer quality. (Figures for the DBIR are mostly created in R with only minor updates to most figures by the layout team.)

Take-Aways

  1. The most important takeaway is that R is appropriate if you verbalize what you want to do, Tableau/PowerBI are appropriate if you can visualize the final outcome but don't know how to get there.  
    •  For example "I want to select subjects over 30, group them by gender, and calculate average age."  That can quickly be translated to R/dplyr verbs and implemented.  Regardless of how many things you want to do, if you can verbalize them, you can probably do them. 
    •  If you can visualize your final figure, you can drag and drop parts until you get to something close to what you want to do.  It's trial and error, but it's quick and  easy.  On the other hand, it only works for fairly straight-forward outcomes.
  2. PowerBI and Tableau are useful to quickly explore data.  R is useful if you want to dig deeper.
  3. Anything you can do in PowerBI and Tableau, you can do in R.  It's just going to be a lot harder.
  4. On the other hand, VERY quickly you hit things that R can do but Tableau or PowerBI cannot (at least directly).  The solution is that PowerBI and Tableau both support running R code internally.  This has it's own issues:
    • It requires a bit of setup.
    • If you learn the easy stuff in PowerBI or Tableau, but try to do the hard stuff in R, it'll be even harder because you don't know how to do the basics in R.
    • That said, once you've done the setup, you can probably just find how someone else has solved the problem in R and copy and paste it into your dashboard
    • Then, after the fact, you can go back through and teach yourself how the code actually did whatever hard thing you had it do.
  5. From a data model perspective, R is like excel while PowerBI and Tableau are like a database.  Let me demonstrate what I mean by example: 
    • When we started analyzing, the first thing the other two did was add a unique key to the data.  The reason is that without a key they aren't able to reference rows individually.  They tend toward bar charts because their tools automatically aggregate data.  They don't even think that they are summing/averaging the groups they are dragging in as it's done automatically
    • For myself using R, each row is inherently an observation.  As such as I only group explicitly and first create visualizations that are scatter plots, density plots, etc given my categorical variables by a single continuous variable.  On the other hand, Tableau and PowerBI make it very simple to link multiple tables and use all columns across all tables in a single figure.  In R, if you want to combine two data frames, you have to manually join them.
  6. Output: Tableau and PowerBI are designed primarily to produce dashboards.  Everything else is tacked on.  R is the opposite.  It's designed to produce reports of various types.  Dashboards and interactivity are tacked on.  That said, there is a lot of work going on to make R more interactive through the production of javascript-based visualizations.  I think are likely to see good dashboards in R with easy modification before we see easy high-quality report generation from PowerBI and Tableau.

Final Thoughts

This was a very good experience.  It was not a competition but an opportunity to see and discuss how the tools differed.  It was a lot of fun (though in some ways felt like a CTF, sitting in the vendor area doing the analysis.  Being under some pressure as you don't want to embarrass your tool (and by extension its other users).  I really wish I'd included Kibana or Splunk as I think they would have been different enough from PowerBI/Tableau or R to provide a unique perspective.  Ultimately I'm hoping it's something that I or the conference can do again as it was a great learning opportunity!

No comments:

Post a Comment