Friday, September 7, 2018

Data Analysis Template

This is just a quick blog to share my jupyter notebook analysis template.  I analyze a lot of different datasets in a short period, so having the analysis consistent is very helpful.  I'll walk through the sections quickly to share a bit about my process.

Title Section

In the title section, I have a block for any ideas to explore, specific things I intend to do, anything I need to request to be updated in the data, and any notes about the data.  These are all bulleted text boxes.

This section is VERY helpful for working on multiple datasets.  it's easy to forget what you were going to do or what you've done and the summary up front helps get you back in place.

Preparation

next is preparing the data.  No data comes ready for analysis.  Here I have blocks to read in the data, clean the created dataframe, save it to an R data (Rda) object on disk, and then, the next time I need it, I just load the Rda and skip the cleaning.

Analysis

The analysis section is basically filled with mini experiments.  each chuck is one.  As such, it's important that each have a bit of information in comments at the top of it:

  1. A description of the hypothesis being tested or explored.  Something like "looking at the distribution of the periodicity of events".
  2. Once it's done, describe the results.  Yes, the results should describe the results but you'll thank past you if you write down what you got from the analysis when you did it.  Something like "it looks like the periodicity is bimodal with one mode representing X and another representing Y."
  3. Add a comment with a UUID.  Seriously.  Every. Single. Block.  If it's something interesting you're going to put it in a document or a blog or something.  You want to be able to track it from beginning to end.  (Ours track from the report, through several drafts of the report, through drafts of the sections, to a figures rmarkdown file that generates all the figures, to an exploratory report where we created the original analysis.)  Seriously.  If you like it then you shoulda put a UUID on it.
  4. Now you can actually write the analysis code

Appendixes

This is where I put all of the extra stuff.

Testing

I always have a testing block.  Throughout the analysis, you'll spend a lot time testing stuff to make it work, (or simply looking up things like the dimensions of your data and the column names).  Putting those in a testing block keeps you from coming back later and wondering what the block in your analysis was there for.

Lookups

Sometimes you have big, ugly, lookups.  putting them at the top clogs the Preparation section, so I tend to put them at the bottom.  You'll remember you forgot to run them when your analysis fails.

Backup

Really a parking lot for anything you don't want in another section, but don't want to delete.


Ultimately, if I were doing full modeling, I'd probably want a template that follows the process outlined in Modern Dive.  However, for someone just getting into analysis, hopefully this helps!

15 comments:

  1. An all around complex information structure is utilized when considering huge measure of information where as a basic information structure is viewed as enough if the client information is little. ExcelR Data Science Courses

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hey, thanks for this great article I really like this post and I love your blog and also Check data science course Data-Analytics course

    ReplyDelete
  4. It has fully emerged to crown Singapore's southern shores and undoubtedly placed her on the global map of residential landmarks. I still scored the more points than I ever have in a season for GS. I think you would be hard pressed to find somebody with the same consistency I have had over the years so I am happy with that.
    data analytics courses

    ReplyDelete
  5. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.data science training in Hyderabad

    ReplyDelete
  6. You completed certain reliable points there. I did a search on the subject and found nearly all people will agree with your blog.
    data science classes in hyderabad

    ReplyDelete

  7. Enroll yourself in the Data Science training online program and reach the epitome of success. We provide a world-class curriculum framed by top industry experts.
    business analytics training in hyderabad

    ReplyDelete
  8. Thank you for taking the time and sharing this information with us. Elevate your online presence with our handpicked dofollow classified submission sites! Discover a world of opportunity with free classified submission sites designed to boost your reach.
    visit Classified submission sites

    ReplyDelete
  9. Thank you for taking the time and sharing this information with us. Embark on a seamless online learning journey for your 2nd grader with our tailored home tuition classes.
    For more info visit Online tuition for class 2 near me

    ReplyDelete