Friday, September 7, 2018

Data Analysis Template

This is just a quick blog to share my jupyter notebook analysis template.  I analyze a lot of different datasets in a short period, so having the analysis consistent is very helpful.  I'll walk through the sections quickly to share a bit about my process.

Title Section

In the title section, I have a block for any ideas to explore, specific things I intend to do, anything I need to request to be updated in the data, and any notes about the data.  These are all bulleted text boxes.

This section is VERY helpful for working on multiple datasets.  it's easy to forget what you were going to do or what you've done and the summary up front helps get you back in place.

Preparation

next is preparing the data.  No data comes ready for analysis.  Here I have blocks to read in the data, clean the created dataframe, save it to an R data (Rda) object on disk, and then, the next time I need it, I just load the Rda and skip the cleaning.

Analysis

The analysis section is basically filled with mini experiments.  each chuck is one.  As such, it's important that each have a bit of information in comments at the top of it:

  1. A description of the hypothesis being tested or explored.  Something like "looking at the distribution of the periodicity of events".
  2. Once it's done, describe the results.  Yes, the results should describe the results but you'll thank past you if you write down what you got from the analysis when you did it.  Something like "it looks like the periodicity is bimodal with one mode representing X and another representing Y."
  3. Add a comment with a UUID.  Seriously.  Every. Single. Block.  If it's something interesting you're going to put it in a document or a blog or something.  You want to be able to track it from beginning to end.  (Ours track from the report, through several drafts of the report, through drafts of the sections, to a figures rmarkdown file that generates all the figures, to an exploratory report where we created the original analysis.)  Seriously.  If you like it then you shoulda put a UUID on it.
  4. Now you can actually write the analysis code

Appendixes

This is where I put all of the extra stuff.

Testing

I always have a testing block.  Throughout the analysis, you'll spend a lot time testing stuff to make it work, (or simply looking up things like the dimensions of your data and the column names).  Putting those in a testing block keeps you from coming back later and wondering what the block in your analysis was there for.

Lookups

Sometimes you have big, ugly, lookups.  putting them at the top clogs the Preparation section, so I tend to put them at the bottom.  You'll remember you forgot to run them when your analysis fails.

Backup

Really a parking lot for anything you don't want in another section, but don't want to delete.


Ultimately, if I were doing full modeling, I'd probably want a template that follows the process outlined in Modern Dive.  However, for someone just getting into analysis, hopefully this helps!

20 comments:

  1. An all around complex information structure is utilized when considering huge measure of information where as a basic information structure is viewed as enough if the client information is little. ExcelR Data Science Courses

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Actually I read it yesterday but I had some thoughts about it and today I wanted to read it again because it is very well written.

    business analytics course

    data analytics courses

    data science interview questions

    data science course in mumbai

    ReplyDelete
  4. Hey, thanks for this great article I really like this post and I love your blog and also Check data science course Data-Analytics course

    ReplyDelete
  5. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
    Data science Interview Questions
    Data Science Course

    ReplyDelete
  6. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.

    data science course

    ReplyDelete
  7. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries. keep it up.
    data analytics course in Bangalore

    ReplyDelete
  8. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression

    ReplyDelete
  9. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression
    data science interview questions

    ReplyDelete
  10. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

    Simple Linear Regression

    Correlation vs Covariance

    ReplyDelete
  11. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression
    data science interview questions

    ReplyDelete
  12. Thanks for sharing great information. I like your blog and highly recommend. We also offer best data science training in Hyderabaddata scientist courses

    ReplyDelete
  13. It has fully emerged to crown Singapore's southern shores and undoubtedly placed her on the global map of residential landmarks. I still scored the more points than I ever have in a season for GS. I think you would be hard pressed to find somebody with the same consistency I have had over the years so I am happy with that.
    data analytics courses

    ReplyDelete
  14. Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Simple Linear Regression
    Correlation vs covariance
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

    ReplyDelete
  15. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.data science training in Hyderabad

    ReplyDelete
  16. I would like to thank you for getting my neurons conspicuous with this brilliant article that you have written which contains every potential points which needs to considered on the given topic. Thanks for chipping in such a brilliant writing!
    Data Science training in Mumbai
    Data Science course in Mumbai
    SAP training in Mumbai

    ReplyDelete
  17. Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple Linear Regression
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

    ReplyDelete
  18. Very nice blogs!!! i have to learning for lot of information for this sites…Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data sciecne course in hyderabad

    ReplyDelete