Data-driven Science / Research: Explore -> Estimate -> Explain

Data-driven science is becoming increasingly popular.  Here are some of my notes/opinions on the issue.

Traditional Approach: Hypothesis -> Model -> Verification (based on data)
Data-driven Approach: Data -> Model

Personally I would not quite consider the “data -> model” to be science but more of engineering *.  For it to be called science, some knowledge or understanding needs to be extracted from the model; i.e. data -> model -> knowledge.  I would prefer to describe it as: explore -> estimate -> explain.

explore: mostly includes just looking at the data for various patterns, i.e. unsupervised learning

estimate: constructing a model that is able to accurately predict/estimate a certain output, i.e. supervised learning

explain: examining obtained model and trying to understand why it works and what knowledge could be obtained from it.


Another large potential flaw of just using the  data -> model; is that sometimes created model is overfitted, so its usefulness is reduced even further.

* Unless new modeling techniques were developed in the progress.

About Neil Rubens

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *