Data-driven science is becoming increasingly popular. Here are some of my notes/opinions on the issue.
Traditional Approach: Hypothesis -> Model -> Verification (based on data)
Data-driven Approach: Data -> Model
Personally I would not quite consider the “data -> model” to be science but more of engineering *. For it to be called science, some knowledge or understanding needs to be extracted from the model; i.e. data -> model -> knowledge. I would prefer to describe it as: explore -> estimate -> explain.
explore: mostly includes just looking at the data for various patterns, i.e. unsupervised learning
estimate: constructing a model that is able to accurately predict/estimate a certain output, i.e. supervised learning
explain: examining obtained model and trying to understand why it works and what knowledge could be obtained from it.
Another large potential flaw of just using the data -> model; is that sometimes created model is overfitted, so its usefulness is reduced even further.
* Unless new modeling techniques were developed in the progress.