|
|
Business Intelligence Series
|
Data Science is a collection of quantitative and qualitative methods,
respectively techniques, algorithms, principles, processes and technologies
used to analyze, and process amounts of raw and aggregated data to extract
information or knowledge it contains. Its theoretical basis is rooted within
mathematics, mainly statistics, computer science and domain expertise, though
it can include further aspects related to communication, management,
sociology, ecology, cybernetics, and probably many other fields, as there’s
enough space for experimentation and translation of knowledge from one field
to another.
The aim of Data Science is to extract valuable insights from data to support
decision-making, problem-solving, drive innovation and probably it can achieve
more in time. Reading in between the lines, Data Science sounds like a
superhero that can solve all the problems existing out there, which frankly is
too beautiful to be true! In theory everything is possible, when in practice
there are many hard limitations! Given any amount of data, the knowledge that
can be obtained from it can be limited by many factors - the degree to which
the data, processes and models built reflect reality, and there can be many
levels of approximation, respectively the degree to which such data can be
collected consistently.
Moreover, even if the theoretical basis seems sound, the data, information or
knowledge which is not available can be the important missing link in making
any sensible progress toward the goals set in Data Science projects. In some
cases, one might be aware of what's missing, though for the data scientist not
having the required domain knowledge, this can be a hard limit! This gap can
be probably bridged with sensemaking, exploration and experimentation
approaches, especially by applying models from other domains, though there are
no guarantees ahead!
AI can help in this direction by utilizing its capacity to explore fast ideas
or models. However, it's questionable how much the models built with AI can be
further used if one can't build mechanistical mental models of the processes
reflected in the data. It's like devising an algorithm for winning at lottery
small amounts, though investing more money in the algorithm doesn't
automatically imply greater wins. Even if occasionally the performance is
improved, it's questionable how much it can be leveraged for each utilization.
Statistics has its utility when one studies data in aggregation and can
predict average behavior. It can’t be used to predict the occurrence of events
with a high precision. Think how hard the prediction of earthquakes or extreme
weather is by just looking at a pile of data reflecting what’s happening only
in a certain zone!
In theory, the more data one has from different geographical areas or
organizations, the more robust the models can become. However, no two
geographies, respectively no two organizations are alike: business models, the
people, the events and other aspects make global models less applicable to
local context. Frankly, one has more chances of progress if a model is
obtained by having a local scope and then attempting to leverage the
respective model for a broader scope. Even then, there can be differences
between the behavior or phenomena at micro, respectively at macro level (see
the law of physics).
This doesn’t mean that Data Science or AI related knowledge is useless. The
knowledge accumulated by applying various techniques, models and programming
languages in problem-solving can be more valuable than the results obtained!
Experimentation is a must for organizations to innovate, to extend their
knowledge base. It’s also questionable how much of the respective knowledge
can be retained and put to good use. In the end, each organization must
determine this by itself!