In this paper, we describe and discuss opportunities for modernizing official statistics through big data. Big data comes in high volume, high speed, and high variety. High volume can lead to better accuracy and more detail, high speed can lead to more frequent and more timely statistical estimates, and high variation can provide opportunities for statistics in new areas. However, there are also many challenges: there are uncontrollable changes in sources that threaten continuity and comparability, and data that are only indirectly related to phenomena of statistical interest. In addition, big data may be highly volatile and selective: the scope of the population they reference can change daily, causing unexplained spikes in the time series. And often, individual observations in these big data datasets do not have variables that allow them to be linked to other datasets or population frames. This severely limits the possibility of selectivity and fluctuation correction. Also, with the advancement of big data and open data, there are more opportunities for individual data disclosure, and this poses new problems for statistical agencies. So, big data can be considered a nonprobability sample. The use of these sources in official statistics requires a different approach than the traditional one based on surveys and censuses. The first approach is to accept big data for what it is: an imperfect, but very timely, indicator of developments in society. The second approach is to use formal models and extract information from this data. In recent years, many new methods for handling big data have been developed by mathematical and applied statisticians. National statistical agencies have always been reluctant to use models, except in special cases such as small area estimation.
Copyrights © 2023