The research conducted on the 2015-2021 Data Breach Report in the U.S. Department of Health and Human Services is a study related to the estimation and modeling of the breach sizes each type of entity using the Kernel-Generalized Pareto Distribution Mixture Model method, as well as the estimation of the dependence of breach sizes between years with the D-Vine Copula. The D-Vine Copula can accommodate the complex dependencies demonstrated by data breach reports across all enterprise categories. Before researching with D-Vine Copula, we will first model and estimate breach size parameters for each type of entity using the Mixture Model Kernel-Generalized Pareto Distribution (GPD). The Mixture Model can accommodate large data breach sizes via GPD and also allows the use of non-parametric kernel distributions to model smaller data breach sizes. The data resulting from the logarithmic transformation of entity data in the Business Associate and Healthcare Provider types has a right short-tail with Weibull distribution, while the Health Plan category has a right heavy-tail with Frechet distribution. The three types of entity were estimated using the maximum likelihood Cross-Validation method. Dependency estimation with D-Vine Copula shows that the breach sizes between years measure has a positive dependency.
Copyrights © 2023