Zero-inflated negative binomial distribution is characterized in this paper through a linear differential equation satisfied by its probability generating function.
Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to compl...Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.展开更多
The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six gene...The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six generalized linear models to examine the relationship between the occurrence of lightning-induced forest fires and meteorological factors in the Northern Daxing'an Mountains of China. The six models included Poisson, negative binomial (NB), zero- inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), Poisson hurdle (PH), and negative binomial hurdle (NBH) models. Goodness-of-fit was compared and tested among the six models using Akaike information criterion (AIC), sum of squared errors, likelihood ratio test, and Vuong test. The predictive performance of the models was assessed and compared using independent validation data by the data-splitting method. Based on the model AIC, the ZINB model best fitted the fire occurrence data, followed by (in order of smaller AIC) NBH, ZIP, NB, PH, and Poisson models. The ZINB model was also best for pre- dicting either zero counts or positive counts (〉1). The two Hurdle models (PH and NBH) were better than ZIP, Poisson, and NB models for predicting positive counts, but worse than these three models for predicting zero counts. Thus, the ZINB model was the first choice for modeling the occurrence of lightning-induced forest fires in this study, which implied that the excessive zero counts of lightning- induced fires came from both structure and sampling zeros.展开更多
Road crash prediction models are very useful tools in highway safety, given their potential for determining both the crash frequency occurrence and the degree severity of crashes. Crash frequency refers to the predict...Road crash prediction models are very useful tools in highway safety, given their potential for determining both the crash frequency occurrence and the degree severity of crashes. Crash frequency refers to the prediction of the number of crashes that would occur on a specific road segment or intersection in a time period, while crash severity models generally explore the relationship between crash severity injury and the contributing factors such as driver behavior, vehicle characteristics, roadway geometry, and road-environment conditions. Effective interventions to reduce crash toll include design of safer infrastructure and incorporation of road safety features into land-use and transportation planning;improvement of vehicle safety features;improvement of post-crash care for victims of road crashes;and improvement of driver behavior, such as setting and enforcing laws relating to key risk factors, and raising public awareness. Despite the great efforts that transportation agencies put into preventive measures, the annual number of traffic crashes has not yet significantly decreased. For in-stance, 35,092 traffic fatalities were recorded in the US in 2015, an increase of 7.2% as compared to the previous year. With such a trend, this paper presents an overview of road crash prediction models used by transportation agencies and researchers to gain a better understanding of the techniques used in predicting road accidents and the risk factors that contribute to crash occurrence.展开更多
Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the freq...Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the frequency of that use during the period 2000-2008. Study Design: The study is a cross-sectional analytical study using 2000 and 2008 Egypt Demographic and Health Surveys. Methods: The assessment of the studied impact is conducted using the Zero-inflated Negative Binomial Regression. In addition, Factor Analysis technique is used to construct some of the explanatory variables such as women’s empowerment, the availability and quality of health services indicators. Results: Utilization of antenatal health care services is greatly improved from 2000 to 2008. Availability of health services is one of the main determinants that affect the number of antenatal care visits in 2008. Wealth index and quality of health services play an important role in raising the level of antenatal care utilization in 2000 and 2008. However, the impact of the terminated pregnancy on receiving ANC increased over time. Conclusions: Further research of the determinants of antenatal health care utilization is needed, using more updated measures of women’s empowerment, availability and quality of health services. In order to improve the provision of antenatal health care services, it is important to understand barriers to antenatal health care utilization. Therefore, it is advisable to collect information from women about the reasons for not receiving antenatal care.展开更多
Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key chall...Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key challenge is data sparsity,since that 1)not all crimes have been recorded,and 2)crimes usually occur with low frequency.In this paper,we propose an effective framework to predict fine-grained and dynamic crime risks in each road using heterogeneous urban data.First,to address the issue of unreported crimes,we propose a cross-aggregation soft-impute(CASI)method to deal with possible unreported crimes.Then,we use a novel crime risk measurement to capture the crime dynamics from the perspective of influence propagation,taking into consideration of both time-varying and location-varying risk propagation.Based on the dynamically calculated crime risks,we design contextual features(i.e.,POI distributions,taxi mobility,demographic features)from various urban data sources,and propose a zero-inflated negative binomial regression(ZINBR)model to predict future crime risks in roads.The experiments using the real-world data from New York City show that our framework can accurately predict road crime risks,and outperform other baseline methods.展开更多
文摘Zero-inflated negative binomial distribution is characterized in this paper through a linear differential equation satisfied by its probability generating function.
基金supported by the Basic Performance Key Project,the Ministry of Science and Technology of the People’s Republic of China(No.2006FY110300)
文摘Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.
基金funded by Asia–Pacific Forests Net(APFNET/2010/FPF/001)National Natural Science Foundation of China(Grant No.31400552)
文摘The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six generalized linear models to examine the relationship between the occurrence of lightning-induced forest fires and meteorological factors in the Northern Daxing'an Mountains of China. The six models included Poisson, negative binomial (NB), zero- inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), Poisson hurdle (PH), and negative binomial hurdle (NBH) models. Goodness-of-fit was compared and tested among the six models using Akaike information criterion (AIC), sum of squared errors, likelihood ratio test, and Vuong test. The predictive performance of the models was assessed and compared using independent validation data by the data-splitting method. Based on the model AIC, the ZINB model best fitted the fire occurrence data, followed by (in order of smaller AIC) NBH, ZIP, NB, PH, and Poisson models. The ZINB model was also best for pre- dicting either zero counts or positive counts (〉1). The two Hurdle models (PH and NBH) were better than ZIP, Poisson, and NB models for predicting positive counts, but worse than these three models for predicting zero counts. Thus, the ZINB model was the first choice for modeling the occurrence of lightning-induced forest fires in this study, which implied that the excessive zero counts of lightning- induced fires came from both structure and sampling zeros.
文摘Road crash prediction models are very useful tools in highway safety, given their potential for determining both the crash frequency occurrence and the degree severity of crashes. Crash frequency refers to the prediction of the number of crashes that would occur on a specific road segment or intersection in a time period, while crash severity models generally explore the relationship between crash severity injury and the contributing factors such as driver behavior, vehicle characteristics, roadway geometry, and road-environment conditions. Effective interventions to reduce crash toll include design of safer infrastructure and incorporation of road safety features into land-use and transportation planning;improvement of vehicle safety features;improvement of post-crash care for victims of road crashes;and improvement of driver behavior, such as setting and enforcing laws relating to key risk factors, and raising public awareness. Despite the great efforts that transportation agencies put into preventive measures, the annual number of traffic crashes has not yet significantly decreased. For in-stance, 35,092 traffic fatalities were recorded in the US in 2015, an increase of 7.2% as compared to the previous year. With such a trend, this paper presents an overview of road crash prediction models used by transportation agencies and researchers to gain a better understanding of the techniques used in predicting road accidents and the risk factors that contribute to crash occurrence.
文摘Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the frequency of that use during the period 2000-2008. Study Design: The study is a cross-sectional analytical study using 2000 and 2008 Egypt Demographic and Health Surveys. Methods: The assessment of the studied impact is conducted using the Zero-inflated Negative Binomial Regression. In addition, Factor Analysis technique is used to construct some of the explanatory variables such as women’s empowerment, the availability and quality of health services indicators. Results: Utilization of antenatal health care services is greatly improved from 2000 to 2008. Availability of health services is one of the main determinants that affect the number of antenatal care visits in 2008. Wealth index and quality of health services play an important role in raising the level of antenatal care utilization in 2000 and 2008. However, the impact of the terminated pregnancy on receiving ANC increased over time. Conclusions: Further research of the determinants of antenatal health care utilization is needed, using more updated measures of women’s empowerment, availability and quality of health services. In order to improve the provision of antenatal health care services, it is important to understand barriers to antenatal health care utilization. Therefore, it is advisable to collect information from women about the reasons for not receiving antenatal care.
基金This work was partly supported by the National Natural Science Foundation of China(Grant No.61772460)Ten Thousand Talent Program of Zhejiang Province(2018R52039).
文摘Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key challenge is data sparsity,since that 1)not all crimes have been recorded,and 2)crimes usually occur with low frequency.In this paper,we propose an effective framework to predict fine-grained and dynamic crime risks in each road using heterogeneous urban data.First,to address the issue of unreported crimes,we propose a cross-aggregation soft-impute(CASI)method to deal with possible unreported crimes.Then,we use a novel crime risk measurement to capture the crime dynamics from the perspective of influence propagation,taking into consideration of both time-varying and location-varying risk propagation.Based on the dynamically calculated crime risks,we design contextual features(i.e.,POI distributions,taxi mobility,demographic features)from various urban data sources,and propose a zero-inflated negative binomial regression(ZINBR)model to predict future crime risks in roads.The experiments using the real-world data from New York City show that our framework can accurately predict road crime risks,and outperform other baseline methods.