With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th...With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.展开更多
In this research,an integrated classification method based on principal component analysis-simulated annealing genetic algorithm-fuzzy cluster means(PCA-SAGA-FCM)was proposed for the unsupervised classification of tig...In this research,an integrated classification method based on principal component analysis-simulated annealing genetic algorithm-fuzzy cluster means(PCA-SAGA-FCM)was proposed for the unsupervised classification of tight sandstone reservoirs which lack the prior information and core experiments.A variety of evaluation parameters were selected,including lithology characteristic parameters,poro-permeability quality characteristic parameters,engineering quality characteristic parameters,and pore structure characteristic parameters.The PCA was used to reduce the dimension of the evaluation pa-rameters,and the low-dimensional data was used as input.The unsupervised reservoir classification of tight sandstone reservoir was carried out by the SAGA-FCM,the characteristics of reservoir at different categories were analyzed and compared with the lithological profiles.The analysis results of numerical simulation and actual logging data show that:1)compared with FCM algorithm,SAGA-FCM has stronger stability and higher accuracy;2)the proposed method can cluster the reservoir flexibly and effectively according to the degree of membership;3)the results of reservoir integrated classification match well with the lithologic profle,which demonstrates the reliability of the classification method.展开更多
Having researched for many years, seismologists in China presented about 80 earthquake prediction factors which reflected omen information of earthquake. How to concentrate the information that the 80 earthquake predi...Having researched for many years, seismologists in China presented about 80 earthquake prediction factors which reflected omen information of earthquake. How to concentrate the information that the 80 earthquake prediction factors have and how to choose the main factors to predict earthquakes precisely have become one of the topics in seismology. The model of principal component-discrimination consists of principal component analysis, correlation analysis, weighted method of principal factor coefficients and Mahalanobis distance discrimination analysis. This model combines the method of maximization earthquake prediction factor information with the weighted method of principal factor coefficients and correlation analysis to choose earthquake prediction variables, applying Mahalanobis distance discrimination to establishing earthquake prediction discrimination model. This model was applied to analyzing the earthquake data of Northern China area and obtained good prediction results.展开更多
The weather in Nagano Prefecture, Japan, can be roughly classified into four types according to principal component analysis and k-means clustering. We predicted the extreme values of the maximum daily and hourly prec...The weather in Nagano Prefecture, Japan, can be roughly classified into four types according to principal component analysis and k-means clustering. We predicted the extreme values of the maximum daily and hourly precipitation in Nagano Prefecture using the extreme value theory. For the maximum daily precipitation, the vales of ξ in Matsumoto, Karuizawa, Sugadaira, and Saku were positive;therefore, it has no upper bound and tends to take large values. Therefore, it is dangerous and caution is required. The values of ξ in Nagano, Kisofukushima, and Minamishinano were determined to be zero, therefore, there was no upper limit, the probability of obtaining a large value was low, and caution was required. We predicted the maximum return levels for return periods of 10, 20, 50, and 100 years along with respective 95% confidence intervals in Nagano, Matsumoto, Karuizawa, Sugadaira, Saku, Kisofukushima, and Minamishinano. In Matsumoto, the 100-year return level was 182 mm, with a 95% CI [129, 236]. In Minamishinano, the 100-year return level was 285 mm, with a 95% CI [173, 398]. The 100-year return levels for the maximum daily rainfall were 285, 271, and 271 mm in Minamishinano, Saku, and Karuizawa, respectively, where the changes in the daily maximum rainfall were larger than those at other points. Because these values are large, caution is required during heavy rainfall. The 100-year return levels for the maximum daily and hourly precipitation were similar in Karuizawa and Saku. In Sugadaira, the 100-year return level for a maximum hourly rainfall of 107.2 mm was larger than the maximum daily rainfall. Hence, it is necessary to be careful about short-term rainfall events.展开更多
Investigation of genetic diversity of geographically distant wheat genotypes is </span><span style="font-family:Verdana;">a </span><span style="font-family:Verdana;">useful ...Investigation of genetic diversity of geographically distant wheat genotypes is </span><span style="font-family:Verdana;">a </span><span style="font-family:Verdana;">useful approach in wheat breeding providing efficient crop varieties. This article presents multivariate cluster and principal component analyses (PCA) of some yield traits of wheat, such as thousand-kernel weight (TKW), grain number, grain yield and plant height. Based on the results, an evaluation of economically valuable attributes by eigenvalues made it possible to determine the components that significantly contribute to the yield of common wheat genotypes. Twenty-five genotypes were grouped into four clusters on the basis of average linkage. The PCA showed four principal components (PC) with eigenvalues ></span><span style="font-family:""> </span><span style="font-family:Verdana;">1, explaining approximately 90.8% of the total variability. According to PC analysis, the variance in the eigenvalues was </span><span style="font-family:Verdana;">the </span><span style="font-family:Verdana;">greatest (4.33) for PC-1, PC-2 (1.86) and PC-3 (1.01). The cluster analysis revealed the classification of 25 accessions into four diverse groups. Averages, standard deviations and variances for clusters based on morpho-physiological traits showed that the maximum average values for grain yield (742.2), biomass (1756.7), grains square meter (18</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;">373.7), and grains per spike (45.3) were higher in cluster C compared to other clusters. Cluster D exhibited the maximum thousand-kernel weight (TKW) (46.6).展开更多
Clustering is an important unsupervised classification method which divides data into different groups based some similarity metrics. K-means becomes an increasing method for clustering and is widely used in different...Clustering is an important unsupervised classification method which divides data into different groups based some similarity metrics. K-means becomes an increasing method for clustering and is widely used in different application. Centroid initialization strategy is the key step in K-means clustering. In general, K-means has three efficient initialization strategies to improve its performance i.e., Random, K-means++ and PCA-based K-means. In this paper, we design an experiment to evaluate these three strategies on UCI ML hand-written digits dataset. The experiment result shows that the three K-means initialization strategies find out almost identical cluster centroids, and they have almost the same results of clustering, but the PCA-based K-means strategy significantly improves running time, and is faster than the other two strategies.展开更多
Water borne ailments are of serious public health concern in Gilgit Baltistan’s (GB) region of Pakistan. The pollution load on the glacio-fluvial streams and surface water resources of the Chapurson Valley in the Hun...Water borne ailments are of serious public health concern in Gilgit Baltistan’s (GB) region of Pakistan. The pollution load on the glacio-fluvial streams and surface water resources of the Chapurson Valley in the Hunza Nagar area of the GB is increasing as a result of anthropogenic activities and tourism. The present study focuses on the public health quality of drinking water of Chapurson valley. The study addressed the fundamental drinking water quality criteria in order to understand the state of the public health in the valley. To ascertain the current status of physico-chemical, metals, and bacteriological parameters, 25 water samples were collected through deterministic sampling strategy and examined accordingly. The physico-chemical parameters of the water samples collected from the valley were found to meet the World Health Organization (WHO) guidelines of drinking water. The water samples showed a pattern of mean metal concentrations in order of Arsenic (As) > Lead (Pb) > Iron (Fe) > Zinc (Zn) > Copper (Cu) > Magnesium (Mg) > Calcium (Ca). As, Cu, Zn, Ca and Mg concentration were under the WHO guidelines range. However, results showed that Pb and Fe are present at much higher concentrations than recommended WHO guidelines. Similarly, the results of the bacteriological analysis indicate that the water samples are heavily contaminated with the organisms of public health importance (including total coliforms (TCC), total faecal coliforms (TFC) and total fecal streptococci (TFS) are more than 3 MPN/100mL). Three principal components, accounting for 48.44% of the total variance, were revealed using principal component analysis (PCA). Bacteriological parameters were shown to be the main determinants of the water quality as depicted by the PCA analysis. The dendrogram of Cluster analysis using the Ward’s method validated the same traits of the sampling locations that were found to be contaminated during geospatial analysis using the Inverse Distance Weight (IDW) method. Based on these findings, it is most likely that those anthropogenic activities and essentially the tourism results in pollution load from upstream channels. Metals may be released into surface and groundwater from a few underlying sources as a result of weathering and erosion. This study suggests that the valley water resources are more susceptible to bacteriological contamination and as such no water treatment facilities or protective measure have been taken to encounter the pollution load. People are drinking the contaminated water without questioning about the quality. It is recommended that the water resources of the valley should be monitored using standard protocol so as to protect not only the public health but to safe guard sustainable tourism in the valley.展开更多
In order to monitor malt quality in the malting industry, despite yearly variations in the barley quality, 394 barley samples were analysed using conventional (moisture, protein and B-glucan content) and mid-infrare...In order to monitor malt quality in the malting industry, despite yearly variations in the barley quality, 394 barley samples were analysed using conventional (moisture, protein and B-glucan content) and mid-infrared Fourier transform spectroscopy FT-IR. The experimental dataset included barley from three harvest years, two barley species, 77 barley varieties, and two-row and six-row barley, from 16 cultivation sites. For each sample, the malt quality indices were also assessed according to European Brewing Convention (EBC) standards. Principal component analysis (PCA) was carried out on mean-centred, normalized and derivative spectra using 200/cm width spectral bands. The most informative spectral bands were observed in the 800-1,000/cm and 1,000-1,200/cm ranges. PCA revealed that barley harvested in 2010 and in 2011 had bands that were very close together, while 2009 harvest clearly displayed a difference in its quality. PCA made it possible to distinguish two species and confirmed that two-row winter barley quality was closer to two-row spring barley quality than to six-row winter barley. Results indicate that mid-infrared spectrometry (MIR) could be a very useful and rapid analytical tool to assess barley qualitative quality.展开更多
A diversity of socio economic and cultural factors contributes towards maintenance and changes in dietary patterns of people. Therefore People around the world have adapted different types of dietary patterns for thei...A diversity of socio economic and cultural factors contributes towards maintenance and changes in dietary patterns of people. Therefore People around the world have adapted different types of dietary patterns for their survival. Aim of this study was to investigate the most relevant factors influencing human dietary patterns. Sample for the study was selected by using the Stratified sampling technique, which consists of 390 families residing around the Abatenna estate, Bandarawela municipal council, Sri Lanka. Principal component analysis techniques and correlation analysis were employed to identify the most relevant factors which affect human dietary patterns. Results of the study indicate that socio economic conditions, monthly income, number of children in a family, dietary patterns and weight-related behaviors are highly co-related with each other. These findings suggest that education and awareness programs on nutrition should target low income groups to enhance their knowledge on dietary patterns.展开更多
To categorize the nations to reflect the development status, to date, there are many conceptual frameworks. The Human Development index (HDI) that is published by the United Nations Development Programme is widely acc...To categorize the nations to reflect the development status, to date, there are many conceptual frameworks. The Human Development index (HDI) that is published by the United Nations Development Programme is widely accepted and practiced by many people such as academicians, politicians, and donor organizations. However, though the development of HDI has gone through many revisions since its formulation in 1990, even the current version of the index formulation published in 2016 needs research to better understand and to gap-fill the knowledge base that can enhance the index formulation to facilitate the direction of attention such as release of funds. Therefore, in this paper, based on principal component analysis and K-means clustering algorithm, the data that reflect the measures of life expectancy index (LEI), education index (EI), and income index (II) are analyzed to categorize and to rank the member states of the UN using R statistical software package, an open source extensible programming language for statistical computing and graphics. The outcome of the study shows that the proportion of total eigen value (i.e., proportion of total variance) explained by PCA-1 (i.e., first principal component) accounts for more than 85% of the total variation. Moreover, the proportion of total eigen value explained by PCA-1 increases with time (i.e., yearly) though the amount of increase with time is not significant. However, the proportions of total eigen value explained by PCA-2 and PCA-3 decrease with time. Therefore, the loss of information in choosing PCA-1 to represent the chosen explanatory variables (i.e., LEI, EI, and II) may diminish with time if the trend of increasing pattern of proportion of total eigen value explained by PCA-1 with time continues in the future as well. On the other hand, the correlation between EI and PCA-1 increases with time although the magnitude of increase is not that significant. This same trend is observed in II as well. However, in contrast to these observations, the correlation between PCA-1 and LEI decreases with time. These findings imply that the contributions of EI and II to PCA-1 increase with time, but the contribution of LEI to PCA-1 decreases with time. On top of these, as per Hopkins statistic, the clusterability of the information conveyed by PCA-1 alone is far better than the clusterability of the information conveyed by PCA scores (i.e., PCA-1, PCA-2, and PCA-3) and the explanatory variables. Therefore, choosing PCA-1 to represent the chosen explanatory variables is becoming more concrete.展开更多
In this study the principal component analysis (PCA) and geographically weighted regression (GWR) are combined to estimate the spatial distribution of water requirement of the winter wheat in North China while the eff...In this study the principal component analysis (PCA) and geographically weighted regression (GWR) are combined to estimate the spatial distribution of water requirement of the winter wheat in North China while the effect of the macroand micro-topographic as well as the meteorological factors on the crop water requirement is taking into account. The spatial distribution characteristic of the water requirement of the winter wheat in North China and its formation are analyzed based on the spatial variation of the main affecting factors and the regression coefficients. The findings reveal that the collinearity can be effectively removed when PCA is applied to process all of the affecting factors. The regression coefficients of GWR displayed a strong variability in space, which can better explain the spatial differences of the effect of the affecting factors on the crop water requirement. The evaluation index of the proposed method in this study is more efficient than the widely used Kriging method. Besides, it could clearly show the effect of those affecting factors in different spatial locations on the crop water requirement and provide more detailed information on the region where those factors suddenly change. To sum up, it is of great reference significance for the estimation of the regional crop water requirement.展开更多
文摘With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation.
基金funded by the National Natural Science Foundation of China(42174131)the Strategic Cooperation Technology Projects of CNPC and CUPB(ZLZX2020-03).
文摘In this research,an integrated classification method based on principal component analysis-simulated annealing genetic algorithm-fuzzy cluster means(PCA-SAGA-FCM)was proposed for the unsupervised classification of tight sandstone reservoirs which lack the prior information and core experiments.A variety of evaluation parameters were selected,including lithology characteristic parameters,poro-permeability quality characteristic parameters,engineering quality characteristic parameters,and pore structure characteristic parameters.The PCA was used to reduce the dimension of the evaluation pa-rameters,and the low-dimensional data was used as input.The unsupervised reservoir classification of tight sandstone reservoir was carried out by the SAGA-FCM,the characteristics of reservoir at different categories were analyzed and compared with the lithological profiles.The analysis results of numerical simulation and actual logging data show that:1)compared with FCM algorithm,SAGA-FCM has stronger stability and higher accuracy;2)the proposed method can cluster the reservoir flexibly and effectively according to the degree of membership;3)the results of reservoir integrated classification match well with the lithologic profle,which demonstrates the reliability of the classification method.
文摘Having researched for many years, seismologists in China presented about 80 earthquake prediction factors which reflected omen information of earthquake. How to concentrate the information that the 80 earthquake prediction factors have and how to choose the main factors to predict earthquakes precisely have become one of the topics in seismology. The model of principal component-discrimination consists of principal component analysis, correlation analysis, weighted method of principal factor coefficients and Mahalanobis distance discrimination analysis. This model combines the method of maximization earthquake prediction factor information with the weighted method of principal factor coefficients and correlation analysis to choose earthquake prediction variables, applying Mahalanobis distance discrimination to establishing earthquake prediction discrimination model. This model was applied to analyzing the earthquake data of Northern China area and obtained good prediction results.
文摘The weather in Nagano Prefecture, Japan, can be roughly classified into four types according to principal component analysis and k-means clustering. We predicted the extreme values of the maximum daily and hourly precipitation in Nagano Prefecture using the extreme value theory. For the maximum daily precipitation, the vales of ξ in Matsumoto, Karuizawa, Sugadaira, and Saku were positive;therefore, it has no upper bound and tends to take large values. Therefore, it is dangerous and caution is required. The values of ξ in Nagano, Kisofukushima, and Minamishinano were determined to be zero, therefore, there was no upper limit, the probability of obtaining a large value was low, and caution was required. We predicted the maximum return levels for return periods of 10, 20, 50, and 100 years along with respective 95% confidence intervals in Nagano, Matsumoto, Karuizawa, Sugadaira, Saku, Kisofukushima, and Minamishinano. In Matsumoto, the 100-year return level was 182 mm, with a 95% CI [129, 236]. In Minamishinano, the 100-year return level was 285 mm, with a 95% CI [173, 398]. The 100-year return levels for the maximum daily rainfall were 285, 271, and 271 mm in Minamishinano, Saku, and Karuizawa, respectively, where the changes in the daily maximum rainfall were larger than those at other points. Because these values are large, caution is required during heavy rainfall. The 100-year return levels for the maximum daily and hourly precipitation were similar in Karuizawa and Saku. In Sugadaira, the 100-year return level for a maximum hourly rainfall of 107.2 mm was larger than the maximum daily rainfall. Hence, it is necessary to be careful about short-term rainfall events.
文摘Investigation of genetic diversity of geographically distant wheat genotypes is </span><span style="font-family:Verdana;">a </span><span style="font-family:Verdana;">useful approach in wheat breeding providing efficient crop varieties. This article presents multivariate cluster and principal component analyses (PCA) of some yield traits of wheat, such as thousand-kernel weight (TKW), grain number, grain yield and plant height. Based on the results, an evaluation of economically valuable attributes by eigenvalues made it possible to determine the components that significantly contribute to the yield of common wheat genotypes. Twenty-five genotypes were grouped into four clusters on the basis of average linkage. The PCA showed four principal components (PC) with eigenvalues ></span><span style="font-family:""> </span><span style="font-family:Verdana;">1, explaining approximately 90.8% of the total variability. According to PC analysis, the variance in the eigenvalues was </span><span style="font-family:Verdana;">the </span><span style="font-family:Verdana;">greatest (4.33) for PC-1, PC-2 (1.86) and PC-3 (1.01). The cluster analysis revealed the classification of 25 accessions into four diverse groups. Averages, standard deviations and variances for clusters based on morpho-physiological traits showed that the maximum average values for grain yield (742.2), biomass (1756.7), grains square meter (18</span><span style="font-family:Verdana;">,</span><span style="font-family:Verdana;">373.7), and grains per spike (45.3) were higher in cluster C compared to other clusters. Cluster D exhibited the maximum thousand-kernel weight (TKW) (46.6).
文摘Clustering is an important unsupervised classification method which divides data into different groups based some similarity metrics. K-means becomes an increasing method for clustering and is widely used in different application. Centroid initialization strategy is the key step in K-means clustering. In general, K-means has three efficient initialization strategies to improve its performance i.e., Random, K-means++ and PCA-based K-means. In this paper, we design an experiment to evaluate these three strategies on UCI ML hand-written digits dataset. The experiment result shows that the three K-means initialization strategies find out almost identical cluster centroids, and they have almost the same results of clustering, but the PCA-based K-means strategy significantly improves running time, and is faster than the other two strategies.
文摘Water borne ailments are of serious public health concern in Gilgit Baltistan’s (GB) region of Pakistan. The pollution load on the glacio-fluvial streams and surface water resources of the Chapurson Valley in the Hunza Nagar area of the GB is increasing as a result of anthropogenic activities and tourism. The present study focuses on the public health quality of drinking water of Chapurson valley. The study addressed the fundamental drinking water quality criteria in order to understand the state of the public health in the valley. To ascertain the current status of physico-chemical, metals, and bacteriological parameters, 25 water samples were collected through deterministic sampling strategy and examined accordingly. The physico-chemical parameters of the water samples collected from the valley were found to meet the World Health Organization (WHO) guidelines of drinking water. The water samples showed a pattern of mean metal concentrations in order of Arsenic (As) > Lead (Pb) > Iron (Fe) > Zinc (Zn) > Copper (Cu) > Magnesium (Mg) > Calcium (Ca). As, Cu, Zn, Ca and Mg concentration were under the WHO guidelines range. However, results showed that Pb and Fe are present at much higher concentrations than recommended WHO guidelines. Similarly, the results of the bacteriological analysis indicate that the water samples are heavily contaminated with the organisms of public health importance (including total coliforms (TCC), total faecal coliforms (TFC) and total fecal streptococci (TFS) are more than 3 MPN/100mL). Three principal components, accounting for 48.44% of the total variance, were revealed using principal component analysis (PCA). Bacteriological parameters were shown to be the main determinants of the water quality as depicted by the PCA analysis. The dendrogram of Cluster analysis using the Ward’s method validated the same traits of the sampling locations that were found to be contaminated during geospatial analysis using the Inverse Distance Weight (IDW) method. Based on these findings, it is most likely that those anthropogenic activities and essentially the tourism results in pollution load from upstream channels. Metals may be released into surface and groundwater from a few underlying sources as a result of weathering and erosion. This study suggests that the valley water resources are more susceptible to bacteriological contamination and as such no water treatment facilities or protective measure have been taken to encounter the pollution load. People are drinking the contaminated water without questioning about the quality. It is recommended that the water resources of the valley should be monitored using standard protocol so as to protect not only the public health but to safe guard sustainable tourism in the valley.
文摘In order to monitor malt quality in the malting industry, despite yearly variations in the barley quality, 394 barley samples were analysed using conventional (moisture, protein and B-glucan content) and mid-infrared Fourier transform spectroscopy FT-IR. The experimental dataset included barley from three harvest years, two barley species, 77 barley varieties, and two-row and six-row barley, from 16 cultivation sites. For each sample, the malt quality indices were also assessed according to European Brewing Convention (EBC) standards. Principal component analysis (PCA) was carried out on mean-centred, normalized and derivative spectra using 200/cm width spectral bands. The most informative spectral bands were observed in the 800-1,000/cm and 1,000-1,200/cm ranges. PCA revealed that barley harvested in 2010 and in 2011 had bands that were very close together, while 2009 harvest clearly displayed a difference in its quality. PCA made it possible to distinguish two species and confirmed that two-row winter barley quality was closer to two-row spring barley quality than to six-row winter barley. Results indicate that mid-infrared spectrometry (MIR) could be a very useful and rapid analytical tool to assess barley qualitative quality.
文摘A diversity of socio economic and cultural factors contributes towards maintenance and changes in dietary patterns of people. Therefore People around the world have adapted different types of dietary patterns for their survival. Aim of this study was to investigate the most relevant factors influencing human dietary patterns. Sample for the study was selected by using the Stratified sampling technique, which consists of 390 families residing around the Abatenna estate, Bandarawela municipal council, Sri Lanka. Principal component analysis techniques and correlation analysis were employed to identify the most relevant factors which affect human dietary patterns. Results of the study indicate that socio economic conditions, monthly income, number of children in a family, dietary patterns and weight-related behaviors are highly co-related with each other. These findings suggest that education and awareness programs on nutrition should target low income groups to enhance their knowledge on dietary patterns.
文摘To categorize the nations to reflect the development status, to date, there are many conceptual frameworks. The Human Development index (HDI) that is published by the United Nations Development Programme is widely accepted and practiced by many people such as academicians, politicians, and donor organizations. However, though the development of HDI has gone through many revisions since its formulation in 1990, even the current version of the index formulation published in 2016 needs research to better understand and to gap-fill the knowledge base that can enhance the index formulation to facilitate the direction of attention such as release of funds. Therefore, in this paper, based on principal component analysis and K-means clustering algorithm, the data that reflect the measures of life expectancy index (LEI), education index (EI), and income index (II) are analyzed to categorize and to rank the member states of the UN using R statistical software package, an open source extensible programming language for statistical computing and graphics. The outcome of the study shows that the proportion of total eigen value (i.e., proportion of total variance) explained by PCA-1 (i.e., first principal component) accounts for more than 85% of the total variation. Moreover, the proportion of total eigen value explained by PCA-1 increases with time (i.e., yearly) though the amount of increase with time is not significant. However, the proportions of total eigen value explained by PCA-2 and PCA-3 decrease with time. Therefore, the loss of information in choosing PCA-1 to represent the chosen explanatory variables (i.e., LEI, EI, and II) may diminish with time if the trend of increasing pattern of proportion of total eigen value explained by PCA-1 with time continues in the future as well. On the other hand, the correlation between EI and PCA-1 increases with time although the magnitude of increase is not that significant. This same trend is observed in II as well. However, in contrast to these observations, the correlation between PCA-1 and LEI decreases with time. These findings imply that the contributions of EI and II to PCA-1 increase with time, but the contribution of LEI to PCA-1 decreases with time. On top of these, as per Hopkins statistic, the clusterability of the information conveyed by PCA-1 alone is far better than the clusterability of the information conveyed by PCA scores (i.e., PCA-1, PCA-2, and PCA-3) and the explanatory variables. Therefore, choosing PCA-1 to represent the chosen explanatory variables is becoming more concrete.
基金supported by the National Basic Research Program of China (2006CB403406)the National Natural Science Foundation of China(51079154)the National HighTech Research & Development Program of China (2011AA100502)
文摘In this study the principal component analysis (PCA) and geographically weighted regression (GWR) are combined to estimate the spatial distribution of water requirement of the winter wheat in North China while the effect of the macroand micro-topographic as well as the meteorological factors on the crop water requirement is taking into account. The spatial distribution characteristic of the water requirement of the winter wheat in North China and its formation are analyzed based on the spatial variation of the main affecting factors and the regression coefficients. The findings reveal that the collinearity can be effectively removed when PCA is applied to process all of the affecting factors. The regression coefficients of GWR displayed a strong variability in space, which can better explain the spatial differences of the effect of the affecting factors on the crop water requirement. The evaluation index of the proposed method in this study is more efficient than the widely used Kriging method. Besides, it could clearly show the effect of those affecting factors in different spatial locations on the crop water requirement and provide more detailed information on the region where those factors suddenly change. To sum up, it is of great reference significance for the estimation of the regional crop water requirement.
基金the water saving project funding of Ministry of Water Resources of P.R.China(code:200970)the research funding of North China University of Water Conservancy and Electric Power of 2006+1 种基金the project of Henan Excellent Teacher Funding of 2006,Henan Science and Technology project(092102310197)Henan natural science research project of Education Department(2009A170004)