The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth...The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth length is introduced. Through tested on lid driven cavity flow, it is clear that this method can provide high accuracy. Analysis and experiments have been made on its parallelism, and the results show that this method has better parallelism and with adding processors its accuracy become higher, thus it achieves that efficiency grows in pace with accuracy.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteris...This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.展开更多
Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computati...Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computation offers solutions to these two prominent issues quantum mechanically and beautifully. Through careful design to employ superposition, entanglement, and interference of quantum states, a quantum algorithm can allow a quantum computer to store datasets of exponentially large size as linear size and then process them in parallel. Quantum computing has found its way in the world of machine learning where new ideas and approaches are in great need as the classical computers have reached their capacity and the demand for processing big data grows much faster than the computing power the classical computers can provide today. Nearest neighbor algorithms are simple, robust, and versatile supervised machine learning algorithms, which store all training data points as their learned “model” and make the prediction of a new test data point by computing the distances between the query point and all the training data points. Quantum counterparts of these classical algorithms provide efficient and elegant ways to deal with the two major issues of storing data in memory and computing the distances. The purpose of our study is to select two similar quantum nearest neighbor algorithms and use a simple dataset to give insight into how they work, highlight their quantum nature, and compare their performances on IBM’s quantum simulator.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
针对当前发动机叶片损伤体积计算困难、误差较大的问题,提出一种基于点云的压气机叶片的损伤体积测量方法。首先,通过结构光扫描仪获取完整点云模型和损伤点云模型,配准分割得到缺损点云。其次,缺损点云经过姿态转换后与主成分轴对比分...针对当前发动机叶片损伤体积计算困难、误差较大的问题,提出一种基于点云的压气机叶片的损伤体积测量方法。首先,通过结构光扫描仪获取完整点云模型和损伤点云模型,配准分割得到缺损点云。其次,缺损点云经过姿态转换后与主成分轴对比分析、分层、切片、投影得到二维点云轮廓。最后,提出单向双次最近邻点搜索算法对二维点云的轮廓进行有序提取,使用坐标解析法求解投影面的面积,累加各层面积与切片间隔的乘积得到最终的体积。试验结果表明,提出的第一主成分轴方向切片体积计算效果更好,且轮廓提取算法对比凸包提取法、双向最近邻搜索和改进最近邻搜索算法(improved nearest point search,INPS)算法更准确,效率更高,与Geomagic软件结果相比平均相对误差不超过0.3%,证明了算法的高效性和有效性。展开更多
目的采用近红外光谱技术对油莎豆进行分析,并应用化学计量学中识别模式对油莎豆进行产地溯源。方法采用近红外光谱法结合化学计量学软件,对来自河北、湖南、山东、新疆、云南等地408份油莎豆样品进行产地溯源,分别采用多元散射校正、多...目的采用近红外光谱技术对油莎豆进行分析,并应用化学计量学中识别模式对油莎豆进行产地溯源。方法采用近红外光谱法结合化学计量学软件,对来自河北、湖南、山东、新疆、云南等地408份油莎豆样品进行产地溯源,分别采用多元散射校正、多量标准化或多量标准化耦合去趋势算法3种光谱预处理方法和支持向量机(support vector machine,SVM)、簇类独立分类(soft independent modeling of class analogy,SIMCA)、正交偏最小二乘判别(orthogonal partial least squares discriminant analysis,OPLS-DA)、偏最小二乘判别(partial least squares discriminant analysis,PLS-DA)、和K最近邻算法(K-nearest neighbor algorithm,KNN)等5种识别模式进行产地识别。结果SVM、SIMCA、OPLS-DA、PLS-DA和KNN等5种模式的建模识别率分别为91.89%、94.47%、62.37%、65.32%和100.00%。选择KNN作为产地识别模型,分析不同预处理方法、数据预处理及样本距离对模型预测结果稳定性的影响,筛选出最优模型参数。选用多元散射校正光谱预处理方式,在UV标度化、Pareto标度化、自动标度化或中心化任一种数据预处理条件下,样本距离选用街区距离,测试集识别率能达到100.00%。结论近红外光谱结合KNN模式的技术具有分析速度快、操作简单、样本预处理容易、无损、在线的定性定量分析等优点,具有一定应用前景。展开更多
A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates...A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates and the other contains the remaining coordinates. Three inequalities based on the characteristics of the sums and variances of a vector and its two subvectors were introduced to reject those codewords which are impossible to be the nearest codeword. The simulation results show that the proposed algorithm is faster than the improved equal average eaual variance nearest neighbor search (EENNS) algorithm.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant No.11002086)the Shanghai Leading Academic Discipline Project(Grant No.J50103)
文摘The core of smoothed particle hydrodynamics (SPH) is the nearest neighbor search subroutine. In this paper, a nearest neighbor search algorithm which is based on multiple background grids and support variable smooth length is introduced. Through tested on lid driven cavity flow, it is clear that this method can provide high accuracy. Analysis and experiments have been made on its parallelism, and the results show that this method has better parallelism and with adding processors its accuracy become higher, thus it achieves that efficiency grows in pace with accuracy.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
文摘This paper describes the nearest neighbor (NN) search algorithm on the GBD(generalized BD) tree. The GBD tree is a spatial data structure suitable for two-or three-dimensional data and has good performance characteristics with respect to the dynamic data environment. On GIS and CAD systems, the R-tree and its successors have been used. In addition, the NN search algorithm is also proposed in an attempt to obtain good performance from the R-tree. On the other hand, the GBD tree is superior to the R-tree with respect to exact match retrieval, because the GBD tree has auxiliary data that uniquely determines the position of the object in the structure. The proposed NN search algorithm depends on the property of the GBD tree described above. The NN search algorithm on the GBD tree was studied and the performance thereof was evaluated through experiments.
文摘Today computers are used to store data in memory and then process them. In our big data era, we are facing the challenge of storing and processing the data simply due to their fast ever growing size. Quantum computation offers solutions to these two prominent issues quantum mechanically and beautifully. Through careful design to employ superposition, entanglement, and interference of quantum states, a quantum algorithm can allow a quantum computer to store datasets of exponentially large size as linear size and then process them in parallel. Quantum computing has found its way in the world of machine learning where new ideas and approaches are in great need as the classical computers have reached their capacity and the demand for processing big data grows much faster than the computing power the classical computers can provide today. Nearest neighbor algorithms are simple, robust, and versatile supervised machine learning algorithms, which store all training data points as their learned “model” and make the prediction of a new test data point by computing the distances between the query point and all the training data points. Quantum counterparts of these classical algorithms provide efficient and elegant ways to deal with the two major issues of storing data in memory and computing the distances. The purpose of our study is to select two similar quantum nearest neighbor algorithms and use a simple dataset to give insight into how they work, highlight their quantum nature, and compare their performances on IBM’s quantum simulator.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘针对当前发动机叶片损伤体积计算困难、误差较大的问题,提出一种基于点云的压气机叶片的损伤体积测量方法。首先,通过结构光扫描仪获取完整点云模型和损伤点云模型,配准分割得到缺损点云。其次,缺损点云经过姿态转换后与主成分轴对比分析、分层、切片、投影得到二维点云轮廓。最后,提出单向双次最近邻点搜索算法对二维点云的轮廓进行有序提取,使用坐标解析法求解投影面的面积,累加各层面积与切片间隔的乘积得到最终的体积。试验结果表明,提出的第一主成分轴方向切片体积计算效果更好,且轮廓提取算法对比凸包提取法、双向最近邻搜索和改进最近邻搜索算法(improved nearest point search,INPS)算法更准确,效率更高,与Geomagic软件结果相比平均相对误差不超过0.3%,证明了算法的高效性和有效性。
文摘目的采用近红外光谱技术对油莎豆进行分析,并应用化学计量学中识别模式对油莎豆进行产地溯源。方法采用近红外光谱法结合化学计量学软件,对来自河北、湖南、山东、新疆、云南等地408份油莎豆样品进行产地溯源,分别采用多元散射校正、多量标准化或多量标准化耦合去趋势算法3种光谱预处理方法和支持向量机(support vector machine,SVM)、簇类独立分类(soft independent modeling of class analogy,SIMCA)、正交偏最小二乘判别(orthogonal partial least squares discriminant analysis,OPLS-DA)、偏最小二乘判别(partial least squares discriminant analysis,PLS-DA)、和K最近邻算法(K-nearest neighbor algorithm,KNN)等5种识别模式进行产地识别。结果SVM、SIMCA、OPLS-DA、PLS-DA和KNN等5种模式的建模识别率分别为91.89%、94.47%、62.37%、65.32%和100.00%。选择KNN作为产地识别模型,分析不同预处理方法、数据预处理及样本距离对模型预测结果稳定性的影响,筛选出最优模型参数。选用多元散射校正光谱预处理方式,在UV标度化、Pareto标度化、自动标度化或中心化任一种数据预处理条件下,样本距离选用街区距离,测试集识别率能达到100.00%。结论近红外光谱结合KNN模式的技术具有分析速度快、操作简单、样本预处理容易、无损、在线的定性定量分析等优点,具有一定应用前景。
文摘A fast encoding algorithm was presented which made full use of two characteristics of a vector, its sum and variance. In this paper, a vector was separated into two subvectors, one is the first half of the coordinates and the other contains the remaining coordinates. Three inequalities based on the characteristics of the sums and variances of a vector and its two subvectors were introduced to reject those codewords which are impossible to be the nearest codeword. The simulation results show that the proposed algorithm is faster than the improved equal average eaual variance nearest neighbor search (EENNS) algorithm.