International Journal on Data Science and Technology 2016; 2(6): 62-71 http://www.sciencepublishinggroup.com/j/ijdst doi: 10.11648/j.ijdst.20160206.12 ISSN: 2472-2200 (Print); ISSN: 2472-2235 (Online) A Hybrid Generic Algorithm for Dynamic Data Mining in Investment Decision Making Kangzhi Yu 1 , Yufang Li 2 , Zhengying Cai 1, * 1 College of Computer and Information Technology, China Three Gorges University, Yichang, China 2 College of Economics and Management, China Three Gorges University, Yichang, China Email address: [email protected] (Kangzhi Yu), [email protected] (Yufang Li), [email protected] (Zhengying Cai) * Corresponding author To cite this article: Kangzhi Yu, Yufang Li, Zhengying Cai. A Hybrid Generic Algorithm for Dynamic Data Mining in Investment Decision Making. International Journal on Data Science and Technology. Vol. 2, No. 6, 2016, pp. 62-71. doi: 10.11648/j.ijdst.20160206.12 Received: October 16, 2016; Accepted: November 8, 2016; Published: December 9, 2016 Abstract: To solve the risks and uncertainty problem in investment decision-making, a dynamic data mining architecture is introduced here. First, the investment decision-making process is examined and the involved risks are analyzed. Accordingly, dynamic data mining architecture is proposed here with the dynamic search ability of the generic algorithm. Second, a hybrid algorithm with dynamic learning ability is submitted to overcome the local minima problem prevalent in dynamic data mining. Whenever new data are generated, the data mining algorithm can dynamically collect the original input data without any reconstruction, to realize the dynamic update for investment decision-making. Last, an example is illustrated to verify the proposed model, and the solution provides us an effective model to improve the robustness of investment decision-making under risk environment. Keywords: Dynamic Data Mining, Investment Decision, Hybrid Genetic Algorithms, Risk Management 1. Introduction The investment decision-making problem is very important in modern economy, but the risks in investment environment increased the difficulty to make a right decision. To solve this risk problem, data mining is introduced in all decision making support systems. Zanin (2016) made a deep analysis on the combination of complex network analysis and data mining, and describes how to extract information from the complex system, and finally create a new compact quantitative representation in combining complex networks and data mining [1]. Heinecke (2016) showed us the optimization of data mining algorithms to solve the regression and classification problems in a broad data set in Data mining on vast data sets as a cluster system benchmark [2]. Garcia (2016) summarized the most influential data preprocessing algorithms, the impact of each algorithm is discussed, and the current research and further research is reviewed in Tutorial on practical tips of the most influential data preprocessing algorithms in data mining [3]. An approximate method for dynamic maintenance of objects and attributes was proposed by Chen (2015) in a decision-theoretic rough set approach for dynamic data mining [4]. Chen (2015) discussed the production and development of the logistics fee policy in toll policy for load balancing research based on data mining in port logistics. It studied the impact of the charges on the consumer's choice of logistics, and even the choice of departure time [5]. Moreover machine learning and artificial intelligence are applied in data mining to improve its performance. A set of unsupervised machine learning techniques was proposed and applied by Gajowniczek (2015) in data mining techniques for detecting household characteristics based on smart meter data to reveal the specific usage patterns [6]. Morro (2015) achieved a similar search of the data with respect to different pre stored categories in ultra-fast data-mining hardware architecture based on stochastic computing [7]. Zheng (2015) conducted a systematic survey, the main research on the trajectory of data mining, to provide a panoramic view of the field, as well as the scope of its research topics in trajectory
10
Embed
A Hybrid Generic Algorithm for Dynamic Data Mining in ...article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20160206... · Investment Decision Making ... as well as the scope
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
International Journal on Data Science and Technology 2016; 2(6): 62-71
http://www.sciencepublishinggroup.com/j/ijdst
doi: 10.11648/j.ijdst.20160206.12
ISSN: 2472-2200 (Print); ISSN: 2472-2235 (Online)
A Hybrid Generic Algorithm for Dynamic Data Mining in Investment Decision Making
Kangzhi Yu1, Yufang Li
2, Zhengying Cai
1, *
1College of Computer and Information Technology, China Three Gorges University, Yichang, China 2College of Economics and Management, China Three Gorges University, Yichang, China
To cite this article: Kangzhi Yu, Yufang Li, Zhengying Cai. A Hybrid Generic Algorithm for Dynamic Data Mining in Investment Decision Making. International
Journal on Data Science and Technology. Vol. 2, No. 6, 2016, pp. 62-71. doi: 10.11648/j.ijdst.20160206.12
Received: October 16, 2016; Accepted: November 8, 2016; Published: December 9, 2016
Abstract: To solve the risks and uncertainty problem in investment decision-making, a dynamic data mining architecture is
introduced here. First, the investment decision-making process is examined and the involved risks are analyzed. Accordingly,
dynamic data mining architecture is proposed here with the dynamic search ability of the generic algorithm. Second, a hybrid
algorithm with dynamic learning ability is submitted to overcome the local minima problem prevalent in dynamic data mining.
Whenever new data are generated, the data mining algorithm can dynamically collect the original input data without any
reconstruction, to realize the dynamic update for investment decision-making. Last, an example is illustrated to verify the
proposed model, and the solution provides us an effective model to improve the robustness of investment decision-making
Return on shareholders' equity 64.86% 63.81% 57.18%
Net sales rate 17.85% 14.23% 13.43%
Flow rate = current assets / current liabilities
Speed ratio = quick assets / current liabilities
Asset liability ratio = Total Liabilities / total assets
Accounts receivable turnover rate of credit = net income /
average accounts receivable
Inventory turnover = cost of sales / inventory balance
Total asset turnover = sales / total assets
Net profit margin = net profit / total assets
Return on shareholders' equity = net profit / total
stockholders' equity.
Net profit margin = net profit / net profit
However, the company's total asset turnover rate has not
changed much. It is worth noting that the company's three
profitability indicators are declining. According to the above
analysis, although the solvency of the company enhanced, but
the asset turnover rate has not accelerated, and the company's
profitability is declining. Therefore, it is important to strengthen
the sales work, strictly control costs and expenses, in order to
reverse the trend of declining profitability of the company.
International Journal on Data Science and Technology 2016; 2(6): 62-71 69
4.2. Results Analysis
The curve of mean square error is shown in figure 5. The
training of a total of 800 times, with a time of 7 seconds, the
average variance of the training time is 0.01, the mean
variance of the training time is 0.001. The following figure for
the neural network in training 800 times shows the
performance of the indicators, where the three lines shown in
the figure, are the actual training indicators, the best indicators
of line and target line, respectively.
Figure 5. Mean square error.
It can be seen from the figure that the convergence rate is
very fast in the initial stage of training, but in the later period
of training, the convergence rate is obviously slowed down.
The following figure shows the current gradient of the
training process and calibration curve.
Figure 6. Gradient and validation checks.
As the selected explanatory variables, it is clear their ability
to explain the target variables, but when many variables are
available, it is difficult to manual so many data of the
observed variables. It is entirely possible only because of
sampling error, or even just a coincidence caused, and is not
caused by the nature of the overall because data mining is
often faced with massive data. And automatic tools with
fitting ability and strong over fitting may be very helpful. But
once the transition fitting phenomenon is serious, the whole
model for the prediction of the value will be greatly reduced.
So it is necessary to assess the validity of the model, in order
to ensure a robust and reliable model. The algorithm proposed
in this paper has a high degree of fit to the data, and the error is
small. The new hybrid algorithm significantly improves the
adaptability of the neural network. In the initial stage of
evolution, adaptation significantly improved, when the
evolution is in the late period, the adaptation will not be
obvious.
4.3. Further Discussion
Because the data mining model is very powerful, it is very
easy for us to evaluate the resulting transition of fitting model.
In order to establish a true and useful model, it is necessary to
prevent the transition fit. So the effectiveness of the evaluation
model must be carried on to ensure the prediction to being
robust and reliable. Although the genetic algorithm can
guarantee the convergence to the optimal solution in theory, it
is difficult to determine the evolutionary algebra.
The comparison of investment and production are shown in
figure 7. If the numerical model can predict well beyond the
range of sample, this model is called "extrapolation", but a lot
of models can not be extrapolated effectively because of the
over-regularization. Over quasi consensus, this model can not
only explain the changes which can be observed in the in
general, but also explain the cause of error due to the
fluctuation of the individual samples.
Figure 7. Comparison of investment and production.
In order to prevent the transition in the method of data
fitting, data mining is generally used in the split. The so-called
split data is starting in accordance with a certain proportion of
the sample data, and it is necessary to be split into three
separate training data set, validation data set, test data sets, and
the training data set for the fit for the selected model.
The sample data distribution map is shown in figure 8.
70 Kangzhi Yu et al.: A Hybrid Generic Algorithm for Dynamic Data Mining in Investment Decision Making
Figure 8. Sample data distribution map.
Using the sample data distribution map, it can be avoided to
be over-regularization to the greatest extent, to ensure the
stability of the model. The important point is that the data
resolution is a data luxury, only when a sufficient number of
samples can be used. Investment parameter list and return
form is shown in Table 2.
Table 2. Cost and expense statement.
Index parameter 2012 2013 2014
Actual value / million 5178 6648 7612.6
Predicted value / million 5016 6472.65 7578.56
Error rate / (%) 3.13% 2.64% 0.45%
As can be seen from the table, the cost of rapid growth,
results in a slow growth in net profit, so the company should
strengthen management, and strive to do more work to
diminish investment risk and loss, especially to strengthen the
main business, cost control. So it will be possible to help us to
make the company's net profit to be a substantial growth.
Because of the data test in this part is completely independent
of the modeling data set, and the samples have any modeling
uncertainties taken from the same general, it should be
considered to be an extrapolation of test validity of the
proposed model, where the evaluation for the model results is
very impressive.
5. Conclusion
A dynamic data mining scheme is introduced here for
decision making problem with risks, where the hybrid model
of genetic algorithm is proposed. The ingenious reasonable
error function of neural network combined with the fitness
function is combined for the optimization of objective
function, which has dynamic topological structure to optimize
the BP neural network, weights and thresholds. And the
experimental study presented a set of advanced encoding
technology and evolution strategy optimized by genetic
algorithm to overcome the arbitrariness of the process caused
by the network risks. The proposed model can not only help
investors to determine the risk investment with high efficiency
over traditional technologies, but also avoid the decision
problem to be easy to fall into local solutions.
Future work will make further test and comparison of the
practical results and apply the optimization method to
international investment problems.
Acknowledgements
This research was supported by the National Natural
Science Foundation of China (No. 71471102), and Science
and Technology Research Program, Hubei Provincial
Department of Education in China (Grant No. D20101203).
References
[1] Zanin M., Papo D., Sousa P. A., Menasalvas E., Nicchi A., Kubik, E., Boccaletti S., Combining complex networks and data mining: Why and how, Physics Reports-Review Section of Physics Letters, 635, (2016), 1-44.
[2] Heinecke Alexander, Karlstetter Roman, Pflueger Dirk, Bungartz Hans-Joachim, Data mining on vast data sets as a cluster system benchmark, Concurrency and Computation-Practice & Experience, 28, (2016), 2145-2165.
[3] Garcia Salvador, Luengo Julian, Herrera Francisco, Tutorial on practical tips of the most influential data preprocessing algorithms in data mining, Knowledge-Based Systems, 98, (2016), 1-29.
[4] Hongmei Chen, Tianrui Li, Chuan Luo, Shi-Jinn Horng, Guoyin Wang,, A Decision-Theoretic Rough Set Approach for Dynamic Data Mining, IEEE Transactions on fuzzy Systems, 23, (2015), 1958-1970.
[5] Dafeng Chen, Yifei Chen, Bingqing Han, Toll Policy for Load Balancing Research Based on Data Mining in Port Logistics, Journal of Coastal Research, 73, (2015), 82-88.
[6] Gajowniczek Krzysztof, Zabkowski Tomasz, Data Mining Techniques for Detecting Household Characteristics Based on Smart Meter Data, Energies, 8, (2015), 7407-7427.
[7] Morro Antoni, Canals Vincent, Oliver Antoni, Alomar Miquel L., Rossello Josep L., Ultra-Fast Data-Mining Hardware Architecture Based on Stochastic Computing, Plos One,10, (2015), e0124176.
[8] Zheng Yu, Trajectory Data Mining: An Overview, ACM Transactions on Intelligent Systems and Technology, 6, (2015), 29.
[9] Chunlin Li, Xiaofu Xie, Yuejiang Huang, Hong Wang, Changxi Niu, Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network, International Journal of Distributed Sensor Networks, (2015).
[10] Boland Giles W., Thrall James H., Duszak Richard Jr., Business Intelligence, Data Mining, and Future Trends, Journal of The American College of Radiology, 12, (2015), 9-11.
[11] Chi-Sen Li, Mu-Chen Chen, A data mining based approach for travel time prediction in freeway with non-recurrent congestion, Neurocomputing, 41, (2014), 5416-5430.
[12] Kwon Kyunglag, Kang Daehyun, Yoon Yeochang, Sohn Jong-Soo, Chung In-Jeong, A real time process management system using RFID data mining, Computers in Industry, 65, (2014), 721.
International Journal on Data Science and Technology 2016; 2(6): 62-71 71
[13] Lopez-Yanez Itzama, Sheremetov Leonid,Yanez-Marquez Cornelio, A novel associative model for time series data mining, Pattern Recognition Letters, 41, (2014), 23-33.
[14] Tahat Amani, Marti Jordi, Khwaldeh Ali, Tahat Kaher, Pattern recognition and data mining software based on artificial neural networks applied to proton transfer in aqueous environments, Chinese Physics B, 23, (2014).
[15] H. Hassani, G. Saporta and E. S. Silva, Data Mining and Official Statistics: The Past, the Present and the Future, Big Data, 2, (2014), 34-43.
[16] Xylogiannopoulos, Konstantinos F., Karampelas Panagiotis, Alhajj Reda, Experimental Analysis on the Normality of pi, e, phi, root 2 Using Advanced Data-Mining Techniques, Experimental Mathenatics, 23, (2014), 105-128.
[17] Chun-Wei Tsai, Chin-Feng Lai, Ming-Chao Chiang, Laurence T. Yang, Data Mining for Internet of Things: A Survey, IEEE Communications Surveys and Tutorials, 16, (2014), 77-97.
[18] Musolesi Mirco, Big Mobile Data Mining: Good or Evil?, IEEE Internet Computing, 18, (2014), 78-81.
Information Security in Big Data: Privacy and Data Mining, IEEE Access, 2, (2014), 1149-1176.
[20] Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding, Data Mining with Big Data, IEEE Transactions on Knowledge and Data Engineering, 26, (2014), 97-107.
[21] C. Lima, M. Lidio, O. Limao, C. Roberto and M. Roisenberg, Optimization of neural networks through grammatical evolution and a genetic algorithm, Expert Systems with Applications, 56, (2016), 368-384.
[22] O. H. Yuregir and C. Sagiroglu, Solar Energy Validation for Strategic Investment Planning via Comparative Data Mining Methods: An Expanded Example within the Cities of Turkey, International Journal of Photoenergy, 8506193, (2016).
[23] Hongmei Chen, Tianrui Li, Chuan Luo, Shi-Jinn Horng, Guoyin Wang, A Decision-Theoretic Rough Set Approach for Dynamic Data Mining, IEEE Transactions on fuzzy Systems, 23, (2015), 1958-1970.
[24] Junbo Zhang, Tianrui Li, Hongmei Chen, Composite rough sets for dynamic data mining, Information Sciences, 257, (2014), 81-100.