J. Sci. Technol. Environ. Inform. | Volume 09, Issue 02, 665-672 | https://doi.org/10.18801/jstei.090220.67
Article type: Research article, Article received: 03.07.2020; Revised: 22.07.2020; First published online: 30 July 2020.
Article type: Research article, Article received: 03.07.2020; Revised: 22.07.2020; First published online: 30 July 2020.
Analysis of Wisconsin Breast Cancer original dataset using data mining and machine learning algorithms for breast cancer prediction
Md. Toukir Ahmed, Md. Niaz Imtiaz and Animesh Karmakar
Dept. of Computer Science and Engineering, Pabna University of Science and Technology, Bangladesh.
✉ Corresponding author: [email protected] (Ahmed M.T.).
Dept. of Computer Science and Engineering, Pabna University of Science and Technology, Bangladesh.
✉ Corresponding author: [email protected] (Ahmed M.T.).
Abstract
Breast cancer has become a concerning issue in recent years. The rate of women having breast cancer seemed to be increased significantly. The disease has become life-taking if it is not diagnosed at all and in many cases, separation of limbs is the only way to prevent it, if it is diagnosed at the last stage. As a result, a good predictor of this issue can be fruitful in successful diagnosis. The main focus of this paper is to perform different machine learning classification algorithms to correctly predict the target class and improve it by checking the effectiveness of particular attributes of original Wisconsin Breast Cancer dataset (WDBC) for breast cancer diagnosis prediction. After running classifiers on the dataset, the comparison was made among them to find the best performing algorithm and then effective attributes of dataset were analyzed to improve performance further. In this paper, we have used algorithms- Naïve Bayes, Support Vector Machine (SVM), Multilayer Perceptron (MLP), J48 and Random Forest. Here, for comparing the result, we have used performance metrics: Accuracy, Kappa statistic, precision, recall, F-measure, MCC, ROC area, PRC area. Based on the values of performance metrics, Naïve Bayes classifier gave the best result among the algorithms used. Moreover, we also tried to optimize our proposed model and made a comparison among state-of-the-art approaches proposed by different researchers, on the same dataset.
Key Words: Classification, Decision tree, MLP, WDBC, Naïve bayes and SVM
Breast cancer has become a concerning issue in recent years. The rate of women having breast cancer seemed to be increased significantly. The disease has become life-taking if it is not diagnosed at all and in many cases, separation of limbs is the only way to prevent it, if it is diagnosed at the last stage. As a result, a good predictor of this issue can be fruitful in successful diagnosis. The main focus of this paper is to perform different machine learning classification algorithms to correctly predict the target class and improve it by checking the effectiveness of particular attributes of original Wisconsin Breast Cancer dataset (WDBC) for breast cancer diagnosis prediction. After running classifiers on the dataset, the comparison was made among them to find the best performing algorithm and then effective attributes of dataset were analyzed to improve performance further. In this paper, we have used algorithms- Naïve Bayes, Support Vector Machine (SVM), Multilayer Perceptron (MLP), J48 and Random Forest. Here, for comparing the result, we have used performance metrics: Accuracy, Kappa statistic, precision, recall, F-measure, MCC, ROC area, PRC area. Based on the values of performance metrics, Naïve Bayes classifier gave the best result among the algorithms used. Moreover, we also tried to optimize our proposed model and made a comparison among state-of-the-art approaches proposed by different researchers, on the same dataset.
Key Words: Classification, Decision tree, MLP, WDBC, Naïve bayes and SVM
Article Full-Text PDF
67.02.09.2020_analysis_of_wisconsin_breast_cancer_original_dataset_using_data_mining_and_machine_learning_algorithms_for_breast_cancer_prediction.pdf | |
File Size: | 1078 kb |
File Type: |
Share This Article
|
|
Article Citations
MLA
Ahmed, et al. “Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction.” Journal of Science, Technology and Environment Informatics, 09(02) (2020): 665-672.
APA
Ahmed, M. T., Imtiaz, M. N. and Karmakar, A. (2020). Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction. Journal of Science, Technology and Environment Informatics, 09(02), 665-672.
Chicago
Ahmed, M. T., Imtiaz, M. N. and Karmakar, A. “Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction” Journal of Science, Technology and Environment Informatics, 09(02), (2020): 665-672.
Harvard
Ahmed, M. T., Imtiaz, M. N. and Karmakar, A. 2020. Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction. Journal of Science, Technology and Environment Informatics, 09(02), pp. 665-672.
Vancouver
Ahmed, MT, Imtiaz, MN and Karmakar, A. Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction. Journal of Science, Technology and Environment Informatics, 2020 July 09(02), 665-672.
Ahmed, et al. “Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction.” Journal of Science, Technology and Environment Informatics, 09(02) (2020): 665-672.
APA
Ahmed, M. T., Imtiaz, M. N. and Karmakar, A. (2020). Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction. Journal of Science, Technology and Environment Informatics, 09(02), 665-672.
Chicago
Ahmed, M. T., Imtiaz, M. N. and Karmakar, A. “Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction” Journal of Science, Technology and Environment Informatics, 09(02), (2020): 665-672.
Harvard
Ahmed, M. T., Imtiaz, M. N. and Karmakar, A. 2020. Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction. Journal of Science, Technology and Environment Informatics, 09(02), pp. 665-672.
Vancouver
Ahmed, MT, Imtiaz, MN and Karmakar, A. Analysis of Wisconsin Breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction. Journal of Science, Technology and Environment Informatics, 2020 July 09(02), 665-672.
References
- Amrane, M., Oukid, S., Gagaoua, I. and Ensarİ, T. (2018). Breast cancer classification using machine learning. Proceedings Article published at the 2018 Electric Electronics, Computer Science, Biomedical Engineering’s' Meeting (EBBT) https://doi.org/10.1109/EBBT.2018.8391453
- Bazazeh, D. and Shubair, R. (2016). Comparative study of machine learning algorithms for breast cancer detection and diagnosis. 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), 1-4. https://doi.org/10.1109/ICEDSA.2016.7818560
- Belgiu, M and Dragut, L. (2016). Random forest in remote sensing: A review of applications and future directions. Journal of Photogrammetry and Remote Sensing, 114, 24-31. https://doi.org/10.1016/j.isprsjprs.2016.01.011
- Dua, D. A. (2017). Breast Cancer Wisconsin (Diagnostic) Data Set. Retrieved from {UCI} Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
- Gardner, M. W. and Dorling, S. R. (1998). Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric Environment, 32 (14), 2627-2636.https://doi.org/10.1016/S1352-2310(97)00447-0
- Hall, M., Frank, E., Holmes, G., Pfahringer, B. Reutemann, P. and Witten, H. I (2009). The WEKA data mining software: an update. SIGKDD Explorations Newsletter. 11, 1 (June 2009), 10–18. https://doi.org/10.1145/1656274.1656278
- Hung, P. D., Hanh, T. D., and Diep, V. T. (2018). Breast Cancer Prediction Using Spark MLlib and ML Packages. Paper presented at the Proceedings of the 2018 5th International Conference on Bioinformatics Research and Applications, Hong Kong, Hong Kong. https://doi.org/10.1145/3309129.3309133
- Kaur, G and Chhabra C. (2014). Improved J48 Classification Algorithm for the Prediction of Diabetes. International Journals of Computer Applications, 98 (22), 13-17. https://doi.org/10.5120/17314-7433
- Kim, T., Chung, B. D. and Lee, J. S. (2017). Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification. Computing, 99(3), 203-218. https://doi.org/10.1007/s00607-016-0483-z
- Mushtaq, Z., Yaqub, A., Hassan, A. and Su, S. F. (2019). Performance Analysis of Supervised Classifiers Using PCA Based Techniques on Breast Cancer. International Conference on Engineering and Emerging Technologies (ICEET) Lahore, Pakistan, 2019, pp. 1-6. https://doi.org/10.1109/CEET1.2019.8711868
- Nematzadeh, Z., Ibrahim, R. and Selamat, A. (2015). Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques. Paper presented at the 2015 10th Asian Control Conference (ASCC). https://doi.org/10.1109/ASCC.2015.7244654
- Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26 (1), 217-222. https://doi.org/10.1080/01431160412331269698
- Parkin, D. M. (1998). Epidemiology of cancer: global patterns and trends. Toxicology Letters, 102-103, 227-234. https://doi.org/10.1016/S0378-4274(98)00311-7
- Priyanka, G., Rohith, V., Sahoo, P. K. and Eswaran, K. (2019). Breast Cancer Prediction System using KE Sieve Algorithm. International Journal of Scientific & Engineering Research, 10(1), 19-21.
- Senturk, Z. R. and Kara, R (2014). Breast Cancer diagnosis via data mining: Performance analysis of seven different algorithms. Computer Science and Engineering: An International Journal, 4 (1), 35-46. https://doi.org/10.5121/cseij.2014.4104
- Shah, C., and Jivani, A. G. (2013). Comparison of data mining classification algorithms for breast cancer prediction. Paper presented at the 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT). 1-4. https://doi.org/10.1109/ICCCNT.2013.6726477
- Singh, SN and Thakral, S. (2018). Using Data Mining Tools for Breast Cancer Prediction and Analysis. Paper presented at the 2018 4th International Conference on Computing Communication and Automation (ICCCA). https://doi.org/10.1109/CCAA.2018.8777713
- Wolberg, W. H. (1992). Breast cancer Wisconsin (diagnostic) data set [uci machine learning repository]. Retrieved from https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
- Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., Geoffrey J. McLachlan, G. J.. Ng, A., Liu. B., Philip S. Yu, P. S., Zhou, Z. H., Steinbach, M., Hand, D. J., Steinberg, D. (2008). The Top Ten Algorithms in Data Mining. Knowledge and Information Systems14, 1-37. https://doi.org/10.1007/s10115-007-0114-2
© 2020 The Authors. This article is freely available for anyone to read, share, download, print, permitted for unrestricted use and build upon, provided that the original author(s) and publisher are given due credit. All Published articles are distributed under the Creative Commons Attribution 4.0 International License.
Journal of Science, Technology and Environment Informatics, EISSN 2409-7632.