Date of this Version
The objective of this study is to cluster and classify data using a combination of the k-means and C4.5 methods. The process involves clustering and subsequent classification. The classification process uses k-folds = 10 and samples = stratified sampling. In this study, analphabets in Indonesia of a minimum age of 15 years (15+) were evaluated. The data are the percentage of analogs between 2017 and 2019. The dataset was obtained from https://www.bps.go.id and is accessible at https://osf.io/crwug. In this study, the Davies Bouldin index (DBI) was used to determine the number of clusters with an optimal DBI value of k = 2, namely, 0,121. The results of the cluster maps in Indonesian territories demonstrate low clustering (C 0 = 22 provinces) and high clustering (C 1 = 11 provinces) for children with k = 2 analphabets. Then, the clustering results were classified, and an accuracy of 97.50 was realized, along with a recall of 90.91%, a precision of 100.00%, and an AUC (optimistic) of 0.95 (excellent classification).