Journal of Health Sciences & Research

Register      Login

VOLUME 12 , ISSUE 2 ( July-December, 2021 ) > List of Articles


Evaluation of First and Second Dose of COVID-19 Vaccination Using k-means Clustering Model and Visualization of Indian States and Union Territories

R Lakshmi Priya, N Manjula Devi, Manimannan Ganesan

Keywords : COVID-19 vaccination, Data science and visualization, k-means, NFHS, Silhouette distance score

Citation Information : Priya RL, Devi NM, Ganesan M. Evaluation of First and Second Dose of COVID-19 Vaccination Using k-means Clustering Model and Visualization of Indian States and Union Territories. J Health Sci Res 2021; 12 (2):34-39.

DOI: 10.5005/jp-journals-10042-1104

License: CC BY-NC 4.0

Published Online: 25-03-2022

Copyright Statement:  Copyright © 2021; The Author(s).


Aim and objective: This research paper was attempted to identify the pattern of COVID-19 vaccinated people of first and second doses using machine learning (ML) methods. Settings and designs: The secondary source of COVID-19 vaccination data was collected from the National Informatics Centre (NIC), India, up to April 30, 2021, based on Census 2011 data. The original data consist of total population, first dose, second dose, percentage of the first dose, percentage of the second dose, and the cumulative percentage of the population throughout the states and union territories of India. Materials and methods: Application of Orange data mining software determines the clusters and plots of the graph of vaccination data for various states and union territories of India. The file widget opens a new vaccination data set and performs k-means++ from two to nine with silhouette distances. Results: Silhouette distance scores and cluster information are achieved. The three zones are visualized and the zones are labeled as green, blue, and red. Cluster 1 (C1) zone indicates that states and union territories are highly vaccinated, cluster 2 (C2) zone indicates that states and union territories are moderately vaccinated, and the cluster 3 (C3) zone is low-vaccinated states and union territories of India. The different colors green, blue, and red of the zones are labeled as C1, C2, and C3, respectively. Conclusion: In India, Sikkim, Tripura, Ladakh, and Lakshadweep have a low population density but fall under the highly vaccinated zones of first and second doses. Goa, Mizoram, Delhi, Arunachal Pradesh, Chandigarh, Uttarakhand, Gujarat, Rajasthan, Kerala, Jammu and Kashmir, Dadra and Nagar Haveli, Daman and Diu, Himachal Pradesh, Chhattisgarh, and Andaman Nicobar Islands have diverse population density and come in the category of low-vaccinated zones of first and second doses. Manipur, Meghalaya, Nagaland, Odisha, West Bengal, Haryana, Karnataka, Andhra Pradesh, Maharashtra, Telangana, Jharkhand, Madhya Pradesh, Punjab, Assam, Uttar Pradesh, Tamil Nadu, Puducherry, and Bihar have high population density and are considered under moderately vaccinated zones of first- and second-dose COVID-19 vaccination.

  1. Callaghan S. COVID-19 is a data science issue. Patterns 2020;1(2):100022. DOI: 10.1016/j.patter.2020.100022.
  2. Wollersheim BC. Surprising side effect of COVID-19 : we are all data scientists now. In: Data Analytics & Insights, Arcadis. 2020. Available from:
  3. Singh R, Gupta V, Malhotra B, et al. Cluster containment strategy: addressing Zika virus outbreak in Rajasthan, India. BMJ Glob Health 2019;4(5):e001383. DOI: 10.1136/bmjgh-2018-001383.
  4. Maier BF, Brockmann D. Effective containment explains sub exponential growth in recent confirmed COVID-19 cases in China. Science 2020;368(6492):742–746. DOI: 10.1126/science.abb4557.
  5. Arumugam P, Kadhirveni V, Lakshmi Priya R, et al. Prediction, cross validation and classification in the presence COVID-19 of Indian states and union territories using machine learning algorithms. Int J Recent Technol Eng 2021;10(1):16–20. DOI: 10.35940/ijrte.A5659.0510121.
  6. Weatherill G, Burton PW. Delineation of shallow seismic source zones using k-means cluster analysis, with application to the Aegean region. Geophys J Int 2009;176(2):565–588. DOI: 10.1111/j.1365-246X.2008.03997.x.
  7. National Information Centre (NIC), India. Available from:
  8. Orange Data Mining workflow and Document. Available from:
PDF Share
PDF Share

© Jaypee Brothers Medical Publishers (P) LTD.