Nicodemus Nzoka Maingi*, Ismail Ateya Lukandu and Matilu Mwau
Background: The disease outbreak management operations of most countries (notably Kenya) present numerous novel ideas of how to best make use of notifiable disease data to effect proactive interventions. Notifiable disease data is reported, aggregated and variously consumed. Over the years, there has been a deluge of notifiable disease data and the challenge for notifiable disease data management entities has been how to objectively and dynamically aggregate such data in a manner such as to enable the efficient consumption to inform relevant mitigation measures. Various models have been explored, tried and tested with varying results; some purely mathematical and statistical, others quasi mathematical cum software model driven.
Methods: One of the tools that have been explored is Artificial Intelligence (AI). AI is a technique that enables computers to intelligently perform and mimic actions and tasks usually reserved for human experts. AI presents a great opportunity for redefining how the data is more meaningfully processed and packaged. This research explores AI’s Machine Learning (ML) theory as a differentiator in the crunching of notifiable disease data and adding perspective. An algorithm has been designed to test different notifiable disease outbreak data cases, a shift to managing disease outbreaks via the symptoms they generally manifest. Each notifiable disease is broken down into a set of symptoms, dubbed symptom burden variables, and consequently categorized into eight clusters: Bodily, gastro intestinal, muscular, nasal, pain, respiratory, skin, and finally, other symptom clusters. ML’s decision tree theory has been utilized in the determination of the entropies and information gains of each symptom cluster based on select test data sets.
Results: Once the entropies and information gains have been determined, the information gain variables are then ranked in descending order; from the variables with the highest information gains to those with the lowest, thereby giving a clear cut criteria of how the variables are ordered. The ranked variables are then utilized in the construction of a binary decision tree, which graphically and structurally represents the variables. Should any variables have a tie in the information gain rankings, such are given equal importance in the construction of the binary decision tree. From the presented data, the computed information gains are ordered as; gastro intestinal, bodily, pain, skin, respiratory, others. Muscular and finally nasal symptoms respectively. The corresponding binary decision tree is then constructed.
Conclusions: The algorithm successfully singles out the disease burden variable(s) that are most critical as the point of diagnostic focus to enable the relevant authorities take the necessary, informed interventions. This algorithm provides a good basis for a country’s localized diagnostic activities driven by data from the reported notifiable disease cases. The algorithm presents a dynamic mechanism that can be used to analyze and aggregate any notifiable disease data set, meaning that the algorithm is not fixated or locked on any particular data set.
Published Date: 2022-12-16; Received Date: 2022-07-25