Malaria parasite cultures
Sample collection and parasite cultures were completed at Ifakara Health Institute’s laboratories in Bagamoyo, Tanzania using adapted protocols with slight adjustments [27, 28]. Group O+ blood was obtained from four malaria-free volunteers, and kept in tubes containing the anticoagulant, ethylenediaminetetraacetic acid (EDTA) for continuous culturing of P. falciparum strains NF54 and FCR3. The blood was centrifuged at 2500 rpm at 24 °C for 10 min to obtain RBCs. The RBCs were then washed, diluted to 50% haematocrit with uncomplemented media, namely Roswell Park Memorial Institute (RPMI) 1640 media supplemented with hypoxanthine, neomycin and 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), and stored at 4 °C.
The washed RBCs were used to culture P. falciparum in vitro for up to 7 days. The asexual malaria parasites were grown in uninfected washed O+ RBCs as host cells at 5% haematocrit, maintained in RPMI-1640 medium supplemented with 25 mM HEPES, 100 µM hypoxanthine, neomycin, 5% Albumax II and 24 mM Sodium bicarbonate. The parasite culture was gassed with a mixture of 5% CO2, 3% O2, and 93% N2 and incubated at 37 °C. The culture was examined daily for parasitaemia estimation using field-stained (Hemacolor® rapid staining) thin blood smears under a compound microscope in 10 fields. Two rounds of parasite synchronization was performed to ensure the remaining of only ring-stage P. falciparum [29]. The culture was kept until the ring-stage parasitaemia level reached > 6% and was used for experimental dilutions. Parallel malaria-free cultures with only media and O+ RBCs from the same volunteers were kept to create controls. The process was repeated for up to 11 batches, until 70 volunteers were recruited and their blood was diluted.
Recruitment of malaria-free volunteers
Malaria-free individuals were recruited from tertiary-level colleges in Bagamoyo, eastern Tanzania, following sensitization meetings, during which the objectives, procedures, potential risks, and benefits of the study were explained. Participants who expressed interest were given a unique identity number, contacted by phone, and requested to provide informed consent. The consenting participants were screened for malaria parasites using RDTs by taking a finger-prick blood sample, followed by a confirmatory quantitative polymerase chain reaction (qPCR). Participants who tested negative for malaria were enrolled in the study, while those who tested positive were treated following Tanzania’s malaria treatment guidelines and excluded from the study [30]. A total of 70 volunteer participants were recruited, all between 20 and 40 years old. A total of 40 mL venous blood was drawn in an EDTA tube from each participant and used for laboratory dilutions of cultured malaria parasites to create different malaria parasitaemia and haematocrit ratios.
Haematocrit dilutions to mimic anaemic conditions
For each participant, two sets of haematocrit dilutions were created to simulate different anaemic conditions for both infected and un-infected blood. One set had malaria-free blood at 50%, 25%, and 12.5% haematocrit content, while the other comprised infected red blood cells (iRBCs) from cultured parasites, adjusted to 50%, 25%, and 12.5% haematocrit ratios using respective plasma. For uninfected blood, 40 ml of venous blood from each participant was split into 5 mL and 35 mL portions and centrifuged to separate plasma from RBCs. After separation, plasma was transferred to empty 1.5 mL tubes for haematocrit dilutions. RBCs from the 5 ml portion were used to formulate a 50% haematocrit stock solution by adding plasma from the same volunteer. Serial dilutions was done by transferring 2.5 mL of the stock solution to a second tube, and adding 2.5 mL of previously obtained plasma to simulate moderate anaemia (25% haematocrit) and severe anaemia (12.5% haematocrit) conditions (Fig. 1). On the other hand, for infected blood, when the culture reached > 6% ring stage parasitaemia, it was centrifuged to separate iRBCs from the culture media and washed twice. Washed iRBCs volume was 0.5 mL, which was mixed with an equal volume of participant plasma (0.5 mL) to create a 50% haematocrit ratio stock solution; and serially diluted to 25% and 12.5% solutions (Fig. 1).
Serial dilutions for different parasite densities and controls
The parasite dilutions were performed at the respective haematocrit ratios. Initially, the cultured parasitaemia was standardized to 6% stock—and in cases where the initial densities were higher, this was lowered to the 6% densities. Serial dilutions were done to create three additional parasitaemia levels, i.e. 0.1%, 0.002% and 0.00003%, were created from the stock solution. The control group included malaria-negative samples at haematocrits of 50%, 25%, and 12.5% prepared from uninfected RBC from the culture with plasma from same individual participants. To ensure RBC distribution in control matched to that of malaria parasitaemia dilutions, uninfected haematocrits from the culture were diluted with uninfected haematocrits from participants in volumes equal to the parasitaemia dilutions (Fig. 1).
Preparation of dried blood spots
For each individual volunteer, five replicates of dry blood spots (DBS) for each parasite density at each haematocrit level were created, resulting in a total of 75 DBS per participant. For each DBS, 50 μL of blood was added on the circular spot on the Whatman™ paper cards. The experimental design ensured that all malaria-positive and negative samples utilized similar filter paper cards to standardize the potential impact of filter paper on ML analysis. The cards were air-dried for up to 3 h and labelled with batch number, date, parasitaemia levels, haematocrit ratio, and participant ID. To prevent cross-contamination, each card was sealed in a plastic bag. The cards were then grouped by participant ID and stored in a cool environment in a larger bag with desiccant packets and a humidity card, awaiting transportation to another Ifakara Health Institute’s facility, the VectorSphere, at Ifakara, Tanzania for infrared scanning. During transport, the bags were kept in a cooler box with ice packs separated by plastic sheeting.
Acquisition and pre-processing of infrared spectra
The DBS were individually scanned using a Fourier transform infrared spectrometer (FT-IR) with a wavenumber range of 4000–500 cm−1 and a resolution of 2 cm−1. The instrument employed was a compact Bruker Alpha FTIR interferometer, equipped with a Platinum-Attenuated Total Reflectance (ATR) module that incorporates diamond crystals. Scanning was done after 3–5 days of DBS storage. Each blood spot was punched and placed on the diamond crystal. For scanning, each blood spot was excised, positioned on the diamond crystal, and subjected to pressure using an anvil to enhance the contact area with the crystal, thereby optimizing the depth of light penetration. Each spot was scanned 32 times to obtain an averaged spectrum, which was labelled by project initial, study site, participant ID, haematocrit ratio, parasitaemia ratio, and dates.
To quantify the depth of light penetration (d) in whole blood, a theoretical approach that considers the wavelength of light (λ), incidence angle (θ), and the refractive indices of whole blood (n1) and the diamond crystal (n2) used in the spectrometer were employed. The penetration depth (d) (refer to Fig. S1) was calculated using the formula.
$${\text{D}} = \frac{{{\lambda n}_{1} }}{{2{\uppi }\sqrt {\sin^{2} {\uptheta } – \left( {\frac{{{\text{n}}_{2} }}{{{\text{n}}_{1} }}} \right)^{2} } }}$$
Given that the incidence angle (θ) was fixed at 45º, the refractive index of whole blood (n1) was determined using the Sellmeier equation (with λ expressed in micrometres) [31]:
$${\text{n}}1\left( {\uplambda } \right) = 1{ } + { }\left( {\frac{{0.7960 \times {\uplambda }^{2} }}{{{\uplambda }^{2} – 1.0772 \times 10^{4} }}} \right) + { }\left( {\frac{{5.1819 \times {\uplambda }^{2} }}{{{\uplambda }^{2} – 78301 \times 10^{5} }}} \right)$$
Similarly, the refractive index of the diamond crystal (n2) was ascertained through its corresponding Sellmeier equation [31]:
$${\text{n}}_{{2}} \left( \lambda \right) = 1{ } + { }\left( {\frac{{4.3356 \times \lambda^{2} }}{{\lambda^{2} – 0.1060^{2} }}} \right) + { }\left( {\frac{{0.3306 \times \lambda^{2} }}{{\lambda^{2} – 0.1750^{2} }}} \right)$$
The acquired spectra were then pre-processed using a Python program to compensate for atmospheric interferences, water vapour and carbon dioxide (CO2) and to discard spectra with poor quality as described by González-Jiménez et al. [18]. The pre-processed spectra were subsequently used for training, testing, and validating machine-learning algorithms.
Selection of machine-learning models
Machine learning analysis was conducted using the Python programming language, version 3.9. Employing a supervised ML classification approach, seven classifiers were evaluated: Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting (XBG), Decision Tree (DT), Extra Tree (ET) Classifier, and Bagging Classifier (BC). The non-anaemic class (50% haematocrit) was utilized for ML algorithm selection and training, while the two other haematocrit ratios (25% and 12.5%) were kept separate and used to assess the impact of anaemia on the ability of the models to classify infected versus non-infected specimen from sets of previously unseen spectra. To do this, the non-anaemia data was shuffled and split into two portions; 70% for model selection, training, and testing, while 30% were kept separate as an unseen dataset for validating the trained model. Further, 70% portion were divided into 80–20% train-test split, respectively. For model selection, training, and testing, balanced classes were ensured through random under-sampling of the majority class.
Stratified shuffle split, tenfold cross-validation (SSS-CV) was employed to select the best machine-learning algorithm for identifying malaria infections. The seven mentioned algorithms were evaluated of the majority, and the best one was selected based on accuracy scores to distinguish malaria infections within the non-anaemia class using three approaches: (i) Cross-validation using datasets with high contrast (Positive class = 6%, N = 230) against the negative class (Negative = 0%, N = 230); (ii) Cross-validation with all concentrations (6%, 0.1%, 0.002%, and 0.00003%) combined as the positive class (N = 220) against the malaria-negative class (N = 220); and lastly, (iii) Cross-validation with low contrast (positive class = 0.00003%, N = 226) against the negative class (N = 226) datasets. Model selection, training and validation were performed on standardized absorption intensities relative to their wavenumbers.
Training, testing and validation of machine learning models to identify malaria parasite presence in non-anaemic spiked blood
The best ML algorithms selected through SSS-CV were then trained on 80% of the spectra data from non-anaemic blood using three distinct approaches: (i) High Contrast: models were trained using highest parasite densities (6%) as positive samples against negatives (0%); (ii) All Concentrations: models were trained using combined all parasite densities (6%, 0.1%, 0.002% and 0.00003%) as positives, against negatives (0%); and (iii) Low Contrast: models were trained using lowest parasite densities (0.00003%) as positives, against negatives (0%). The trained models underwent fine-tuning through Grid Search for optimal hyper parameter optimization.
For testing purposes, the final tuned classifiers were tested on a similar parasitaemia class used for training. For instance, the model trained on 80% of the data with high contrast at 6% against negatives was also tested on the remaining 20% of the data at the highest contrast against negative classes. In addition to accuracy, other evaluation measures such as sensitivity, specificity, recall, and F1-score on the test set (20%) were calculated. Finally, the best classifiers were validated on a completely unseen dataset, the 30% kept separate at the start. Beginning with non-anaemic classes at different parasitaemia levels, the total number of DBS included for model validation in the non-anaemic class with various parasitaemia levels were as follows: validated model performance on predicting malaria infections at non-anaemic conditions when positive at 6% parasitaemia (N = 82) against negative 0% (N = 82); then validated positive at 0.1% (N = 82) against negative 0% (N = 82), followed by positive at 0.002% (N = 82) against negative 0% (N = 82), and finally, positive at 0.00003% parasitaemia (N = 82) against negative 0% (N = 82).
Evaluating the effect of anaemia on performance of MIR-ML for distinguishing between blood samples with and without malaria parasites
The best classifiers developed for predicting malaria infections without considering anaemia were evaluated on a new dataset comprising cases of moderate anaemia (with a haematocrit level of 25%) and severe anaemia (with a haematocrit level of 12.5%). This evaluation was structured across four distinct categories: (i) malaria-positive with a 6% parasitaemia rate (N = 101) versus malaria-negative with a 0% parasitaemia rate (N = 101); (ii) malaria-positive with a 0.1% parasitaemia rate (N = 101) versus malaria-negative (N = 101); (iii) malaria-positive with a 0.002% parasitaemia rate (N = 101) versus malaria-negative (N = 101); and (iv) malaria-positive with a 0.00003% parasitaemia rate (N = 101) versus malaria-negative (N = 101). The accuracy, sensitivity, and specificity of these models computed using a bootstrap method with 100 iterations and establishing a 95% confidence interval for these metrics.
Finally, a generalized linear model was used to test the statistically significant effect of anaemia and parasitaemia on the performance of MIRs-ML approaches in predicting malaria infections.