Experiments
To assess the effectiveness of endorsements and testimonials, we conduct four RCTs. They compile a control group and seven treatment arms. The experiments are embedded in two original SMS surveys collected as part of this study: one of the survey questions is used to inform respondents, and depending on the randomized subgroup, that information treatment differs. All treatment subgroups use the same base campaign, and the treatment arms are variations from this.
The four experiments conducted in this study are briefly described here, and the details are shown in-full in Table 1.
Experiment 1 – Endorsement variations: The control group receives a base campaign message concerning handwashing, while the first treatment arm pairs the base campaign message with an endorsement by a soccer player on Tanzania’s national team, and the second treatment arm pairs the base campaign with an endorsement by Tanzania’s Ministry of Health.
Experiment 2 – Endorsement variations: The control group receives the same base campaign message, while the first treatment uses the endorsement of the same soccer player, and the other the endorsement of a medical doctor who appears in Tanzanian media as a health commentator.
Experiment 3 – Endorsements and testimonials: The control group is not shown any base campaign message, while the first treatment arm informs participants that sanitation is endorsed by a Tanzanian model and philanthropist, and the second treatment arm trials a testimonial relaying the fictional story of Devota, a woman who lost her three-year-old baby Julius following his contamination from fecal matter.
Experiment 4 – Positively and negatively framed testimonials: The control group is not shown any base campaign message, while the two treatment arms trial testimonials which are variations of Devota’s fictional story described above.
Institutional review board approval was received from Solutions IRB (https://www.solutionsirb.com/;Protocol#2020/08/32) and all procedures were performed in accordance with this protocol, including receiving informed consent from all participants.
Outcomes
Treatment effects are measured along outcomes which are self-reported, and collected in the survey where experiments are embedded, after treatments are administered. This study draws on three outcome variables in total, shown in-full in Table 2. The first outcome measures respondents’ use of water (water use). For this measure, respondents were asked to allocate a fixed amount of water to handwashing, at the expense of other water needs. The second outcome, sanitation investment, measures investment into sanitation by capturing the share of a fixed grant amount that respondents allocate to toilet, when the alternative investment categories are water and electricity. The third outcome measures the respondent’s priority of sanitation (sanitation priority), which indicates whether the respondent perceives having an improved toilet to be the most important way to keep children healthy. Regarding the two continuous outcome variables concerning water use and sanitation investment, the survey questions are worded in a forward-looking manner, so that respondents’ answers can capture treatment effects. This means that we asked respondents how they intend to act in the near future, rather than how they acted in the past. We recode outcomes so that they can be used in regressions, in the way shown in the last column of Table 2. As a convention throughout this paper, all outcomes are coded in ascending order of desirability.
Not all outcomes are asked in each experiment, rather we ask the outcomes corresponding with the WASH topic of the experiment. This includes one outcome for handwashing experiments (water allocation) and two outcomes for sanitation (sanitation investment and sanitation priority). In Supplementary Figs. S1–S4, we provide one diagram for each experiment, showing the treatment subgroups alongside the outcome numbers used to measure effects.
Data and sampling
This study draws on original data from two survey rounds collected in January 2021 (Survey #1) and June 2021 (Survey #2) with more than 1,000 respondents each. The surveys were administered in Swahili, Tanzania’s national language, and their observation unit is the individual. Almost two hundred individuals participated in several SMS surveys which included experiments that we conducted since the start of the pandemic (one survey in summer 2020 without experimentation using endorsements or testimonials in addition to the two surveys presented in the present paper). To focus on unique observations, their surplus observations were dropped. After those individuals are removed, the sample size of each survey is 1,279 for Survey #1 and 1,122 for Survey #2, yielding a total of 2,238 unique individuals. As a robustness check, we also run a model where surplus observations are left in (see below).
The surveys were collected by self-fill SMS. Self-fill SMS is a collection mode where questions and responses are administered by text, via the respondents’ phones. The survey starts with an invitation to take part in the survey. If respondents provide consent, they receive the survey’s questions one by one, and respond by text their answers. Depending on the answers they text, the flow of questions adjusts, following patterns programmed ex-ante. Answering the survey is free of charge to the respondents, as the costs are paid by the survey firm. Respondents who complete the survey receive airtime, sent to the phone number they used to take the survey within two days. The gross amount of airtime is TZS 1,000, inclusive of TZS 153 VAT, i.e., a net airtime amount of TZS 847, equal to USD 0.36 at the time of writing. This amount of compensation is in line with local customs: high enough to show appreciation, while not excessive.
The sample frame for these surveys starts from lists of sim card users registered with the three largest phone operators in Tanzania: Vodacom, Airtel, and Tigo. At the beginning of every year, the survey firms conduct a process called “users indexing”, where phone users are approached and asked if they would agree to take part in future SMS surveys. Those who agree eventually form the sample frame used to build the sample. Sampled phone numbers receive a text offering for them to participate. Those who opt to do so enter the survey. To support representativeness, the sample is built to match national-level statistics along four variables (region, rural vs urban, gender, and age groups), whose targets are provided in Supplementary Table S1. The target sample size is 1,000 respondents by survey, however as a result of filling these quotas, the final sample size slightly exceeds 1,000. Supplementary Tables S2 and S3 display descriptive statistics for the study population in each survey. A step-by-step guide describing the study implementation is provided in the Supplementary Information.
It should be noted that when several experiments are conducted in the same survey, the respondents from that survey participate in all those experiments. For instance, Experiments 1 and 3 were both conducted in Survey #1. Therefore, by the time respondents participate in Experiment 3, they have already received the treatments from Experiment 1. Importantly, random assignments across experiments are independent from each other. Thus, treatment status from earlier experiments is expected to be balanced in later ones, and therefore not to drive the results of later experiments.
Balance
The baseline balance tables for the experiments are shown in Supplementary Tables S4–S9, and we confirm that randomization ensured balance across observed measures. For instance, Supplementary Table S6 assesses balance for Experiment 3. It includes 13 pre-intervention covariates – for each, the table shows the means of subgroups, the mean differences across subgroups, and their statistical significance. With three subgroups, 13 covariates result in 39 mean comparisons, for which the number of expected “by-chance” imbalanced tests at the 5% level is 1.95. We find one imbalanced test significant at the 5% level, and four at the 10% level, with small magnitudes.
Because this paper analyzes results by pooling observations, below we also report balance at the pooled level. To allay concerns that the existing imbalances could partly drive the findings, we further run a model where all covariates found to be imbalanced at 10% in any of the experiments are included as controls. We discuss this as a robustness check in the below subsection, and find that the results are not driven by these imbalances.
Empirical strategy
Randomized assignment of the interventions is secured based on a cryptographic random number generator, using the cryptographic service provider. This assigns respondents to the various subgroups randomly following a uniform distribution. Random assignment is done across all SMS enumerations which are active at a given point in time. We do not anticipate spillovers between participants, as they are spread across the sample’s region, and are not connected to each other.
To organize findings on the behavioral designs trialed in this study, we pool the seven treatment arms into endorsements and testimonials. This section starts by describing how the observations are pooled, and then presents the econometric specification.
Pooling of treatment
We start by appending the data sets from the two surveys. Then, for the two behavioral change strategies examined in this paper, we build an overall pooled treatment variable. For instance, five treatment arms in this study relate to endorsements, and the pooled treatment variable is equal to 1 if the observation belongs to any of those, and 0 otherwise. Second, we build “subpool” treatment variables, where we pool together only some of these treatment arms, based on a given characteristic they share. For instance, within the pooled analysis of endorsements, we build a subpool treatment variable for all endorsements by celebrities, and one for those by non-celebrities. This allows us to explore different levels of heterogeneity in treatment. One particular case is when the pooled analysis of a behavioral design draws on several experiments coming from the same survey. For instance, the pooled analysis of endorsements draws on the treatment arms from Experiment 1, and the first treatment arm from Experiment 3. It is not immediately possible to generate the overall pooled treatment variable, as a given individual from Survey #1 may be in control under Experiment 1, and in treatment under Experiment 3. In this case, the data include for the same individual two observations. One observation corresponds to the treatment assignment and outcome variable in the first experiment and the other corresponds to the treatment assignment and outcome variable in the third experiment. The econometric specification controls for the experiment that generated the observation (see Equation 1).
For endorsement, we draw on a total of five treatment arms from three experiments conducted across two surveys. The details of these treatment arms are shown in Supplementary Fig. S5. They include endorsements by Tanzania’s Ministry of Health and by three celebrities: a soccer player on Tanzania’s national team; a fashion model and philanthropist; and a medical doctor providing health commentary in Tanzanian media.
The pooled analysis of testimonials draws on three treatment arms from two experiments conducted across Survey #1 and Survey #2, and the details are shown in Supplementary Fig. S6. The three treatment arms relay the fictional story of an everyday person named Devota and her three-month-old son Julius. Two treatment arms (Testimonials #1 and #3) explain that Julius died from dehydration, due to lack of an improved toilet in the household. In contrast, the third treatment arm (Testimonial #2) explains that Julius is in good health, and that the household benefits from having an improved toilet.
Pooling of outcomes
We pool experiments which sometimes use different outcomes to measure effects. This occurs primarily when the experiments being pooled pertain to different WASH topics (handwashing, clean water, etc.). Specifically, for the pooled endorsement analysis, we draw from experiments which relate to handwashing (all using one same outcome, water use) and one treatment arm relates to sanitation (using two outcomes, sanitation investment and sanitation priority). As all testimonial arms relate to the topic of improved toilets, they use the same two respective outcomes, i.e., sanitation investment and sanitation priority.
To support comparability across experiments, we produce pooled outcomes for each pooled treatment analysis, i.e., one for endorsements (index of handwashing and sanitation) and one for testimonials (index of improved sanitation). Following the approach of standardized mean differences commonly found in meta-analyses33, we build pooled outcomes in two steps. First, we recode the outcomes obtained in a given experiment from 0 to 1 and average the various recoded outcomes into an experiment index. Second, we standardize the experiment’s index with mean equal to 0 and standard deviation equal to 1 in the experiment’s control group. For instance, we asked respondents receiving the endorsement of improved toilets provided by the model the questions in relation to the outcomes sanitation investment and sanitation priority. We calculate the average over the two re-coded outcomes and standardize this pooled variable.
The details of these calculations are provided at the bottom of the diagrams detailing the pooling procedure in Supplementary Figs. S5 and S6.
Econometric specification
The pooled analyses of the behavioral designs are conducted using the econometric specifications described below. Here, we describe the specification using endorsements as an illustration, but the same specification applies to testimonials. We probe the robustness of the results with respect to clustering the standard errors by individuals and including imbalanced baseline covariates. Across the study, we declare as survey design in Stata that the sample is stratified by region.
Equation 1 specifies the model for average treatment effects of endorsements, using the terms below.
$$\begin{aligned} Y_{i}=\alpha _0+\sum _{j=1}^{J}\alpha _jEndorsement_{ij}+\sum _{k=1}^{K}\beta _kExp_{ik}+\varepsilon _{i} \end{aligned}$$
(1)
\(Y_{i}\) is the outcome variable for individual i.
\(Endorsement_{ij}\) is the treatment variable for an endorsement of type j, equal to 1 if individual i was treated with it, and 0 otherwise. For instance, if the model includes a subpool of all endorsement arms by celebrities and a subpool of all endorsement arms by non-celebrities, we include \(Endorsement_{i1}\) as the treatment variable for the former, and \(Endorsement_{i2}\) as the treatment variable for the latter.
\(\alpha _j\) is the coefficient of interest for endorsement of type j.
\(Exp_{ik}\) is an indicator equal to 1 if individual i received experiment k, and 0 otherwise. As individual experiments are self-contained within each survey round, i.e., specific treatments are solely provided in one of the two survey rounds, the binary variables indicating the experiment simultaneously control for the survey the observation was collected in. For example, the first and third experiments were both conducted within the first survey round, which is why respective binary indicators additionally capture fixed effects pertaining to this survey round.
The set of observations starts from the dataset appending the two surveys. Each respondent participated in one of the two survey rounds. We drop from the dataset respondents who did not participate in any experiment related to the behavioral design at hand.
We run two additional models as robustness checks. First, we add to Eq. (1) a vector of covariates \(X_{i}\) found to be imbalanced across subgroups. Second, we add regional fixed-effects to Eq. (1). The results are presented in the additional model section in the Supplementary Information (Tables S10–S17).
Credit: Source link