Concurrent Validity and Feasibility of Short Tests Currently Used to Measure Early Childhood Development in Large Scale Studies: Methodology and Results

In low- and middle-income countries (LIMCs) measuring early childhood development (ECD) with standard tests in large scale surveys (i.e. evaluations of interventions) is difficult and expensive. Multi-dimensional screeners and single-domain tests ('short tests') are frequentlyused as alternatives. However, their validity in these circumstances is unknown. We examine the feasibility, reliability, and concurrent validity of three multi-dimensional screeners -the Ages and Stages Questionnaires (ASQ-3), the Denver Developmental Screening Test (Denver-II), the Battelle Developmental Inventory screener (BDI-2) -and two single-domain tests- the MacArthur-Bates Short-Forms (SFI and SFII) and the WHO Motor Milestones (WHO-Motor)-in 1,311 children 6-42 months in Bogota, Colombia. We compare scores on these short tests to those on the Bayley Scales of Infant and Toddler Development (Bayley-III), which we take as the 'gold standard'. The Bayley-III was given at a center by psychologists; whereas the short tests were administered in the home by interviewers, as in a survey setting. Concurrent validity of the multi-dimensional tests' cognitive, language, and fine motor scales with the corresponding Bayley-III scale is low below 19 months but increases with age, becoming moderate-to-high over 30 months. In contrast, gross motor scales' concurrence is high under 19 months and then decreases. Of the single-domain tests, the WHO-Motor has high validity with gross motor under 16 months, and the SFI and SFII expressive scales show moderate correlations with language under 30 months. Overall, the Denver-II seems the most feasible and valid multi-dimensional test and
the ASQ-3 performs poorly under 31 months. By domain, gross motor development has the highest concurrence below 19 months, and language above. Results do not vary by household socio-economic status. Predictive validity investigation is nonetheless needed to further guide the choice of instruments for large scale studies.