For my first postdoctoral position, I was lucky enough to become a member of the Gestational Project, a research project funded by the Bill and Melinda Gates Foundation that was a collaboration between the School of Computer Science and the School of Medicine at the University of Nottingham.
The goal of the project was to create a system that would combine photographs of a newborn’s face, foot and ear to calculate their gestational age right after birth. The gestational age of a baby is important because it measures how premature they are at birth, which directly influences the type of treatment that they will receive. The most accurate dating method right now is an ultrasound scan, but these machines are expensive, require trained personnel and cannot always be deployed to remote areas. In the absence of ultrasound scans, there is a manual method that can be used, the Ballard Score. The Ballard Score looks at the posture and measurements of a newborn in order to determine their gestational age. However, this method is highly subjective and results can vary widely depending on the experience of the person carrying out the exam.
The ultimate idea was to create an accurate system that would combine the objectivity and accuracy of an ultrasound scan and the postnatal measurements of the Ballard Score, therefore allowing non-trained personal to measure the gestational age of newborns. Such method would be useful in remote or under-funded areas where ultrasound machines or trained physicians might not be available or easily accessible.
I found the project fascinating and its area of application, healthcare, was new for me. In my PhD, I had worked in environmental problems, where, if I needed data, I could either download free images from sources like Geograph or go outside and take a picture of plants (you may think I’m exaggerating, but I am not). In other words, data collection was fast, straightforward and I had to think very little about its effects on the design of my algorithm. If accuracy was low, I could go through the motions again, download or take more photos, and see if that helped.
However, when I started working in this project, I realised that my data collection and maintenance habits had to become much more nuanced and serious. Just to get through NHS Ethics took 8 months, which is more than reasonable, since we needed to take photographs of such a protected group. But, even after getting the approval, recruiting participants was, also understandably, a challenge.
To ensure privacy and safety, images had to be stored in encrypted disks and access to them was limited to only the four people working on the project. Only the team at QMC were allowed to recruit and take the photos, so I had no information about the parents or newborns other than their photos and measurements. A downside of this measures is that I have never been able to express my thanks to all the parents (and babies!) who were generous enough to allow the team take photos, even when they were in dangerous circumstances. It was only due to their kindness that we were able to collect any data at all and to them we owe them the success of the project.
The characteristics of the data, including how challenging it was to collect, became a deciding factor in the design of the algorithm. If the accuracy in the calculations was low, I could not rely on additional and instantaneous data collection. Instead, I had to rethink the algorithm design and change it to work with the data we had. Incorporating these notions into our design, we were able to create a system that combined deep learning and regression and that was able to calculate the gestational age of newborns with 6 days of error in the worst case scenario [1].
All in all, I can say that I loved working on the Gestational Project and that I learned a lot. I think that its challenges made me a more conscious researcher, one who treats data collection and management much more carefully now, independently of the area of application.
Right now, we are working to use the Centre for Doctoral Training Impact Fund to continue to increase the size and diversity of our dataset, hopefully collecting data in other parts of the country and the world. You can find more information about the GestATional Project here.
[1] Torres, M.T., Valstar, M.F., Henry, C., Ward, C. and Sharkey, D., 2017, May. Small Sample Deep Learning for Newborn Gestational Age Estimation. In Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on (pp. 79-86). IEEE
Mercedes Torres Torres, Transitional Assistant Professor, Horizon Digital Economy Research