DeathGPT is the nickname given by the FT to an algorithm developed by a group of researchers in AI. It mirrors two theories I have already covered in these Newsletters.
The Danish Government has huge collection of data on each of its citizens. Anyone born in Denmark has a unique identification number. So too does anyone who lives there for at least three months. The Government have many medical registries. Registries of cancer and other diseases. Registries of treatments, hospital and GP visits and even prescriptions. Their administrative data tracks major life events such as birth, deaths, marriages, divorces and children. It also tracks occupation, income level, residency, working hours and education. The unique identification number allows them to look at all 8m residents as individuals. They can track major life events by the day for each person.
Building an AI model requires a database on which the system can learn. These researchers have used the Danish Registries. The system learnt on the lives of millions of individuals over a whole decade. There were literally millions of observations on hundreds of variables. The researchers came up with a very novel way of analysis. They converted the data into a life story for each person. “Fractured a forearm”. “Worked in manufacturing of hand tools”. Each incident included either the age of the person or the date of the incident. They could then use the same “large language models” that underpin ChatGPT.
life2vec
When building such a model we want to know whether it worked. The first thing they discovered was the AI had structured the data for itself. That structure made logical sense. It had worked out “salary” and understood the relationship with time. It had grouped together health issues. It had separated maternal baby issues from other diseases. There were many other examples. The model was robust. It reappeared in subsets of the data. They called it “life2vec”.
Forecasts
They held out the last five years of the data and reconstituted the underlying model. It then forecast how many people between 35 and 65 would survive the next five years. (Hence DeathGPT). With 3.5m people the model outperformed all standard and AI based forecasts of 100,000 people. They also successfully forecast emigration in that period.
Finally, they took a sample of 5000 people for whom they had a psychological profile. They forecast their scores on various dimensions of extraversion. They were remarkably successful. The underlying life course model remains the same. All that happens is it uses different variables within the model for each forecast. Removing variables reduced the accuracy of the forecasts.
The Exposome and the Life Course model
Implicit in such a study is a test of the Exposome. In 2004 researchers had just mapped the human genome for the first time. They knew the genome would only explain a small part of the health and the ageing of the population. The environment in all its forms would account for far more individual health differences. The EXPOSOME was conceived to fill the gap. In fact, a study of 44,000 pairs of twins over their lifetime confirmed this. They studied the incidence of many different cancers. If one twin got cancer, what were the odds the other would get it, compared to the odds for the general population? They were able to measure how much the different cancers were inherited. Some cancers such as breast and prostate cancer were hereditary. At least 25% and 45% was inherited respectively. Others were not. They looked across twenty-eight cancers. Their overall conclusion was our genetics only accounts for a small part of susceptibility to cancer. The environment has the principal contributor. The same has been found for many other diseases.
The Exposome model assumes the cumulative impacts of the environment over a lifetime. This will affect health and other outcomes. The complexity of the relationships within the Exposome is huge. Researchers have trouble proving its validity. Researchers have focused instead on one factor at a time. For example, the impact of diet of health (Newsletter #106 The Genome and the Exposome). These researchers in Computational Science never reference the Exposome. They may in fact be making it real.
“The Life Course” approach sees life as a series of events. As a concept it was first developed in sociology. It assumes ageing is an individual development process. Its perspective is that there are sequential life stages. The transitions from one stage to the next form pathways. Each transition generates stressors. Our responses to those stressors develop us. They can be transformational when we move from being an “individual” to being a “parent”. Others can change our perspective on life. To lose parents or close friends can have profound effects on our motivations. The impact depends on our trajectory.
The events can influence our satisfaction with life. A recent study tried to forecast life satisfaction. They used chronological age. (Studies show that life satisfaction has a low point in mid-life). They also constructed for individuals the pattern of their life course. It was clear that the life course was the best predictor of life satisfaction. Not the status of one’s life today but the history that got you to that point (Newsletter #115 Models of Life).
Life2vec is a life course model for 8m people. It encompasses all possible trajectories through life. Its stability validates the “life course model”. It can be used to make forecasts. It can even forecast one’s personality. Academic research is very siloed. The computational science article never mentions the existing “life course” literature either!
Missing the Implications.
Like every other AI model life2vec is partially a Blackbox. It cannot tell us the intricate relations that would be so important. It is afterall working in 280-Dimensional space. Those relationships are so important because they can tell us what to do. It cannot tell us whether our personality determines our life course. It does not know whether our life course instead influences our personality.