253
Views
4
CrossRef citations to date
0
Altmetric
Research Articles

The creation of LIFE-M: The Longitudinal, Intergenerational Family Electronic Micro-Database project

, , , , , & ORCID Icon show all
Pages 138-159 | Published online: 17 Aug 2023
 

Abstract

This paper describes the creation of the Longitudinal, Intergenerational Family Electronic Micro-Database (LIFE-M), a new data resource linking vital records and decennial censuses for millions of individuals and families living in the late 19th and 20th centuries in the United States. This combination of records provides a life-course and intergenerational perspective on the evolution of health and economic outcomes. Vital records also enable the linkage of women, because they contain a crosswalk between women’s birth (i.e., “maiden”) and married names. We describe (1) the data sources, coverage, and linking sequence; (2) the process and supervised machine-learning methods used to link records longitudinally and across generations; and (3) the resulting linked samples, including linking rates, representativeness, and weights.

Notes

1 Historical linking to create longitudinal data has been done extensively outside the United States. For the United Kingdom (U.K.), the Cambridge Group for the History of Population and Social Structure hosts several different datasets, primarily representing different areas and time periods for England. The most widely cited database is the family reconstitution data for 26 English parishes (Wrigley et al. Citation2018), which has been used to conduct individual-level studies of fertility, mortality and nuptiality. The Victorian Panel Study links vital and census records from 1851 to 1901 in Great Britain (Schürer Citation2007). Data from Sweden are linked longitudinally from 1650 to 1950 across full-count censuses, emigration records and death records (Wisselgren et al. Citation2014; Berger et al. Citation2023). Available through the Swedpop project, these links have been used in multiple studies and include women as well as men (Dribe, Eriksson, and Scalone Citation2019). The Scanian Economic Demographic Database covers the entire population of five rural and semi-urban parishes and an industrializing town in southern Sweden between 1815 and 2017 (Dribe and Quaranta Citation2020; Bengtsson and Dribe Citation2021). For Canada, the BALSAC database covers the population from an area in Quebec from 1621 to 1965 (Vézina and Bournival Citation2020). The Canadian Peoples contains over 40 million linked census records, representing three generations from the mid-19th century to early 20th century (Foxcroft, Inwood, and Antonie Citation2022). Researchers have also linked records for the U.K. (Long and Ferrie Citation2013) and Norway (Modalsli Citation2017, Citation2023) as well as between Norway and the U.S. (Abramitzky, Boustan, and Eriksson Citation2013; Biavaschi and Elsner Citation2013). We focus our discussion on U.S. databases which are most related to LIFE-M.

2 This large, linked sample follows two earlier linking projects. Guest (Citation1987) created a national sample of men in the 1880 Census linked to the 1900 Census (Guest Citation1987, N = 4,014, linkage rate 39.4%). Ferrie (Citation1996) linked a nationally representative sample of men in the 1850 Census to the 1860 Census (N = 4,938, linkage rate 19.3%).

3 Mohammed and Mohnen (Citation2023) also use a subset of the linked dataset used in Bailey, Mohammed, and Mohnen (Citation2022) to study the impact of Rosenwald schools on labor market outcomes for both men and women.

4 The NLS has subsequently tracked supplemental samples. One covers ages 14–22 in 1979 (N = 12,686) (and children for women in this survey) and another ages 12–16 in 1996 (N = 9,000).

5 A variety of independent administrative and restricted data sources offer a third type of longitudinal, intergenerational data. The National Longitudinal Mortality Study (NLMS) links the Current Population Surveys and other records to death certificates to examine the relationship of demographic and socio-economic characteristics with mortality rates. These large microdata samples (N > 340,000 deaths) generally link individuals ages 50 and older to demographic and socio-economic information in the CPS from about age 40. Researchers have also conducted labor-intensive hand-linkages across censuses (Ferrie Citation1996; Guest Citation1987; Long and Ferrie Citation2013; Collins and Wanamaker Citation2014, Citation2015, Citation2022; Bleakley and Ferrie Citation2013, Citation2014, Citation2016). Many of these linked samples are the property of the researchers who collected or linked them and are not available for public use. Lack of access to these data and substantial barriers to creating such samples limit replication, new research using these data, and analyses of data quality.

6 LIFE-M links more than 170,000 Black Americans and more than 368,000 foreign-born people.

7 Age misreporting is common in the census (e.g., there are a lot more 50- and 60-year-olds relative to 51- and 63-year-olds) as well as on marriage certificates to circumvent minimum age requirements (Blank, Charles, and Sallee Citation2009). Age misreporting is more common for Black Americans (Elo and Preston Citation1994; Logan and Parman Citation2011). 

8 Multiple matches have been so problematic that past work has eliminated common names entirely from samples to be linked (Ferrie Citation1996; Ruggles Citation2002). 

9 We use the terms “birth family” and “marriage family” to distinguish between when someone is a child (birth family) and when they are married or a parent (married family).

10 Completed education is first available in the 1940 Census; literacy is available in censuses prior to 1940.

11 These refer to the full-count censuses for the entire United States.

12 The project also tracked and provided trainers with feedback on their speed, which was determined using the metadata collected from time-stamped uploads and downloads of each batch from the distribution system. Tracking trainer speed helped minimize training costs due to inattention. Increasing accuracy also minimized training costs by reducing the number of records sent for discrepancy review.

13 This is due to name misspellings, incomplete names (e.g., nicknames, initials), transposed first and middle names, and other idiosyncrasies in historical records. The recording of age in the census tends to reflect “age heaping,” the common practice of rounding ages to the nearest multiple of five (A’Hearn, Baten, and Crayen Citation2009; Hacker Citation2013).

14 “Linkability” is determined by the completeness of name and birth year and is described in the notes of Table 4.

15 Linking with 97% precision, means the error rate is only 3%. For the 1940 Census and death records, we can also link with higher error rates of 5 and 10%. The advantage of a higher error rate is more links, thus larger samples. However, the samples only increase in size by, at most, a few hundred thousand.

16 LIFE-M links more than 170,000 Black Americans and more than 368,000 foreign-born people.

Additional information

Funding

This project was generously supported by the National Science Foundation (SMA1539228), the National Institute on Aging (R21AG05691201), the University of Michigan Population Studies Center Small Grants (R24HD041028), the Michigan Center for the Demography of Aging (MiCDA, P30 AG012846-21), the University of Michigan Associate Professor Fund, and the Michigan Institute on Research and Teaching in Economics (MITRE). We gratefully acknowledge the use of the Population Studies Center’s services and facilities at the University of Michigan (R24HD041028). The study team gratefully acknowledges the use of the services and facilities of the Population Studies Center at the UM (P2CHD041028) and the California Center for Population Research at the UCLA (P2CHD041022). We are grateful to Dora Costa, Shari Eli, Adriana Lleras-Muney, Joseph Price, and the board members of the LIFE-M project, including Eytan Adar, George Alter, Hoyt Bleakley, Matias Cattaneo, William Collins, Katie Genadek, Maggie Levenstein, Bhash Mazumder, Evan Roberts, and Steven Ruggles for their helpful suggestions. We are also grateful to Garrett Anstreicher, Sarah Anderson, Meizi Li, Morgan Henderson, Alfia Karimova, Catherine Massey, and Annie Wentz for their excellent contributions to the LIFE-M project and assistance with this project.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 113.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.