73
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Duplicate Records in WorldCat for 20th-Century American, British, and Canadian Books: A Comparison of Duplication Rates and Causes

ORCID Icon
Received 01 Mar 2024, Accepted 22 Apr 2024, Published online: 11 May 2024
 

Abstract

The bibliographic record duplication rates in WorldCat for 20th-century books cataloged in English and published in New York, London, and Montreal were estimated by sampling records created by Concordia University. Duplicate sets were identified according to OCLC WorldCat record merging guidelines. New York and London records had similar duplication rates, higher than those for Montreal. Changing descriptive cataloging standards, brief cataloging, and both typographical and MARC coding errors caused failures in duplicate record detection. Records for editions, reproductions, fine arts titles, and conference publications were particularly problematic. The earlier cataloging standards were reviewed to uncover how differences have caused duplicates.

Acknowledgements

Thanks are due to Jay Weitz, retired Senior Consulting Database Specialist at OCLC, and Pat Riva, Associate University Librarian, Collection Services, Concordia University, for providing comments on a draft of this article.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 More information about the OCLC Member Merge Program is available in the WorldShare Record Manager Release Notes dated August 1, 2023.

2 “OCLC Member Quality Assurance,” Bibliographic Formats and Standards, OCLC, last modified April 6, 2023, accessed January 18, 2024, https://www.oclc.org/bibformats/en/quality.html#oclcmemberqualityassurance.

3 Jay Weitz, e-mail to author, August 16, 2022.

4 “OCLC Member Quality Assurance.”

5 Jenny Toves, “Machine Learning and WorldCat: Improving Records for Cataloging and Discovery,” Hanging Together, the OCLC Research Blog, August 14, 2023, accessed January 18, 2024, https://hangingtogether.org/machine-learning-and-worldcat-improving-records-for-cataloging-and-discovery/.

6 Ilija Subasic, Nebojsa Gvozdenovic, and Kris Jack, “De-Duplicating a Large Crowd-Sourced Catalogue of Bibliographic Records,” Program 50, no. 2 (2016): 143.

7 “Record Merge Field Transfer and Merge Matrix,” OCLC, last modified April 12, 2023, accessed January 18, 2024, https://help.oclc.org/WorldCat/Metadata_Quality/Member_Merge/Record_merge_field_transfer_and_merge_matrix.

8 Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios, “Duplicate Record Detection: A Survey,” IEEE Transactions on Knowledge and Data Engineering 19, no. 1 (2007): 2.

9 Siv Hunstad, “Norwegian Bibliographic Databases and the Problem of Duplicate Records,” Cataloging & Classification Quarterly 8, no. 3–4 (1988): 239–48.

10 Shirley Anne Cousins, “Duplicate Detection and Record Consolidation in Large Bibliographic Databases: The COPAC Database Experience,” Journal of Information Science 24, no. 4 (1998): 231–40.

11 Anestis Sitas and Sarantos Kapidakis, “Duplicate Detection Algorithms of Bibliographic Descriptions,” Library Hi Tech 26, no. 2 (2008): 287–301.

12 Shoichi Taniguchi, “Duplicate Bibliographic Record Detection with an OCR-Converted Source of Information,” Journal of Information Science 39, no. 2 (2012): 153–68.

13 Subasic, Gvozdenovic, and Jack, “De-Duplicating a Large Crowd-Sourced Catalogue,” 138–56.

14 Sitas and Kapidakis, “Duplicate Detection Algorithms,” 297.

15 Thomas B. Hickey and David J. Rypka, “Automatic Detection of Duplicate Monographic Records,” Journal of Library Automation 12, no. 2 (1979): 125–42.

16 Judith J. Johnson and Clair S. Josel, “Quality Control and the OCLC Data Base: A Report on Error Reporting,” Library Resources & Technical Services 25, no. 1 (1981): 40–47.

17 Patricia Dwyer Wanninger, “Is the OCLC Database Too Large? A Study of the Effect of Duplicate Records in the OCLC System,” Library Resources & Technical Services 26, no. 4 (1982): 353–61.

18 Douglas A. Cargille, “Variant Edition Cataloging on OCLC: Input or Adapt?,” Library Resources & Technical Services 26, no. 1 (1982): 47–51.

19 Barbara Jones and Arno Kastner, “Duplicate Records in the Bibliographic Utilities: A Historical Review of the Printing versus Edition Problem,” Library Resources & Technical Services 27, no. 2 (1983): 211–20.

20 Edward T. O’Neill and Diane Vizine-Goetz, “Quality Control in Online Databases,” Annual Review of Information Science and Technology (ARIST) 23 (1988): 125–56.

21 Mary Anne Fox and Barbara G. Preece, “Upgrading Minimal Level Monographic ­Records: A Study and Conclusions,” Technical Services Quarterly 8, no. 4 (1991): 25–35.

22 Barbara G. Preece and Mary Anne Fox, “Preliminary LC Records for Monographs in OCLC,” Information Technology and Libraries 11, no. 1 (1992): 3–9.

23 Edward T. O’Neill, Sally A. Rogers, and W. Michael Oskins, “Characteristics of Duplicate Records in OCLC’s Online Union Catalog,” Library Resources & Technical Services 37, no. 1 (1993): 59–71.

24 Martha M. Yee, “Manifestations and Near-Equivalents: Theory, with Special Attention to Moving-Image Materials,” Library Resources & Technical Services 38, no. 3 (1994): 227–55.

25 Jeffrey Beall, “The Impact of Vendor Records on Cataloging and Access in Academic Libraries,” Library Collections, Acquisitions, and Technical Services 24, no. 2 (2000): 229–37.

26 Laura D. Shedenhelm and Bartley A. Burk, “Book Vendor Records in the OCLC Database: Boon or Bane?,” Library Resources & Technical Services 45, no. 1 (2001): 10–19.

27 Gail Thornburg and W. Michael Oskins, “Misinformation and Bias in Metadata Processing: Matching in Large Databases,” Information Technology and Libraries 26, no. 2 (2007): 15–26.

28 OCLC, WorldCat Quality: An OCLC Report (Dublin, OH: OCLC Online Computer Library Center, 2011); accessed January 18, 2024, https://www.oclc.org/content/dam/oclc/reports/worldcatquality/214660usb_WorldCat_Quality.pdf.

29 Cathy Blackman, Erica Rae Moore, Michele Seikel, and Mandi Smith, “WorldCat and SkyRiver: A Comparison of Record Quantity and Fullness,” Library Resources & Technical Services 58, no. 3 (2014): 178–86.

30 “When to Input a New Record,” Bibliographic Formats and Standards, OCLC, last modified July 15, 2021, accessed January 18, 2024, https://www.oclc.org/bibformats/en/input.html.

31 “Merging Duplicate Books Records: A Field-by-Field Comparison,” OCLC, last modified March 23, 2023, accessed January 18, 2024, https://help.oclc.org/WorldCat/Metadata_Quality/Member_Merge/Merging_duplicate_books_records_A_field_by_field_comparison.

33 O’Neill and Vizine-Goetz, “Quality Control in Online Databases,” 133.

34 Jay Weitz, e-mail to author, February 22, 2024.

35 “Library of Congress Rule Interpretations (LCRI),” Cataloging Service Bulletin 92 (Spring 2001): 11.

36 “Rule Interpretations,” Cataloging Service Bulletin 2 (Fall 1978): 2.

37 Anglo-American Cataloguing Rules, Second Edition, 2002 Revision, 2005 Update (Chicago: American Library Association, 2005); accessed January 18, 2024, https://original.rdatoolkit.org/.

38 “Library of Congress Rule Interpretations,” Cataloging Service Bulletin 11 (Winter 1981): 5.

39 “Rule Interpretations,” Cataloging Service Bulletin 1 (Summer 1978): 6.

40 “Library of Congress Rule Interpretations (LCRI),” Cataloging Service Bulletin 13 (Summer 1981): 10.

41 “Library of Congress Rule Interpretations (LCRI),” Cataloging Service Bulletin 102 (Fall 2003): 18.

42 Differences Between, Changes Within: Guidelines on When to Create a New Record (Chicago: Association for Library Collections & Technical Services, 2007); accessed January 18, 2024, https://www.ala.org/alcts/sites/ala.org.alcts/files/content/resources/org/cat/differences07.pdf.

43 Jay Weitz, e-mail to author, April 19, 2024.

44 Jay Weitz, e-mail to author, August 16, 2022.

45 O’Neill, Rogers, and Oskins, “Characteristics of Duplicate Records,” 71, 68.

46 Ibid., 62.

47 Shedenhelm and Burk, “Book Vendor Records in the OCLC Database,” 15.

48 Jay Weitz, e-mail to author, February 22, 2024.

49 Jones and Kastner, “Duplicate Records in the Bibliographic Utilities,” 213.

50 “Words Indicating Both ‘Edition’ and ‘Printing’,” Cataloging Service Bulletin 17 (Summer 1982): 32–33.

51 Yee, “Manifestations and Near-Equivalents,” 231.

52 Hickey and Rypka, “Automatic Detection of Duplicate Monographic Records,” 133.

53 Subasic, Gvozdenovic, and Jack, “De-Duplicating a Large Crowd-Sourced Catalogue,” 153.

54 Hickey and Rypka, “Automatic Detection of Duplicate Monographic Records,” 133.

55 O’Neill, Rogers, and Oskins, “Characteristics of Duplicate Records,” 70.

56 “Merging Duplicate Books Records: A Field-by-Field Comparison.”

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 309.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.