AnSim: A Dataset of Animal Pair Similarity Judgments

Semantic fluency tasks, in which a patient must list as many examples of a particular category within a fixed time period, are routinely administered during neuropsychological examination. One of the most widely used semantic fluency categories is animals. This dataset contains mean human similarity judgements for 3785 pairs of animals. Each of these pairs was observed in a set of approximately 500 previously collected semantic fluency responses produced by a wide range of individuals ranging in age from 4 to 90, some with neurological conditions such as autism and dementia.

All animal names were first manually standardized for spelling, formatting, and grammatical number. Pairs observed in both orders were mapped to a single order; for example, the pair "cat dog" and "dog cat" were both observed in our data, but we submitted only "cat dog" to our human judges. Human judges were recruited through Amazon Mechanical Turk, and the judgments were collected using Qualtrics. Each worker was assigned 50 pairs of animals and asked to rate the similarity of the two animals on a scale from 1 ("completely different") to 7 ("almost identical"). The input interface used a slider rather than radio buttons, allowing raters to select decimal values for their ratings. Workers were also allowed to skip any pair by checking a box indicating that they were not familiar with one or both of the animals. Repeated pairs and distractors (e.g., highly similar ("groundhog ~ woodchuck") or highly dissimilar ("beetle ~ alpaca")) were used to ensure worker compliance. Each animal pair was presented to at least 5 workers. The majority (89%) of pairs have at least 5 similarity ratings, with all but 42 pairs having 2 or more ratings. Each worker was paid 1USD for rating a set of 50 animals.

Download the AnSim dataset here

These similarity judgments were collected by researchers at Boston College under IRB# 19.176.01e. The semantic fluency responses from which the animals pairs were extracted (not yet publicly available) were collected as part of NIH awards R01DC012033, R21DC017000, R03DC010891, P30 AG008017, R01AG024059, and with the support of the Oregon Brain Aging Study. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the NIH or the OADC.