Guidelines for Referring to the HapMap Populations in Publications and Presentations
It is important to exercise care when labeling the populations whose samples were used to develop the HapMap in any publications and presentations that describe the Project or use Project data. This document provides guidelines on how to refer to the populations and includes other relevant background information about each population.
Rationale
The way that a population is named in studies of genetic variation, such as the HapMap, has important ramifications scientifically, culturally, and ethically. From a scientific standpoint, precision in describing the population from which the samples were collected is an essential component of sound study design; the source of the data must be accurately described in order for the data to be interpreted correctly. From a cultural standpoint, precision in labeling reflects acknowledgement of and respect for the local norms of the communities that have agreed to participate in the research. From an ethical standpoint, precision is part of the obligation of researchers to participants, and helps to ensure that the research findings are neither under-generalized nor over-generalized inappropriately. The use of careless and inconsistent terminology when describing the populations represents a failure in all three of these areas.
The populations included in the HapMap should not be named in such a way that they single out small, discrete communities of individuals and imply that those communities are somehow genetically unique, of special interest, or very different from their close neighbors. Labels that are too specific could also invade the privacy interests of communities (or even, conceivably, of individual sample donors).
On the other hand, describing the populations in terms that are too broad could result in inappropriate over-generalization. This could erroneously lead those who interpret HapMap data to equate geography (the basis on which populations were defined for the HapMap) with race (an imprecise and mostly socially constructed category). This, in turn, could reinforce social and historical stereotypes, and lead to group stigmatization and discrimination in places where members of the named populations or of closely related populations are minorities.
The guidelines in this document take into account the above considerations. They also incorporate input obtained in some of the communities during the course of extensive community consultations about how the samples collected in those communities should be named, at least the first time that they are described in a publication or presentation.
Recommended Descriptors
The complete recommended language for naming the populations included in the HapMap (which reflects both the ancestral geography of each population and the geographic location where the samples from that population were collected) is:
* Yoruba in Ibadan, Nigeria (abbreviation: YRI)
* Japanese in Tokyo, Japan (abbreviation: JPT)
* Han Chinese in Beijing, China (abbreviation: CHB)
* CEPH (Utah residents with ancestry from northern and western Europe) (abbreviation: CEU)
After the complete descriptor for a population has been provided, it is acceptable to use a shorthand label for that population (e.g., "Yoruba," "Japanese," "Han Chinese," "CEPH") or the abbreviation for that population (e.g., "YRI," "JPT," CHB," "CEU") in the remainder of the article or presentation. However, the full descriptor for each population should be provided before such shorthand labels are used. This will help to avoid the risks associated with over-generalization of findings.
The sample sets should not be described as having come from "normal controls." Because no phenotypic information was collected with the samples, we have no way of knowing what sorts of medical conditions they have.
Recommended Language for Describing Criteria for Population Assignments
In addition to providing the complete descriptor for each population when first describing the populations, the criteria used to assign membership in each population should be noted. Appropriate language for doing this is:
For the Yoruba, donors were required to have four of four Yoruba grandparents. For the Han Chinese, donors were required to have at least three of four Han Chinese grandparents. For the Japanese, donors were simply told that the aim was to collect samples from persons whose ancestors were from Japan. The criteria used to assign membership in the CEPH population have not been specified, except that all donors were residents of Utah.
Additional Background about the Populations
* Yoruba in Ibadan, Nigeria(YRI)
These samples were collected in a particular community in Ibadan, Nigeria, from individuals who identified themselves as having four Yoruba grandparents. It is important to include a reference to "Ibadan, Nigeria" when describing the source of these samples out of respect for the community's wishes. Including the name of the city and country where these particular Yoruba samples were collected also reinforces the point that the sample set does not necessarily represent all Yoruba people, whose population history is complex. These samples should not be described merely as "African," "Sub-Saharan African," "West African," or "Nigerian," since each of those designators encompasses many populations with many different ancestral geographies. Note that the adjective form is "Yoruba," as in "the Yoruba samples," not "Yoruban." The accent is on the first syllable (YOR-u-ba).
* Japanese in Tokyo, Japan(JPT)
These samples were collected in the Tokyo metropolitan area, from people who came from (or whose ancestors came from) many different parts of Japan. Thus, this set of samples can be viewed as representative of the majority population in Japan. It is considered culturally insensitive in Japan to inquire about ancestral origins; thus, prospective donors were told that the aim was to include samples from people whose grandparents were all from Japan. These samples should not be described merely as "Asian" or as "East Asian," terms that encompass many populations whose ancestors came from places other than Japan.
* Han Chinese in Beijing, China(CHB)
These samples were collected from individuals living in the residential community at Beijing Normal University who were self-identified as having at least three out of four Han Chinese grandparents. Although individuals of Beijing Normal University were from many different parts of China, this set of samples was not drawn to be representative of all Han Chinese people. The samples also should not be seen as representing all people in China, where there are 56 officially recognized ethnicities. Like the samples from Japan, these samples should not be described merely as "Asian" or "East Asian."
* CEPH (Utah Residents with Northern and Western European Ancestry)(CEU)
These samples were collected from people living in Utah with ancestry from northern and western Europe. The term "CEPH" stands for the Centre d'Etude du Polymorphisme Humain, the organization that collected these samples in 1980. Because the importance of precision in assigning group membership to prospective donors based on ancestral geography was not well appreciated in 1980, it is unclear how accurately these samples reflect the patterns of genetic variation in people with northern and western European ancestry. These samples should not be described as "European," nor seen as representing people with ancestry from other parts of Europe (e.g., southern or eastern Europe). The samples also should not be described as "Caucasian," a term that carries racial overtones, and that technically refers only to people from the area between the Black and Caspian seas.