Genetic Ancestry, Jewish

GENETIC ANCESTRY, JEWISH

Background

The human genome refers to approximately three billion chemical letters (nucleotides) comprising the sequence of deoxyribonucleic acid (DNA) in almost every cell of each human being. There are four different nucleotides (adenine, guanine, cytosine, thymidine), such that each of the approximately three billion sites of the human DNA sequence comprising the human genome is occupied by one of these four nucleotide chemical letters. Human genome analysis has revealed that on the face of the planet, on average, any two individuals differ from each other at fewer than merely 0.1% (1/10,000) of these sites. These differences among individuals arise from inaccuracies during the process wherein DNA is replicated and transmitted from generation to generation. Furthermore, the pattern of variable sites is not randomly scattered across the 3 billion-nucleotide genome. Rather, certain combinations of variable sites are often transmitted in blocks known as haplotypes.

DNA sequence variants are detected by genotyping or DNA sequencing methods. In the minority of cases, such variable sites may predispose to disease (disease-predisposing mutations), but for the most part they simply serve as "neutral DNA markers." In addition to medical and forensic applications, DNA sequence variation markers are convenient for tracing shared ancestries, family relations, genealogic networks, migratory patterns, and geographic origins of individuals, communities, and populations. This discipline is called DNA sequence based phylogenetics or phylogeography.

While analysis of the genome provides important insights with respect to population history, including Jewish origins and history – for both scientific and ethical reasons, such analysis does not provide an appropriate tool for establishing Jewish or any other religious or ethnic identity at an individual or community level. Scientifically, the variation in DNA sequence identity among Jews is too broad, and overlaps that of non-Jews sufficiently, so as to negate the concept of unique or characteristic genomic markers for Jews. Furthermore, Jewish identity is a concept based on tradition, law, culture, and custom, rather than on physical considerations, including DNA sequence. Attempts to use any biological markers to establish Jewish identity in individuals have been fraught with unwanted and tragic consequences in the past. Therefore, inferences regarding patterns of DNA sequence variation should be interpreted with great caution, with regard to both scientific and societal considerations.

DNA markers are distributed across all of the various distinct regions of the genome, which in humans consists of 22 pairs of autosomal chromosomes, the sex chromosomes (XX in females and XY in males), and mitochondrial DNA. Most of the genome is diploid, meaning that there is representation of each nucleotide site from both parents. However, the Y-chromosome of the genome in males, and mitochondrial DNA in both males and females are haploid, meaning that there is only representation from one parent (uniparental). In the case of the Y-chromosome, the DNA sequence including its variable site markers is transmitted only from fathers to their sons. In the case of mitochondrial DNA, the DNA sequence including its variable site markers is transmitted only from mothers to both their male and female offspring. Furthermore, at these uniparentally inherited haploid regions the genome is free of a process called recombination, which does occur at the diploid regions of the genome. Recombination shuffles markers between the two parental copies at corresponding genomic regions. For most of the length of the Y-chromosome (the non-recombining or NRY region) and for the entire mitochondrial DNA, no recombination occurs. Thus, analysis of DNA markers on the NRY region of the Y-chromosome and mitochondrial DNA has emerged as a powerful tool in phylogenetics of male and female lineages respectively. Markers outside of these haploid regions have also been used in genome based phylogeographic analysis. However, the dual inheritance, with biparental presentation together with recombination, renders the interpretation of shared ancestry and phylogenetics more complex and often ambiguous. It should be noted, that when DNA sequence variants anywhere in the genome are disease-predisposing mutations, differences in their frequency among Jewish communities in comparison with non-Jews can contribute to certain health and disease epidemiologic patterns (see *Genetic Diseases in Jews). The current entry will be divided into a description of genomic analysis of Jewish populations along male and female lineages, followed by an integrated overview.

Application of Phylogenetics to Jewish Populations

The molecular principles described above have been usefully applied to the evolutionary studies of humankind as a whole, as well as to the phylogenetics of various populations of interest. These studies address questions related to geographic origins, ancestry, history, migration, and demography of populations. Likewise, it is possible to phrase similar questions with regard to the parental ancestry of contemporary Jews. To do so, it is necessary, first, to delineate accepted nomenclatures and classifications for Jewish communities and second, to clarify how the use of different classes of genetic markers enables distinct questions of interest to be addressed. To this end, contemporary Jews can be considered as descending from two large population groups which had somewhat separate demographic histories during the past approximately two millennia. These are the Ashkenazi and non-Ashkenazi groups, which in turn are comprised of numerous different communities. It is clear that this division oversimplifies the relations and hierarchy between the various Jewish communities. Thus the Ashkenazi population of Europe, which refers to Jews whose recent ancestry traces to Central and Eastern Europe, is, often regarded as one population subgroup, despite clearly being composed of multiple communities. This classification has emerged because of shared adherence to similar religious rituals, liturgical style, and the shared use of the Yiddish language, and geographic location in Central and Eastern Europe. Of relevance to phylogenetics was the practice of a high level of endogamy, wherein Ashkenazi Jews married within the population subgroup. The non-Ashkenazi population subgroup is a much more culturally and geographically diverse population. The majority of the non-Ashkenazi population is composed of communities that resided in the Near and Middle East, North Africa, and geographic locations to which the Jews fled following the Iberian expulsions, beginning in 1492 C.E. These communities share similar religious rituals, most probably due to their presumed common historical origin from a gradual movement of Babylonian Jews, and are sometimes collectively referred to as the "Sephardi (Spanish)" or "Mizraḥi (Eastern)" Jews. In the current entry, we shall adhere to this convention though, where appropriate, based on available information, the term "Spanish exile" will refer to members of Jewish communities descended from the Iberian expulsions, and shall use the term "non-Ashkenazi" when the detailed geographic origin does not permit a more precise description. Moreover, neither the term "Sephardi" nor "Mizraḥi" takes into account some additional Jewish communities such as some of the Italian, Georgian, Yemenite, and Indian communities.

Following the foregoing definitions, two complementary sets of questions arise. First, what is the overall pattern of the contemporary NRY and mitochondrial DNA sequence variation at the level of the entire Jewish population in comparison to non-Jews, and of individual population subgroups or communities? More specifically this set of questions relates to our overall ability to trace recent or contemporary Jewish communities to a particular geographic origin such as the Near East, and allows analysis of parameters such as admixture and gene flow with Diaspora host populations. Second, DNA marker analysis enables clarification of micro-evolutionary mechanisms and events that have shaped the population history of each of the Jewish communities. These include the actual number of founding ancestors, their rate of expansion, their most likely geographic origin, and the level of identity between the various Jewish founding ancestors in different Jewish communities. The answers to both sets of questions are addressed separately for paternal and maternal population history, using the NRY-region of the Y-chromosome and mitochondrial DNA respectively, and in some cases these are expected to yield different patterns.

To gain a clearer understanding of the way in which these questions can be addressed, it is important to clarify the different kinds of DNA sequence variation markers that are available for analysis, and the ways in which they can be combined to generate phylogenetic trees, with different levels of temporal resolution. Haplogroups are generally defined by a series of hierarchically arranged stable variations or polymorphisms in DNA sequence (usually at a single nucleotide site and hence termed single nucleotide polymorphisms or SNPS) that have usually occurred only once in the course of human evolution. These are binary or bi-allelic, since there are only two variants in the human population, rather than multiple different variants. Numerous such binary sites are located throughout the NRY, and when combined they define major haplogroups. Individuals belonging to the same NRY haplogroup share common paternal ancestry at a level of resolution and timeframe that is a function of the number and choice of such binary sites. In the case of mitochondrial DNA, these binary sites are usually located in the portion of the circular mitochondrial DNA genome that is termed the coding region, and these define maternal haplogroups. Haplogroups enable the most basic level of phylogenetic assignment of humans into populations on the basis of shared paternal or maternal ancestry and hence phylogeographic origin. Such haplogroup analysis has been used to trace African origins and subsequent major migration routes for all anatomically modern humans on the planet. In the case of paternal haplogroups, defined by binary markers on the NRY, these have been given designations of major haplogroups A through R, based on the use of a few dozen binary markers, and each such haplogroup can be further refined and subdivided into a hierarchical tree of subhaplogroups, using many additional binary markers. These subhaplogroups are given additional lower case letter and number designations. As an example, NRY haplogroups A and B are dominant in Africa and absent in the Americas. Of relevance to the origin of Jewish populations, the Near East as a whole is populated by a varied mix of major haplogroups among which the most frequent are E and J. Similarly the mitochondrial major haplogroups are designated by letters A through Z, and then again further subdivided using numbers and lower case letters, using additional coding region binary markers. In the case of mitochondrial DNA haplogroups, the major L haplogroup is dominant in Africa and absent in the Americas. Of relevance to Jewish population origins, and as is the case for the Y-chromosome, the Near East is populated with a long list of major mitochondrial haplogroups, among which H, J, T, U, and K are frequent. It is important to emphasize that the most common major haplogroups can be found across very large geographic expanses, and in turn comprise numerous lineages that usually coalesced many thousand years ago. Lineages refer to branches within a given haplogroup or subhaplogroup which can be related to each other by additional classes of DNA sequence variation markers. Many such additional classes of markers exist, and together they are distinguished from haplogroup-defining binary markers in several ways. First, there may be more than two variants – such as in the case of simple tandem repeat markers (STRS) on the NRY. Also, they represent DNA sequence mutation events which may occur at a much more rapid rate compared to haplogroup defining binary markers, and as such may also have occurred at a given site repeatedly many times in human history, as occurs in the D-loop or control region of mitochondrial DNA. Such repeat markers are often said to define haplotypes within haplogroups, or lineages. Thus a phylogenetically defined lineage represents a cluster of related evolving haplotypes within a haplogroup. As noted, a haplogroup at any level of binary marker resolution is composed of numerous such coalescing lineages, whose relatedness can be determined using analysis of haplotype-defining repeat markers. Thus, while documentation or comparison of haplogroup frequencies within or among populations of interest provides important information regarding large but specific geographic origins, this does not effectively allow determination of the real number of ancestral parental lineages that gave rise to the present-day diversity in a population. This can be likened to the hands on a clock, in which haplogroups are like the hour hand, and haplotypes are like the minute hand, and a lineage represents a given number of minutes within the interval defined by the hour hand. There is a slight difference in the way haplotypes are measured and determined for the NRY and for mitochondrial DNA, with a greater emphasis on the use of STRS in the case of the NRY, and use of D-loop sequence variants in the case of mitochondrial DNA. The advantage of using haplotype-defining repeat markers is invaluable in the study of the genomic structure of population groups, since they evolve quickly enough to trace recent historical events from DNA samples of extant living individuals. It is this genomic tool which has provided several important insights regarding Jewish populations, whose demographics and histories had previously been described on the basis of oral tradition, archival records, linguistic and liturgical analysis. Analysis of the genome has provided a complementary tool to these more classical approaches, and yielded additional insights.

Jewish Paternal Ancestry – View from the NRY Markers of the Y-Chromosome

The first recorded studies at the level of the genomic DNA sequence variation appeared in 1993, and compared Sephardi and Ashkenazi Jews in comparison to non-Jewish Czech males. These reported that the two Jewish population subgroups show a great similarity of NRY DNA marker frequencies, and appear to show very little evidence for admixture with host non-Jewish neighbors. Of interest, comparison with Lebanese non-Jews supported the notion of a shared Near East origin for both Ashkenazi and Sephardi Jewish population subgroups examined. Studies over the subsequent decade utilized progressively larger and more diverse sample sets, and a greater number of DNA sequence markers. Taken together this decade of work on the NRY markers strongly supports the hypothesis that the paternal gene pool of Jewish communities from Europe, North Africa, and the Middle East descended from a common Near Eastern ancestral population, and suggest that most Jewish communities have remained relatively isolated from neighboring non-Jewish communities during and after the Diaspora. The two most prevalent major NRY haplogroup affiliations shared among all Jewish communities are those denoted J and E. Further research based on haplogroup markers has shown that, with some notable rare exceptions, the NRY chromosome pool of both Ashkenazi Jews and non-Ashkenazi Jews originates as an integral part of the genetic landscape of the Near East. Further analysis at the haplotype level suggested that the pattern of haplotype differentiation within these shared haplogroups differs between the Jewish population and non-Jewish Near Eastern populations. This is entirely consistent with a shared remote Near East origin but subsequent separation of the ancestors of contemporary Jews from their non-Jewish Near East shared ancestral population. Such separation involved the establishment of a separate ethnic identity and restriction in marital admixture. The separation would have been accentuated by migration of the Jewish population from the Near East and into other parts of the world, during the Diasporas. In others words, the biological events leading to the emergence of the major haplogroups observed in Jews and non-Jews with whom they share common Near East ancestry are much older than the populations in which these haplogroups are found. While the similar and shared Near Eastern background at the haplogroup level predates the ethnogenesis in the region, the haplotype structure is more recent and has evolved after the establishment of the Jews as a population group. To date, the Ashkenazi subpopulation of the Jews has been studied in the greatest detail, though there is a steadily increasing accumulation of comparably detailed genomic information for non-Ashkenazi communities. In the most detailed paternal phylogenetic study of the Ashkenazi to date by Behar and Skorecki in collaboration with an international team of scientific colleagues, a detailed resolution of the haplogroup structure according to the Y Chromosome Consortium recommendations was obtained. Based on the genotyping results, the Ashkenazi haplogroups were divided into the following three categories: major founder haplogroups, minor founder haplogroups, and shared haplogroups. The first two categories included those haplogroups likely to be present in the founding Ashkenazi population (and that now occur at high and low frequency respectively). The latter category is comprised of haplogroups that either entered the Ashkenazi Jewish gene pool recently as the result of introgression from European host populations, and/or that were present in both European and Jewish populations before the dispersal of the ancestral Ashkenazi population through Europe.

Haplogroup E-M35 and haplogroup J-12f2a fit the criteria for major Ashkenazi Jewish founding subhaplogroups, because they are widespread both in Ashkenazi Jewish communities and in Near Eastern populations, and occur at much lower frequencies in European non-Jewish populations. Subhaplogroups G-M201 and Q-P36 show a similar pattern, but are found at lower frequency, and are therefore considered to have been part of the founding paternal Ashkenazi Y-chromosome pool. It has not yet been established if these minor subhaplogroups are shared with non-Ashkenazi Jews. The best candidates for subhaplogroups that entered the Ashkenazi Jewish population more recently via admixture from the neighboring European populations include I-P19, R-P25, and R-M17. Taken together these results confirmed that the majority of NRY haplogroups found among contemporary Ashkenazi Jews originated in the Near East, with an approximately 8% introgression from non-Jewish European populations. Two events of interest seem to have made very specific independent contributions to this minor degree of introgression, and these will be described in the subsequent section. However, overall genomic analysis provides definitive evidence refuting a major contribution to the Ashkenazi Y-chromosome pool of any large scale entry into the population from the Caucasus, the putative geographic location of the Khazarian Kingdom, or from any other European or Eurasian source population. While a study of this detail in non-Ashkenazi communities is still to be done, multiple lines of evidence from the genomic literature strongly support a common Near Eastern paternal origin for all Jewish communities, with low levels of introgression from neighboring non-Jews in the Diasporas. These findings also provided the backdrop for detailed analysis of lineages to clarify demographic patterns and microevolutionary forces that have shaped the detailed population structure of different Jewish communities and Jewish population subgroups. A number of illustrative examples are provided herein.

GENOMIC ANALYSIS OF THE JEWISH PRIEST AND LEVITE CASTES

Phylogenetic analysis is based upon relatedness of individuals within a group. Genetic analysis has confirmed that all of humankind is phylogenetically related as descendents of a common maternal and paternal ancestor. In some societies, extensive records are maintained which document relationships and establish pedigrees extending over many generations, and this information can be used to facilitate genomic studies. While such biparental pedigree information is not available extending back to the early history of the Jewish people, there exists an oral tradition which may provide information about shared paternal ancestry, which has proven to be of interest, and must be taken into account in phylogenetic studies of Jews. In particular, a long-established system of Jewish male tribal or caste affiliation categorizes Jewish men into three groups: Jewish *priests or kohanim, *levites, and Israelites. Within the Jewish community, membership in the male castes noted above, is determined by patrilineal descent. Kohanim are, in biblical tradition, the descendants of Aaron, who along with his brother Moses was a male descendant of Levi, the third son of biblical patriarch Jacob. According to the same tradition, Levites are considered to be those remaining male descendants of Levi who are not kohanim. These categories are recognized and affiliations of individual Jewish males to one of the three castes is widely known in virtually all Jewish communities, including Sephardi, Ashkenazi, and other.

More specifically, self-identification with the Jewish priestly caste reflects an oral tradition of transmission by inheritance from father to son with no halakhically sanctioned mechanism for introgression of males who are not descendents along the paternal line from the founder of this male dynasty. Accordingly, this tradition carries with it specific scientific predictions based on the molecular genomics of the Y-chromosome. Since, as noted, the Y-chromosome is also transmitted from fathers only to their male offspring, it is predicted that the Y-chromosome of historically and geographically dispersed priests should have a significantly greater similarity of DNA sequence markers compared to Y-chromosomes of other groups. Comprehensive clarification of the patterns of paternal relatedness, based on NRY marker analysis, requires combining haplogroup with haplotype analysis, to trace actual lineages. Indeed, several research studies beginning in 1997, and carried out over many years and across several continents, reveal a statistically significant greater degree of similarity of such NRY markers among contemporary Jewish priests compared to other groups tested. This similarity applied equally when tested across Ashkenazi and non-Ashkenazi communities. This finding has been durable and has withstood the test of a decade of verification. Utilization of NRY STR markers, whose rate of change occurs at a surmised rapid pace, enabled the tracing of lineages and also determination of lineage coalescence times, in order to bracket an approximate timeframe for the establishment of this patrilineal Jewish priestly dynasty. Thus for example, using a set of six STR markers (DYS19, DYS388, DYS390, DYS391, DYS392, and DYS393), a single haplotype, termed the Cohen Modal Haplotype, was found to be the most frequent, and to be shared among priests from both the non-Ashkenazi and Ashkenazi communities. The scores (corresponding to the number of repeats in each named STR marker respectively) for this six-STR haplotype are 14, 16, 23, 10, 11, and 12 and are now known to belong to NRY haplogroup J, which, as noted above, is the most frequent haplogroup in the Near East and among Jews in particular. In a 1998 study, the modal haplotype frequencies were found to be 0.449 and 0.561 for the Ashkenazi and Sephardi kohanim, respectively. The corresponding modal frequencies for the Ashkenazi and Sephardi Israelites in this same study were found to be 0.132 and 0.098, respectively. This lower frequency highlights the difference in criteria for overall Jewish affiliation compared to affiliation with the Jewish priesthood. Overall Jewish identity, since at least talmudic times (100 B.C.E.–500 C.E.) has traditionally been acquired either by descent from a Jewish woman, or alternatively by rabbinically authorized conversion, without the need to establish descent from a common male (or female) ancestor. In contrast as noted above, affiliation to the Jewish priesthood was restricted along patrilineal lines of descent. The use of one-step mutation haplotypes, termed the Cohen Modal Cluster, allowed the calculation of the coalescence to the most common recent ancestor by standard accepted mutation rates. This calculation gave an estimate of approximately 106 generations, which for a generation time of 25 years gives an estimated range which brackets a mean of 2,650 years before the present. These results establish the common origin of the Jewish priesthood caste in the Near East, coinciding with a timeframe beginning at approximately the biblically attributed date of the exodus from Egypt and extending to the Temple period. However, it should be noted that such dating estimates are based on numerous inherent assumptions and carry with them a wide error margin. The availability of more binary as well as STR markers for the NRY is now enabling further refinement at both the haplogroup and haplotype levels, and these numerical estimates may change based on future genome analysis. Furthermore, the discovery of a modal haplotype and cluster is based on statistical analysis, and does not permit specific validation of priestly status for a given individual. The latter depends upon cultural, religious, and social considerations which are not related to genome analysis for a given individual.

Of interest, the same studies in 1997 and 1998 found high frequencies of multiple haplogroups in the levites, indicating that no single recent origin could be inferred for the majority of this group, despite an oral tradition of a patrilineal descent similar to that of the kohanim (with some exceptions outlined in talmudic tractate *Bekhorot). This led to a more detailed NRY analysis of the levites. In particular, given the importance of the paternally defined levite caste in Jewish history, together with multiple theories of the ethnogenesis of the Ashkenazi Jewish community, and a suggestion that Yiddish is a relexified Slavic tongue, Behar and Skorecki, together with an international team of scientific collaborators, reported in 2003 a detailed investigation of the paternal genetic history of Ashkenazi levites. They compared the results with matching data from neighboring populations among which the Ashkenazi community lived during its formation and subsequent demographic expansion. The finding clearly demonstrated among the Ashkenazi levites, a major tightly clustered lineage within NRY haplogroup R-M17, which comprises 74% of Ashkenazi levites within this haplogroup and 52% of Ashkenazi levites overall. The presence of the R-M17 haplogroup within Ashkenazi levites is striking for several reasons. Firstly, this haplogroup is found at high frequency in the Ashkenazi levites but not in Sephardi levites, nor any other geographically or religiously designated Jewish grouping examined to date. This means that a large and closely related subgroup of the Ashkenazi levites and the Sephardi levites differ in paternal ancestry. This is a very different pattern from that observed among the kohanim. Second, the STR marker-based haplotypes within this Ashkenazi levite haplogroup form an exceedingly tight phylogenetic cluster, indicative of a very recent origin from a single common ancestor. Coalescence calculation following the same principles used for the Cohen Modal Haplotype point to a founding event that occurred approximately 1,000 years before the present, with the same caveats regarding time estimates based on genomic analysis as were pointed out above. Third, the haplogroup is extremely rare in other Jewish groups and in non-Jewish groups of Near Eastern origin, but is found at high frequency in populations of East European origin. This contrasts with the Cohen Modal Haplotype, which belongs to a haplogroup that is abundant in the Near East. For the reasons stated above, it is likely that the event leading to a high frequency of R-M17 Y-chromosomes within the Ashkenazi levites involved very few, and possibly only one, founding paternal ancestor. The question then arises regarding the possible origins of the founder(s). Haplogroup R-M17 is found at very low frequency in other Jewish groups. It is possible, therefore, that this haplogroup was also present at very low frequency among the levites present within the Ashkenazi founding community, followed by exceeding reproductive success, rendering the descendents of one such Levite, with this rare haplogroup, more numerous. Likewise, the haplogroup is also found at very low frequency within some populations of Near Eastern origin. It is therefore also possible that a conversion event prior to the establishment of the Ashkenazi founding population led to the founding of this haplogroup and its subsequent emergence at high frequency within the Ashkenazi levites. While it is not possible to formally refute either of these two possible explanations, it would be a remarkable coincidence that the geographic origins and demographic expansion of the Ashkenazi levites are within northern and eastern Europe and that this haplogroup is found at very high frequency within neighboring non-Jewish populations of European origin, but not at high frequency elsewhere. An alternative explanation, therefore, would postulate a founder(s) of non-Jewish European ancestry, whose descendents were able to assume levite status. While neither the NRY haplogroup composition of the majority of Ashkenazi Jews nor the STR haplotype composition of the R-M17 haplogroup within Ashkenazi levites is consistent with a major Khazar or other European origin for the Ashkenazi community, as has been speculated by some scholars, one cannot rule out the important contribution of a single or a very few individual male founders from the Khazarian or another Eurasian population group among contemporary Ashkenazi levites. A similar study focusing on non-Ashkenazi levites is yet to be carried out, and will no doubt shed additional light on the detailed paternal lineages comprising contemporary levites.

DUTCH JEWS AND LEMBA

Two additional illustrative examples of geographic rather than caste designation can be given wherein genomic analysis of NRY marker variation has provided insights of relevance to Jewish population history. NRY analysis of Ashkenazi Dutch Jewish males has shown that approximately 25% of their NRY chromosomes belong to the most prevalent haplogroup in Western Europe and one that is rare in the Near East, R-P25. Therefore, when various indices of genetic distances are measured between this Ashkenazi community and the non-Jewish host population, greater similarities are observed, reflecting more substantial male-origin gene flow from the host population to the Ashkenazi Dutch community. This is consistent with greater religious tolerance which may have characterized Dutch society. Interestingly, the pattern of this possible introgression is different from that observed for the R-M17 haplogroup described for the levites. The genetic distances between the haplotypes comprising haplogroup R-P25 in contemporary Ashkenazi Dutch Jews coalesce prior to the migration of Jews to Europe and therefore are likely explained by repetitive introgression events (admixture) of European non-Jewish males into this community. Another group of interest has been the *Lemba tribes of Southern Africa. While not identified as Jews in religious or halakhic terms, these individuals relate an oral tradition of descending from a group of men who migrated via the Hadramout from the ancient kingdom of Judea in the Near East. Following their eventual settlement in their current villages, located in modern-day South Africa, Mozambique, and Zimbabwe, the Lemba founders are said to have intermarried with local Bantu-speaking women, and to have adopted the language and many cultural practices of their neighbors. However, they also maintained some traditions, reminiscent of a Near East and Jewish origin. Genomic analysis of NRY markers at the haplogroup and haplotype level indeed confirmed a pattern of admixture, with clear-cut evidence of Y-chromosomes of Near East origin in a substantial number of Lemba males, with frequencies approaching those found in some Ashkenazi and Sephardi Diaspora Jewish communities, with a strikingly high frequency of Lemba males with the Cohen Modal Haplotype. These are virtually absent among the non-Lemba neighboring populations. More detailed STR-based lineage and coalescence analysis with a large number of markers could provide additional insights of historical interest.

Additional studies have been done, and are continuing to focus on the mechanisms that shaped the population genomic structure of the remaining majority of Jewish groups and communities. Questions of special interest amenable to this type of analysis include these: how limited is the number of founders which gave rise to the contemporary global Jewish population? Do Ashkenazi and various non-Ashkenazi Jewish populations share overlapping or distinct founding lineages? Can geographic origins for each of the Jewish haplogroups be determined with greater accuracy? Studies carried out between 2002 and 2004 have provided some initial information in this regard. By focusing initially on the Ashkenazi population and investigating the STR marker variation within each of the founding haplogroups, Behar and Skorecki, together with an international group of scientific collaborators, confirmed previous findings that Ashkenazi Jews show high levels of haplogroup diversity compared with their non-Jewish counterparts. However, a vastly reduced number of haplotypes within Ashkenazi Jewish haplogroups, as well as reduced haplotype variance within haplogroups, was clearly observed. What do these contrasting patterns tell us about the possible role of a bottleneck in the Ashkenazi population? Despite the fact that Ashkenazi Jews represent a recently founded population in Europe, they appear to derive from a large and diverse ancestral source population in the Near East, a population that may have been larger than the source population from which European non-Jews derived. This is consistent with the finding that contemporary Ashkenazi Jews display higher levels of haplogroup diversity than European non-Jewish populations. The reduced haplotype diversity within Ashkenazi Jewish haplogroups compared to non-Jewish populations may be the signature of a founder event/population bottleneck in the Ashkenazi population history. Indeed, the extremely low STR-based haplotype diversity of some of the less frequent founding haplogroups (e.g., NRY haplogroups R-M17, Q-P36) suggest a single male lineage expansion comprising most or all of these and other haplogroups in Ashkenazi Jews. Comparable analyses have yet to be carried out for the many non-Ashkenazi communities. In addition, the study demonstrated that the many different Ashkenazi communities in Central and Eastern Europe cannot be readily distinguished from each other either at the haplogroup or haplotype level, based on genetic markers at both the haplogroup and haplotype levels. This can be attributed to a common origin from a shared ancestral deme and due to continuous migration among the Ashkenazi communities, and is entirely consistent with non-genetic disciplines identifying all Ashkenazi communities as a relatively homogeneous population.

Jewish Maternal Ancestry: View from Mitochondrial DNA

The available data on the maternally inherited mitochondrial DNA in Jewish communities is still scant, but is being collected at a rapid rate as DNA sequencing and genotyping technology improves, and is also fueled by the interest of the public in genealogic questions. An initial study, which focused only on a region of the D-loop of mitochondrial DNA known as hyper-variable sequence 1 (HVS-1), demonstrated greatly reduced mitochondrial DNA diversity in the Jewish populations in comparison with the host populations, together with a wide range of different modal haplotypes specific to each of the different communities. The results indicated specific founding events in the Jewish populations. A simple explanation for this exceptional pattern of mitochondrial variation across Jewish populations was that each of the different Jewish communities is composed of descendants of a small group of maternal founders. After the establishment of these communities, inward gene flow from the host populations must have been very limited. As the study focused on haplotype diversity and did not include deep haplogroup analysis, a putative origin of each of the founding lineages was not suggested. A subsequent study conducted by Behar and Skorecki, together with an international group of scientific collaborators, focused in greater detail on the Ashkenazi population using a large set of samples from descendents of numerous communities across Europe, and utilized markers which permitted deep phylogeographic analysis at the mitochondrial haplogroup and haplotype levels. The analysis of Ashkenazi mitochondrial sequence variation portrays a pattern of highly reduced diversity, with an unusually large proportion of haplotypes that are unique to the Ashkenazi gene pool, and a reduction in frequency of rare haplotypes and singleton sites compared with both European and Near Eastern populations. At the haplogroup level, the Ashkenazi mitochondrial DNA variation was found to have a number of peculiarities. For example, in two separate studies nearly ten years apart, haplogroup K appears as the most common haplogroup, with its frequency almost an order of magnitude greater than among European or Near Eastern non-Jewish populations. More detailed sequence analysis enabled the construction of mitochondrial DNA-based phylogenetic networks, which resolved the haplogroup K samples into three separate lineages, whose phylogeographic origins are thought to antedate by far the founding of the Ashkenazi population. Furthermore, mitochondrial DNA haplogroup N1b, rare in most European populations, was found to comprise nearly 10% of the Ashkenazi mitochondrial DNA pool, and strikingly, haplotype analysis of this N1b haplogroup in Ashkenazi Jews revealed only a single lineage. These Ashkenazi mitochondrial DNA lineages were virtually absent from surrounding non-Jewish populations, and therefore provide a genetic signature of the Ashkenazi maternal gene pool, and bear witness to the strong effects of genetic drift acting on this population. Similar to the observation for male ancestry based on Y-chromosome analysis in the Ashkenazi population, the mitochondrial DNA results also show that the various Ashkenazi communities throughout Central and Eastern Europe cannot be readily distinguished from each other, likely reflecting shared recent origins from a common small ancestral deme, followed by continuous migration among the Ashkenazi communities.

Micro-Evolutionary Mechanisms that Have Shaped Mitochondrial DNA Sequence Variation in Jewish Communities

Based on the foregoing, and with the development of advanced technological approaches to facilitate DNA sequence analysis, the highest possible level of maternal phylogeographic resolution can be obtained from compete sequencing of the entire approximately 16,500 nucleotides of mitochondrial DNA from samples of interest. Recent studies by Behar and Skorecki and their international scientific collaborators, as well as other research groups, are utilizing such an approach in an attempt to shed light on the absolute number of individual women who gave rise to the lineages among Ashkenazi Jews, to shed light on their putative origin. Based on the complete sequencing analysis in Ashkenazi Jews and existing complete sequences from non-Jews, the exact phylogenetic branches in which the Ashkenazi lineages could be traced were identified. The new information was used to screen a global set of haplogroup K samples to include or exclude them from these Ashkenazi lineages. The results showed that the Ashkenazi lineages were virtually absent in other populations, with the important exception of low frequencies among non-Ashkenazi Jews. These results indicate that the three Ashkenazi haplogroup K lineages are virtually restricted to this population, and are likely to be of Near Eastern rather than European origin. The same approach was followed for mitochondrial DNA haplogroup N1b, and concluded that for this haplogroup all samples belong to one expanding lineage. Taken together, these four lineages indicate that four individual women gave rise to fully 40% of contemporary Ashkenazi Jews, or approximately 3.5 million people. The coalescence times for the expansion of these four lineages coincide well with the historical timeframe of less than 2,000 years for Ashkenazi population expansion from a small founding deme, providing the most powerful and detailed information about the maternal Ashkenazi population founding event. Similar studies in non-Ashkenazi Jewish communities remain to be carried out, and should provide comparable information regarding absolute numbers of founding maternal lineages, as well as their approximate founding dates and possible ancestral locations.

Integration of the Paternal and Maternal Genetic History

Taken together, the data available from Y-chromosome and mitochondrial DNA phylogenetic analysis of Jewish populations has been very informative in uncovering patterns and mechanisms that complement information gleaned from more conventional historical, linguistic, archival, liturgical, and archeological approaches. Furthermore, NRY and mitochondrial DNA markers continue to be used to seek possible Near East origins for communities which claim shared remote ancestry with the majority of Jewish population groups (so-called "Lost Tribes"). At the population level it seems that the genetic histories of the maternal and paternal ancestors tell different stories about population genomic structure of the Jews. Y-chromosome genomic analysis strongly points to a common origin in the Near East while the genetic data from the mitochondrial DNA point to separate local events with a putative geographic origin that might or might not be in the Near East. Y-chromosome and mitochondrial DNA analyses are congruent in suggesting that a limited number of founding ancestors gave rise to the various Jewish communities, with remarkably low levels of introgression from the host populations. It is also clear that many questions remain unanswered and the scope of future studies is potentially very large. Data on the non-Ashkenazi population is needed to answer more accurately questions pertaining to the mechanisms that have shaped each of the communities and the possible connection among them and with the Ashkenazi and host populations. It is important to note that information gleaned from the study of the haploid regions of the genome provide information that is of relevance to population level genomic effects. Population level effects, such as founder and bottleneck events, influence overall patterns of DNA sequence variation across the genome as a whole. Thus a founder effect, followed by population expansion, may lead to the drift to high frequencies of specific disease-predisposing or phenotype-modifying sequence variants at other parts of the genome. However, they do not substitute for direct analysis at these diploid and autosomal regions of the genome in ascertaining mutations. Furthermore, recombination, which characterizes the pattern of inheritance at the diploid regions of the genome, accounts for the influence of even small degrees of admixture of Jews with their non-Jewish neighbors on diverse traits or phenotypes that are determined by DNA sequence variation throughout the genome. This partly explains some of the differences in physical features that may be noted among Jewish communities, despite common ancestral origins, and high levels of intra-community endogamy. Interestingly, recently it has been shown that in other parts of the genome as well, there may be regions of limited recombination, or regions in which DNA sequence variation markers are inherited in a block like pattern. This finding may open up the ability to utilize such diploid regions to enhance our understanding of population genomic history, especially with respect to disease predisposition. The potential implication of findings such as paucity of ancestors and their possible effect on other parts of the genome, especially those relevant for diseases prevalent among Jews, remains an important continuing frontier for study with respect to genomic analysis of Jewish populations. These questions are particularly important for the Ashkenazi community in which the reasons for the well-documented excess of rare recessive disorders have been repeatedly discussed without a definitive resolution. It is anticipated that future studies integrating analysis of the haploid genomic regions and other genomic regions such as the X-chromosome and the autosomes will be complementary and shed additional light of historical and population health importance. The future holds great promise in clarifying these important chapters in the history of the Jewish people.