20 minute read

Genetic Diversity in the Arabian Horse by Beth Minnich, with Michael Bowling

Next Article
by Kathy Busch

by Kathy Busch

Genetic Diversity & Complex Ancestry in the Arabian Horse

by Beth Minnich and Michael Bowling

Advertisement

DNA as a Storyteller

The Arabian horse possesses a rich history, intimately connected “All of us in trying to understand today and to the physical environment of its ancestral homeland and culture tomorrow look to yesterday; to try to get some of Bedouin caretakers who nurtured its development. It is unclear glimpse of what came before us and some when and where the horse was introduced to the area historically lessons that can help us meet the future.” known as ‘Arabia’. Yet, from a regional proto-Arabian that was developed over millennia, the nomadic horse breeding tribes culti- Kenneth W. Harl, PhDvated the foundation of the breed now known as the Arabian horse. Professor, Dept of History / Tulane University

From the mapping of the horse genome in 2007, genomic tools were developed that provide access to information offering expanded perspectives for viewing breed ancestry. These new observations provide a gateway to deeper understanding of the Arabian breed, helping to connect the threads of culture, history and genetics. With these threads, we can weave a more detailed tapestry depicting the origins of the Arabian horse, to help guide preservation efforts of this iconic animal.

To help tell this story, we turn to a paper entitled ‘Genome Diversity and the Origin of the Arabian Horse’ in the journal Scientific Reports (June 2020). Conducted by an international team of researchers from the USA, Qatar, Iran, Austria, Poland and Hong Kong, the project sought to study genomic diversity in the Arabian horse and its relationship to other breeds, particularly the Thoroughbred racehorse. Led by Dr. Samantha Brooks (University of Florida) and Dr. Doug Antczak (Cornell University), the study used a large number of Arabian horses of diverse origin and bloodlines. Project funding was provided by a grant from the Qatar National Research Fund (a member of the Qatar Foundation) to study the genomes of the Arabian horse, Arabian oryx and Dromedary camel, three species with deep cultural ties to the Middle East. Project support was also received from several additional organizations, including non-financial support from the Arabian Horse Foundation.

There are several storylines shared in the paper, so there is a lot to conceptually sort through. Due to space limitations, this article will focus on the genetic diversity component of the project. But for interest, the main summary points across the study include: 1. The Arabian breed has a unique genetic profile marked by broad variation and underlying complex ancestry consistent with an ancient origin: • Globally, Arabian horses display a large degree of genetic diversity, more than many other breeds of horse. • Registered Arabian horses were identified in the Middle East that carried expanded genetic and phenotypic diversity. • Straight Egyptians have a distinct genetic signature and less genetic variation than other Arabian bloodline groups considered. 2. Genomic regions were identified that may be associated with important traits of the Arabian horse, such as head shape and athletic ability. 3. Little overall genetic similarity of Arabians to Thoroughbreds was detected, including lack of evidence for Arabian stallion Y chromosome ancestry. 4. Strong evidence was found for recent interbreeding of Thoroughbreds with Arabians used for flat racing.

Before we begin this journey, some background needs to be provided. DNA samples were considered from 378 Arabian horses of diverse

bloodlines from Qatar, Iran, UAE, Poland, USA, Egypt, Jordan, Kuwait, United Kingdom, Australia, Denmark and Canada. This data set was then expanded to include additional samples from past studies on Arabians and other breeds; with the final data set including genotype data for 917 samples from 19 different horse breeds. Relatedness of horses sampled was also evaluated, to remove closely related horses and ensure a diverse group of pedigrees.

This study posed two primary questions. What is the relationship of the Arabian to other horse breeds, and how diverse are Arabians? Before getting to the answers, it will be helpful to meet some of the main characters involved. • The first one to become acquainted with is a SNP (pronounced ‘snip’), which stands for ‘single nucleotide polymorphism.’ These single base changes in the genetic code, passed down through the generations, are used to look for genetic variation within a population to identify genetic similarity or dissimilarity between individuals (see Figure 1, at right). • Next is a Principal Component Analysis (PCA), which is a method that measures relatedness between individuals, by using genetic variation at SNP markers across the genome. This analysis used a genotyping tool called a ‘SNP chip’ that assesses markers at 640,000 unique locations in the horse genome. PCA works by allowing the computer to identify the patterns of relatedness based on the DNA markers, with no consideration of the pedigree. • And finally, how do you read a PCA plot? Each symbol represents a Figure 1. Modified from https://whatisdna.net/wiki/single-horse and each color indicates a known population (i.e. a breed or subgroup nucleotide-polymorphisms/within a breed as identified by the registration of the horse.) The spacing between individuals indicates genetic relationship; with loose clusters showing genetic diversity within the group and tight clusters representing genetically similar individuals. In addition, different groups that are more closely related will cluster nearer to one another.

Story #1 – What is the relationship of the Arabian to other horse breeds?

Referencing Figure 2 (PCA Plot A), below, the Arabian grouping is shown in purple and forms a separate group, far removed from most of the other breeds evaluated. The closest breed groups in this analysis include the Dareshouri (light blue) and Kurdish breeds from Iran (khaki green), with the furthest group being the Thoroughbred (black). Arabians also form a broad cluster occupying much of the left-hand side of the plot, indicating a high degree of genetic variation within the global Arabian horse population. Of note, even with the breed’s wide distribution across the globe, the Arabian breed overall maintains a unique genetic identity separate from the other breeds.

Story #2, Part 1- How much diversity is there in the Arabian breed?

Figure 2 (PCA Plot A). The Arabian is a distinct breed with diverse lineages, having little apparent relationship to the

Thoroughbred. Principal component analysis of 378 Arabian horses sampled in this study among a reference set including samples from 18 additional global breeds, with symbol shape indicating data source and symbol color indicating breed. Cosgrove, EJ, et al. Genome Diversity and the Origin of the

Arabian Horse. Sci Rep. 2020

Jun 16;10(1):9702. http:// creativecommons.org/licenses/ by/4.0/

A

DNA vs. the Historical Record

Genomic studies are informative for showing relationships, but it is still human interpretation that further defines the information. In other words, there is ‘data’, and then there is ‘interpretation of the data’. So, what happens when the DNA appears to tell a different story from what is outlined in the historical record? Although it is not surprising the Arabian horse forms its own unique group, what is surprising is the lack of genetic similarity shown between the Arabian and the Thoroughbred. This result is counterintuitive to the extensively documented history of the two breeds. So why does this dissimilarity exist?

Option #1: The Arabian provided little to no contribution to the founding of the Thoroughbred.

This is an easy conclusion to draw, but the historical record indicates otherwise. Does commonly recited breed history overestimate the contribution of the Arabian? Probably. But even with errors in documentation and designation of ‘Arabian horse’ given to horses coming from the Middle East, regardless of whether Arabian or other Oriental breed – is the breadth and depth of the historic record that far off the mark?*

Option #2 – History isn’t wrong. But if this is the case, why are the two breeds clustering far apart?

The Thoroughbred has a very old closed studbook, with hundreds of years of intense selection behind it. When these elements are added to the scenario, the separation between the groups becomes easier to account for: • The Thoroughbred did not continue to sample from desert sources. As a result, genetic drift, coupled with intense selection, will move the breeds apart. As such, an argument can be made that a representative sample of Thoroughbred founding ancestors and early-era horses might not cluster near current-era Thoroughbreds either. • The Thoroughbred experienced continued positive selection towards performance traits that need not have come from the Arabian lines. • The Arabian horse has also experienced 400 years of selection with an unknown number of population bottlenecks due to wars and migrations. The modern Arabian is also likely to be different from the population sampled in early Thoroughbred breeding. • Additionally, a reason for the lack of evidence for modern Arabian stallion Y chromosome ancestry in the Thoroughbred could be that the foundation stallions came from Arabian male lines that have since died out.

Past genetic studies using blood typing markers and STR (simple tandem repeat) data from parentage analysis did show Arabian and Thoroughbred as each other’s closest match (e.g. Bowling 1992). Blood typing involves functional genes, so the range of potential variation would be limited by natural selection; this system could reflect an ancestral connection while the rest of the genome varied. STRs sample fewer than 20 sites in the genome and were chosen to be informative for parentage purposes; they must be less representative of overall variation than are SNPs.

It is clear modern-era Thoroughbreds and Arabians are not genetically similar across most of their genomes. For comparison purposes, hopefully DNA samples from historic horses (such as Eclipse) can be accessed and included in the analysis. In the bigger picture, it is important the narrative developed from these findings takes into consideration the historical record (flaws and all) along with the genomic data.

*These comments are specific to the PCA analysis. The Y chromosome component of the study involves a different set of issues that require a separate article to cover.

As previously mentioned, the Arabian horse overall displays a large degree of genetic diversity across bloodlines, more than many other breeds of horse. Looking within the Arabian breed cluster, Figure 3 (PCA Plot B), opposite, identifies several lineage subgroupings, including Bahrain, Iran, Poland, Saudi Arabia, Syria and Tunisia, along with Straight Egyptians, and Arabians with multi-origin ancestry. In general, horses from the same lineage group cluster together, with differences noted between all the Arabian subgroups examined in the study. Of note, three subgroups segregated strongly: Straight Egyptian (green), Poland (red) and Saudi Arabia (orange), with Straight Egyptians being notably far from the main Arabian cluster. Multi-origin lineages common to main studbook horses in the USA and Europe (light blue) are found across the plot, with some horses located closer to the Thoroughbred (black) cluster.

There is also evidence of relatively high inbreeding within some individuals, especially in the Straight Egyptian subgroup — which had a group mean inbreeding coefficient (F) of 30%. The F value is the probability of inheriting two copies of the same allele from an ancestor who appears on both sides of the pedigree. These alleles are also referred to as ‘identical by descent’. This is an important finding, since the paper notes the level of inbreeding “may be reaching levels sufficient to impact animal health.” Even with relatively diverse pedigrees, high inbreeding values were also observed within individual horses of the multi-origin subgroup; though as a whole, the group had a much lower level of inbreeding (F = 14%.) The high inbreeding values may be a result of historical population bottlenecks, along with impacts from Popular Sire Effect. The F values for other lineage subgroups include Iran (12%), Poland (14%), Bahrain/Syria/Tunisia (17%), and Saudi Arabia (20%).

Genome-wide heterozygosity was also measured, ranging from a high of 33% for the Iran subgroup to a low of 26% for the Straight Egyptian subgroup. A locus (plural, loci) is the single physical location of a specific gene or marker on a chromosome. Heterozygosity, which is an indicator of genetic variation, measures the proportion of loci that have different alleles. Higher heterozygosity means more genetic

Figure 3 (PCA Plot B). The Arabian is a distinct breed with diverse lineages. Principal component analysis of 378 Arabian horses sampled in this study with 71

Persian Arabian and 11 Turkemen samples, and 17

Thoroughbred samples collected in this study, with symbol shape indicating breed, and symbol color indicating Arabian breed lineage, except for the Thoroughbred and Turkemen groups. Cosgrove, EJ, et al. Genome Diversity and the Origin of the Arabian Horse. Sci Rep. 2020 Jun 16;10(1):9702. http://creativecommons.org/licenses/ by/4.0/

B

variation within that single genome. Mean heterozygosity values for other lineage subgroups include Poland (32%), Multi-origin ancestry (32%), Bahrain/Syria/Tunisia (31%), and Saudi Arabia (30%).

Story #2, Part 2 – Complex ancestry in the Arabian breed

To dig deeper into the story of genetic diversity within the Arabian breed, we shift from the use of SNPs to construct PCA Plots, which identify relatedness among groups — to the use of SNPs to construct a STRUCTURE Plot (Figure 4, below), which indicates genetic ancestry of individuals within a group. Similar to DNA ancestry reporting done in humans, the STRUCTURE Program calculates the proportion of an individual’s genome that originates from each of the genetically distinct source groups included in the analysis. The benefit of this type of plot is that it provides a look at genome diversity and ancestry at different levels.

Figure 4 (STRUCTURE Plot). Individual lineages of the Arabian breed display complex ancestry. STRUCTURE cluster assignments are plotted for number of clusters K = 2, 5, 8, and 11 (top to bottom panels). Each cluster in a given analysis (panel) is represented by a separate color.

The plotted 312 samples represent Arabian breed subgroups as well as Turkemen, Icelandic and Thoroughbred breeds. The purple bar and asterisk mark the cluster of multi-origin samples showing shared ancestry with Thoroughbred samples. Cosgrove, EJ, et al. Genome Diversity and the Origin of the Arabian Horse. Sci Rep. 2020 Jun 16;10(1):9702. http://creativecommons. org/licenses/by/4.0/

To read a STRUCTURE Plot, each column represents an individual horse and each color represents a source group defined only by the DNA marker data and the computational analysis. A column with a single color indicates simple ancestry, while multiple colors in a column indicates complex ancestry. The basic concept is SNPs occur at different frequencies in different populations and have their own patterns of distribution among breeds and breed sub-groups. By analyzing horses included in Figure 3 (PCA Plot B), clusters of genetic similarity were identified, with each cluster assigned a different color. The K designations refer to the number of clusters selected for evalu- ation, i.e. K=2 means inclusion of two ancestral groups (colors), K=4 means inclusion of four ancestral groups, etc. For this study, the computer identified 11 clusters across the group of horses examined as the mostly likely description of the diversity captured in this group.

Figure 4 (STRUCTURE Plot) demonstrates horses from Syria, Bahrain, Tunisia, and a portion of the Iran Arabians show complex ancestry, meaning there is a high level of genetic diversity. Conversely, the Straight Egyptian and Polish groups are quite homogenous, indicating a lack of genetic diversity created by historically closed and focused breeding programs. As K values increase, the components of ancestral origin can be seen for the various Arabian lineage subgroups. This type of visualization is valuable for better understanding breed ancestry and origin, especially when trying to identify ancestral components for purposes of preservation.

When thinking about the establishment of Arabian horse breeding programs outside the cradle countries, the homogeneity seen in the plot for the Egyptian and Polish sub-groups makes sense. Even with the volumes of information written about the diaspora of the Arabian horse from its ancestral homeland, only a small number of horses left; most of the population stayed in the region. This sets the stage for the difference in levels of genetic diversity found in the global subpopulations of the Arabian horse. This loss of genetic diversity has been driven by 1) the limited number of exported horses (founder animals), and 2) strong selection by breeders to achieve a specific ‘breed standard’ that was developed outside of the Arabian horse’s native region.

Since source groups of ancestry can be much older than the formation of breeds, the STRUCTURE Plot can also indicate the amount of historic shared ancestry between breeds. For comparison purposes, several out-groups were added to the analysis, including samples from Turkemen, Icelandic and Thoroughbred horses. The Turkemen grouping included Akhal-Tekes, as well as horses from the closely related Yamut breed. Not surprisingly, the Thoroughbred and geographically isolated Icelandic breed appear relatively homogenous.

Of great significance, this study identified Arabian horses from homeland countries that clustered with the Arabian breed but carried expanded genetic diversity. These horses show higher levels of variation, compared to the progeny of exported Arabians in other regions of the world. In particular, the Syrian, Bahraini, Tunisian and Persian (Irani) subgroups evaluated in this project displayed what the paper describes as “a high degree of genetic variation and complex ancestry” (see Figure 5, below). The paper also notes this finding is supported by previous work from other research groups that detected higher genetic diversity in Arabian horses from the Middle East than elsewhere.

At first glance, these STRUCTURE Plots can be mistakenly interpreted as meaning ‘complex ancestry’ is akin to being a ‘mutt’. However, a population’s center of diversity is one estimate of its point of origin; prior to tight selection for breed creation or going through a genetic bottleneck, there is a lot of genetic diversity [see sidebar]. As noted in the paper, “these desert-bred Arabian horses have a diversity of physical characteristics and increased genetic diversity typical of a landrace; this includes the breed’s ability to thrive in a hot, dry environment. Yet, these horses still clearly cluster genetically with other modern Arabians. The increased diversity seen in these subgroups is consistent with a Middle Eastern origin for the modern Arabian horse.”

While some clusters are readily identifiable, for some multiancestry horses there are large proportions of the genome that do not yet have a historical label. To put it another way, at this point, some of the colors shown (i.e. dusty rose and periwinkle) have not yet been associated with a single modern population. They may be rare, inaccessible, or possibly extinct. Another cluster of interest, that is also not yet identified, is the ‘chocolate brown’ group — which is important for many of the older racing Arabian lines that do not show as much Thoroughbred admixture. The chocolate brown cluster also shares ancestry with some small proportions of the Thoroughbred genomes. Hopefully, further study will yield additional information to see where these clusters fit in the origin of the Arabian breed and shared ancestry with other breeds.

From Plots to Practical Applications

Having had a quick journey through some of the storylines of the study, how can this information be effectively used by Arabian horse breeders?

What is genetic diversity and why does it matter?

Genetic diversity is a measure of variation that can be used as an indicator of the genetic health of a breed or individual, including potential impacts on the immune system and reproductive fitness, and incidence of inherited disease.

Figure 5: High-diversity Arabian breed sub-groups

Image courtesy of Dr. Samantha Brooks, as adapted from Cosgrove et al. 2020.

The challenge for any breed, from the time of its establishment, is loss of genetic diversity has already begun. This loss comes from numerous factors, with the foremost being the concept of ‘artificial selection’, where only a limited number of specifically chosen animals are used for breeding. Additional causes include selection pressure for specific traits desired in the breed, closed populations, genetic drift, historical genetic bottlenecks, inbreeding, and the Popular Sire Effect. Small closed groups are especially impacted, so working to maintain beneficial genetic diversity among a breed’s subgroups is critical. o Genetic drift is a change in the frequency of a gene, owing to chance rather than selection. Population size influences both the rate of genetic drift, and the likelihood of inbreeding in the population; small groups tend to lose genetic diversity more quickly than large ones. Additionally, smaller population size means individuals are more likely to breed with close relatives. Notably, in small closed groups individuals will be more closely related to each other, compared to individuals in the previous generation. This results in both inbreeding and drift reducing genetic diversity. o Genetic load is the presence of unfavorable alleles in a population, which decreases the fitness of the average individual. Although deleterious recessive alleles exist in all groups and each individual, increased inbreeding raises the chance of any single individual inheriting two genes at a locus from a common ancestor. Since recessive alleles are exposed by homozygosity, narrowing a program to aim at a specific goal is a double-edged sword; more type consistent foals are produced, but heterozygosity is reduced, and this can bring to light harmful recessive traits. Inevitably, when the allele frequency gets high enough, carrier to carrier matings will occur. While judicious use of genetic testing can eliminate the production of foals affected by some heritable diseases, only a small number of genetic disorders are currently testable. o Two individuals with identical pedigrees may not have the same genetic load.

All domestic species carry genetic load and have some level of inbreeding, but these levels will be influenced by breeding decisions. Breeders working in small closed groups need to take pedigree, conformation, and health into consideration. Using health as a selection criterion brings a heterozygous advantage, helping to keep DNA based inbreeding coefficients level.

Complex Ancestry and Breed Origins

While the Arabian horse is a domesticated breed, it also shows some historic landrace characteristics. A landrace is “a local variety of a species having distinctive characteristics arising from development and adaptation over time to conditions of a localized geographic region and typically displaying greater genetic diversity than types subjected to formal breeding practices.” A breed is “a group related by descent from common ancestors and visibly similar in most characteristics.” The differentiation between the two is useful when viewing the origins of the modern-day Arabian horse. From its beginnings, the Arabian horse has been a horse of utility. The focus of Bedouin Arab horse breeders was production of horses able to survive the severe living conditions of the region and that were fast, and therefore good for raiding. In combination with the camel, the horse was an integral part of the daily life and survival of the Bedouin. Because this functionality was the desired goal, a variety of physical types are found among Bedouin-bred horses. In other words, selecting for a single physical type was not part of the Bedouin breeding tradition. For interpreting genetic diversity and complex ancestry data, it is helpful to view original desert-bred foundation stock through the lens of being more of a landrace, rather than a breed. For some correlation to studies on human genetic diversity: There is greater genetic diversity in Africa than in the rest of the world combined, due to only small samples of human genetic diversity having been represented in migration out of Africa. Of particular interest, the Khoisan people from Southern Africa have the greatest nuclear-genetic diversity among all human populations. This same concept applies to the Arabian horse, with genetic diversity being greater in the cradle countries than in countries outside the region. Similarly, only a small number of Arabian horses were exported, with most of the population remaining in the region. • An important take home point is homozygosity does not equal ‘breed purity’. In fact, this skewed concept of ‘purity’ is directly at odds with horses in the Middle East exhibiting increased genetic diversity and complex ancestry. • As mentioned in this article’s opening, the Arabian horse is intimately connected to the physical environment of its native region and the culture of the Bedouin Arab horse breeders. The Arab Bedouin tribes defined ‘Arabian horse’ and that definition was based on a framework steeped in their cultural values. An entire series of articles can be written on the Bedouin notion of authenticity and the standards of rasan (strain) and the marbat. The origins of the breed must be viewed within these parameters and not from a Western idea of ‘purity of blood’ — an idea that originated as a religious construct, then morphed into a racial ideology. From a Bedouin perspective, the notion of breed purity is more cultural than genetic, and this perspective must be the primary consideration.

This article is from: