19 July 2016

To SNP or not to SNP, That is the Question


I wrote the following article for the The Bulletin, published by the Genealogy Forum of Oregon (Portland) 16 Jun 2015 as part of a series of DNA lessons for the membership. I am asked rather often about taking a SNP test.  

To SNP or not to SNP, That is the Question
by Emily D. Aulicino

Every plant and animal has a phylogenetic tree, including humankind, of course. A phylogenetic tree shows the inferred evolution of a species. Genetic genealogists often refer to the human phylogenetic tree as the haplogroup tree. There are haplogroup trees for the all-male and all-female lines. Haplogroups are decided through testing either the Y-chromosome DNA or the mitochondrial DNA. Testing the full mitochondria provides the haplogroup in detail, and no other testing is needed. However, further testing is needed to fine-tune a haplogroup for the Y-chromosome DNA (“Y-DNA”); therefore, additional SNP testing is done only for the Y-chromosome.
Many people who are new to genetic testing for genealogy are confused by the terms STR (short tandem repeat, pronounced by the individual letters, S-T-R); SNP (single nucleotide polymorphism, pronounced SNiP); haplotype (DNA results – explained further below); and haplogroup (a group of related haplotypes constituting a twig on the world family tree). That is, STR marker results make up a Y-chromosome haplotype or Y-test results, and a group of haplotypes who share the same common SNP form a haplogroup. Knowing these terms will help the researcher more clearly understand the various Y-DNA tests and how they relate to genealogy.

STRs and Haplotypes
An STR is a short pattern of the four bases in our DNA, namely adenine (A), cytosine (C), guanine (G), and thymine (T) repeated in tandem. The number of times this pattern is repeated determines a marker result for a Y-STR test. For example: GATAGATAGATA is a pattern repeated three times. Thus, the marker result would be “3” on a report. The repeating pattern can be two to five bases long. Each marker has a range in which it repeats. For instance, DYS 393 is an area on the Y chromosome known to repeat its pattern from 9 to 17 times (normally), so the result of that marker in a tested person could be any number from 9 to 17. Y-DNA test results are determined by the number of STRs or short tandem repeats on different places on the Y chromosome. The test results are referred to as the DNA signature or haplotype.


An example of a Y-DNA STR Result

Note that Y-37, Y-67, and higher number Y tests are really Y-DNA STR tests, but most people just refer to them as Y-DNA tests, thus adding to the confusion. The number after the “Y” indicates how many STRs are being tested.

SNPs and Haplogroups
A SNP is the most common type of genetic variation among people. Each SNP represents a difference in a single DNA building block, called a nucleotide, which is also comprised of one of the four bases in our DNA, among other things. For example, the base cytosine (C) may be replaced with the base thymine (T) in a certain stretch of DNA (public domain information from the National Library of Medicine [NLM]). To be classified as a SNP, a change must be present in at least one percent of the general population.
`SNPs have unique names such as M207 or P224. The letter indicates what lab found the SNP (M is for Peter Underhill, Ph.D. of Stanford University and P is for Michael Hammer, Ph.D. of the University of Arizona) while the number indicates the number of SNPs that have been located by the lab. That is, M207 is the 207th SNP found by this lab.



DNA Double Helix graphic is courtesy of Apers0n, via Wikimedia Commons

A person tests either positive or negative for a particular SNP, and this helps determine where a tester is on the phylogenetic tree (the world’s family tree). That is, testing SNPs helps determine the haplogroup. The more SNPs tested, the more detailed or refined the haplogroup will be. DNA testing companies originally used an alternating letter and number system; however, these strings of letters and numbers became quite long as more information was acquired and tests improved. Therefore, companies use the terminal SNP as the haplogroup designation. The terminal SNP is the last (as in chronologically the most recent) SNP for which a person tests positive. Of course, as more SNPs are discovered and more testing is done, the terminal SNP will change. No company, lab, nor organization has a full list of Y-DNA SNPs - yet.

                                    Old-style haplogroup: R1a1a1b2a2b1b
                                    New-style haplogroup: R- F2935

The International Society of Genetic Genealogy (ISOGG) Tree places SNPs based upon evidence of where they belong on their haplogroup tree. They have a public standard so people can know how the organization determines what SNPs to place where. ISOGG attempts to update the tree and the new haplogroups frequently. You can find the listing criteria standard for inclusion of SNPs into the ISOGG Y-DNA Haplogroup Tree here: 
The ISOGG Y-DNA SNP tree is at http://www.isogg.org/tree/

Members of a haplogroup share the same common ancestor. Unfortunately, this common ancestor is very likely beyond genealogical records. Therefore, haplogroup project administrators are interested in more ancient migration patterns whereas the usual DNA tester is a genealogist trying to further his or her family history.
After receiving the result of a Y-DNA STR test, it is important to join the appropriate haplogroup as well as your surname group. Haplogroup administrators run projects that look at ancient ancestry which tend to be quite different from projects for a surname. Y-DNA testers may receive a request from their haplogroup administrator to do testing for particular SNP markers. These requests, seemingly out of the blue, can be quite a puzzle for genealogists. So why are they beneficial, and how can they help the tester?

WHY IS SNP TESTING BENEFICIAL?
The more refined the haplogroup, the closer the testers of that haplogroup are to each other genetically.
Haplogroup trees have grown immensely since the recent increased interest in SNP testing. The following N haplogroup for Y-DNA currently seems to be one of the smallest and, therefore, a relatively easy group to use as an example. If you think of a haplogroup as its own tree with branches and twigs, then in this case N is the trunk of the tree with N* and N1 being major branches. N (or any other solo letter in the phylogenetic tree) is sometimes called the parent haplogroup. When the parent haplogroup designator is followed by an asterisk, it is possible that those testers who fall under the haplogroups with the asterisk may not possess any additional unique markers or those unique markers have yet to be discovered. When (or if) such additional unique SNP markers are discovered then such a tester(s) involved will be given a new, unique subclade (branch of the tree).
Off the major branch N1, there are smaller branches N1*, N1a, N1b, and N1c, as seen in the following chart. The term subclade is used for any haplogroup that is beneath (contains more alternating letters and numbers) the basic haplogroup. In this case N*, N1, N1b1, etc. are all subclades of Haplogroup N.
SNPs break down the haplogroup and subclades into smaller subsets. As previously stated, these SNPs have unique names determined by the lab that discovered them; however, if multiple labs discover the same SNP each may name it, so some SNP may have multiple names. Notice in the following chart, some SNPs are separated by a forward slash (/) while others are separated by a comma (,). Those with the slash were discovered and named by multiple labs while the others were not.
The SNPs listed on each line are those required for that subclade. For example, a person who is in subclade N1b1, must test positive for every SNP on that line (L731 and L733) as well as every SNP above it back to N. Of course, a person in a haplogroup like N must also test positive for every SNP from N back to Y-DNA Adam. Remember N is just one of the branches of the oldest known haplogroup A00 (Y-DNA Adam). (See the ISOGG Y-Haplogroup Tree as previously mentioned.


N M231/Page91, M232/M2188
• N* -
• N1 CTS11499/L735/M2291
• • N1* -
• • N1a P189.2
• • N1b L732
• • • N1b* -
• • • N1b1 L731, L733
• • N1c L729.1/M2087.1/Z15.1/Z548.1
• • • N1c* -
• • • N1c1 M46/Page70/Tat, L395/M2080, P105
• • • • N1c1* -
• • • • N1c1a M178, P298
• • • • • N1c1a* -
• • • • • N1c1a1 L708/Z1951, F4325/L839v

After more people do SNP testing on any of these branches, more branches and twigs will appear. These would be named N2, N3, etc. which would line up in the same column as N1 with their own subclades and SNPs. See the contrived haplogroup tree below.

N M231/Page91, M232/M2188
• N* -
• N1 CTS11499/L735/M2291
• • N1* -
• • N1a P189.2 etc.
• N2 (plus newly found SNPs)
• • N2* -
• • N2a (plus newly found SNPs) etc.
• N3 (plus newly found SNPs)
• • N3* -
• • N3a (plus newly found SNPs) etc.

 









  







One of the goals of a haplogroup administrator is to narrow the distance between written records and the ancient migration pattern(s) of their group. By doing some selective SNP testing, the administrator can determine what groups were established more recently than others because SNPs mutate over time. Geneticists have designated some periods when particular SNPs occurred and the more data they discover from additional SNP testing will help them perfect their timelines and determine more recent haplogroups, thus placing testers into groups that occurred more closely to genealogical time.
When a haplogroup administrator asks a tester to take a SNP test, that administrator is trying to narrow this gap and determine which participants are more closely related to each other than they are to the whole group. SNP testing helps the entire haplogroup in establishing closely related testers. But how does this benefit the tester who is more interested in his genealogy?

HOW DO SNP TESTS BENEFIT GENEALOGISTS?
Genealogists use DNA tests to verify their lineage and to find others with whom they can research. Taking advantage of all types of DNA testing helps all aspects of our genealogy and ensures the accuracy and understanding of our results. The following examples may illustrate how SNP testing is important to the genealogist.

Confirming a Haplogroup
A few years ago, a DNA testing company reported a wrong haplogroup for an accountant from Florida, stating that the man was a genetic descendant of Genghis Khan. Two major U.S. newspapers reported this finding, and after Family Tree DNA (FTDNA) tested the man, his haplogroup was clarified. The newspapers wrote retractions, and Bennett Greenspan, President of FTDNA began the company’s SNP assurance program that, in essence, states if the haplogroup cannot be derived from the haplotype, then the SNP testing would be performed free of charge.
With a few marker results it can be difficult to assess the haplogroup, especially in the more common haplogroups. For this reason, a tester should test at a Y-37 marker level or higher.          

Confirming the Paper Trail
An African American member of a surname group was predicted by the testing company to be in Haplogroup I1b. This haplogroup suggests that his paternal line came from Europe, rather than Africa. The participant had traced his ancestry through traditional genealogical research back to a slave who lived in the mid-1800s, and he wondered if the slave might have been the son of someone in the family who owned him. However, a Genealogical Forum of Oregon Volume 64, No. 4 19 descendant of the owner’s family in the project did not match his STR profile. SNP testing was ordered and the participant was found to be in Haplogroup B, which is found almost exclusively in sub-Saharan Africa. Now the participant knows the real origin of his paternal line.
- Contributed by Whit Athey

Determining Extremely Rare DNA
Several dozen people tested positive for M201, so they were within Haplogroup G, but they were found to be negative for every other SNP within G then being offered commercially. Finally, a few members of this group were tested in a small research study for what was thought to be an extremely rare SNP, M377; this resulted in defining Haplogroup G5, which had only been observed previously in two Pakistani men. Now the European branch of this haplogroup has something that clearly unifies them and adds to their sense of identity. Essentially all in this group are Ashkenazi Jews from Eastern Europe, though some did not previously know their origin.
- Contributed by Whit Athey

Creating Subgroups within a Larger Haplogroup
SNP testing refines ancestral origins and helps to differentiate between members of the same haplogroup. Testing positive for additional SNPs puts a person in a more select group with others in the same haplogroup. This means you can narrow the people with whom you match. For those who do not match you on the SNPs you are not related for thousands of years. With each SNP for which you test positive, your DNA signature gets closer to indicating relationships within recorded history.
The Talley Project had three to four people whose haplogroups could not be determined without doing SNP testing. The testing helped determine if those with no haplogroup predictions were related, even remotely or not recently at all. It also showed if there would be a new haplogroup for the surname. SNP testing would also indicate if these testers could be a product of convergence; that is, they are matching the haplotype, but are not a member of the haplogroup and therefore not related. The result of testing indicated that the testers were more closely related to each other than to the entire group. They became their own subgroup within the haplogroup.
- Contributed by Emily Aulicino - Administrator for the Talley DNA Project

Narrowing the Gap
SNP testing narrows the gap between written genealogy and ancient genealogy. I tested my paternal Doolin cousin with the Y-111 test. He matches a couple of Doolins and many other surnames, such as Lawlor, Kelley, Moore, etc. The paper trail ends about 1750 in Virginia. I know the line was Irish or Scots-Irish, but where in the native land, I had no idea. I joined my cousin to a subclade haplogroup according to his terminal SNP at that time.
The haplogroup administrators e-mailed to ask him to take a SNP test when they saw that my Doolin cousin and the six other names had common markers. I did so for the sake of the group and because I know those administrators are trying to use the SNPs to lessen the gap between the genealogical records timeframe and ancient migrations. I followed their suggestions and now know that the surname was probably O’Dowling in the mid- 1600s in County Loais, Ireland. We are one of the Seven Septs of Loais that the British tried to disband in the mid-1600s. I now have about a 100 year gap between my paper trail and my ancestral origins, instead of infinity. Recent analysis by the haplogroup administrators estimates that my surname existed about 1300 AD and that the terminal SNP L1402 began about 800 AD. I realize that my line may have lived in other locations before coming to America, but it gives me a place to start researching, and in time, haplogroup administrators will learn more through their SNP testing.
 - Contributed by Emily Aulicino

Determining Unique Novel SNPs
With the advent of the Big Y test at FTDNA (www.familytreedna.com), a male can be tested for 25,000 SNPs. Although not everyone will test positive for all 25,000, the more people who take this test the higher the likelihood that testers in the same haplogroup subgroup will find that they are more closely related than one thought. A great benefit from this test is that novel (newly found) SNPs will allow the creation of more subclades within a haplogroup thus bringing the common ancestor nearer to genealogical time. Private SNPs can be discovered as well. These SNPs may or may not remain private; that is, belonging to a family for the past few generations. Over time, some of these private SNPs may be found more extensively and thus help narrow the subclades as well. The Big Y test is not a test to use for finding matches within a genealogical time frame, but is for more ancient ancestry which makes it of more interest to the haplogroup administrators. However, the test could be of interest for those who wish to contribute to the overall knowl- The Bulletin 20 June, 2015 edge of genetic testing. Besides the Big Y, FTDNA offers individual SNP testing along with various haplogroup SNP panels which are being created in collaboration between haplogroup administrators and FTDNA. See http://www.isogg.org/wiki/Y-DNA_SNP_testing_chart

SNP TESTING RESOURCES
Astrid Krahn who, along with her husband Thomas Krahn, owns YSEQ (http://www.yseq.net/) states that their company “offers every public or private SNP on the male specific region of the Y chromosome as long as it can be technically tested with the Sanger sequencing method” and that “there is no practical limit to the number of SNPs that YSEQ offers since every SNP can be wished for. The number on the menu (top left) on our website only reflects the SNPs that have been practically ordered and that we have confirmed with actual sequencing results.” As of printing time, their website lists over 11,000 SNPs and 59 Custom SNPs. Tests can be ordered separately or in panels.
Other companies conducting SNP testing include Genographic Geno 2.0, although it is not used as much as it used to be (https://genographic.nationalgeographic.com/) and YFull that is helpful to people with ancestry in Eastern Europe or Asia (http://www.yfull.com/.) Also, both Full Genomes (https://www.fullgenomes.com/) and BritainsDNA Chromo 2.0 (https://www.britainsdna.com/) are used by those very interested in SNP testing.
ISOGG has a comparison chart for some of these companies at http://www.isogg.org/wiki/Y-DNA_SNP_testing_chart. The ISOGG Y-DNA Haplogroup Tree is so powerful that not only the genetic genealogists use it, but various genetic labs around the world also visit.

SUMMARY
SNP testing can be beneficial to the genetic genealogy community as a whole as well as to individual testers depending upon their desire to determine who is more specifically related on the Y-chromosome as well as narrowing the gap between genealogical time and ancient migrations. The exact number of SNPs for the Y-chromosome is not yet known, but as of February 2015 Alice Fairhurst (team leader for the ISOGG Y-DNA Haplogroup Tree) reported that there are 15,888 uniquely named SNPs whose location on the tree are identified. ISOGG YBrowse has more than 120,000 SNP names, but as of this writing, the site is not operational.
Both Thomas Krahn’s company YSEQ and the ISOGG tree show the equivalent names of SNPs that were discovered by multiple labs and so given multiple names.
When you know a little about a subject, it is easy to make judgements based on the knowledge. However, as knowledge increases, beliefs change. In the early years, geneticists discovered SNPs that helped them place testers into haplogroups. More SNPs were discovered and those haplogroups were refined, creating many subclades. Some testers’ haplogroups were changed completely. Now that thousands of SNPs have been discovered, geneticists are seeing some unique situations surrounding these special markers. Some scientists question the quality of some SNPs, believing that they are not viable enough to use for haplogroups while others are not in agreement with how some SNPs are placed on the haplogroup tree. All this will take time to sort out as we gain more knowledge in understanding these markers. And, just as scientists now believe that Haplogroup R is more recent than previously thought based on new discoveries; we may find major changes in the structure of the phylogenetic tree as more information surfaces.
No doubt, the SNP testing currently available is only a small step toward what the future holds for genealogy testing as this is just scratching the surface of the estimated 12.8 million SNPs in the human genome according to the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/books/NBK44423/). The decision to SNP or not to SNP should be left to the individual tester with guidance from the haplogroup administrators.

Permission has been given by the International Society of Genetic Genealogy (ISOGG) to use any references to their website, including the Success Story examples.



Originally written for the Genealogical Forum of Oregon’s Bulletin, June 2015, p. 16-20.

No comments: