DNA ethnicity estimation has improved significantly from its earliest days. In my article DNA Testing, What Do My Origin Percentages Mean? from 2016, I noted that ethnicity estimates are generally accurate to the continent level, and should improve with time. Now two of the major testing companies are attempting to pinpoint origins in regions within countries. 23andMe came out with this first. Now Ancestry is also doing this, and has defined hundreds of regions just within Europe. Many more people have been added to reference populations, which is great. However, we shouldn’t lose sight of the fact that in some areas, the distinction between countries is still quite rough.
France has been a difficult country to define genetically. One challenge is that consumer DNA testing is illegal in France. See a discussion of the topic in news article In France, it’s illegal for consumers to order a DNA spit kit. Activists are fighting over lifting the ban. Some residents of France have taken the test despite the law, but not many. This limits the number of Ancestry DNA testers with full French ancestry.
Ancestry DNA uses data from its users who have consented to research as part of its ethnicity calculations (see Ancestry Ethnicity Estimate 2019 White Paper, p. 5). Ideal candidates will have a long history in a region. Thus Ancestry doesn’t have a strong pool of French sample candidates. Consider the genetic communities that Ancestry has been able to create based on trees of testers with significant history in various regions. Ireland & Scotland have together 116 regions and Norway has 99 regions. France has none.
Ancestry tested its ethnicity estimator on samples that were each 100% representative of one region. Many regions could be predicted with nearly 100% accurately. These include Finland, Japan, and Polynesia. France was one of the four worst regions, with only 53% accuracy. The worst prediction was Indigenous Cuba at 30%, then Mongolia at 51%, France, then Spain at 55%. The bright side is that the ethnicity is assigned to nearby regions instead (see Ancestry white paper, p. 22). These weaker areas should be focused on for future improvement.
In an era with ethnicity estimates predicting origins in small areas within countries, be aware that some areas still have much less accurate predictions, even next to areas with scores of sub-regions. When working with origins in places like Indigenous Cuba, Mongolia, France, and Spain, consider that ethnicity results may be inaccurate and show up as neighboring communities.