Number of duplicates in the world wide tree
edited September 28, 2020 in Suggest an Idea
terry blair said: In my little corner of the world I spend much of my time merging duplicate records (principally from the Northern US). Unfortunately, some of the duplicates are self inflicted, but most seem to be left over from earlier days, although besides my duplicates, some are modern. Based simply on my experience, I would venture to say that perhaps one quarter to one third of the records in the US portion of the tree are duplicates. Do the number of duplicates vary by region of the world? What percentage of the records being added on a monthly or yearly basis are duplicates? I suppose it doesn't really matter, as all the duplicates have to be found and eliminated, but are there any better ideas or actual numbers?
joe martel said: Terry, duplication rates have been measured since the early days of the organization. (Check out "Hearts turned to the fathers" book by James Allen et.al.).
Today duplication rates for added persons of the major products (FS and third party) are tracked. Duplication rates for the various products ebb and flow over time, and they are not published.
Your question about certain regions is a good one. Analysis of duplicates being added seems to be related to the particular user doing the add. It appears some users choose to ignore suggested matches. So the more users in certain regions may overlap into your tree and their injection of duplicates cause more work for your related ancestors.
We all wish duplication rates were lower and unfortunately some users are more prone to damage done by those less careful users. There are a number of ideas to stem this but they are involved.0
Paul said: From a personal viewpoint, I find the number of duplicates far less worrying than the high volume of careless merges that make genuinely separate individuals disappear.
Yes, merging duplicate IDs does take up a lot of my time, but so does restoring individuals who have little detail in common with the person with whom they have been merged.0
Robert Wren said: The original post seems to be asking: What is the "Number of duplicates in the world wide tree"? and NOT how many are being added.
IMP, both are good questions (but the type that FS generally does not directly respond to).
Some stats on number of completed merges (or "not a match") might be interesting or inclusion of individual merge counts in MyContributions pm the FS App might also be helpful.0
Jeff Wiseman said: Terry,
FWIW, my view on this is that there are two categories of duplicates in the existing FamlySearch FamilyTree.
The first group are from IGI records. Several years back when the FamilyTree database was first being formed, all of the existing records that the church had were migrated into the new tree database. I believe that a very large number of those duplicate profiles came from the IGI database.
The structure of the IGI database was totally different from the FamilyTree database that was being created. It did not have the ability to directly represent all the person to person relationships that a single person had. So when a family of records was brought over from the IGI, a single person (such as father) had their profile created multiple times, one for each relationship record they had in the IGI (i.e., one from their relationship to their own father, One for their mother, one for their spouse, and one for EACH of their children. So you can see that if the person had 10 children, they would likely wind up with at least 13 new profiles for themselves in the FamilyTree after the migration. This was just a natural thing since the new structure in the FamilyTree was just reflecting the original structure that was in the IGI.
These types of duplicates are no longer being created because the IGI has already been completely migrated into the FamilyTree. For these types of duplicates, the "Duplication Rate" today is equal to 0. However, as you have observed, there are many of those still in the tree that need to be merged away. This task is exacerbated by the fact that most of those profiles did not have much information included with them, so it is hard to tell where actual duplicate really exist due to the lack of details. As a result, these sparsely documented profiles can get mistakenly merged into other profiles where they don't belong.
However, the second group of duplicates is a totally different story. These are duplicates being created today (either intentionally or in ignorance) by people working in the FS FamilyTree. As Joe has pointed out, these rates are tracked but not published. This is a complicated issue also, as it occurs both from people not real familiar with the system as well as people intentionally trying to bypass some of the "limitations" of using a shared and collaborative tree. Also, with the new syncing mechanisms with third party sites and the ability to import GEDCOM file data into the FSFT, the ability to generate large numbers of changes in a short period of time can, and has been abused (again either by ignorance or by intention).
So IMHO, tracking the numbers is important to figure out how to reduce the rate of duplication and (as Paul wrote above) incorrect merging of duplicates--especially since correcting the damage after the fact takes orders of magnitude more time than it took to create the problem.0
Adrian Bruce said: FWIW, on a personal level, I have never considered the "duplicates" from the IGI (as per Jeff's reply above) as being duplicates in any real sense. Instead, I prefer to think of them as multiple partial profiles. The system identifies them as duplicates, of course, and they are dealt with in the same way as the second class of duplicates, so you may think I'm playing with words, but given that they are a population with no growth, created for perfectly good reasons, I prefer to put them to one side in the consideration of duplicates and not to worry about them. (Yes, I fix them when I can, but I don't stress over them. Usually...).
It's the second class of duplicates that's the problem.
What I don't know is if the FS duplicate stats measure both classes of dupes. I can't see how to analyse them separately - particularly since FSFT profiles probably exist that are a mix, after merging, of both IGI created and directly created dupes. But I don't know for certain.
Incidentally, I really don't remember if I've seen confirmation of this - but I assume (bad idea?) that the duplicate count is actually a count of dupes that have been merged and are thus no longer a problem, individually??? Rather than ... err... not sure what. (Even if the precise profiles are no longer a problem, I'm sure that projecting the stats forward is eminently sensible).0
S. said: their is many issues I find I wish they would clean up Duplicates is one, and another is Careless Merges. and their is others, all I can say is I hope they get them all worked fairly for every one.0