Remove "Possible duplicates" feature for Scandinavian records--most suggestions are records that are
LegacyUser
✭✭✭✭
Kathleen Hedberg said: There is a real problem with the “possible duplicates” feature in Family Tree as it applies to Scandinavian records. The problem appears to be that the computer uses names, dates, and family relationships to determine possible matches, but does not consider place. That works fine for the United States, where there is a great variety of names and where people moved a lot, but it is a real problem for Denmark, Norway, and Sweden, where the variety of names is small and where people in the early years did not move much. It is quite common for several Per Perssons with birth dates in the same year to have parents, or spouses, or children who also have similar names. The computer then suggests that these are possible duplicates, even though they were living in totally different parts of Sweden. People who are unfamiliar with Swedish genealogy merge these names without realizing that they refer to two different individuals.
Could the computer programmers change the algorithms for recommending possible duplicates for ancestors in Scandinavian countries? Or if they do not want to make separate algorithms for different countries, could they remove the Scandinavian countries from the possible duplicates program? It has been my experience that about 80% or more of all Scandinavian names suggested as possible duplicates are not duplicates. And, on the other hand, because of the variation in the spelling of names–Per, Pehr, Päder, Petter, Pähr, Peter, etc.–the computer finds the real duplicates only about 30% of the time. It would be so much easier for those of us who do Scandinavian research to use the “find” feature to search for duplicates rather than have the computer make so many wrong suggestions.
I have done a lot of Swedish research for many years, and am attempting to add sources to the ancestral lines that I submitted earlier. As I do this, I find that I am spending most of my time correcting incorrect merges. Some of these incorrect combinations occurred in Ancestral File or new.familysearch.org, but others have occurred recently. I am grateful that I can now make corrections, but the process is time-consuming. I feel like I am playing “Whack a Mole” as these incorrect merges keep occurring, in spite of my putting “watch” on those I straighten out, and marking “not a match” on incorrect “possible duplicates”. I cannot keep up with the computer, that keeps adding new “possible duplicate” suggestions to my Swedish ancestors, thus encouraging well-meaning individuals to make incorrect merges.
I would appreciate anything anyone could do to correct this problem.
Could the computer programmers change the algorithms for recommending possible duplicates for ancestors in Scandinavian countries? Or if they do not want to make separate algorithms for different countries, could they remove the Scandinavian countries from the possible duplicates program? It has been my experience that about 80% or more of all Scandinavian names suggested as possible duplicates are not duplicates. And, on the other hand, because of the variation in the spelling of names–Per, Pehr, Päder, Petter, Pähr, Peter, etc.–the computer finds the real duplicates only about 30% of the time. It would be so much easier for those of us who do Scandinavian research to use the “find” feature to search for duplicates rather than have the computer make so many wrong suggestions.
I have done a lot of Swedish research for many years, and am attempting to add sources to the ancestral lines that I submitted earlier. As I do this, I find that I am spending most of my time correcting incorrect merges. Some of these incorrect combinations occurred in Ancestral File or new.familysearch.org, but others have occurred recently. I am grateful that I can now make corrections, but the process is time-consuming. I feel like I am playing “Whack a Mole” as these incorrect merges keep occurring, in spite of my putting “watch” on those I straighten out, and marking “not a match” on incorrect “possible duplicates”. I cannot keep up with the computer, that keeps adding new “possible duplicate” suggestions to my Swedish ancestors, thus encouraging well-meaning individuals to make incorrect merges.
I would appreciate anything anyone could do to correct this problem.
Tagged:
0
Comments
-
Gordon Collett said: A couple of months ago there was a case discussed on this board where two different families were getting combined incorrectly due to similar confusion about names. When I took a look at the records and tried to help out, I found as you did that there was a huge list of incorrect possible duplicates.
However, I also found that the individuals' detail pages were not very complete. After I went into the parish records and the Husförhörslängder and added complete birth and christening information and made sure the green standard date and place boxes were set properly, the list of possible duplicates dropped dramatically, usually to zero.
Do you have an example of an individual with too many incorrect duplicate suggestions? It would be interesting to see why there are so many. It would also help the software programmers refine the routine and try to solve the problem you are having.
I do agree completely that the Find routine is often far more useful in finding duplicates for the Norwegian and Swedish people I have mainly been working with but the Possible Duplicates has shown me some that I would not have found using Find.
As far as your other concern, that spelling variants of names cause problems for the search routines, I have to say that I have always been impressed by how well Family Search handles that problem. To take your example of Per, here is the list of names that Per would match with in the Find, Search, and Possible Duplicates routines:
As you can see, all your examples are on the list (umlauts are ignored when matching names) but if any were missing, you could request additions.
I can certainly sympathize with your frustration with the ongoing need to fix other people's "corrections" and mis-merges at this point in Family Tree. Be sure to put just as many sources as you can on every one of your people, put as complete of vital information as possible, add everyone to your watch list, use the "Not A Match" feature, and keeping teaching everyone you meet how to use Family Tree properly.0 -
Kathleen Hedberg said: Gordon,
Thank you for your suggestions. The number of matches for “Per” is impressive! My major concern, however, is with incorrect merges that result from “possible duplicates”. The problem is not so much that one individual has many incorrect possible duplicates, as that so many individuals have one or two incorrect ones. Here is an example. Four children in the same family have possible duplicates suggested:
Eric Olsson L41J-CDL The possible duplicate has Sweden listed as a birth place, no birth date, no parents; but the spouses’ names are the same. There is no information about the spouse of the possible duplicate, but information about a child born to the duplicate couple shows that the child was born in a different parish and län far from Karlskoga.
Maria Olsdotter MM2W-8JD has two possible duplicates. One shows an approximate birth year close to her birth date, but a birth place in Kroppa, Värmland, which, while in a different län, is not too far from Karlskoga. A spouse’s name is given, but no parents’ names. The other possible duplicate shows the birth place as Karlskoga, but with a birth date ten years later and different names of parents.
Elisabeth Olsdotter MM2Y-TZQ The possible duplicate has a different name Anna Lisa (though Lisa and Elisabeth are interchangeable names), born in a different parish and län far from Karlskoga, and the father’s last name is different.
Brita Olsdotter MM28-232 The possible duplicate has the same birth year, but the birth place is in a parish and län far from Karlskoga, and the parents’ names are different.
So in this family there are five possible duplicates listed. The first duplicate for Maria has a remote chance of being correct as she could have moved to Kroppa and married there, but she should not be merged without thorough research in the original records. The other four suggested possible duplicates have no chance of being correct. Yet it has been my experience that people merge possible duplicates like these. That is why I think that, unless new algorithms could be created, it would be better to have the computer stop suggesting possible merges for Scandinavian records.0 -
Gordon Collett said: I know. That is why I am frantically working through my wife's Norwegian family and getting the possible duplicates as empty as possible by merging when appropriate and marking "Not A Match" when needed.
By far the biggest problem I'm running into, however, are all the bad merges in New Family Search. That, at least, is over with.
I am also putting every direct ancestor, and their children on my watch list so I can reverse merges as soon as possible that should not have been done.
Also, take advantage of the tools the Family Search gives. On that duplicate for Eric Olsson, click "Not A Match" and write in the reason statement exactly what you did here, "The possible duplicate has Sweden listed as a birth place, no birth date, no parents; but the spouses’ names are the same. There is no information about the spouse of the possible duplicate, but information about a child born to the duplicate couple shows that the child was born in a different parish and län far from Karlskoga."
That should prevent anyone from merging. This type of an incorrect Possible Duplicate would not be unique to Scandinavia but with the information given would match up in any country. Looking at your other examples as well, I doubt these kind of problems are unique to Scandinavia. Ideally, the proper people will see these examples of yours and study them to find out what the problem is with the match routine. I'm sure fine tuning it is an ongoing process.
I do think the warning statement at the top of the possible duplicates page could be a lot stronger. Where it now says "Merging is a complex process in which you decide if two people are the same person. If they are, you choose which information should be kept. Please take the time necessary to carefully review each possible duplicate."
Maybe it should say, in a large font size, "WARNING! the possible duplicates here should NOT be merged until you have done sufficient research to prove beyond any reasonable doubt (with apologies to James Tanner who just wrote several blog posts about the inappropriate use of legal terms in genealogy) that these are the same people. Please examine all family relationships as part of this process."
Good luck in your continued work to straighten out and guard your ancestor's pages.0 -
Kathleen Hedberg said: Thanks again for your reply. I really like your idea of a better warning.0
-
Kathleen Hedberg said: Here is another example of problems caused by lack of algorithms for place information in Scandinavian records. This is an ancestor of someone I was helping at our Family History Center. He is Nils Nilsson L7QJ-LY7 born 29 January 1733 in Västra Ämtervik, Värmland, Sweden; his wife was Marit Persdotter. Birth records for this parish were indexed under the old extraction program so that meant that we had to merge the parents of each of Nils Nilsson’s nine children. Because these were extracted records, only the names of the parents were listed with each child and nothing about the parents’ birth dates or places. Because we had good original records for all the children, it was not difficult to identify and merge the correct possible duplicates for the parents of the children. But many other Nils Nilssons were also listed as possible duplicates. Those possible duplicates also had no identifying information except the names of spouses. All of the spouses’ first names were Marit, and several were Marit Persdotter, the same name as the spouse for the Nils Nilsson in Västra Ämtervik. A child was listed for each of them. The only way to find out that those other Nils Nilssons were not duplicates was to click on the name of the child to find out that the child was not born in Västra Ämtervik.
We marked “not a match” for all of the 16 incorrect possible duplicates for Nils Nilsson, including 5 for a Nils Nilsson whose wife was named Marit Persdotter but whose children were born in a different parish. But I am concerned that many people will not realize that they need to click on the name of the child to find the birth place, as it would seem logical to merge two spouses with the exact same names and no other identifying information.
This is not an uncommon problem in Scandinavian records. Name such as Nils, Olof, Eric, Maria, Anna, Catharina, etc. are so common that often two couples with the exact same names can be found. This is why it would be so helpful if algorithms could be created for Scandinavia that would take into account place names, or if the computer would stop suggesting possible duplicates for people living the Scandinavia.0 -
Dale Hein said: It's been a long time since I posted on this topic. I just watch for Possible Duplictes in my lines constantly so I can catch them before anyone else who might merge an incorrect one and add all new [incorrect] family members to my ancestors. But I had to tell everybody who has been complaining about the bad Possible Duplicates for a very long time what FamilySearch is doing to help with the problem! I just found out about this. So they set up a program, one of many in Get Involved, in which they are asking for volunteers to do "Label Matching." You'd have to go there yourself for details, but basically we would be helping train the system to be more correct with the Possible Duplicates put on ancestors' pages. This is GREAT! So we should all "Get Involved" now that FamilySearch is really trying to correct this huge problem. It's also Record Hints they are trying to make better. Record Hints are also in the "Label Matching" effort. So here is the link. https://www.familysearch.org/blog/en/... They're finally trying to solve this problem, so let's help them!0
This discussion has been closed.