Possible Duplicates Algorithm
LegacyUser
✭✭✭✭
Jordi Kloosterboer said: In the image below, you can see that there are duplicate parents. One set only has the first names as the information in those profiles. They match the other set of parents. I think that the possible duplicates algorithm should include this (where one of the children's other set of parents have the same first names as the first set of parents).
Tagged:
0
Comments
-
Tom Huber said: Fortunately we have the merge by ID option. If the merge algorithm is too loose then we will get too many merges that shouldn’t take place. Inexperienced users often merge based upon the possible duplicates and do not take the time to really look at the two records to make sure they are for the same person.0
-
Jordi Kloosterboer said: True about the merge by ID. I don't think this would make the algorithm too loose.0
-
Paul said: Jordi
Since you have now merged Thomas with Thomas Thornborough I am finding it difficult to follow how this situation arose in the first place.
I thought "Thomas" might have been carried across from Eliza's christening source, where he had appeared without a surname (only Eliza being indexed as Thornborough), but I see this was not the case. So I wonder how he got on the system without a surname, unless another user created him (the merged ID) as such (just "Thomas").
In general, the algorithm does seem to look for matching surnames, but I am sure I have found inconsistencies when it comes to the mother. After merging the two IDs relating to the father of the child, I am sure that (using this case as an example) the ID for "Jane" would not be offered as a possible duplicate for Jane Walker. However, I'm sure there have been exceptions to the rule - it's just that I can't work-out the reason(s) this happens. Sometimes the message of a "possible duplicate spouse" has appeared on the husband's/father's person page, other times not.
I know this does not directly address your question, but does show the algorithm has been programmed in too complex a manner for me to understand! Hopefully, an experienced user or FS employee can explain how the algorithm does work, in this and "similar" cases..0 -
Jordi Kloosterboer said: It's from the old system. (Here's the old change log: https://www.familysearch.org/tree/per...) I don't need to know how the algorithm works. I'm just suggesting to add a clause to look for this sort of information into the algorithm.0
-
Paul said: However, I was about to add something, from my "inconsistency" claim, regarding how the algorithm "works".
I believe the occasions when I have received the "possible duplicate spouse" messages have been where I am in the process of merging, say, ten IDs for a John Smith - one created against the christenings of each of his ten children, under an earlier (nFS?) system. On these occasions, it does seem to "recognise" the wife/mother "Mary" is probably the same individual, in all cases.
I just don't think it's as simple as adding something to the existing algorithm so that (in your example) Thomas (with surname) is "recognised" as probably being the Thomas without a surname.
For example, I have asked in another thread what I thought to be a straightforward request - of not offering "John Wright" possible duplicate suggestions for a John Wrightson. (To find Wrightson indexed as Wright is very rare, I have found.) However, as Tom suggests, it must be very difficult for the programmers to strike the right balance in having too loose or too tight an algorithm - both in my and your examples.
Until they are able to improve how these algorithms work, we'll just have to carry on using the "Not a match" and "Merge by ID" functions, I guess, however annoying / time-consuming we might find this to be. Hopefully, the programmers ARE continually considering improvements - from suggestions like these and otherwise.0 -
Cindy Hecker said: I agree with Tom, we don't want a computer making the suggestion. The 2nd set of parents have no dates. I think the computer should have a few pieces of information such as name, date and location before suggesting a duplicate. You would only see this obvious situation when looking at the child's page. When looking from either parent page you would not get this big picture and not be sure it was a duplicate if just names match0
-
Jordi Kloosterboer said: Yeah, I'm sure it is a complicated algorithm--and I don't know how it looks, so I can't really say how easy it will be to add this.0
This discussion has been closed.