use ai to resolve duplication

David Nishimoto · April 30, 2023

Use ai to find the best solutions to resolve duplication of names. If the probability is 90% of duplication then let the computer resolve the duplication

Gordon Collett · April 30, 2023

Maybe to find possible duplicates if the current routine needs to be improved but never to resolve them. Users that know their families really need the final say on a merge.

Now what could be helpful is a routine that prevents merges: "I'm sorry, I can't let you merge these two people, Person A was created to be John Smith born 1820 in New York and Person B was created to be Henry James born 1920 in California."

I think the current possible duplicate routine only presents a match if the probability is at least 98% or higher.

Julia Szent-Györgyi · April 30, 2023

I second Gordon's opinion that the computer should never, ever be allowed to resolve duplicates without human input. Ever.

I'm currently in the middle of a multi-day legacy-data-cleanup spree, so I know exactly how tedious duplicate-resolution gets, but I still don't ever want the computer to make those decisions for me. The reason for this is that the specifics always matter, but the computer can only operate on patterns.

For example, a baptism was misindexed with the mother as Clara instead of the correct Estera. (There's a wrinkle in the register page making the name hard to read, and the indexer made a wrong guess.) In a preceding system, that incorrect index was taken at full face value and used for the creation of three profiles: father, mother, daughter. Those profiles were then imported into Family Tree, when it was "seeded", over a decade ago now. Also included in that initial import were the similar family tryptichs created based on the indexed baptisms of Esther's other daughters. Those register pages were easier to read, so those profiles have Esther's name correct.

Cleaning up this legacy data means merging all of the Esthers, including the one that says Clara. Does this mean that I think Clara and Esther are the same name? Should the computer henceforth treat them as equivalent? No, absolutely not, not by a long shot. They're totally different names, normally; it's just an indexing error, and a weird one, at that, as Est- doesn't resemble Cl- in any handwriting style I know of. At the other extreme, do I want the computer excluding "Clara" from the merge, based on the different name? Again, absolutely not.

Genealogy is full of such cases where it's all in the little details. For example, in Hungarian, Örse is a form of Elizabeth, while Orsi is a nickname for Ursula. To an American diacritic-blind eye, I'm sure they look like they ought to be the same thing. Meanwhile, Stephanus and István and Pista actually are all the same name. I don't think the computer can ever be programmed with all of this kind of "local knowledge", and I do not want it behaving as if it has been.

Adrian Bruce1 · April 30, 2023

Regrettably, the standard that AI is reaching right now is dreadfully poor for anyone who cares about precise things. Well-written stuff should have references - I've seen analyses where someone has asked the tool for text with references and the AI has simply made up some of the references.

use ai to resolve duplication

Active · Last Updated April 30, 2023

Comments

Categories