Do you have an volunteer projects going to merge duplicate parent records created by extraction proj
LegacyUser
✭✭✭✭
TManning said: Extraction of civil birth and baptism records obviously created a lot of duplicate records for the parents, one for each sibling. Are there any projects to merge these duplicates or are you just counting on the users to do so when they reach that point in the tree? I know many users seem to be clueless about this.
And merging would then create a larger family group that the hinting engine could match to census records.
I have been working on this for Ireland for several years now but cannot hope to live long enough to complete.
And merging would then create a larger family group that the hinting engine could match to census records.
I have been working on this for Ireland for several years now but cannot hope to live long enough to complete.
Tagged:
0
Comments
-
gasmodels said: There is no project because that would assume it is easy to tell if they are duplicate parents. If you look at the extracted records frequently there is a fathers name and a mothers given name. There are so many parents with the same names say in England that only someone who takes the time to clearly evaluate the children would be able to properly merge. I have found relatives with 40+ children because the user just assumed every John Platt married to Mary was the same family. If we allowed volunteers to do this I believe it would be a complete disaster.
Sometimes I even have a hard time because there are duplicate parents in the same parish in the same timeframe and it takes a lot of detective work to figure out what is correct and I expect I have made mistakes. I do not believe it is such a major issue that we need volunteers working on this issue.0 -
TManning said: I see your point with some records, such as those in England. The Irish records contain the mother's full first and maiden names. The actual images also contain the father's occupation and the townland in rural areas or street address in cities. So long as you check, you should get it right.
My experience after working on them non-stop for the last four years is that it remains a huge problem. I have merged birth records for Chicago, New York City, and Philadelphia in addition to Irish civil birth records. I estimate there are still over half a million records in Ireland alone that need to be merged. I personally have completed over 20,000 of them and have not come close to finishing my own family names. I know the LDS Church extracted tens of millions of names from microfilms for several decades. So I believe it remains a big cleanup problem that needs to be addressed. And if it was, the hints would work better. Once a family was put together, the census records would hint better. For Ireland that is important because so many families moved overseas. In order to locate them on a census, you need them connected to their siblings.0 -
Paul said: gasmodels
I agree with all you say - except your implication this is not a major issue. My personal experience is that the GEDCOM duplicates have proved of little problem regarding the time I spend in merging IDs. In contrast, those created by the extraction projects - the classic example of ten John Wilson IDs created against each of the IDs for his ten children - has been a major unwanted use of the time I have spent working on Family Tree.
If only there had been some way of avoiding the creation of this problem, because - as you say - a volunteer project (or computer-based one) could create even more problems, by incorrectly merging individuals with very similar identities.
I can't see the overall problem being addressed in any of our lifetimes!0 -
TManning said: I was hoping that in this time of coranavirus lockdown at the Family History Centers, the volunteer missionaries might be able to help with this. Most have experience and training that would help them to be more accurate than the average new user. Many of the geographical areas covered also have matching census records, such as Chicago or Philadelphia, to compare against. They could also attach source records, leaving small nuclear families for others to attach to their families. As I mentioned, I have done this for over 20,000 individuals in the last few years.0
-
Juli said: This is the one reason I'm glad so many of my relatives were unindexed Lutherans in Hungary. They're very hard to find, but when I do find them, I know they don't already have half a dozen profiles each to merge. When I'm working on Catholics or Calvinists, or northern Lutherans (in Slovakia), I spend as much time merging as I do searching on my unindexed lines. Each activity is tedious in its own way, but at least searching comes with the possibility of the thrill of discovery. Merging, on the other hand, is neverending drudgery, without even the possibility of a sense of completion, because there's always the parents and siblings and their families, and where do you draw the line?
Another annoyance with these extraction-based legacy duplicates is that I'm constantly dealing with ridiculous misindexing, and the accompanying worry that my record searches and the duplicates algorithms are missing someone because it was indexed so badly. The other problem is that even if I manage to find and correctly sort all of the legacy profiles, I haven't generally completed the families, because the boys very seldom have profiles. It's almost always only the girls (or the boys who were misindexed as girls). This adds yet another tedious step, of working around Source Linker's shortcomings to add profiles for the boys and citations for everyone.
Unfortunately, I don't think the problem lends itself to a "volunteer project"-type solution. Part of the reason is the scope issue I mentioned: where do you stop? The other problem is the need for close familiarity with the names and places and so on, so that you can tell whether it's just a spelling variation (or transcription error) or actually a different family, and so you can tell that no, this is not the same woman having children over a span of 45 years, but two different couples with the same names (in the same small village). There are all sorts of such pitfalls in these profiles and the records they're based on, and I'd really rather not have to sort through a stranger's conclusions about them. As tedious as merging is, undoing merges is even worse.0 -
Tom Huber said: The last time that volunteers were used to merge was back with nFS and the tree in that system.
The biggest problem is that the volunteers (from BYU) had no invested interest in the people they were merging and as such, it was deemed a failure and according to what was reported, discontinued before a whole lot of damage was done.
In her last paragraph, Juli above sums up the problem. Lacking familiarity with the time, the families and naming, and the places all can contribute to major problems. And if the families involved lived in the same place for many generations spanning two centuries or more, the problem is magnified because families tend to reuse the same given names and marry into families that do the same.0 -
TManning said: Thanks Tom for the information on the BYU project. To everyone else. As someone who has been doing this almost daily for the last four years, it is not that hard. Many record sources that were extracted cover a very limited time span, in the case of Iridsh births, 17 years. You have no problem with multiple generations. You do not even have one full generation. You know clearly when to stop, when you have merged each nuclear family in that particular collection and have attached its civil birth record. Both parents gave both a first and last name. For women it is their maiden name. The images, which can be pulled up on irishgenealigy.ie in a separate window also contain the father's occupation. And an exact street address if they live in a town or a townland if in a rural area.. Unless you show no care whatsoever, it is hard to screw up. In fact, it the father's occupation and the street address or townland had been indexed fields, a computer program could easily do the whole country.
And for most civil birth records in America I have worked with, New York, Chicago and Philadelphia, the situation is similar. Relatively small record sets and mother's maiden names making the father and mother set unique to that nuclear family.
Perhaps you are working in records where this would not work, such as England, where the mither's maiden name was not recorded. But there are many where volunteers could be very useful and cleaning up the database helpful.0 -
Adrian Bruce said: While I can see that it might work with Irish birth registrations, there are lots and lots of collections where it won't. You say that it won't work in "England, where the mother's maiden name was not recorded." It certainly won't work in the English collections in FS but it's not that England didn't record MMNs - because it did in the direct equivalent, the English & Welsh birth registrations. Rather it's that the English collections in FS don't have that data, while the ones that do, aren't in FS.
The simple truth is that I don't trust FS to identify which collections are potentially do-able in this manner. Such identifications are not simple - either extensive knowledge is needed or extensive sampling. Both, even. And Murphy's Law would say that the sampling would hit some baptisms where the mother's maiden names were included and the wrong decision would be made.0 -
TManning said: Then why don't we do the sampling and clean up the collections we can. Just because we cannot do everything does not mean we should not do what we can. And I believe the 20,000 Irish names I already did qualifies as an adequate sample for the Irish civill birth records.
And I am sure somewhere in their missionary volunteers are people with competence enough to test each collection before doing it. They know what the collections in their area of expertise contain.
And if the collections that contain the data are available online, we can train the volunteers to check them prior to the merging, the same way the images of Irush birth records can be checked on irishgenealogy.ie. I am not suggesting we use random volunteers, but trained familysearch center missionaries. Perhaps they might like projects they can sink their teeth into for half an hour or hour a day, something different from indexing0
This discussion has been closed.