Gedcom Challenge: Show Us The Data
LegacyUser
✭✭✭✭
rotkapchen said: For well over 2 years individuals have been lamenting the issues associated with GEDCOM loads (see a collection of former discussions at http://gsfn.us/t/52s44).
FamilySearch is purported as a collaborative environment, only it is not managed or designed to operate efficiently as such. It appears to only be as collaborative as 'the powers that be' want to allow it to be. [As an example, this system wouldn't even let me post the word GEDCOM in caps in the topic -- as the system had to chastise me that I was shouting and would not let me continue without changing it. Seriously? Can we get some help with some real issues?]
Collaborative environments need to be managed at some level, otherwise chaos reigns. It is not very collaborative to allow countless individuals to dump hundreds of duplicates into the system without everyone else having some sort of recourse (other than spend weeks finding and merging all of the duplicates, one by one, themselves) to report and/or have some administrative tool to manually review the data that was dumped in and correct it - directly, rather than happenstance.
We are repeatedly told that the negative impact of the GEDCOM loads is far less than the benefit. That is not my personal experience.
So I am specifically requesting that MY data be tested and that MY family be used as an example as to what it is I am experiencing (along with anyone else that wants to add examples to be evaluated and tested).
In a separate entry here, I will outline the economic 'cost' that GEDCOMs are adding to the work in the temple - what the time impact is for each ordinance to be done and what that 'lost cost' is for all the duplicate work I find EVERY time I get on this system to correct errors in my tree.
Let's start with one example. Just this week I stumbled onto a record I hadn't reviewed in over 6 months. It was a disaster (there are published errors that keep being repeated that people mindlessly pick up and refuse to read any of the many notes throughout the record refuting the bad published information). All of that aside, I found 5 duplicate GEDCOM records for the same person. 2-3 of them were all actively being worked on in the temple, another 2 were either reserved or had been RELEASED to the temple for someone else to waste their time.
So here's the challenge:
Pierre Lejeune dit Briard
from 1626 to 1628 – 1661 • LRSQ-HKX
Across all merges, I want a count of each of the ordinances completed for this person, separating each count for B/C/I/E/SP/SS
This individual has only one unnamed spouse and yet I know he has been sealed over time to multiple women and multiple parents.
Show it all and then I'll run the numbers for the wasted time.
FamilySearch is purported as a collaborative environment, only it is not managed or designed to operate efficiently as such. It appears to only be as collaborative as 'the powers that be' want to allow it to be. [As an example, this system wouldn't even let me post the word GEDCOM in caps in the topic -- as the system had to chastise me that I was shouting and would not let me continue without changing it. Seriously? Can we get some help with some real issues?]
Collaborative environments need to be managed at some level, otherwise chaos reigns. It is not very collaborative to allow countless individuals to dump hundreds of duplicates into the system without everyone else having some sort of recourse (other than spend weeks finding and merging all of the duplicates, one by one, themselves) to report and/or have some administrative tool to manually review the data that was dumped in and correct it - directly, rather than happenstance.
We are repeatedly told that the negative impact of the GEDCOM loads is far less than the benefit. That is not my personal experience.
So I am specifically requesting that MY data be tested and that MY family be used as an example as to what it is I am experiencing (along with anyone else that wants to add examples to be evaluated and tested).
In a separate entry here, I will outline the economic 'cost' that GEDCOMs are adding to the work in the temple - what the time impact is for each ordinance to be done and what that 'lost cost' is for all the duplicate work I find EVERY time I get on this system to correct errors in my tree.
Let's start with one example. Just this week I stumbled onto a record I hadn't reviewed in over 6 months. It was a disaster (there are published errors that keep being repeated that people mindlessly pick up and refuse to read any of the many notes throughout the record refuting the bad published information). All of that aside, I found 5 duplicate GEDCOM records for the same person. 2-3 of them were all actively being worked on in the temple, another 2 were either reserved or had been RELEASED to the temple for someone else to waste their time.
So here's the challenge:
Pierre Lejeune dit Briard
from 1626 to 1628 – 1661 • LRSQ-HKX
Across all merges, I want a count of each of the ordinances completed for this person, separating each count for B/C/I/E/SP/SS
This individual has only one unnamed spouse and yet I know he has been sealed over time to multiple women and multiple parents.
Show it all and then I'll run the numbers for the wasted time.
Tagged:
0
Comments
-
Tom Huber said: The current system makes it very difficult for a user to add all of the people in a GEDCOM file they upload to the massive tree. The change that took place several months or so ago increased the difficulty tremendously.
Unfortunately, due mostly to bad original "old and clunky" code, the changes only impact newly compared uploaded files against the tree. One cannot rerun the compare to get the new list and use the new system.
It has been reported and I tend to believe it, that newly uploaded GEDCOM files are now the least likely to produce duplicates. I believe that the efforts put into making the compare process better along with its new view process is the reason for that bit of information.
That does not resolve the problem with files that were uploaded and the compare run before the latest change. Those are still going to be the source of duplicates and without doing what we've asked for (which FS seems to be unwilling to do) which is to not allow a compare to be run or view the results on any uploaded GEDCOM, regardless of when it was uploaded, we are going to continue to be plagued with the problem.
It is well known, and was well known back in the days of newFamilySearch, that a major problem existed with duplicate ordinances and records. The nFS system was abandoned because it could not handle, meaning it could not scale in capacity, the sheer numbers involved, especially in the "Mormon" corridor states. This was long before the current system existed.
I'm not sure when the compare and view/add/change functionality was added to the upload process, but it definitely predated FamilySearch (which is why the code is considered "old and klunky").
I would be very surprised if anyone took you up on your challenged. The issues are well known and there is no solution for what has already been done. Like almost everything else, the matter is up to us, as users, to resolve. I'm just glad that I have only a very small sgement of my ancestors who were members of the Church and living in the corridor.0 -
Tom Huber said: By the way, Get Satisfaction (this site) is not owner or operated by the Church or any of its departments.0
-
Tom Huber said: The upload process has existed for a long time. The original system fed the old CD-based Ancestral File. After that was abandoned, the uploads fed the Pedigree Resource File CDs. Both of those make up the bulk of the trees in the Genealogies section of the site.
In addition to feeding the two CD systems (AF and PRF), a user could run a routine called Temple Ready, that querired the CDs containing what we now see as the IGI. That series of CDs (two sets were released) contained the actual ordinance dates for all but the confirmation and initiatory ordinances. I believe that newFamilySearch was an attempt to bring that process online over the internet, but that was where the scaling problem was discovered. The handcart of the CD-based system was replaced by a horse and buggy, but they could not handle the interstate highways and speeds.
So at best, newFamilySearch was just a step above the horse and buggy and didn't work very well or consistently. Yeah, we could tavel down the interstate, but FamilySearch could not bring everything to a halt to replace it, so it was done a little bit at a time, like changing a tire while continuing down the highway. Jim Greene has commented that the current car needed replacement and that's what is going on now. Many older systems still exist and high priority was given to the temple reservation and ordinance sections. I suspect that when that is done that a lot of the problems that we continue to see today will simply "go away" if the new merge comparison screen is any indicator (we can now see more and more of both records).
Only time will tell and despite our desire to see a lot of the problems resolved right now, there is no simply solution, not with a site as complex as FamilySearch. Even the ability to correct index entries ran into some major issues that are still being worked out, and that was only for correcting a person's name.0 -
joe said: I'm always interested in how new data gets added and changed. Gedcom and some partner products make it easy to add lots of Persons in a short period of time. I know you and others feel the pain more other users because of the family lines that are more likely to be involved.
The things that helps to see your issues is providing PIDs of very recently added Person or relationships as a result of Gedcom ingests. This process has been the same for years, aside from sundry bugs that have been resolved. So the indicators are the Reason statement - uploaded from Gedcom, and if it's in the last 90 days then there's more of a data transaction history to investigate.
Also, if you can describe how much time you generally take to correct it that would be interesting as well.0 -
joe said: I'm always interested in how new data gets added and changed. Gedcom and some partner products make it easy to add lots of Persons in a short period of time. I know you and others feel the pain more other users because of the family lines that are more likely to be involved.
The things that helps to see your issues is providing PIDs of very recently added Person or relationships as a result of Gedcom ingests. This process has been the same for years, aside from sundry bugs that have been resolved. So the indicators are the Reason statement - uploaded from Gedcom, and if it's in the last 90 days then there's more of a data transaction history to investigate.
Also, if you can describe how much time you generally take to correct it that would be interesting as well.0 -
Stephanie Spencer Booth said: I love this idea, and I'm waiting to see what the numbers are on this individual.
President Nelson has said we are running out of time. Who are we spending it on? Pierre and countless others who are already done. Ordinances Ready only helps us do the ordinances for the same people even faster because its design does not encourage the user to check for duplicates that the system does not catch, which is a lot.
You may want to ask support how many of each ordinance has been done for Pierre. I don't know if they will tell you, but there is a team that has the tools to view all of the ordinance records for an individual.0 -
Tom Huber said: GEDCOM, the way it is now configured, takes a LOT of time to add more than just a record or two.
First, I cannot add a record for which an existing record has been identified. That takes care of one of the problems right there.
Second, I cannot look at just those records that can be added. I have to work my way through every record in the GEDCOM, just to get to all of the Add records. Even the first one was the 42nd name out of 78, so the process is not easy or quick. By the way, 73 of the 78 records were already in the tree and so no duplicates could be added via that means.
As I mentioned, that doesn't help those files that have been previously uploaded and the compare function run before the latest change, but the old assumptions that it is quick and easy to add a person to the tree via a GEDCOM file no longer applies.
It is far quicker and easier to manually add a person via the site and keyboard and then ignore the possible duplicates. The GEDCOM process doesn't allow that to happen if it has found a match.
The system is not foolproof, but it goes a long way to stopping some of the madness that took place just a few months ago.0 -
David Robertson said: Agreed, FSFT needs some sort of management to keep chaos from taking over.0
-
S. said: I see the need for Ged coms, But they need to come up with a way to get rid of all the duplicates when importing.0
-
S. said: if Pres. Nelson says were are running out of time, example to me what they are doing better to help the issues, And why they are make semi dificult for users. Also and their could be more questions I could ask. X(( very mad!0
-
Tom Huber said: They already have and as a result, it is very difficult to for newly uploaded GEDCOMs to add duplicates. That is why the process now results in the lowest number of duplicates. What happened before the latest changes is part of history, but for newly uploaded files, they are no longer the result of duplicates of any significant number, not like for those that were uploaded in the past and that compare function run on them at that time.
The date the GEDCOM data is added is not the date the file was uploaded or the compare function was run.0 -
Robert Wren said: OK, Tom, I'll bite.
What DOES the GEDCOM date mean or indicate?0 -
joe martel said: In the changelog for the Person added via Gedcom ingest you will see this: This is the day the user added this person to FT.0
-
Robert Wren said: Thanks, Joe, is that date significantly different from the date the GEDCOM was added to "genealogies"0
-
Tom Huber said: It certainly can be significantly different, which is why you cannot count on the date in the changelog to know when the file was uploaded and the compare function run.0
-
joe martel said: That date above is when that user ingested their GEDCOM into FT. It could be way later than the date they uploaded their GEDCOM to genealogies. To know when it was uploaded you have to go find that upload by doing a Search | Genealogies and the bottom left pane has the upload info0
-
Tom Huber said: Thanks for this very helpful information, Joe. It definitely helps to know when the GEDCOM was uploaded (the submission date).0
-
rotkapchen said: When was this implemented because I'm still seeing massive loads of duplicates?0
-
rotkapchen said: I'm sorry but why would something earlier than the ingest date be of any significance? The results are the issue.0
-
Tom Huber said: I've already explained the significance in the second paragraph of my first reply. It says, "Unfortunately, due mostly to bad original "old and clunky" code, the changes only impact newly compared uploaded files against the tree. One cannot rerun the compare to get the new list and use the new system."0
-
Jeff Wiseman said: Tom,
I have not gone near the GEDCOM compare utilities mainly because they are not of use to ANYTHING that I need on FSFT (and frankly, I have not been able to understand why it is deemed as being so ABSOLUTELY ESSENTIAL by FS). But from your descriptions, it sounds as though some "inertia" has been added to slow down the creation of duplicates.
Ok, so that would be a plus.
However, my main concern on this has NEVER been about the duplicates. It has always been about the FAR GREATER PROBLEM that is created when the tool actually FINDS what it thinks is a match, and then throws you into a merge type operation. With sometimes thousands of names in a GEDCOM, the owner would see all these comparisons show up, and wanting to QUICKLY preserve "their tree" from their GEDCOM file, they would just click-click-click through all of the vitals for a person replacing all the pre-existing data in the FSFT with the data they had from their GEDCOM file.
REGARDLESS of whether the match provided by the tool is correct or not, within 15-30 seconds that person has overwritten most of the data for that record while totally ignoring all of the sources, logic, notes, discussions, and previous reasons for change that already existed on that record. This can take hours to just analyze the problem, determine a proper solution, and then to fix a single person's corrupted record back to the correct shape it was in originally.
So MY question is, has anything in that utility been changed to slow down the rate whereby dumping of GEDCOM data over EXISTING data in the Tree happens? Because when it happens, it used to take orders of magnitude LONGER to correct and return those records back to their original correct values.1 -
Tom Huber said: Short answer to your question: Nope.
The same problem exists with any system except the merge two Family Tree records together, because the merge screen now shows a lot more information than was previously available.
The GEDCOM file uploaded file system when used to update the tree, needs to be treated as if both records are from the tree. That way, all of the reason statements are in place, along with the life sketch.
Unfortunately, nothing can stop stupidity from still making a mess of things, so it doesn't matter what FS does in terms of comparison screens are concerned, but the process needs considerable improvement.
If the system would error check and flag, requiring extra steps, discrpancies, that would go a long way to slow down and prevent some of the disastrous updates including from bad hints.
Hopefully, at some point, FS will improve that process, wherein the existing record is fully displayed, even which using the source linker, which will go a long way to overcome most careless attachments and overwrites.
But it is far from there at this point in time.0 -
Tom Huber said: I envision the use of Error meters that range from green through yellow to Red. With Green being the least likely to have problems.
I wrote meters (plural) because I envision using multiple meters to indicate errors in the following areas:
Dates (green with all dates being within a year or two of those in the existing record, yellow being within a decade, and red being more than a decade and the extreme being at least a century off).
Places (green with all place beings within the same town, township, county as the existing record, yellow being outside the same town, et al, but not more than a bordering place, and red being outside the same country or continent).
Parents (green being that they are the same, yellow that at least one is the same, and red that both are not the same in terms of names, dates and places)
Spouses (same as parents).
Children (same as parents).
While this is not going to help those who are color blind, it should go a long way to help determine the potential match of the hint, duplicate, and person being "ingested" such as via a GEDCOM or third-party system. For the color blind, hopefully the system will allow switching to a gray scale from white (same as green) to black (same as red).0 -
David Newton said: "One cannot rerun the compare to get the new list and use the new system."
Yes one can. That is perfectly doable. It may require some resources, but it can be done. Alternatively they could STOP THE IMPORT OF THOSE GEDCOMS! That would also require some work, but not nearly as much work.0 -
Robert Wren said: Jeff:
"It has always been about the FAR GREATER PROBLEM that is created when the tool actually FINDS what it thinks is a match, and then throws you into a merge type operation." . . . "So MY question is, has anything in that utility been changed to slow down the rate whereby dumping of GEDCOM data over EXISTING data in the Tree happens?"
Tom:
"Short answer to your question: Nope."
Assuming Tom is correct (as he often IS), a Question to Joe Martel (or Ron Tanner):
"Would Someone in Upper FS Management Please Explain WHY it is Necessary and/or Desirable to allow GEDCOM submission to the FSTree?" (and please don't say it is so people don't have to type.)
------------------------------------------------
Tom: "Unfortunately, nothing can stop stupidity from still making a mess of things."
BUT, stopping (stupid) GED imports into the FStree would greatly stop "a mess of things." from wholesale mass stupid 'corrections' (because MY Aunt Millie never made a mistake - regardless of what RESEARCH & SOURCES indicate!!!)
Anyone interested, please read (or RE-read) at least the promoted comments in
https://getsatisfaction.com/familysea...1 -
Jeff Wiseman said: How about a minimal mandatory Reason for Change statement FOR EACH DATA ITEM THAT IS BEING REPLACED? If it is less than a minimum amount of characters (say about 25) or contains the word "gedcom" in any form, the change would be refused.
As far as FS improving the process, I'll not hold my breath. For some reason that is totally beyond my comprehension, FS has already set (and is immovable) on an extremely high priority to leave those functions in place. In the entire time that I've been a member on this forum, there has NEVER been an explanation for why this is that justifies the extremes in frustration, anger, database corruption, and wasted time that it always causes as a side-effect of its current existence.0 -
Jeff Wiseman said: Robert,
As I just pointed out to Tom above, it's unfortunate but even though it is one of the biggest threads to exist in this forum (and that even with large chunks of it having been removed), your basic topic question will never be answered.
I've seen similar behavior to this in other industry many times. It's where management has already decided on something justified by some local form of "logic" that is accepted by themselves. They will not reveal that logic outside of those circles because once it is, they know that it won't hold up to the scrutiny based on a lot of other sound reasoning and (usually better) logic that it will be faced with.
I don't like to be judgmental, and I have no idea of what goes on in FS regarding this issue. But all of the other situations I've seen similar to this, it was always based on what I just described. I do not believe that you will ever see "FS Management...Explain WHY it is Necessary and/or Desirable to allow GEDCOM submission to the FSTree" because there is most probably NO good reason for it that is justifiable when you look at the overall picture.
Statistics on duplications is just a deflection. The documented original and main reason that the FSFT was created, was to reduce duplicate work being done in the temples. And yet here we have a top priority feature that INCREASES the duplicate work being done (among all of its other undesirable side effects).
There is something that appears quite dysfunctional there and I doubt that it will ever be revealed on a public forum like this.
As I've said before, I hope that someone would prove me wrong but my personal believe is that with this issue we are all beating a dead horse.1 -
Jeff Wiseman said: When I last tested this, Tom was correct. When you first upload the file, you are deceptively routed into the compare function. At that time, the only way to REDO it was to delete your GEDCOM file and upload it again, but that is quite easy.
All of the GEDCOM files that are uploaded can be useful for finding other things and I wouldn't want to lose that part, but the confusing and deceptive method of leading a person who has uploaded a file into dumping the contents of that file into (or over top of) the tree requires so much work to make it reasonable that IT needs to be shut down until fixed (if ever).0 -
Tom Huber said: My tests, conducted quite recently, indicate that the user has absolutely no means by which they can rerun the compare function. The statement holds as true, despite your claim otherwise.
What Jeff indicated, that yes, it can be run on the same file again, but the only way to do that is to delete the original and reload it, which makes it a new upload, or rename the file and upload the renamed file, which still makes it a new upload.0 -
Tom Huber said: The option to enter a reason statement is already part of the new system, but it is like all reason statements: not mandatory.0
This discussion has been closed.