We need someway to identify hijacked records and stabilize identities
When I train patrons to use Family Tree, I always emphasize to them the importance of verifying that a record has not already been hijacked.
By hijacked, I mean that the record which came into the system as one unique individual has now had all the data changed on the record to the point it now represents an entirely different unique individual.
Record L6VN-J7G provides one example of this. The record was originally for Edmund Lord, born 1819 and christened in July 1820. In June, a patron changed the name to be Thomas Howarth, and changed the birth date to be 1860. There were no merges performed on this record. The patron just "hijacked" the record for Edmund Lord and turned it into a record for Thomas Howarth. All the sources and relationships on the record still belong to Edmund Lord, the record details just no longer represented Edmund. I have since repaired this damage, so please look in the change history to observe the issue I am referencing.
This is not an isolated incident. I have a google spreadsheet in which I have recorded 150+ other examples, and I expect that most patrons who have used the tree extensively have encountered similar examples.
I would like to propose that something should be done to mitigate this type of damage. The original data that uniquely identifies the record should not be hidden at the very bottom of the change history. Accessing that information is very difficult once the change history becomes burdensomely long.
I would like to see one of the following options:
Idea 1) Place a button at the top of the change history that will open the original seed data that created the record. For example, identify the earliest contributor (often this is FamilySearch) and when the patron hits the button, pull up a list that contains only that first contributor data.
Idea 2) Place a button at the top of the change history that will take the user to the bottom of the change history page.
Idea 3) In the details screen, when a patron clicks on the "Name", include the original name from the record in that popup screen (along with the reason statements and tags and "see all changes" options). Similarly, do this for the birth, christening, and death, but the trick with those events is that if the original contributor left them blank, then the original data should be listed as blank. Every record must be added with a name, so the earliest name is always the original intention, but if the record was entered without a birth/christening/death/burial, then sometimes those fields can contain hijacked information. It may also be useful to encourage patrons to be able to enter a reason statement about why the current name/date/location is vastly different from the original name (if the current name/date/location is vastly different from the original).
Idea 4) include another box at the very bottom of the details screen that shows the facts that existed on the record originally before later contributors edited it. That portion of the record could not be edited, though a comment section at that area would be appropriate.
Comments
-
Restoring names and other singular conclusions is actually pretty easy. Yes, it may involve scrolling down interminably in the change log, but you'll get there eventually. Or you can use the filters.
What's really challenging when a profile is made into someone else is figuring out and fixing the relationships. When one of my ancestors was conflated with another man with the same name (but different religion, birthplace, residence, etc.), I found it easiest to just detach him from my ancestors and create a new profile for the correct individual. Sorting through the dozen children and the resulting three dozen identically-titled source citations was just too daunting a task.
1 -
"It may involve scrolling down interminably in the change log, but you'll get there eventually." This should be a fairly easy fix for the programmers. Why leave us to scroll interminably, when they can give us a button to move us down there and save us a lot of time.
I agree that "Restoring names and other singular conclusions is actually pretty easy", my concern is when people don't realize that they NEED to restore the names or other singular conclusions, and they wind up making a bigger mess (by conflating the relationships) based on the hijacked record. My thought is that a lot of that relationship conflation that you mention as being challenging could possibly be prevented if we had a field that tipped us off on the hijacking--which is what would be accomplished by ideas 3 and 4. Ideas 1 and 2 would simply make it easier to avoid the "scrolling down interminably."
0 -
@HESM Most people will probably feel like Julia - "I found it easiest to just detach him from my ancestors and create a new profile for the correct individual. Sorting through the dozen children and the resulting three dozen identically-titled source citations was just too daunting a task."
I am not sure whether the proper procedure will be to RESTORE the original - especially if conflated with more than 2 people - or the merge mixed up several families. The Source Linker/Merge process - which is where the conflation point originates - does present both the record and the Tree profile or both Tree profiles (whichever is the case) - so the problem point from my understanding is with the person doing that comparison (or lack thereof). Unfortunately the process may change more than just the relations of those two people - it may involve the families of both - plus any subsequent changes ... It does become almost too difficult to try to find the end of the ball of yarn at some point. Perhaps some with more expert Tree cleaning abilities can comment. I know @dontiknowyou does these sorts of things all the time.
As far as impeding changes, I think the person doing the merge/source link - should definitely be reminded when data is not a good match - unfortunately the current process does not seem to prevent such from occurring often (though I have no data to suggest how often). I tend to look on open-edit structure with a little disfavor - though to date I haven't suggested any great viable alternative. I am not sure about Idea 3) - extra display of data might not be the cleanest approach - but I do agree that Vital records should be a point of focus for establishing unique identity. Perhaps it could be combined into the Alert Note new feature (since that is the latest impedance feature it might as well bear some burden).
I would actually like to see something like - once there is some original documentation for a vital record or possibly all the profile vitals - mark it/them read-only (establish unique identity) - any subsequent change would be required to disprove that combined identity and family relationships prior to the change (merge or conflated record) could go through (basically impede or make more onerous the merge/attach process). So development of: 1) a process through which there could be some agreement/conclusion as to uniqueness and 2) a process through which one could submit a differing proof. The current process does not require you to disprove anything - it basically asks, "are you sure? OK, click OK."
This would not help with current conflated profiles though... so maybe develop a process that identifies conflated profiles and suggests the records/Tree profiles that existed prior - before the merge/attachment - so that it would be easier to identify/separate those from the current profile. So a Recent Changes/Change Log tool that helps separate and retain all unique profiles based on reversing the merge/attachment(s). Whether such a tool, other impedance measures or as Julia - just detach and create the correct profile newly - there then need to be some impeding processes to prevent such from reoccurring (especially if the other ball of yarn hasn't been dealt with). I have some other Ideas that I hope to work up one of these days and present.
Suggestion for making a case for Ideas: If one works up a comparison of how ythe Idea would appear within the current FamilySearch interface (minimally through screenshots) - pictures may convince more than a page of words. I like to envision how an Idea would appear within the interface - how it might affect workflow, etc.
1 -
I don't know if this is browser- or OS-dependent, but for Idea 2, what's wrong with just dragging the scroll bar all the way down?
What's harder to find is the intermediate changes, if you're for example cleaning up a mess from years ago on a frequently-edited person -- but as I said, there are filters.
But what really messes up the changelog (or makes it nigh-impossible to parse) is merging, especially merging family members. Say I have Father and Mother with Daughter1, and I find FatherDuple and MotherDuple with Daughter2. I merge Father and FatherDuple, with a detailed reason statement (ReasonOne), then I merge Mother with MotherDuple, with a slightly different detailed reason statement (ReasonTwo). What ends up in the surviving Father's changelog is: (1) Merge with ReasonOne, (2) Relationship Added with ReasonTwo, (3) Relationship Deleted with ReasonTwo, (4) another Relationship Deleted with ReasonTwo, and (5) yet a third Relationship Deleted with ReasonTwo. The names and dates all match, so the only way to figure out that (2) is the surviving Mother's relationship to Daughter2 (I think) is to compare the PIDs. (Number (3) is the merge-deleted spousal relationship, and numbers (4) and (5) are the merge-deleted parental relationships.)
And all that is for a single correct merge that I did myself and tried to document in the reason statement. If the hijacking involves adding someone else's spouse and children via multiple merges, done by a lazy/careless user who fills in the reason for all merges as "Doppeleintrag" (German for "duplicate entry"), I think I can be forgiven for throwing up my hands and abandoning the poor guy to his ignominous fate.
1 -
I have often wondered why changing the name of an individual doesn't lead to a request for a reason statement. Then I realised changing Vitals and other details doesn't prompt any warning, either. Why should it be more important to add a reason for adding details (say when adding a source) than changing or deleting them?
When I queried a "hijacker's" actions, she didn't seem to have any malicious intent, but could not explain why she felt the need to use an ID completely unconnected to her relative (different family / first name / last name), instead of creating a new one for them.
1 -
I'm not even talking about merges here. I'm talking about just flat out hijacking a record. Here's a real example. Go look at
"What's wrong with just dragging the scroll bar all the way down?" On a short change history, that is a valid action. However, I'm dealing with long change histories for which I drag the scroll bar all the way to the bottom, and then the system has to load the next set of changes, and I drag the scroll bar to the bottom of those, and then the system has to load the next set of changes... and I end up doing that over and over until I get to the bottom of the change list. It seems that everyone in this conversation is agreeing that getting to the bottom of a change history can be useful. How hard can it be to make a button at the top of the page to do that? Why wouldn't we want that capability?
0 -
I'm not talking about bad merges here. That is an entirely different can of worms. What I'm talking about is when someone goes to a record and just changes the name entirely. For example, look at MF89-7S4 for Ann Sladen. On June 9 2012, FamilySearch populated the original record with Ann Sleaden. On June 29 2022, another user changed the name to Lydia Armitage. This weekend I changed it back to Ann Sleaden/Sladen. Here's another example: 9H4Y-8LF Annie Parkinson. On June 20 2012 FamilySearch populated the record as Annie. On June 26 2022 another user changed the name to Irene Cooper (Butterworth). This weekend I changed it back to Annie/Annie Parkinson. I can provide you with 150+ more examples of this exact same behavior. I've been saving them all in a spreadsheet. This sort of patron behavior has enormous impact on the integrity of the tree. I'm looking for a solution. As I'm correcting this issue on these records, I'm realizing that there is a lot of time wasted by not at least attempting to prevent this sort of damage. Other patrons come to the record and attempt to work with it, and the problems snowball. I can't imagine that it can hurt to make the original intention of a record more easily accessible and visible. What are the drawbacks of making it more visible? I can think of a lot of pros to any of these ideas. I'm not seeing drawbacks. What would any of you recommend? Is there any reason why you wouldn't want to be able to see the earliest identity? It sounds to me like all of you have experience with needing to access that information. Why don't you want easier access?
0 -
I really like the idea of working it into the Alert/Note feature. I think that is a very appropriate spot for it. It fits with the intention of the Alert.
1 -
For a nice discussion if this whole issue see: https://www.youtube.com/watch?v=RZeqgY47zdA
I think it would be great if an automatic routine could search through the change log tree of a person gathering information on the originally created names and vitals of every person merged into that final survivor and every person merged into the people merged into the survivor and create a report of all this.
It can take a long time to search through a person's change log to find all the merges done then check all the change logs for all the deleted people to find all the merges they were survivors of and find all the beginning people at the ends of all these chains of change logs to find the original identities of all the dozens of people now hidden under the final survivor.
This is pretty straightforward if there have been no merges. Just go to the change log and filter by name to see every name that has been on the profile:
3 -
> Why don't you want easier access?
It's not that, really. I agree that easy access to who the profile was originally meant to be for is good and needful. But in all of the things going on on FS, I don't feel any great need for easier access than what we've already got. It wouldn't help with the things I have trouble with, since the whole point of almost all of them is that the name is the same (but absolutely nothing else is).
In fact, even your cited examples of totally-different names go back to same-name conflation: Ann Sladen became Lydia Armitage when a user attached an indexed census record for a man with the same surname (but not given name!) as Ann's husband. Similarly, Annie Parkinson became Irene Cooper when a user attached a National Register index entry for a household with someone matching Annie's daughter's name. Unfortunately, the Change Log doesn't tell us the source or motivation for such attachments: was it the hinting system's fault, or did the careless/inattentive user come up with them all on his own? (I rather suspect it's the former: a nice endless loop of bad hint attachments generating more bad hints generating more bad attachments.)
2 -
I do think the Record Hints have become better - but it is still a problem if people think all hints are correct and just attach all hints. And I do agree ... in the attachment process there need to be BIG alerts pointing out mismatching data.
0 -
@Gordon Collett you have described the excruciating process that I use to correct bad merges. This "give us access to the original identity" could be helpful with that too. Once they've generated the "original identity" data, that data should never change for a record. Once a record is merged, they could also keep that "original identity" data in the change log with the merge. We could load the change history, find a merge, hit the drop down on the merge and see the "original identity" data there. @Julia Szent-Györgyi would you think that would be helpful? I just keep thinking that generating this "original data" for these records cannot be that difficult to program. The data is already available in the change log, and humans know how to find it. Can it be that hard to teach a computer to find it? It seems like it should be a quick fix to a widespread problem. I agree it isn't a solution to all the data corruption problems that come about through bad merges and bad source links and bad data changes, but it helps a tiny bit with prevention and a lot with reparation.
0 -
Once they've generated the "original identity" data, that data should never change for a record.
I think the premise that open-edit should help 'correct identities' 'rise to the top' is great in theory - but I think we've all seen how the obscuring/conflating of identity can result - the opposite of what open-edit structure intended. To be clear -hopefully - adding PID in the correct Tree location with correct relationships can be difficult - I think I may have added one incorrectly the other day (but I removed them and gave reason). Such difficulty is compounded the further back generations and sparseness of records occur. It is my belief with more recent generations/availability of records - that this problem is more 'visible'/problematic - especially for living descendants. Part of the problem deals with 'sufficiently complete/proven' identity. Obviously creation/attachment of Sources are one method to do so - that Name is sourced combined with sourced birth, etc. Should mean this identity is unique and excludes merging with a different identity. So a Record Hint wanting me to select 'Not a match's is only helpful IF it establishes a separate identity - something I'm not always proactive about persuing (just focusing on my relations not others) ...
I tend to agree with keeping the original/intended identity -excepting possibly ones that may have been entered in the wrong relationships originally (some original seeds might have been bad).
If we agree that once identity is established that should be 'locked' - why opposition to making it read-only? As mentioned in other threads - the person may just create an open-edit duplicate and not request permission to edit. At least that would keep others from changing the one 'that is established'. I've got another Idea kicking around that I want to present separately for 1) establishing identity 2) hopefully impeding changes - BUT which retains open-edit structure! I'm hoping my roundtuit is found sooner rather than later ...
For this Idea and as Gordon contributed - if there were such a ' tree cleaning' routine it might be useful to present all of the 'merge identities' on a separate 'Identities' tab? Would that help with ease of access concerns? So minimally I would think all the person's and relationship states would be presented - original/intended, after merge #1, after merge #2 ... Basically same as change log - but combining all those line items into person/relationship results state rather than all those line items? I don't know... But another tab might be useful...? Or if not another tab - accessible like Recent Changes/Change Log - essentially taking you to another 'tab'.
1 -
I would like a magic splitting tool but that's asking a lot. The detangling process doesn't have a lot of routine to it, so there isn't much to put into a tool.
What I do is mostly ignore the Change Logs and focus on the historical records. Once I get historical records sorted, I closely examine all the attached profiles in the tree for spurious names and other relationships and event dates and places left over from prior conflation. Usually in the process I find many more historical records. Very often I also find and merge duplicate profiles. I believe other contributors recognize there is a conflation problem but don't know how to fix it so just start over.
2 -
Once I get historical records sorted
@dontiknowyou Are you changing titles to sort them - or how are you sorting to differentiate?
0 -
Are you changing titles to sort them
Absolutely not!
The steps of my splitting process, more or less:
- On the profile being split (profile A) look at the bottom of the change log to determine the intended person. If the intended person isn't obvious, just decide profile A is whoever actually is married to a certain spouse or the child of a certain parent or parent of a certain child in the tree. This is always determinable: look for relationships in the change log. If the profile is conflated and has never been attached to any other profile, then whatever the decision, it's good.
- In the profile A source list expand all the attached source records, read the indexed information, and read the source image. If a source doesn't pertain to the intended person simply detach it from the profile.
- On the remaining sources check for similar historical records not attached anywhere, review them, and attach them if they pertain.
- Go to (or create) profile B, and attach those detached historical records as sources. If the Hints system doesn't offer up the sources, find them at the top of the profile A change log.
- Go through all the sources, one by one, examining them in the Source Linker and making additional corrections there.
- Go through the profiles and remove debris: wrong names, places, dates, etc.
- Work any remaining hints. If there are bad hints, that's a big clue that the splitting operation is not yet finished. A related profile in the tree may also be conflated and need splitting or other work. Only rarely do I dismiss hints; instead, I work long and hard to find and correct the errors in the tree that are causing the bad hints.
1