Untangle Tool: Change Log Map & Search
As I work on tangles I find digging down some change logs in order to understand prior merges to be very tedious and complex. This is especially a problem when trying to retrieve work that was done before the tangle such as work done before they were added to the Tree by FamilySearch or by users when first added to the Tree.
Two suggestions:
- Provide a map similar to the landscape family tree chart which displays all the people that have been merged into an ID (including branches of the merge tree):
- basic information displayed about each person in a box
- additional detail available with popups or drop-downs
- lines showing the merges
- capability to open a deleted person for even more detail
- Provide a search that can search thru a change log including merged branches for:
- names
- dates
- places
- etc.
Comments
-
I applaud you for your work. Deconflating profiles is by far the most difficult work on Family Tree.
That said, I won't be upvoting this idea.
Remember that Family Tree is based on historical records. I rarely find it necessary to dig through change logs. I completely agree with you that understanding prior merges is tedious and complex. Prior merges are largely irrelevant. If I do go into the change logs, it is only at the very end of my detangling process. Only rarely do I reverse a merge. I avoid reversing because it leaves behind the equally or more tangled attached sources.
So you know where I am coming from: My stats are nearly 200k, including over 20k new person profiles. I estimate I've merged far more profiles than I've created, and detangled almost as many. Most of the 3500 profiles in my Following list are there because they appear to be conflated. I follow them then ignore them while I work on their parents, spouses, children. Often working the close relatives has the side effect of repairing most or all of the conflation. Often the relatives are conflated too, so I move outside the whole rats nest and start from there, always working from the outside in. Although I love searching and analyzing, for the most effective work I make building profiles a priority. Family Tree then does most of my search and analysis for me.
0 -
Sounds like some good ideas to help with restoring and unmerging.
Going through the Change Log and identifying every single merge that has ever taken place in a profile is extremely important and really should be done whenever needing to untangle a mixed up person. That is the only way to fix them properly. Of course, this won't always be possible because some profiles were combined prior to 2012 in New Family Search and we cannot reverse any of those at this point.
Prior incorrect merges are not irrelevant because that is where we will find and can then fix not just the person who is the end result of incorrect merges but also all the other people contained in that end result. Ignoring those people and just removing incorrect information means those other people are lost from the tree. Just creating new profiles for those people from scratch risks losing the hard work of the other users who originally entered them and never getting them properly reconnected to their extended family that already exists in the tree.
Using the Change Log to pull out incorrect merges is also the one way we have to correct the Ordinances page which for some of us is an important part of this untangling process.
2 -
Just to clarify my comment: I do not detangle by creating new profiles; that just makes duplicates, compounding the problem. I also do not ignore any attached profiles; they too often are tangled. Nor do I merely delete what seems to be incorrect information. What I do is follow historical records.
1 -
If I could just ask for a bit more clarification then. A recent project I undertook was to untangle the Family Tree record for Erik Hansen Musgjerd which had hidden in it:
- Kristoffer Hansen abt 1733 Musgjerd-Trondhjem, Tronhjem, Norway
- Erik Hansen Musgjerd abt 1731 of Sundalen, Nordmore, Norway
- Knud Nielsen Halse b. abat. 1735 Grytten, Romsdal, Norway
- Hans Hansen 1733 Asker, Akerhus, Norway
- Knud Hansen Fiellestad 1737 of Ovrebo, Kristiansand, Norway
- Knut Hansen 1733 Asker, Buskerud, Norway
- Kristoffer Hansen Olssen 21 Jan 1737 of Asker, Akerhus, Norway
- Knud Hansen ABT 1733 Of Denmark
- Hansen b. abt 1735 Tryggelev, Den.
By starting with digging through the Change Log, sorting out incorrect merges, and restoring the separate individuals, I was then able to shift historical record sources and data to where they actually belonged. (I did need to create a couple of new profiles since some of the incorrect merging was in New Family Search and have Support fix the ordinance pages for those new profiles.) This project also involved sorting out the families for these nine men and getting parents, spouses, and children all connected properly.
How would you have done this without using the Change Log to find all the people hiding here?
3 -
I have not had 9 people merged into one tangle yet. But I recently had a tangle of 3 into one, several other family members tangled, persons who were not part of the family incorrectly added and people who were not added that should have been. Some tangles can be undone relatively easily but some are real messes and a system generated map of the change log would be very helpful. Gordon brings up a point I had not mentioned related to people incorrectly merged out on a branch of the change log that could be lost if focused on correcting later errors.
My original thought about a map mostly had to do with mapping merges. But thinking about Gordon's comment -- having the capacity to see relationships added/deleted could be very helpful. I don't know if we would want to see all of the relationship adds/deletes as a default. But, having a count of the adds and deletes with the option to view them when the ID is selected may be very useful.
1 -
How would you have done this without using the Change Log to find all the people hiding here?
Assuming this question is directed to me, . . .
As I do not have an LDS account, I necessarily work blind with respect to ordinances. If I mess up any ordinances, someone else will need to fix them, consulting the change logs in the process. That said, it must be easier to do that as a final step, not earlier. There is a famous dictum in computer science that applies equally to Family Tree: Premature optimization is the root of all evil (or at least most of it). This certainly is the root of much edit warring. Exactly like profiles, ordinances are supposed to follow historical records.
I am mindful that ordinances may be involved, so I do try to work conservatively. I look to see which currently attached historical records were attached first, and retain those attachments. (But I am aware that some contributors systematically detach and re-attach sources so that it appears the contributor did all the work; for this reason I do check change logs to see who really did what.) Profiles follow records.
Often, I also do a very quick check of the change log for merges and origin of the profile. Where an "edit warrior" or vandal has made changes I pay close attention to the change log. However, I know a change log can be redacted and is likely to have been redacted if there has been vandalism. So, change logs can be an aid but are not to be relied on.
When stumped I do examine change logs.
That said, I want to stress that I am not guided by change logs. I follow historical records. Years ago I did try to adhere to the common advice to detangle Family Tree messes by analyzing the change logs. But on Family Tree I found analyzing the change logs was inefficient, ineffective, and led me to make mistakes I otherwise could have avoided entirely. Also, most families I work have no LDS descendants, so ignoring a deleted PID is of no consequence to anyone.
Change logs are not full of people; they are full of PIDs. The difference is subtle but very significant.
Have I found change logs stuffed with 10 or more PIDs, all relating to other persons? Yes, absolutely. I see this most often, in fact shockingly often, in England where novice Family Tree contributors have willy-nilly merged together every couple with the same names regardless of dates or places. Once that starts, bad hints cause it to snowball. Apparently farmer Tom Smith and Jane Jones were a frenetic couple, producing 20+ children in 10+ English shires and a few in Wales too. Um, no.
Some snowballs have even packed me in a la "The Biggest Snowball Ever!". Fortunately escape is possible.
0 -
We do have a basic difference in viewpoint which is fine. Family Tree is designed to handle about anything.
I view these tangled up individual as having Change Logs full of people who just happen to have an ID number assigned to them. Potentially well researched people whose families have spent hours in the past finding information on them even if that documentation is not currently included in Family Tree. To have these people ripped from their actual families and interred in another person's Change Log is a minor tragedy that I will continue to try to reverse as I am able.
0 -
I did appreciate that you see the PIDs that way, and I empathize. You are not alone. Many contributors do get attached to the PIDs. I don't, but I do feel the tug; I know the potential for attachment is there, and I resist because I know such attachment would serve no one. If it were a case of someone published a book somehow using PIDs as keys to disambiguate family members, then I might care. A published book is essentially what LDS ordinances are, but to me ordinances are invisible.
Even if I were attached to PIDs I would recognize that building one tree out of many necessarily involves the creation and subsequent merging and loss of many, many PIDs as we all work to get profiles correct. In the process very many cathected PIDS necessarily will get merged into others.
That is, if the work is done in Family Tree. If one is working elsewhere, importing profiles to Family Tree only when perfectly formed, then one might create no surplus PIDs. Maybe. But if one's goal were to create only a single PID per person, that would be troubling. As James Tanner often says about Family Tree, perfection is the enemy of progress. It has a modern variant: Premature optimization is the root of all evil (or at least most of it) in programming.
All those merged PIDs, if they ever were attached to a family, still exist in the change logs of that family's current profiles, regardless of PID. Many PIDs were created in FamilySearch data processing; no contributor made them.
Try not to get hung up about PIDs. Nothing good can come of that.
0 -
One reason to disentangle existing profiles instead of just creating new ones is that at this point, seasoned users expect legacy profiles to exist corresponding to many index entries. When they're not there, we get confused.
1 -
No one in this discussion is advocating just blindly creating new profiles. That said, it seems to happen a lot with, ahem, "GEDCOM upload" users.
When I detangle a conflated profile, more often than not I find sources still attached to other PIDs. By working the process from the sources I preserve and reconstruct those PIDs. This accomplishes a conservative detangle without spending a huge amount of time and brainpower analyzing the change log.
0 -
I'm not actually interested in PIDs. I know how they change and know how little they mean.
I am interested in existing people. If I come to a family that looks well established in Family Tree where historical records show they have three sons by the name of Hans but Family Tree only shows one son by that hame, the first thing I will do, not the last, is look in Hans' Change Log, not for PIDs, but for his two brothers. By finding his brothers and restoring them, I can be sure that not a bit of the brothers' information remains in or under Hans's information that could cause any difficulty for the Family Tree routines.
A tool to show all the branching structure of past merges for Hans, as originally requested here, would be helpful in this hunt.
1 -
Gordon
It's 'Brett'.
Just in passing ...
In the "Family Tree" Part, of 'FamilySearch' ...
The difference is, that like you and I, MANY of us Users/Patrons, are working, on Our OWN "Ancestral" Lines.
[ Or, especially, if one, is a Member of the Church, directly acting; as, a "Helper", to help/assist others, do the same. ]
So, we ARE being CAREFUL (and, spending, whatever 'Time', is necessary); as, these are OUR "Ancestral" Lines.
Where, the "ChangeLog" is a very, important; and, necessary, TOOL, for such work.
We are NOT, working on, 'One-Name'/'Surname' Studies/Projects; or, the like; where, it really matters not.
Plus ...
Personally, I prefer to maintain an OLDER / ORIGINAL 'FamilySearch Person Identifier' (PID), rather than a recently "Created" one, regardless of how "Detailed" the Older/Original PID is; as, sometimes, using the more recent PID, can cause problems/issues; especially, regarding "Temple" Work.
And, as you reference, for Users/Patrons, who are Members of the Church, it is the ACTUAL "Temple" Work, that is IMPORTANT, rather than a specific 'FamilySearch Person Identifier' (PID).
Of course, for those Users/Patrons, who are not members of the Church, the "Temple" Work side of things, is NOT, an a problem/issue; and, 'unseen'.
Plus, the true purpose of the "Family Tree" Part, of 'FamilySearch', of why the Church created such, to follow the Tenets of the Church, is NOT a consideration; and, simply does not, concern, Users/Patrons, who are Members of the Church.
Just my thoughts.
Brett
0 -
I don't leave any information behind, but on the other hand I prefer not to propagate WAGs. I find Family Tree loaded with WAGs and more than anything else they contribute to tangles.
@Gordon Collett, could you give a concrete example of the kind of conflation you are thinking of, and the diagram you would want Family Search to generate? Find an example family, don't fix it, but instead show us the tool that would help you sort it all out.
0 -
WAG?
I'll try to find an example but that may take a while. I keep a close eye on the people I follow for my relatives and my wife's relatives so they are in good shape. Problems there I untangled long ago. The ones I have fixed recently have all come up from questions on these boards, such as the one I mentioned above, that I helped out with because of the work the profiles needed.
0 -
WAG = guessing.
I am sure others will come up. Perhaps apply to the next one that comes up here.
0 -
What I would find most useful, and I think is along the lines of what @GlennBarlowHammer originally proposed here, is a diagram that starts with the final surviving person and shows who got merged in. Because each group of merges produces a tree-like structure, it can be difficult to be sure one has found and analyzed all the branches. Something like this in which each entry is a link to the individual's page:
Here you can quickly see that the current John Smith had three other John Smith, probably correctly but maybe not, merged into him. Two of those three John Smiths had other John Smiths merged into them. And so on. I usually create a spreadsheet by hand to show the same structure but, again, it's easy to miss a branch.
0 -
So, do you ignore what names were on the profiles when they were merged?
What about unmerges and splits? Do you ignore them?
0 -
What @Gordon Collett suggests is a great base display of what I had in mind.
For me the next level of understanding is key biographical information. Which could be an optional view, perhaps by "node", "generation" or "branch" of the base tree. The additional information I would like to easily/quickly see without having to go to the individual's detail:
- Full birth date and location
- Full death date and location
- Father and Mother names
- Spouse name(s)
- Ordinance "bar" (for church members)
Other information that would be handy at some point or by option:
- Birth and Death detail information for father, mother, spouse
- Number of sources
- Date of merge
- Stats of activity for each branch of the tree (provided between nods and before the first node of each branch). These could be calculated and provided from the change log between merges and up to the first merge. The stats could include numbers of unmerges, sources added & deleted, relationships added & deleted, etc. (These stats would be easy for the system to generate and would give a quick view of how big the mess is.)
Note: As a career computer type I operate on the philosophy of using computer power do what it does best (e.g. create lists and stats) and give people the time and tools to do what they do best (e.g. look at patterns and make judgements).
1 -
I don't ignore anything.
Let me explain my process a bit farther.
After compiling the merge tree, I would scan it for a good place to start. In my theoretical example, that would be with Henry Jones since something is clearly amiss here. I start with a few assumptions: 1) Henry Jones is a real person and either 2) his great-granddaughter spent a lot of time and effort researching him in 1940, submitted his information to FamilySearch, and expected it to be maintained there, or 3) his record is the outcome of an indexing/extraction project and his information came directly from an original historical source document.
I would use his ID number to open his page. If it has been restored, wonderful! I don't need to do anything. If it still shows as him being merged into a John Smith, then I open his Change Log and scroll to the very first entry where I will find how he was originally created in Family Tree. Then I track forward to see what edits might have been made to his record prior to him being merged away. Once I thoroughly understand his record and its history and who he was and confirm that he was never the surviving John Smith and that the merge was completely spurious, I would restore his record.
Next I would use his Change Log to confirm that any sources on him restored properly and that family relationships that were lost also were restored. If not, then I do this manually. I remove any information that shows on him that the merge created.
When I am done, his great-great-grandson will be able to find his record again, just as his mother assumed he would be able to.
Finally I go to the primary John Smith and comb through his record making sure that I remove all information, sources and anything else that came from Henry Jones.
I would then repeat this process one name at a time for everyone in the merge tree, leaving behind good reason statements. Sometimes I will place a note in more complex Change Logs regarding what I have done. I do this by creating a Custom Fact with a reason statement such as "I have gone through this individual's record and fully confirmed correct merges and reversed all incorrect merges. Merges prior to this point in the Change Log do not need to be evaluated again." I then delete the Custom Fact which puts it as a nice marker in the Change Log.
0 -
I am a very visual thinker. So, thinking about what visual aids might help me, I worked up my own version of merge tree from the change logs of some profiles I have worked on for a few years now. Here is my result.
Summary: Recently, there was a brief merge of GWQG-WXG and LWYN-SFP followed by a restore that left children behind. Both profiles previously had other profiles merged in. During post-restore cleanup this week another duplicate was merged in.
I don't find these merge diagrams very useful, but perhaps others would. I would like to hear from more contributors about their detangling process.
I don't find these merge diagrams very useful because when I detangle profiles I work from sources, from the profile Sources page, first and foremost. There are many reasons why I work this way, but a big one is that I would be reworking the sources anyway, so I "kill two birds with one stone". Usually, by the time the sources are done most of the detangling is done and I can see clearly how to split or merge the profiles. Working with sources means spending a lot of time in the Source Linker. There, I find and fix cross-linked sources. I find and merge duplicates before the Hints system suggests them. I find additional children and spouses. I see which children belong to a couple and which don't.
When I know a profile is conflated the first thing I do is decide what does not belong. (Often I can see what does not belong but am not sure yet about what does belong.) I may look at the bottom of the change log to see the original name, but often the first contributor's intention is ambiguous: "Maria" and the profile conflates 2+ Marias, or "Mrs His Name", or "?" So, more often I look at the order in which the sources were attached. That usually tells me all I need to know, and most profiles have much shorter source lists than change logs.
The next thing I do is detach sources belonging to the other person. Then I remove details (event dates and places, etc.) belonging to the other person.
Basically, Gordon Collett and I work in opposite directions to achieve the same result.
My direction of work involves very little analysis. Rarely do I need to diagram. Detaching sources has no impact on ordinances. Deleting Residence dates and places also has no impact on ordinances. Yet, this weeding clears the flower bed so I can see the flowers.
0 -
I've heard it said that weeds are just flowers growing in the wrong place.
When you weed out a residence date and place by deleting it, what do you do with that information? Just discard it? In a conflated record, it belongs to someone.
When I analyze the Change Log first, I am able to use that residence information to confirm who is who in list of underlying individuals and transplant the information to the correct record by restoring the record it came from.
0 -
When you weed out a residence date and place by deleting it, what do you do with that information? Just discard it? In a conflated record, it belongs to someone.
Usually I just delete it from the profile. I do not worry about finding a home for every iota.
- In a conflated record, unsourced detail often does not belong to anyone.
- Events not coming from historical records usually are guesses, and most guesses are wrong. This is misinformation and does not belong anywhere on Family Tree.
- Events are required come from historical records. If it is not in an attached historical record or at least a note I have no way of knowing which person it belongs to.
- Very often, when I delete the unsourced misinformation I am rewarded immediately with a burst of new duplicates and source hints.
- If I find credible life history details in a reason statement I diligently work them into the profile or put them in a note, and reference the Change Log.
- Anything I delete remains in the Change Log for future reference.
- When I work the other profile (as I usually do), I add to its details page any events in the attached historical records.
In short, I prune hard and don't about the twigs I cut off. I keep it simple.
0 -
Perhaps this idea could be broken into smaller, more atomic pieces. I suggested two:
Merge tool display Change Logs
Merge tool option to close and open sections (the current actual title is an error; I have asked mods to fix it)
0 -
The suggestions by @dontiknowyou are viable ideas which should be considered.
The change log map as described by @Gordon Collett and myself would be ideal for the way I deal with tangles. I first survey the landscape to get an idea of the scope of the problem and hopefully the key issue(s), then I look for a specific problem to solve that will simplify the tangle. Depending of the situation the specific problem may be a peripheral issue that clears away part of the mess or a basic issue that may split the tangle into parts. Solve and document that bit. Then pick another bit to solve. I keep going until I don't see any more problems I can solve.
The change log map as described would be used as a single tool with options to easily survey the landscape, quickly focus on specific issues, go to the related historical records and initial data loaded into FamilySearch, formulate a action plan based on the facts as now known and go to work based on the specific situation. One does not need/want/use all of the functions at once or every time.
0