Is the user "TreeBuilding Project" taking the tree forward or wasting time?
Answers
-
@MandyShaw1 I try to be as responsive as I can with all of the questions that I receive. The only thing that I have ever asked of people in my posts on Facebook or elsewhere is to just share an example of a duplicate we created or propose other ways that we can ensure that no one is missed on the Family Tree. Sometimes I don't respond when I don't know a thread about one of our projects. For example, I didn't know about this thread until yesterday. I'm going to try to respond to as many of the comments/questions in this thread that I can.
One thing to note if you go back and look at the Facebook comments, is that the people I interact with have rarely provided an example that we can discuss. Even in this thread, the examples were all about sources that were attached by individuals, not via automation. We are working to improve the quality of the work of the people helping. Here is a link to the principles that guide our efforts as a lab: https://bit.ly/rll-principles You can see we put a lot of focus on involving more people in helping and allowing them to get better at the work they help with.
0 -
@No one in particular You are correct that we are very focused on engagement and onboarding new users. We are a research lab on a college campus and so we are very focused on the rising generation. We always want to be as inclusive as possible in who participates in this work. Here is a link to our principles and metrics: https://bit.ly/rll-principles Most of our work is focused on parts of the Family Tree that have been overlooked in the past (African Americans, Puerto Rico, Immigrants from non-European countries). However, given the scale of what we do, sometimes the work we do touches on more trafficked parts of the tree. It is helpful to note if that someone has already attachd the 1910 census or the Numident record, it wouldn't would not even be in our sample to begin with. Especially with the 1910 census, when we add a duplicate to the tree, it is often a record that has eluded detection by experienced researchers for a long time (usually because of mispelled names and such). This is why I always ask for examples of duplicates we have created. Often the value of the info in the record that we helped be found is worth much more than the time it takes to do the merge. I'm not trying to add to people's work load, it is just as an economist, I always look at both the costs and benefits of what we do.
0 -
Even if some of your comments do not address the ongoing problems many of us long-standing Family Tree users are experiencing as a result of profiles / data being added (to FT) through your projects, I am very grateful for the time and effort you are applying, in your responses to us as individuals. Thank you for such engagement.
6 -
@MandyShaw1 Thanks for letting me know. I won't edit comments in the future. I always notice the typos after I hit post and go back and fix them.
0 -
I've finished the investigation I've been doing into the effects of the BYU NUMIDENT activities.
The write-up is here: Numident summary.pdf (I won't paste it here as I'm sure I'll want to correct things after the 4 hour cutoff).
The document includes some general thoughts, some statistics based on a range of FT profiles in my analysis database, and a list of associated FS-identified FT duplicates for @Joe Price 4 (my sample is too small and ill defined to be statistically significant, but out of 241 profiles created by BYU (automatically or otherwise), 21 are currently identified by FS as belonging to one of 14 duplicate sets).
If anyone feels there is anything that needs correcting or that could do with more detail, please reply to this.
3 -
@MandyShaw1 Thank you for providing some detailed analysis. I particularly appreciate your discovery that in several cases the NUMIDENT sources are attached to only the parent-child relationship, and not to the individuals. That seems to be an odd choice, given that FSFT record hinting is primarily presented for individuals, and the failure to attach these sources to individuals means that they will often appear on the person profile pages as unattached hints. I don't want to get into yet another lengthy discussion of the challenges with important information stored in parent-child or couple relationships being difficult to find, but wanted to thank you for contributing that important note.
A couple of suggestions for your document:
- You refer to "20th century US Censuses" but the projects include the 1880 and 1900 US Censuses, which were in the 19th Century. Maybe just drop the "20th Century" qualifier.
- The abbreviation BYULL is quite odd — I've never seen it used except in this thread. The BYU Record Linking Lab refers to itself as RLL or sometimes more completely as BYU RLL. It would be better to refer to the entity as it refers to itself.
2 -
Many thanks @Alan E. Brown - changes made as suggested.
I agree that the child/parent relationship point is important, especially given that two out of 14 of the duplicate candidate sets overlap with the 8 profiles showing the C/P behaviour (not necessarily cause and effect, obviously, could just be multiple but unconnected errors from the same user).
0 -
@Joe Price 4 Another example of an inaccurate treehelper contribution.
Common sense tells us that a child born 4 years after the death of the father is not a child of that couple.
A treehelper added Francis Conlin GRMH-VD7, born 1922, as a child of Frank Conlin GZ6N-LQ9, who died in 1918, and his wife Kathryn Kinsella GZ6N-BB9.
https://www.familysearch.org/tree/person/changelog/GRMH-VD7
The NUMIDENT used as the basis of that addition shows the mother's name as Katherine Conlin - another red flag since the Social Security application should contain the mother's maiden name.
1 -
@Joe Price 4 The NUMIDENT projects mass generate 3-packs of orphan profiles. These groups of orphan profiles aren't attached any family. NUMIDENT-project creations haven't exclusively been orphan profiles but what we're seeing indicates non-orphans will be the exception.
This is the same primary issue that made the USCensusProject an utter misery to coexist with on FS. USCP created an uncountable number (millions by every indication) of orphan profiles and then walked away from them. It's a massive slog and we're still stuck cleaning it up. Because BYULL won't.
Creating orphan profiles in bulk is also what NUMIDENT project members are doing. Here are some of the issues with doing that.
- Under scrutiny, it is difficult to see how orphan profiles bring worthwhile value to the FS tree.
The selling point of NUMIDENT is that it connects project members to family history, is it not? An orphan group is connected to nothing at all. They are abandoned and aren't part of the FamilySearch tree until sometime later - when a volunteer happens upon them and stops their own work to reconcile them. - This is a much more important reason. Orphan profiles eat time. They convert trivial tasks into onerous ones. Attaching 3 NUMIDENT records takes seconds.
All the possibilities that can happen after orphans are created - they will take much, much longer.
Every time I come across an orphan profile I get reminded of that. After the 50th it feels stressful. After hundreds, I've run out of adjectives to describe how it feels. Until a couple of months ago, the frequency was slowly declining. Now it's not.
3 - Under scrutiny, it is difficult to see how orphan profiles bring worthwhile value to the FS tree.
-
I will see what my analysis database can come up with re these 3-packs. It would be good to know whether you are seeing FS flag these duplications (I now have flagged duplicate information for each profile), or whether they are more deeply hidden, or a mixture of the two, please.
Currently looking at extending the analysis to cover USCensusProject, obviously I'll include the above in that.
1 -
@Joe Price 4 wrote:
you will have to clarify the thing about 12 years.
As @No one in particular wrote:
Orphan profiles eat time. They convert trivial tasks into onerous ones.
The "seed" data in the Tree — which was created twelve years ago — consisted, for very large part, of such orphan, disconnected tryptichs (parents and one child). Cleaning up such a group of duplicate profiles into a single set of parents with all of their children involves many, many merges, which are difficult and tedious and stressful. It doesn't help that the task seems endless: every time I work on a branch from a place and denomination that had indexes on FS before 2012, I encounter these index-based disconnected duplicates. Every. Time. Even after twelve years.
4 -
A while after I add members to a family, a duplicate notice pops up. It's alerting me about a duplicate in a NUMIDENT orphan 3-pack.
Sometimes it happens an attachment or two later. Sometimes it's closer to when the family is filled out. Today I had one pop up days after I created the individual. A few times, the orphan group didn't surface until I searched for additional records.
Regardless, I then stop my work and begin the prep surrounding the merge process. Over and over and over again this is happening - and did happen - and will keep happening.
This week, the orphan creator I see most often is the USCenusProject, sometimes with spacing between the words (seemingly different usernames).
The person who responds when I sent Chat requests to those accounts is @Joe Price 4.
2 -
@MandyShaw1 Combining your post and mine. Yours first:
out of 241 profiles created by BYU (automatically or otherwise), 21 are currently identified by FS as belonging to one of 14 duplicate sets).
and mine
Sometimes [the duplicate notification arrives] an attachment or two later. Sometimes it's closer to when the family is filled out. Today I had one pop up days after I created the individual. A few times, the orphan group didn't surface until I searched for additional records.
The inconsistency of duplicate notification might indicate that FS is sharply under-detecting duplicates.
I've no complaint with that, there are a ton of variables tied to detection and over-detection can bring real problems. Under-detection is prudent.
edit: Maybe there's a way to bolster those detection numbers.
more edit: The ways that duplicates come into being varies. It isn't as simple as Does this person already exist? It's also Is there a likelihood this person will be created organically in the future?
I suggest this is why orphan profiles are a clear metric to gauge the burden created by NUMIDENT programs. If an orphan profile exists, someone is going to have to spend extra time on it.
2 -
Perhaps a more meaningful metric would be, "what percentage of RLL-created profiles have ever been merged"? From what Joe has said here, it seems like they keep a list, so it should be easy enough to generate.
Obviously it won't include the previously-cited 2% of outstanding unmerged profiles already flagged as duplicates, but it would be another way to assess the effectiveness, or lack thereof, of these projects.
2 -
@Joe Price 4 Folks are trying to tell you just one thing. Bulk orphan profile creation needs to stop. Forever.
That's what to not do. What to do?
Only attach NUMIDENT records to profiles that are already attached to the tree.
If tree-attached profiles can not be found, STOP.
If another orphan profile group is found, STOP. Do not expand that mess by attaching there.
Alternatively, if you found another orphan group, locate the family that orphan group belongs to. Merge the orphans in. Start doing the work that BYULL left for us to do.
Does any of this seem unreasonable?
0 -
I certainly agree that bulk orphan profile creation needs to stop. BUT, I'm also seeing existing profiles damaged by the attachment of similar-name NUMIDENT records. We don't need that damage to the integrity of the FSFT.
4 -
Yeah. Now that you bring it up, I've run into that.
0 -
To all: How do you think the NUMIDENT program should conduct itself?
0 -
I opened my Following list first thing this morning. What was at the top of the list?
A treehelper contribution attaching the 1920 census to GD4H-TVY who died in 1907.
You know what I'll be doing this morning, since that census was also attached to his whole family, in error.
The Profile Quality Score (PQS) rightly calls out a conflict of "The death happened before the residence." Shouldn't the treehelpers check for error messages and conflicts?
2 -
@MandyShaw1 This report that you have created is very useful. I was able to create a short report about the two duplicates that were in your report that were created by the Tree Building Project: https://docs.google.com/document/d/1URcEw1vr8lkrBBJI8v3TXxJZOqzbnA_FzVRcOFlIfcQ/edit?usp=sharing This is the type of analysis that I try to do when people send me examples of duplicates.
All of the other duplicates that you flagged were created by tree_helper accounts which are not automated. That was really helpful to have you point that ou. I will work with FamilySearch to identify all flagged possible duplicates that were created by those accounts and we'll work to resolve those duplicates similar to what we are doing for our automated accounts account. If you want to email me the list of the PIDs you are working with, I can probably automate some of the statistics that you are gathering, which I think are great.
0 -
@No one in particular I can describe the value of adding seedlings to the Family Tree. Our primary goal with the Numident project is to ensure that when people search the Family Tree for their family, they will find one of their closest deceased ancestors. One of the great things about searching the tree is that they can do that without having an account. In the future, this means that it will also come up more often on google searches and allow people to learn about FamilySearch. I think this will be one of the best way that we draw more people to work on the Family Tree.
I agree that a connected tree is the best tree but even a single seedling (what you call an orphan) provides a place where memories and photos can be preserved. Also, seedlings grow over time as we attach additional record hints. There will be cases where the seedlings won't grow. Many people in the Numident were born in other countries where they don't have good records. We still want them on the Family Tree.
If you are coming across the seedlings that we have added to the tree from the Numident so often, then just send us the next few that you come across. It should be someone that you are related to. I would be willing to write up a report about these duplicates, similar to what I did for MandyShaw1 this morning. If you already sent one via FamilySearch messages, I'll get to it soon.
0 -
My thoughts are (mostly cut and pasted from my document linked to previously):
Despite the long-term negative view of these activities within the wider FS user community, apparent on many Community threads and indeed on BYU RLL’s Facebook page, neither BYU RLL nor FS management appears to have made any effort to establish communication channels (in particular, for the reporting and resolution of issues), to set expectations on either side, or even to publicise the aims or workings of any of these projects to the wider community (LDS and non-LDS). All this needs to change.
‘Treehelper’ users should respond in a timely and meaningful way to other FT users who wish to discuss changes with them, escalating as necessary where they do not have the skills or knowledge to deal with the matter.
All changes made either by BYU RLL automation or by ‘treehelper’ users need to have meaningful reason statements, including (but definitely not limited to) identification of the project involved.
Those using the Power Linker, Source Linker via BYU RLL hint emails, etc., need to be better informed of the context in which they are working, so that they understand the impact their change would have on the FT before they make it. Just because a match looks great in the Power Linker (or indeed the Source Linker) does not mean that there is no contra-indication somewhere else in the FT profile's data, or Research Helps that should be taken into account, or an alert that tells them not to interfere, etc. And it may or may not be the best match to existing FT data. (My feeling, incidentally, is that the Source Linker was probably not designed to be used stand-alone in the way BYU RLL is using it.)
2 -
I understand that everyone needs to learn how to research and especially how to manage working within a collaborative tree. We were all novices once; we've all made mistakes.
Much of what I see is the conflation of similar- or same-name families. I do a great deal of New York City research, and many of the issues I've encountered are in NYC and Philadelphia. Both cities had/have large Irish populations, with many people with the same most common Irish forenames and surnames.
It takes experience and time to review similarly named family groups. It's not a project for quick attach and run.
As I've detached and repaired these families, the same records are being suggested, again, by the hinting algorithm. And the profiles I've unmerged are suggested, within minutes, as possible duplicates.
When I repair the damage done to people I'm following, I always try to find the right family group and attach the relevant records. Today, as I worked my way through the Moran family of GD4H-TVY (mentioned in my earlier comment), I found the correct people with other incorrect records attached both by treehelpers and the census project.
It feels like a never-ending kudzu vine.
5 -
@Joe Price 4 wrote:
One of the great things about searching the tree is that they can do that without having an account.
While I appreciate the attempt at preventing Yet Another Login, the fact is, if you want to do genealogy, you're going to need an account on FamilySearch.
Joe also wrote:
even a single seedling (what you call an orphan) provides a place where memories and photos can be preserved.
But only if you're logged in.
I disagree about the value of disconnected profiles. Yes, it's good to teach people to search the Tree for existing profiles, but finding that floating tryptich that may or may not represent your grandfather will not be the sort of discovery experience that you want people to have: it'll preserve all of the errors and incomplete parts of the index that it's based on. For many (most?) newbies, that gives the impression of setting those not-necessarily-facts in stone, and if you're doing this specifically so they don't need to log in, you can't even demonstrate how that impression is wrong.
6 -
'If you want to email me the list of the PIDs you are working with, I can probably automate some of the statistics that you are gathering, which I think are great.'
There are over 50,000 PIDs in my analysis database, and they are pretty random, so I would suggest that you would be better off looking at the cohorts of PIDs that the TreeBuilding Project and treehelper* users have used your tools to access.
My 'Rhode Island' cohort however is well defined, it uses this search:
q.surname=ti%2A&c.birthLikePlace1=on&f.birthLikePlace0=10&c.birthLikePlace2=on&f.birthLikePlace1=10%2CRhode%20Island
2 -
G85Z-R8L is my 1C1R. I entered the relevant NUMIDENT to his profile on 16 Mar 2023. Can you explain why a treehelper added the same NUMIDENT to his profile, with the same URL, today? You indicated:
It is helpful to note if that someone has already attachd the 1910 census or the Numident record, it wouldn't would not even be in our sample to begin with.
6 -
@Joe Price 4 (and anyone else interested)
I have added the following to my Numident summary.pdf document as linked above:
6 more TreeBuilding Project duplicate candidate sets
4 more 'treehelper' duplicate candidate sets
Some UKCensusProject statistics
89 UKCensusProject duplicate candidate sets
The more I think about it the more I see a 'would it be a duplicate candidate?" pre-creation API as a good idea (as briefly discussed earlier in this thread, iirc).
2 -
This thread is relevant:
0 -
And now I see that the USCensusProject is ALSO attaching NUMIDENTs but attaching the mother's record to the father's profile. This changelog is for my 1C1R: https://www.familysearch.org/tree/person/changelog/K46T-J83
That is the same kind of inaccurate attachment that created the problem here :
0 -
And vice versa, of course - I see the father's (Joseph's) record has been attached to the mother (Kathryn).
I hope @Joe Price 4 is still with us here and will try to address this, as well as the other serious flaws connected with volunteers (or automated processes) adding data to Family Tree, via these projects.
1