Is the user "TreeBuilding Project" taking the tree forward or wasting time?
Answers
-
Two questions which I feel need answers...
What does Prof Joe mean by 'doing projects with FS'? Is this about providing tools and manpower for existing well-defined FS-governed projects, or is it just about being given access to the FS APIs? Is FS management happy enough with BYU's activities to let it set the agenda?
And does BYU see the ordinary family history/genealogy researcher, whether professional, experienced, or starting out, as someone for whose experience it has any responsibility?
Despite considerable Googling and poking around on LinkedIn I have not identified anyone in FS who has overall responsibility for FS data as an information asset, which surprises me. There is clearly a Data Protection function which might be worth trying.
2 -
@MandyShaw1 Regarding your points:
What does Prof Joe mean by 'doing projects with FS'? Is this about providing tools and manpower for existing well-defined FS-governed projects, or is it just about being given access to the FS APIs?
My read on Prof Joe is that he seems driven by a 3-piece vision, where the pieces are
- An ever-increasing user base
- Tech
- 1+2 firehosing data into FamilySearch
To achieve this he's laser focused on getting tools-into-hands. All the critical bits
(like sufficient user training)
get stapled onto that - instead of being early design goals.The result is that some Best Practices exist and can be pointed to (See? There!) but they don't often influence the output in a meaningful way.
2 -
@MandyShaw1 cont.
And does BYU see the ordinary family history/genealogy researcher, whether professional, experienced, or starting out, as someone for whose experience it has any responsibility?
@KAClark2 is strong evidence that experienced researchers are connected to BYULL. But reading thru the Facebook feed we find the BYULL's excitement is about engagement and onboarding new users, not experience, not training and certainly not FS/tree integrity.
1 -
@No one in particular I meant the many ordinary researchers with no connection with BYU whose work ends up being adversely affected by BYU's activities, like the posters here in Community - does BYU accept any responsibility for their experience?
3 -
This angle definitely needs to be included in the summary you are preparing.
My deeper concern is FS' attitude to this damage by a third party to their chief information asset (not to mention the resulting damage to the perception and morale of many of the long-term users who do care about the quality of the data).
Incidentally I've nearly finished collecting up a specific data set (FT profiles born in Rhode Island with surnames starting TI, about 5,000, plus numident record personas matching the same criteria, just under 800) on which to analyse the RLL's impact in detail.
2 -
I'm sorry to not post sooner. I wasn't aware of this thread until "No one in particular" mentioned it to me in a direct message via FamilySearch. I can't post my email here but it is easy to google. I've never had my email disabled and I am really responsive via email. When people contact me via email or Facebook, I usually ask them provide specific recent examples that they came across while doing their own family history since that allows us to have a more concrete discussion. I'm grateful for the examples shared in the posts here and we'll look at each of those and discuss them in our meetings with FamilySearch.
The Numident is a really unique and amazing record collection. It includes data for about 150 million people (if you include the parents). We are focused on the 33.8 million people that were born before 1940 and have both parents listed in the record (about 101 million people if you include the parents). We use FamilySearch's match files to check which of these people are already attached to the Family Tree and which have possible matches to the tree. For any families, where any of the family members have a hint or families where one or two people are attached but not all three, we set these up as opportunities for volunteers to help with. We share these opportunities through the Map App, the Button, Power Linker, Goldie May, or one of our google sheets. All of these can be accessed at: https://record-linking-lab.byu.edu/volunteer All of these are done by volunteers by hand and this doesn't involve any automation. We split the tasks by difficulty and many of these tools allow volunteers to work on hints for people with a specific surname or who were born in a specific place. We also work with BYU Pathway students who use the username "treehelper" followed by a letter and number so we can audit their work and provide feedback and instruction.
For the families where none of the tree people have a possible match to the Family Tree, we add the family to the Family Tree using an automated tool. We run code on the profiles of these newly added seedlings to check for possible duplicates using FamilySearch's API. If FamilySearch identifies a duplicate for any of the three people, we remove the family from the tree. We can only do this if no one edits the profile. In cases, where we can't remove the family, we have volunteers help us merge those duplicates. We also have volunteers that attach census records to the families that we have added. The parents in the Numident only have their names listed (with the maiden name for the mother). As we attach census records to the family, we often obtain birth info for the parents and this can help trigger the detection of a duplicate on FamilySearch, which have volunteers help us merge.
I hope that describing our process is helpful. Our motivation for this project is twofold. First, we want people who use FamilySearch for the first time to find their parent or grandparent on the Family Tree. People usually know when their ancestor passed away and so having the death info from the Numident is really helpful for facilitating this discovery experience. Second, we want to make sure that no one is missed on the Family Tree. The public version of the Numident currently has coverage for most deaths in the US between 1988-2005. By making sure that everyone in this record collection is connected to a profile on the Family Tree is one way to make sure that no one in this particular group is missed.
0 -
Again, we get very different explanations of the same processes. From one, we hear "everything is automated." From another, we read "nothing is automated."
Is the work of treehelper members audited before it goes live? If so, the auditing team may need some training. I've had to do extensive clean-up and correction of treehelper contributions.
3 -
I can't post my email here but it is easy to google. I've never had my email disabled and I am really responsive via email.
I can attest JP was responsive. At least until his email address stopped accepting my mail, mid-conversation. That happened at the same moment the USCP shut down so I took it to be a strategic retreat.
I considered that my address had been blocked but the test would be mailing from a diff account - which might be construed as harassment under those circumstances. I let it lie.
Also, the address JP referred to is different than the one I was conversing with. It had looked like he switched accounts but perhaps not.
0 -
@Joe Price 4: With kindness and to streamline future Q &A, I want to ask for something important.
Before your respond with "send me a link to an example";
- Please fully read and understand the question you were asked and
- Please answer that Q in a way that meaningfully addresses the context.
I'm mentioning this here because in BYULL comms, there is history of misanswering questions when concerns are raised. It's kind of awful to be on the receiving end of it. More to the point, it comes off as deflection and that can raise frustration levels
Your proactive understanding here would be enormously helpful.
1 -
@No one in particular I am fine responding to each of the comments and questions in this feed. The system here doesn't have a great way to do that for each comment individually so I would be willing to paste all of the comments in a google doc and provide a response to each one. Or feel free to propose an alternative for how you would like me to respond to each of the comments and questions.
I still would like to ask two things. First, provide the next pid that you come across while doing your own family history that is a person from the Numident who is a duplicate created by my lab. Second, let me know if you would be willing to explore ways to add families from the Numident that have no possible matches to the tree. Just let me know how many you want to try out and I can send you a random sample among the families that are left with no matches. I am open to trying any approaches that help ensure that everyone in the Numident (born before 1940, with two parents) is included on the Family Tree.
0 -
What you are continuing to ignore is the fact that names being added to Family Tree, through this and other projects, appears to be universally unpopular amongst Family Tree users. You have obviously convinced FamilySearch management of the benefits of adding the individuals from Numident and census projects to the tree, but - judging by the comments regularly added here - everyday FT users have a general dislike for the way your work is being carried out and monitored.
Not only are duplicates still slipping through, but (against accepted practice) female spouses are nearly always added with their married names. Also, placenames are inputted as recorded, which might be acceptable practice in indexing, but results in a huge amount of standardization being required on incorrectly spelled / formatted place names, after they are added to profile pages.
Personally, I have not found duplication of names the main issue - just the poor (yes, unacceptable) way the data is being added to newly created profiles on the tree.
I have nothing against the basic projects, which - presented in another form - offer very useful detail. It is the adding of indexed records (which, by accepted practice, should be recorded as written in the original document) in such a form to Family Tree: where entering maiden names, standardization of placenames, etc., are the accepted practice.
In short, the two main parts of your projects (indexing and then adding that data to Family Tree) are just not compatible.
2 -
@Joe Price 4 If you click on Quote below each comment to which you wish to respond, you will have the option you desire, to respond to each poster directly and individually.
Should you not wish to respond in this public forum, the option exists to send a private message to any poster in this thread by clicking on the specific username.
PIDs impacted by treehelpers, as you requested:
John Joseph O'Connor G7HD-1KP and his parents GZDM-WKJ and KG27-TVN (multiple times)
John Francis Brown LYCK-SDP, his wife LYCK-SYG, and their children.
2 -
@Joe Price 4 here's another.
1 -
Those examples were really helpful. I have copied these examples to share with my students that work with the Pathway students that work on these projects. When you see treehelper followed by a letter and number, that is a Pathway student working on a project (and not automation). We audit random samples of their work and we can also go back and check the work of specific students. I will have my students go back and check on the work done by a34 and a410. For the 7:43am comment, I'm not sure why the person (a110) doing the hint added in Milton again. I'm not sure what it looked like in Sourcelinker at the time. If Cora is the mother of all of the children, it might be good to make that connection so the kids all show up in the same place. That might have been what caused the Gedcom import to create a duplicate as well. For the 7:14am comment, I just went and created the family on the tree for the first example you mentioned and I noticed you did the same for the record that was incorrectly attached to John Francis Brown. That really helps to make sure the record isn't attached again to that profile.
One thing I do want to emphasize is that none of these example invovled automation. If it was automation, you'll usually see that the person was created by Tree Building Project, Census Project, or Community Census Project. The examples you provided will help us go back and fix any additional mistakes made by those 3 Pathway students. We audit random samples and their accuracy is really high but they do make mistakes.
0 -
@Paul W I recognize that our projects are unpopular with some users of FamilySearch, particularly people that have worked on the Family Tree for many years. However, our projects are a huge help and blessing to people that are using FamilySearch for the first time. The Numident project we are doing now, will allow most people in the US to find their parent or grandparent on the tree. That seedling we put on the tree is then a great starting point for their discovery experience, especially since the Numident has such great info. One way to test out what we are trying to make posssible is to just help random people you meet see their family on the Family Tree (especially individuals who are African America or have ancestors from Puerto Rico). We just did this as a booth in Midway and it was incredible when we could help people see their family. We could do this in less than a minute without them having an account. Not everyone could find their family. That is what we are trying to make possible. If the see their family, they are much more likely to set up a FamilySearch account and add in all of the info about their family they know + photos and memories.
For your comment about indexing, I agree that is an issue with census records. For the Numident, the files came from the Social Security Administration. The quality in the data is really high and the record is unique in providing the maiden name of the person's mother. We put in the maiden name for mothers but based on how sourcelinker works, we used the primary name for the focus person from the Numident record. For women, this can be a married name. We have code that can go back and replace this with the maident name. We are just working with FamilySearch on how/when to do this.
0 -
@Áine Ní Donnghaile Some of the work we do is automated and some is done by hand by people helping with the project. Anytime you see treehelper as the user name, that is a Pathway student attaching sources to the tree. We've asked them to use specific user names so we can audit their work. I agree that the examples you shared in this thread were mistakes and having you share those were really helpful since we can go back and check all of the hints the individual has worked on. We audit random batches each week of all of the Pathway students and their average quality is fantastic though we can do more to improve that quality.
0 -
@No one in particular Power Linker doesn't have any ability to create duplicates. It is only set up to work on hints in the Numident where all three people are on the Family Tree so there are no new people to add. We usually use this tool in the context of a group we are working where we can provide live training and be available to answer questions. Based on your comment, we can add a video to that page. That is a good idea.
All of the Power Linker hints are reviewed by two people and if either say "skip" then the hint goes into one of our normal sourcelinker tools. Also, we created this as a non-logged in experience to solve a particular problem. Whenever, we work with youth groups, there is always a part of the group that can't log in (or who require a long time to get logged in). We use Power Linker at the start of the activity so they can learn how to evaluate sources and make good decisions. Then we have them log in so they can use one of the other tools and those that can't log in can then continue to use Power Linker.
The rising generation wants to help. My experience in the past is they they would click on recommended tasks or crawl their tree looking for things to help with and it would often get them into tasks that were beyond their ability. The Power Linker hints are usually pretty easy and are a great place to start out beginners so they grow their skills.
0 -
So, just to take it right back to the beginning, in my initial post at the top of this thread, I referred to
… user "TreeBuilding Project" add[ing] a source record index (which is from the "United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007") to profile GFS1-ZS6, Amelia Matthews …
If I understand you correctly, based on the user name, that is an automated addition (which took place on 4 July 2024). One of my concerns listed at the top is that "they", or, as I now understand it, some software, ignored her death date of June 1994 which is there on the NUMIDENT record.
Should the software have ignored that death date? If it shouldn't have ignored it, can someone explain why it did? The profile is there for your guys to have a look at if necessary but please note that I have added the death date myself, subsequently.
(NB - if your guys wade through the latest Changes, they will see that I detached and re-added that NUMIDENT source in an endeavour to understand what might have been going on. )
Edit - clarified qn from "If it shouldn't have ignored it, can someone explain why?" to "If it shouldn't have ignored it, can someone explain why it did?"
1 -
OK, so this is an automated addition to the tree, then?
I'd characterize that profile as junk: the NUMIDENT file has a glaring error, which has now been propagated verbatim to the Tree. (His name was clearly Jozsef. There's no such thing as Fozsef.) This type of indexing-error-preservation is a longstanding problem with FamilySearch's Tree data, because the "seed" profiles from prior systems consist almost entirely of verbatim index entries. The fact that such additions are still being made is highly disheartening, given that the aforementioned seed data still hasn't been cleaned up, after twelve years of work.
1 -
@Adrian Bruce1 This was not an automated add. Amelia was added to the tree in 2022. This was an automated attach in July 2024 that came from our Power Linker Tool. In Power Linker, two different volunteers will evaluate a record hint and if both agree it is a correct, then we use an automated attach tool to connect the Numident record to the profiles of the three people. That automated attach uses the FamilySearch API so I'm not sure why it brought over some info but not the death year. I can look into this. We have been talking with FamilySearch about going back and making edits to the tree for these profiles and possibly other profiles that have the Numident attached. We would love to bring over the maiden name, exact birth dates, exact birth places, and death dates to the profiles. We don't have permission to do that yet.
Our Power Linker tool is the only tool that creates automated attaches. All of the other tools, like Map App, the Button, 5-a-day, etc. involve people making live edits to the Family Tree using Sourcelinker. In those cases, people can use the edit features in sourcelinker to bring the more detailed info over to the profile, if they chose.
0 -
@Julia Szent-Györgyi Yes, this was an automated addition to the Family Tree. I agree that the first name of Joseph Selmeczi's father is likely spelled wrong. However, I am really glad that we added Joseph Selmeczi to the Family Tree. The info in the Numident for him is really detailed and this was on the only record on either FamilySearch or Ancestry that listed the names of his parents. He shows up in the SSDI and an obit but neither have his parents name. I would prefer a slightly mispelled name to none at all. Joseph was born in 1930, nearly the same year as my grandfather. Now any one of his grandchildren will be able to find his profile on the Family Tree and add to it their photos and memories and other info they know.
0 -
@Áine Ní Donnghaile Anything you see done on the Family Tree by "TreeBuilding Project" was done using an automated tool. If you see that "TreeBuilding Project" added the person to the tree then it was our automated add tool and if you see that the person was added previously and that "TreeBuilding Project" did the attach, then it was a record hint completed by two volunteers using our Power Linker tool.
The automated add tool is only used for Numident records where none of the three people in the record have a possible match on the Family Tree based on the matches that we obtain using the FamilySearch API. The automated attach tool is only used for easy hints where all three people in the Numident have a match on the Family Tree and for which that hint has been verified by human volunteers using Power Linker.
0 -
@Gordon Collett "Tree Building Project" is not a group of volunteers, it is the username we use when we do automated adds or automated attaches to the Family Tree. In the previous post, you can see the criteria we use for each of those. We work closely with FamilySearch to decide on the record collections and inclusion criteria for these projects. In terms of the ability to message us, we try to respond to all of the messages that we get on FamilySearch for this username and we have BYU students that help with this. I didn't know about this thread until yesterday or I would have responded to your questions earlier. Sorry abou that.
We also provide lots of ways that volunteers can help with our Numident project. Here is a link to a google doc that we shared on our Facebook page with 10 ways that volunteers can help with the Numident project:
https://docs.google.com/document/d/1sLvol63Q5TQcWEEd_PhBfaoNjiM8so0DRl5UsGz8hzw/edit?usp=sharing
0 -
@No one in particular In answer to your question, we do have an amazing group of volunteers that are helping merge duplicates and attach additional census records to the families that we have added to the Family Tree as part of our previous projects (including the USCP). Each week, we run code that checks for duplicates and hints for the PIDs from those previous projects and we work with a group of 1,300 volunteers that do really amazing work. We also often remove families from the tree where one of the people in the family has a possible duplicate (if no one has edit the tree) and turn that original family into a record hint.
0 -
@No one in particular I'll try to provide a response to your questions. (1) The US Census Project had several phases. First, we added 1.6 million African American individuals from the 1900 census (April 2020). Second, we added 553,000 individuals living in Puerto Rico in the 1910 census (Nov 2020). Third, we added 15 million individuals from the 1910 census (March 2021). Fourth, we added 807,000 African American individuals from the 1910 census (April 2022). Fifth, we added 3 million individuals from the 1910 census (August 2022). Sixth, we added 5.5 million individuals from the 1900 census (Sept 2022). Seventh, we added 11.6 million individuals from the 1920 census who were born before 1913 (Nov 2022). Eigth, we added 5.5 million individuals from the 1880 census (Feb 2023). We have other projects we have done but those are the ones for the US Census Project.
Right now, 2% of those PIDs have a flagged possible duplicate on the Family Tree. We are working to merge those duplicates. 2% is a low duplication rate and lower than what I usually find when I crawl the 6 generations up and down of a member of the church. However, the number of people we have added to the tree for these projects is large (43.6 million) so it does result in lots of duplicates. However, this is why I ask people to send us an example when they are frustrated about having to merge one of these duplicates because I usually find that they often still helped add new people to their family line and also often provide a census record (with all its rich info) that eluded detection for many years.
(2) The BYU Record Linking Lab has thousands of volunteers. We have built some tools that people really like to use. These include the Map App, the Button, 5-a-day, Power Linker, and our google sheets. We also partner with Goldie May which has a volunteering tool that reaches lots of volunteers. We also work with 1,300 church service missionaries at FamilySearch that are really amazing. Those volunteers help merge duplicates, work on challenging Numident hints, and find info needed to convert yellow temples into green temples. We organize Just Serve activities, work with youth and YSA groups, teach classes at BYU education week and other venues, and make helpful videos. We really focus as a lab on helping more people get involved with family history. They might make some mistakes in the beginning, but they will be the rising generation that takes on this work at a much larger scale than we thought possible in the past.
One last thing is just the math we are working with. The FamilySearch Newsroom reported that there were 80 million people added to the Family Tree in 2023. In that same year, 61 million people died in the world. So we made a net gain of 20 million in terms of gathering all of God's children to the Family Tree. However, 107 billion people have already lived on earth. Even if we just focus on the 30 billion people that have lived on earth more recently, that is still a lot people to include and would take over a thousand years at the current pace. I am excited, though, about ways we can use AI and human volunteers to accelerate this work.
0 -
@Joe Price 4 wrote:
The info in the Numident for him is really detailed and this was on the only record on either FamilySearch or Ancestry that listed the names of his parents.
And the automated addition has served to deprive his relatives of the joy of discovery of that NUMIDENT record, and of the pride of entering his profile and that of his parents, replacing joy and pride with the annoyance — or probably more like outrage — of needing to correct an error, and very likely, the need to merge a duplicate.
No, I don't think Mr. Selmeczi's profile is a duplicate. Yet. But newcomers to FS never start with searching for their relatives. Ever. They start with entering them. That's what they've learned to expect from genealogy websites and software, and it's what FS tells them to do, on the Overview page under Family Tree: "Step 1:
Start by adding what you know about your family." (Yes, it goes on to imply that they only meant living people, and Step 2 talks about searching for connections, but do you seriously think anybody actually reads that far?)One letter in a given name may be a small error by itself, but it all adds up. Twelve years and there's no end in sight, and then you start adding more?!? As I said, disheartening.
1 -
I agree that in the past, the typical experience was for people to set up a FamilySearch account and then adding in what they know and working back. There are lots more ways now to help people look for their relatives on the Family Tree (even without logging in). This is what we did at the Swiss days Midway this last weekend, at a huge book fair in Argentina, at a county fair in Oregon, and with almost every Uber driver I have ever riden with. There are lots of community groups starting to do the same thing.
I think you and I are thinking of two different audiences. Both are really important. The audience you are thinking of are people that already know about FamilySearch. They take the time set up a FamilySearch account and add in what they know. The audience that I am thinking about are the people that don't know about FamilySearch but learn about it at a county fair or other event. Someone at the booth will say, "FamilySearch has a free app that let's us see our family. Would you like to see if we can find your family?" If we can give that discovery experience to them in 1-2 minutes then they will be much more likely to set up a FamilySearch account and do the process that you are talking about.
It's okay that you and I see this experience differently and focus on different audiences. I just wanted to explain the group that I most want to reach. The Family Tree is the most amazing thing ever created and I would love to help more people want to use it. I believe that being able to quickly show people their own ancestors on the Family Tree is one of the best ways to draw them in to use FamilySearch. I could be wrong about that, so maybe we could do an experiment at a county fair or some other place where we can reach out to people who have never heard of FamilySearch and see what works best.
Also, you will have to clarify the thing about 12 years. I am not sure what the refers to. I am guessing it is something about how a new FamilySearch website was created in 2012. I wasn't doing family history at that time, so I'm not sure what you mean or what it means for there to be no end in sight.
1 -
@Joe Price 4 you may need to re-post a comment you added after Julia's above and for which I got a notification. Sometimes editing comments in Community makes them disappear (a long-term but apparently intractable bug).
1 -
12 years = most of us in this thread are long-standing FamilySearch users who have been cleaning up and merging the seed profiles for many years. The thought of doing it all over again is disheartening.
3 -
That's Joe's disappeared comment, many thanks Aine.
1