Is the user "TreeBuilding Project" taking the tree forward or wasting time?
Answers
-
Yes, sorry, Paul, I took it as understood that if one side of the index was attached to the wrong parent so was the other.
I guess I should be glad the contributor (automated or otherwise) did not change the names as shown in the other example.
1 -
On a similar note: The NUMIDENT record for Margaret Jane Haney LWQW-HZM was attached today with her father's record attached to her mother, and her mother's record attached to her father. On this occasion by USCensusProject.
I suspect there are other unannounced projects underway. My watchlist is flooded with users attaching Scotland Census 1901 records. The reason I suspect a project is that they are ignoring all the other hints for these profiles, simply attach the record and move on.
2 -
@ColinCameron - did you really mean Scotland Census 1901? Only I thought such records were firmly behind the paywall of ScotlandsPeople, and I certainly can't find a collection of any similar name. I hope they've not gotten hold of it since I was blissfully not expecting any issues in my Scottish lines.
I do have to keep an eye on the England & Wales censuses but I've not had any issues there (crossing my fingers) possibly because all the families that I follow in that era, I'd already put the 1901 on.
1 -
I believe viewing of the Scotland Census collections is solely possible by visiting a FamilySearch establishment. I have added records as sources whilst at Hyde Park Temple (though that dates back quite some years), much in the same way as for the 1939 National Register. As my Scottish ancestors / relatives were nearly all in England (or emigrated to Canada or the US) by 1901, I can only find sources I've attached to their profiles up to 1891. However, if you look at the Wiki page at https://www.familysearch.org/en/wiki/Scotland_Census (see also https://www.familysearch.org/search/collection/3212239 ) there is no indication that access rights are any different from any of the collections up to 1901. These are among the few "index only" collections I have come across with restricted viewing rights - the good thing being that (whether I or another user has attached them whilst at a FSC) the records can then be viewed from home.
So, yes, given the stated circumstances for access to the collection, I believe users are able to attach the records as sources to Family Tree IDs (not for 1911, though).
1 -
@Joe Price 4 'If you are coming across the seedlings that we have added to the tree from the Numident so often, then just send us the next few that you come across. It should be someone that you are related to. I would be willing to write up a report about these duplicates...'
Joe, the root problems here are not about specific profiles, and while the report you prepared for me is very interesting it is clearly not scalable and it doesn't attack the underlying issues (a point that others have made on the BYU RLL Facebook page).
Obviously specific examples are useful to help you diagnose root causes in your hints, in your automation code, and/or in your instructions to volunteers, but I for one don't need to know the detail, just (eventually) how you have remediated BYU RLL's approaches as a result of all this evidence.
(And I personally am definitely not limiting my investigations to my relatives.)
Also, the report you did for me indicated that you could see problems in FS' duplicate identification algorithm. I can only go on what I am told by FS as an ordinary UI user. Optimising that algorithm surely should be central to everything you and FS are doing together?
0 -
(And I personally am definitely not limiting my investigations to my relatives.)
Nor am I. Most of the examples I have shared in this thread are from my relatives, but I often assist others with their research.
2 -
@Paul W - thanks for that. I had no idea that FS had some of the Scottish censuses anywhere, in any form. Please note that I would never be churlish enough to complain about attendees at FHCs having access to such information when I, sitting at my home pc, don't. Rather the issue is one of whether I have to monitor my Scottish relatives for any issues created by a census project. I thought I didn't have to do so but clearly I have to - at least to some degree.
However, there are 2 things that may or may not be issues in practice with such monitoring.
Firstly, I have no means of checking the images to see if updates to profiles were correct (the mitigation for this is that I've possibly already bought those images from ScotlandsPeople).
Secondly, if there is a "cross-threaded" attachment of father to mother and vice versa (as mentioned above), I have no idea right now whether I can swap those attachments over without permanently losing the index that I'm trying to re-attach, given that I can't access the index in the Historical Records. (If I do lose access to the index in those circumstances, then I'm going to have to completely remove it from the parents. I think).
OK - so I need to watch out for those things, thanks.
1 -
And today, another cross-threaded NUMIDENT attached by USCensusProject.
@Joe Price 4 Are you still with us? https://www.familysearch.org/tree/person/changelog/LB68-1Y4
Of course, the PQS, quite rightly, calls out a conflict in the gender -
2 -
I've had a look at '3-packs' in my analysis database.
All 3 profiles created by USCensusProject on same day: 37, creation dates ranging from April 2021 to August 2024 (with 16 of the 3-packs created between 16 and 24 August 2024).
Example: GYNJ-7GD | GYNN-5HD | GYNN-KZB
All 3 profiles created by TreeBuilding Project on same day: 33, creation dates ranging from May to August 2024.
Example: GYXG-3PM | GYXG-617 | GYXL-YPH
All 3 profiles created by CommunityCensus Project on same day: 14, creation dates ranging from August 2020 to August 2024.
Example: GYXR-BBB | GYXR-FQ3 | GYXR-XV4
(Is CommunityCensus Project one of your usernames @Joe Price 4?)
Interestingly none of the 252 profiles in these 3-packs are as far as I am aware flagged by FS as duplicates.
I can provide the detailed lists if anyone wants them.
I'll have a look at the swapped over mothers and fathers next.
0 -
@MandyShaw1 - re the 3-packs, triplets, triptychs, whatever, not being flagged as dupes. Assuming the trios are 2 parents and a child, with the child being the principal, how many of those principals were women entered under a married name only?
I ask because my one genuine case of a Numident duplicate trio had the principal as a woman entered by the project only under her married name, with her undiscovered duplicate already in FSFT being entered only under her maiden name.
I gave the "original" principal an Alternate name of her married name, and within a minute or two, she and her Numident were being flagged as potential dupes. Things weren't quite as simple as that, however, since there was a single letter spelling mistake on her father's surname on the Numident triptych that dates back to the original NARA files - I checked to see if it was a FamilySearch error - it wasn't.
With only one such example, and with that spelling mistake, it may signify nothing for my small sample, but it might be worth thinking about.
1 -
And another cross-threaded/reversed NUMIDENT attachment by the USCensusProject
1 -
Yes, your examples include the three main user names that we use with our automated tool: USCensusProject, Community Census Project, and Tree Building Project. We check the families from the Numident that we add for the FS duplicate flag and work with volunteers to resolve them. I imagine that as we attach additional sources to the parents in these seedlings that they will connect in with their own families.
0 -
I think it would be worth pausing all of these automations and volunteer work to re-evaluate the goals of these projects. The goals are to improve engagement and inclusivity. The first could be evaluated based on statistics, though it may be a challenge to tie engagement or an increase in activity to these projects. Inclusion is a bit nebulous. How does one track that? I would bet there has been not been a significant increase in engagement, though I am open to being proven wrong. And again, how does one demonstrate inclusion in any statistically significant way? Joe says people are discouraged when they can't find their family. I have interacted with individuals who are surprised, not in a good way, that their family were on the tree. I think it goes both ways and is BEST when a person enters their own family if they are not on the tree.
3 -
I agree that the cross-threading of the parents is a problem. It stems from a way that a small fraction of the parents in the Numident get placed in Sourcelinker with the father and mother switched. We let FamilySearch know about this issue and we'll be able to go back and flag and fix these. I don't want to discount or downplay this issue, since I agree that it is bothersome. However, it doesn't really affect the tree a lot aside from the source being attached to the wrong parent. It is easy to flag and fix and we will definitely work on that.
0 -
@MandyShaw1 What I was trying to show in my report of those specific examples is the cost and benefit of the families that we added to tree. The second example is useful because it highlights that the family we added did require doing a merge but it resulted connecting together 3 generations of Ronald's family. This was a part of the Family Tree that hadn't gotten a lot of previous attention but will now make it possible for Ronald's descendants to now see pretty far back when they use FamilySearch for the first time.
The interaction that I have with the FS duplication identification algorithm is the same as what you experience as an ordinary UI user. The only difference is that I use the API to access the same info that you see on the profile. This just allows me to check larger samples of profiles.
I think that all of us wish that the FS algorithm could identify more of the true duplicates. They just have a natural dillema is that in order to flag more of the true duplicates, they would also have reveal more false positives. This is the precision-recall trade off with all machine learning. If they changed the cut-off they use, then we would probably end up with more munged families (where two distinct families are merged together). This is also why in the first example you shared, I noted that I wasn't sure if those two families are the same and I couldn't find any other records to help me resolve my doubts about it.
0 -
@melanes I think we could come up with some metrics for inclusion and we could measure this for different parts of the world or even for specific groups of people (e.g. immigrants in Oregon from the Philippines). I think the best metric is to just interact with random people from the place or group and ask if they would like to see if their family is on the Family Tree. When we did this at a county fair in Oregon several years ago it was 50%. For a while, when we did this with African Americans in the US, it was 5%. When we do it now with many immigrant groups in the US, it is still 0%. This is a reasonable proxy for the inclusivity of the Family Tree.
I'm lucky to be at BYU where I can try this out with students from lots of countries. I also try it out with many people when I travel (probably 600+ people in the last few years). I have seen so much joy when people see their family on the Family Tree. I've never seen the other thing you describe though I imagine it is possible (though very rare).
We are always adjusting the parameters of our projects based on feedback. If you are willing, let me propose two things we can do. I know I've mentioned them in other posts but they really are at core of the costs and benefits of these projects. First, just share the next duplicate Numident family that you come across while doing your own family history. Second, let me know if you'd be willing to help add families from the Numident to the tree using whatever approach you want. These are families that we can't find a connection to the tree yet. We would be open to finding other approaches to making sure they are included on the tree. I would send you some Numident records for which we can't find any possible matches to the tree yet.
0 -
@No one in particular What you call orphans are what I call seedlings. They are a core part of the way that the Family Tree will grow in the future, especially for immigrants or other groups that have been largely overlooked by past efforts. Many of these seedlings on the tree are created when someone uses FamilySearch for the first time and adds in what they know but only knows about their grandparents. This will show up on the public tree as little seedlings. But that is where it all starts. With both ways that seedlings are created, we can then add additional sources and grow out the seedlings until they become trees and then connect into the Family Tree.
If you are sceptical of our seedling approach then just reach out to someone with ancestors from Puerto Rico or ancestors who are African Americans. These are two groups for which our seedling approach has dramatically improved the experience for people using FamilySearch for the first time. We hope to do the same for many other groups (Guatemala, Philippines, France, Uruguay, Argentina, etc.)
You and I see the value of seedlings differently and that is okay. I view seedlings as a way for people to see their ancestors when they use FamilySearch for the first time (or participate in the various booths that FamilySearch and others set up at public events). I also view seedlings as the recepticle of measures. Even a seedling of one person can still be a place where memories and photos can be preserved forever.
It would be great if you could share the next seedling that my lab has created that you come across as a duplicate when doing your own family history. I bet looking at it would help us highlight some of the benefits of the seedling approach.
0 -
I actually encountered a NUMIDENT-based triplet today. I was, shall we say, not happy about it: it meant that I couldn't use Source Linker's profile-adding shortcut at all. Instead, I had a choice between the tedium of merging or the aggravation of editing Every Single Field on three profiles. (I ended up doing one-third one way and two-thirds the other way.)
The people were my spouse's distant cousins, so the whole "a stranger touched my grandmother!" squickiness didn't apply in this case, but I am very grateful that my dad falls outside the date range for this project.
My experience points out another problem with purposely adding orphans like this: the mother in the tryptich shared her full (maiden) name with one aunt and her married surname and residence with another aunt. In other words, it would've been dead easy to connect up those orphans the wrong way 'round. The existence of those three profiles made sorting things out extra-complicated, since the parents had no vitals added whatsoever, so there was no way to tell whether I was looking at the aunt or the niece.
I wonder how many of the people expressing joy to Joe were just being polite, and how many of them were happy because they didn't have the faintest clue what they were actually looking at. In my experience, people only like seeing data about their close relatives if it's all 100% correct — and we all know that that happens approximately never.
4 -
@Julia Szent-Györgyi were any of the profiles on your triplet flagged by FS as duplicates?
0 -
@Joe Price 4 In about 2003 I ran a stall at a non-genealogy show at the Birmingham NEC at which, besides more normal marketing stuff, we looked up England and Wales BMD index entries for people. They loved it and we had a queue, unlike the other stalls near us (Ha). Do you not think people would get just as much joy and engagement out of seeing their NUMIDENT records and personas on the screen, with zero risk to the integrity of the Tree? If anything, the record display is easier to interpret for the newbie, in my view. (And Julia's '100% correct' point definitely applies here.)
3 -
Joe, my point here was the absolutely key role of appropriate duplicate checking.
Agreed that FS is walking a tightrope in duplicate checking for its UI users, to avoid inappropriate merges. But your need is different, you can safely leave any profile with even a sniff of a danger of duplication for manual intervention.
I would suggest you should be asking for a differently configured algorithm for your use (and indeed the use of gedcom import) that doesn't care about false duplicate identifications because these bulk imports can just ignore them/flag them to sort out manually.
Along with which goes my previously mentioned suggestion of a pre-create 'is there any danger of this being a duplicate?' API (which could obviously use the 'bulk import' identification algorithm I've just suggested).
0 -
@MandyShaw1, no, there were no duplicate flags involved in my case, since I was in the process of adding the aunt when I found the niece's nearly-blank profile. As I said, that led to extra-special fun times while I figured out that no, the SSA (and BYU) hadn't gotten things switched around. I ended up creating a duplicate of the niece and merging the NUMIDENT-based profile into it, because it was the only way I could trust myself to keep the aunt and niece straight (and get the birth and death data entered correctly and all at once, instead of engaging in the well-scattered click-fest of profile editing). It would all have been Much Easier if the NUMIDENT record had simply come up as a hint on the newly-created profile(s), the way the naturalization and censuses did.
I agree with @melanes that it is best for people to enter their own families. As @MandyShaw1 says, Search - Records would offer the same discoveries and engagement opportunities without damaging the Tree.
5 -
I agree with @Julia Szent-Györgyi and others that Search can bring about discoveries and engagement. Further it encourages increased engagement because individuals can see something to do. I would bet that people who see their families on the tree at public events do not engage with the tree further than that event. Why should they, the work appears to have been done by someone else.
Performing experiments that touch existing data like this are never good. It leads to data corruption, as we have seen. These activities should be happening on a cloned version of the tree so they can be thoroughly tested and then offered up as record hints on the production tree. It's the safest approach, won't disenfranchise power users, and will still create discoveries and engagement for new users.
4 -
@melanes I completely agree.
Unfortunately Joe's initiative seems to have developed a vast head of steam.
There have been several years of negative comment here in Community and elsewhere, but formal communication channels with either BYU or FS management on the subject seem non-existent.
I have failed despite much googling to identify whoever in FS is the owner of these precious information assets (who seems the obvious person to raise this with).
3 -
I see the treehelpers have now moved on to the 1930 US Census. And they do like to add all the unattached clutter:
1 -
These last six months most of my time has been spent removing some of that clutter. A "Married" status being added to someone who already has a spouse and a marriage date and location is pointless clutter. A "Single" status, with no associated date, added to anyone is just pointless. We're all single, until we're not.
6 -
Another day, another cross-threaded NUMIDENT.
In the case of Joseph Burns GJ68-RCX, the TreeBuilding Project cross-attached his parents, did not update his profile to include the full date of birth, did not add his middle name, did not add his date of death.
I fail to see the purpose.
Edit to add: It seems the Community has unfortunately changed the way it treats the image posted from Imgur. That's disappointing.
0 -
This is not necessarily an issue related to this topic (TreeBuilding Project). See https://community.familysearch.org/en/discussion/comment/565432#Comment_565432 where there has previously been a discussion about the lack of usefulness of transferring marital status to the Details page, without there being any date specified. I obviously agree with your point, but the problem is a much wider issue.
1 -
This also illustrates another problem with these projects. When it's not your family and you don't study the context, then you don't know what is important or not. This happens with other casual users, but at the scale of these projects the problem becomes a very big problem for the future.
3 -
In my analysis database of approx 60,000 profiles I found four BYU RLL-attached cross-thread pairs as examples for you @Joe Price 4 (all NUMIDENT and all attached by USCensusProject on either 14 or 15 September 2024):
attachDate
record (1:2:)
persona (1:1:)
fsid
surname
given
sourceTitle
14/09/2024
HSJ6-T1VL
6K9Q-M434
MWV9-TSN
McFall
John Charles
Lois B Tinkham in entry for Frederick T McFall, 'United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007'
14/09/2024
HSJ6-T1VL
6K9Q-M43W
LZRZ-PBG
Tinkham
Lois Brooks
John C McFall in entry for Frederick T McFall, 'United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007'
14/09/2024
HSJ7-LJKN
6KMN-R2ZB
L4DX-FFJ
Titus
John Crocker
Avis Edin in entry for Joan Frnnces Barnes, 'United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007'
14/09/2024
HSJ7-LJKN
6KMN-R2ZY
L2CM-B56
Titus
Avis E
John C Titus in entry for Joan Frnnces Barnes, 'United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007'
15/09/2024
HSJW-MYB8
6KMV-SF84
9VNP-L4P
Gellett
Grace E.
Harry G Timkham in entry for Ruth E Phillips, 'United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007'
15/09/2024
HSJW-MYB8
6KMV-SF88
LCRN-RK6
Tinkham
Harry Garfield
Grace E Gillette in entry for Ruth E Phillips, 'United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007'
14/09/2024
HSVJ-WP6W
6K3C-8W1L
G9PB-J59
Haney
John
Sophia Kelly in entry for Margaret Jane Haney, 'United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007'
14/09/2024
HSVJ-WP6W
6K3C-8W1V
9Q86-73L
Kelley
Sophia
John Haney in entry for Margaret Jane Haney, 'United States, Social Security Numerical Identification Files (NUMIDENT), 1936-2007'
0