Is the user "TreeBuilding Project" taking the tree forward or wasting time?
Answers
-
@Paul W agreed on all counts (and thank you), though BMD records in E&W post 1837 do seem more logical and structured (even if a bit lacking in detail) when compared with the US.
I'm just doing some more analysis (also using the BYU RLL Census Tree Dec 2023 snapshot which I found in the Computer-Generated Trees section of FS Genealogies) to look at these delays in more detail. More on this soon.
0 -
I know that Rootstech is a few months away. If anyone is planning to attend, it would be worth the effort to try and talk to FamilySearch staff in person. Clearly the emails are easy for them to ignore or set aside as not that big of a deal. Joe clearly has connections with FamilySearch. I'm assuming that FamilySearch had to have approved these projects because the BYU RLL are using scripts (maybe API's) otherwise the activity would be flagged by systems engineers. The registering of dozens of "volunteer" accounts on a regular basis would normally be flagged as sock puppets and yet they are allowed to exist. It is also appears that FamilySearch is not reviewing these types of projects on a regular basis, both for outcomes and data integrity, or are not communicating about it. This kind of activity, despite not originating from FamilySearch, reflects poorly on an otherwise amazing service and platform.
And while these projects are full of good intentions, they are being done against good standards of practice established by genealogy organizations around the world. It's easy to become enthusiastic by advances in technology and assume that technology will make the work easier. We are seeing just the opposite.
3 -
@melanes BYU has applications (though not re this functionality) listed on the partner solutions directory, which would I guess give them formal access to the update APIs; and anyway I get the impression that there is a long and close relationship between FS and BYU, with the latter doing a lot of the research and providing academic underpinning and recognition.
I would love to go to RootsTech (for lots of reasons, not just bending people's ears about BYU RLL and data quality - the sessions I joined online last time were excellent) but as I live in England it's not realistic.
Completely agree with your 'technology' point - just because you /can/ do it doesn't mean you /should/. FS, in my honest opinion, really need to keep far more of a beady eye on this.
1 -
Further email sent to FS Support Europe:
'In response to your points below:
‘We at FamilySearch Support Europe have no specific access to anyone in the BYU Record Linking team beyond the email we have already supplied … Perhaps you do not fully understand how the projects work …’
You clearly know considerably more than we have ever been told - surely you must have FS contact(s) inside or outside the Support team who would be able to help us?
To be clear, the married name point is just one of the issues we have encountered, many of which relate to communication rather than to any functionality or information matter.
Based on the extensive discussions on FamilySearch Community, plus some detailed data analysis and some use of publicly available BYU RLL tools, I have put together a problem summary document which will hopefully help you identify a contact for us: see Numident summary.pdf (please let me know if you have any trouble using this link).
Many thanks for your responsiveness and help.'
(I've also updated the summary document, as linked to in the above email text, quite a bit - as always, comments gratefully received.)
0 -
Here are some census married name vs maiden name statistics:
Description
Number
Comments
Total profiles in analysis database that are set as Wife on at least one attached census record persona
6882
Of which created by USCensusProject or CommunityCensus Project
551
the below values refer only to these
With at least one husband present
544
Husband present and same surname
440
potentially require work
Husband present and different surname
104
potentially correctly set to maiden name
With at least one set of parents present
52
clear need to identify parents for as many as possible of the others
With at least one duplicate flagged
50
clear need to either merge or state not a match
Husband present and same surname, father present and same surname
0
so no-one is known to have married person with same surname
Husband present and same surname, father present and different surname
1
potentially fixable married name -> maiden name
Husband present and same surname, duplicate flagged and same surname
9
may have married person with same surname but needs checking
Husband present and same surname, duplicate flagged and different surname
28
potentially fixable married name -> maiden name
Husband present and different surname, father present and same surname
51
some minor spelling differences
Not changed by anyone since 1 hour after creation
278
so just over half of the 551 have not been touched
Not changed by anyone since creation in 2021
47
Not changed by anyone since creation in 2022
163
Not changed by anyone since creation in 2023
67
I have also been looking at the Dec 2023 Census Tree in Genealogies (https://www.familysearch.org/search/genealogies/submission/10000130/MMJZ-JR7). This could definitely be referenced to plug more of the maiden name gaps. Its algorithms link census records for different years, and in some cases, while the linking looks accurate, the records are attached in FT to profiles with different surnames that aren't necessarily flagged as duplicates.
The Genealogies-provided source display for each Census Tree entry is really useful. See
for an example (which does demonstrate the obvious dodginess of some of the linking). It helpfully looks up the source on FT for you and gives you a link to whatever it finds (and the profile links are all up-to-date, too). So a thank you to BYU RLL for providing this data set.I might have a go at fixing the profiles referenced in my table above myself sometime, in which case I will post what I find.
0 -
Census Tree statistics following from previous comment:
Description
Number
Comments
Tinkham surnames among the 551 profiles from previous comment
42
FS fuzzy surname matching
Surname matches husband's surname exactly
32
Of these 32, Census-Tree-identified duplicates with different surnames
4
Ditto with no parents or FS duplicate flags present on BYU RLL-created profile
4
demonstrating usefulness of Census Tree in obtaining hints
Apparently genuine duplicates with different surnames (manual investigation)
3
potentially fixable married name -> maiden name (via merge)
I shall definitely be trying the Census Tree in future when I can't identify a maiden name and the dates make it sensible.
1 -
Well, I am distinctly underwhelmed to report that FS Support Europe have responded as follows:
'As we said in our latest email we at FamilySearch Support Europe have no specific access to anyone in the BYU Record Linking team beyond the email we have already supplied. If you are not receiving replies, we suggest you contact BYU more generally via [the main BYU website].'
I propose to write back pointing out that (as clearly indicated by my summary document) there are 2 problems, a) BYU's damaging updates, and b) FS' lack of control over them, and that therefore, wherever the conversation with BYU may go, we still need to escalate this within FS.
Meanwhile I will try the main BYU enquiries channel also.
Anyone got any better ideas? (Does someone want to give Report Abuse a try? I am not optimistic, but you never know until you try, and one of you might have a lot more clout than me.)
0 -
Whilst I (and I'm sure many other FT users) are supportive of the efforts you are making on the issue, it was stated from early on (by Professor Joe) that his projects have the full backing of FamilySearch management. As we have found with many other issues that have caused us concern in the past, there is no direct channel of communication with FS engineers, let alone management, so we are completely in the hands of support staff when wanting any issue escalated to that level. I am not optimistic about that happening here, I'm afraid.
1 -
@Paul W I have to agree here. As I stated on another thread, the Report Abuse channel tests our patience even on issues that are proven to be abuse. I foresee the chance of snowballs in a hot place in this instance.
0 -
Back in 2022 when I first noticed how much of a problem these projects created, I reported several profiles for abuse. I got these responses from Data Administration:
9/8/2022 "Thank you for bringing this to our attention. We have passed the feedback along to those over the project." (received twice when reporting two different duplicates of the same person)
9/14/2022 "Thank you for bringing this to our attention. Appropriate action will be taken."
10/3/2022: (received twice when reporting a husband and wife) "We have reviewed a record that you reported in Family Tree as containing inappropriate content or abuse and have determined that this situation does not qualify as abuse.
Types of inappropriate content to report might include the following:
Offensive or abusive language or content
Information that might harm or embarrass living relatives
Links to external web pages with inappropriate content
Solicitations for businesses or research services
Harassment
Political statement
Copyright infringementPlease do not use the Report Abuse feature to report inaccurate information about individuals or families, such as incorrect names or dates, or to request that the record be deleted or corrected. To correct these errors, work with the other contributors by using the discussions or internal messaging features, or use the Help feature in Family Tree to report your concerns."
Responses like the latter make me extremely cynical about whether Data Administration has any interest at all in addressing these issues. Clearly they were lying to me when they said "appropriate action will be taken", as the problem is ongoing, and I have my doubts about whether they actually passed the feedback along to the project in the first place, although it's also clear that these projects ignore this type of feedback on a regular basis.
2 -
Thank you all for your input. I do entirely see all your points.
It's sadly entirely clear that FS doesn't rein in any of BYU RLL's activities - what I am asking myself is, are the relevant FS people aware of the full current implications of activities to which they have at some point in the past given the green light? Which is why I am inclined to have one more go at escalating this, followed, if necessary, by submitting a formal complaint.
It feels to me as if BYU RLL could achieve many of their objectives in a much less damaging way if they only communicated and collaborated properly with the rest of the FT community. That is something FS could make happen, I'd have thought.
0 -
I decided to re-send my previous email (of Sunday 13th October) to BYU RLL 'in case it had got lost in transit'. I will give them a few days to respond.
0 -
I have submitted my thoughts on how duplicate handling could be improved for automated mechanisms to Suggest an Idea, given that Ideas do now seem to be 'sticking', even if most of them aren't visible to (most?) Community users.
0 -
Since recent Suggestions have all disappeared, I hope you kept a copy. I believe the only ones that have "stuck" were posted during the weekend hours when staff is not on duty.
0 -
The text is in my 'numident summary' document (and is also quoted in the comment above, in fact).
The Idea I submitted in September never got saved at all as far as I could see; the more recent ones show Permission Problem after a bit, but at least they appear to have been saved somewhere.
0 -
Hot news folks, we have had an answer from Joe to the email I sent twice to BYU RLL. As follows.
'I'm responding to your email to rll@byu.edu. Normally, those emails forward directly to my BYU email but I'm not sure why they didn't with your emails on Oct 13th and Nov 2nd. I just noticed them by chance while checking on something else.
Maybe what might help best is to just summarize your main concerns about the approach that we are taking and I can work with FamilySearch to find the best solution. I want to be respectful of your concerns and I will do all I can to respond and adjust our approach.
We would also be open to ideas for how you would ensure that everyone in the Numident (or other important datasets) are on the Family Tree. We don't want anyone to be missed. There are still 1.5 million families in the Numident where no one in the family seems to have a direct match to the Family Tree (based on the FamilySearch match files). If you would like to propose a way to add those families to the Family Tree, I would be certainly open to your ideas and your help.'
I think this is really quite positive. I will start drafting a response (and post it here before sending it, obviously), but all thoughts very welcome.
1 -
Here's my proposed response to Joe's first point (second one to follow). Comments please!
Point 1:
Maybe what might help best is to just summarize your main concerns about the approach that we are taking and I can work with FamilySearch to find the best solution. I want to be respectful of your concerns and I will do all I can to respond and adjust our approach.
Communication
There appear to be no communication channels in place between BYU RLL and the rest of the FT user base for either of the following:
Publicising the aims and workings of BYU RLL projects to the wider FT user base (LDS and non-LDS), and setting expectations.
Reporting and resolution of issues, including non-profile-specific concerns such as those listed here.
Accountability
BYU RLL projects don’t currently demonstrate the expected collaborative approach to editing the Family Tree:
BYU RLL needs to respond meaningfully and in a timely manner to messages from other FT users.
Appropriate reason statements need to be put on all changes made by BYU RLL.
Failure to take current FT information into account
Those using the Power Linker, Source Linker via BYU RLL hint emails, etc., don’t necessarily understand the impact a resulting change would have on the FT before they make or approve it:
Just because a match looks great in the Power Linker (or indeed the Source Linker) does not mean that there is no contra-indication somewhere else in the FT profile's data, in its Research Helps (or Profile Quality Score guidance), or in an alert.
Starting from a BYU RLL-chosen candidate match will always mean that the user has had no chance to judge whether this is the best match available to existing FT data.
Timing problems and best practice
We understand from FS Support that it requires multiple interventions to implement some BYU RLL changes, often with considerable delay between the steps.
These situations may leave a profile in a state that does not reflect genealogy best practice, potentially for a long time (in my analysis database, over 50% of the ‘census wife’ profiles created by BYU RLL using married names have not been touched since - including many profiles created in 2021 and 2022).
Such pending work is in no way communicated to other FT users, resulting in confusion and time-wasting (while FT reason statements and notes/Alerts are available to help with this, they do not appear to be used).
Additionally, changes that may have been made by others in between BYU RLL visits are frequently not taken into account by BYU RLL’s activities.
Handling of duplicates by BYU RLL profile creation
We understand that BYU RLL uses FS’ standard duplicate flagging algorithm in its decision making.
It appears that FS tunes the standard duplicate algorithm to minimise the flagging of duplicates that aren’t actually matches, i.e. to minimise false positives.
But BYU RLL’s profile creation needs to be able to minimise false negatives, thereby protecting both the integrity of FT and the experience of its other users.
The provision and use of a differently configured duplicate flagging algorithm therefore seems appropriate.
A pre-create 'check for duplication' API (or the option to ‘reject if duplicate’ on the Create Person API) would also be beneficial.
3 -
@mandyshaw1 I think it is a well written response. I have underlying philosophical problem the the lab and the work they are doing, but it's probably best not to argue that. Mentioning genealogical best practices is a good start.
0 -
Thanks @melanes, & also to the 'likers'.
Extra para to go on the end of the 'duplicates' section:
Even the standard duplicate algorithm will work better the more information it is given. The timing problems mentioned in the previous section may well, therefore, lead to unnecessary delays in identifying duplicates. One obvious example is that profile creation from Numident records appears initially to ignore the frequent treasure trove of alternate names present on the record as it follows the individual through time.
1 -
Proposed response to Joe's second point follows. I'm planning on sending both halves off to him tomorrow afternoon unless anyone has any objection.
Point 2:
We would also be open to ideas for how you would ensure that everyone in the Numident (or other important datasets) are on the Family Tree. We don't want anyone to be missed. There are still 1.5 million families in the Numident where no one in the family seems to have a direct match to the Family Tree (based on the FamilySearch match files). If you would like to propose a way to add those families to the Family Tree, I would be certainly open to your ideas and your help.
My initial thoughts on this follow (and may or may not add anything!)
I assume that:
You have already created these 1.5 million FT profiles, and that they are currently in small family groups (‘triplets’ etc.) – as you know, we are seeing this a lot.
You are looking for merge opportunities that would link your family group accurately to the main Tree.
I did an experiment using some of the 40 triplets created by TreeBuilding Project that I found within my analysis database.
I successfully used the following methods to build up the evidence needed to identify potential linkages (which would obviously require detailed review before any action was taken):
1.Flagged duplicates. The standard algorithm minimises false positives, but they remain the best place to start.
2.’Research Helps’ that are already attached to other profiles (Research Helps are also accessible to certified partner solutions as webhints).
3.’Similar Records’ on the Numident that are already attached to other profiles.
4.Additional information from the Numident metadata.
5.’Find Similar People’, with judicious use of Exact/wildcards.
6.Lookups on the Census Tree in FS Genealogies (for possible links that FS’ algorithms can’t see).
7.Find a Grave entries (for possible links between profiles, though obviously to be taken with a pinch of salt).
8.‘Research Helps’ not already attached to other profiles (for potential additional evidence).
My examples are here: Numident evidence collection examples.pdf
0 -
This thread has regained its correct date on my Bookmarks and on the main Discussions list!
(Edit, having checked re this comment) …. or perhaps not.
0 -
-
Yup, back to its old tricks, my Bookmarks say it was last changed by you at 1.05pm (GMT).
0 -
The more I think about this the more I realise that bulk insert activities, if they are ever to work in a non-disruptive way, have to have more APIs available to them; just as one example, allowing identification of any attached profile(s) on a given record persona. (Plus the 'duplicate precheck' I have banged on about previously.)
I can see these connections in my analysis database because I have already pulled the data, in particular re one specific family who happen to have meaningful but manageable amounts of data, and can therefore match this information at the database level, but that's not in any way a usable mechanism for actual bulk inserts.
0 -
Here's the response I finally sent to Joe: BYU RLL response.pdf
I'll keep you all posted.
1