Is the user "TreeBuilding Project" taking the tree forward or wasting time?

MandyShaw1 · October 25, 2024

@Paul W agreed on all counts (and thank you), though BMD records in E&W post 1837 do seem more logical and structured (even if a bit lacking in detail) when compared with the US.

I'm just doing some more analysis (also using the BYU RLL Census Tree Dec 2023 snapshot which I found in the Computer-Generated Trees section of FS Genealogies) to look at these delays in more detail. More on this soon.

melanes · October 27, 2024

I know that Rootstech is a few months away. If anyone is planning to attend, it would be worth the effort to try and talk to FamilySearch staff in person. Clearly the emails are easy for them to ignore or set aside as not that big of a deal. Joe clearly has connections with FamilySearch. I'm assuming that FamilySearch had to have approved these projects because the BYU RLL are using scripts (maybe API's) otherwise the activity would be flagged by systems engineers. The registering of dozens of "volunteer" accounts on a regular basis would normally be flagged as sock puppets and yet they are allowed to exist. It is also appears that FamilySearch is not reviewing these types of projects on a regular basis, both for outcomes and data integrity, or are not communicating about it. This kind of activity, despite not originating from FamilySearch, reflects poorly on an otherwise amazing service and platform.

And while these projects are full of good intentions, they are being done against good standards of practice established by genealogy organizations around the world. It's easy to become enthusiastic by advances in technology and assume that technology will make the work easier. We are seeing just the opposite.

MandyShaw1 · October 28, 2024

@melanes BYU has applications (though not re this functionality) listed on the partner solutions directory, which would I guess give them formal access to the update APIs; and anyway I get the impression that there is a long and close relationship between FS and BYU, with the latter doing a lot of the research and providing academic underpinning and recognition.

I would love to go to RootsTech (for lots of reasons, not just bending people's ears about BYU RLL and data quality - the sessions I joined online last time were excellent) but as I live in England it's not realistic.

Completely agree with your 'technology' point - just because you /can/ do it doesn't mean you /should/. FS, in my honest opinion, really need to keep far more of a beady eye on this.

MandyShaw1 · October 30, 2024

Further email sent to FS Support Europe:

'In response to your points below:

‘We at FamilySearch Support Europe have no specific access to anyone in the BYU Record Linking team beyond the email we have already supplied … Perhaps you do not fully understand how the projects work …’

You clearly know considerably more than we have ever been told - surely you must have FS contact(s) inside or outside the Support team who would be able to help us?

To be clear, the married name point is just one of the issues we have encountered, many of which relate to communication rather than to any functionality or information matter.

Based on the extensive discussions on FamilySearch Community, plus some detailed data analysis and some use of publicly available BYU RLL tools, I have put together a problem summary document which will hopefully help you identify a contact for us: see Numident summary.pdf (please let me know if you have any trouble using this link).

Many thanks for your responsiveness and help.'

(I've also updated the summary document, as linked to in the above email text, quite a bit - as always, comments gratefully received.)

MandyShaw1 · October 30, 2024

Here are some census married name vs maiden name statistics:

Description	Number	Comments
Description	Total profiles in analysis database that are set as Wife on at least one attached census record persona	Number	6882	Comments
Description	Of which created by USCensusProject or CommunityCensus Project	Number	551	Comments	the below values refer only to these
Description	With at least one husband present	Number	544	Comments
Description	Husband present and same surname	Number	440	Comments	potentially require work
Description	Husband present and different surname	Number	104	Comments	potentially correctly set to maiden name
Description	With at least one set of parents present	Number	52	Comments	clear need to identify parents for as many as possible of the others
Description	With at least one duplicate flagged	Number	50	Comments	clear need to either merge or state not a match
Description	Husband present and same surname, father present and same surname	Number	0	Comments	so no-one is known to have married person with same surname
Description	Husband present and same surname, father present and different surname	Number	1	Comments	potentially fixable married name -> maiden name
Description	Husband present and same surname, duplicate flagged and same surname	Number	9	Comments	may have married person with same surname but needs checking
Description	Husband present and same surname, duplicate flagged and different surname	Number	28	Comments	potentially fixable married name -> maiden name
Description	Husband present and different surname, father present and same surname	Number	51	Comments	some minor spelling differences
Description	Not changed by anyone since 1 hour after creation	Number	278	Comments	so just over half of the 551 have not been touched
Description	Not changed by anyone since creation in 2021	Number	47	Comments
Description	Not changed by anyone since creation in 2022	Number	163	Comments
Description	Not changed by anyone since creation in 2023	Number	67	Comments

I have also been looking at the Dec 2023 Census Tree in Genealogies (https://www.familysearch.org/search/genealogies/submission/10000130/MMJZ-JR7). This could definitely be referenced to plug more of the maiden name gaps. Its algorithms link census records for different years, and in some cases, while the linking looks accurate, the records are attached in FT to profiles with different surnames that aren't necessarily flagged as duplicates.

The Genealogies-provided source display for each Census Tree entry is really useful. See https://www.familysearch.org/service/gen/sforge/hints-view.html?personId=/ark:/61903/2:5:7JSN-NDY

for an example (which does demonstrate the obvious dodginess of some of the linking). It helpfully looks up the source on FT for you and gives you a link to whatever it finds (and the profile links are all up-to-date, too). So a thank you to BYU RLL for providing this data set.

I might have a go at fixing the profiles referenced in my table above myself sometime, in which case I will post what I find.

MandyShaw1 · October 31, 2024

Census Tree statistics following from previous comment:

Description	Number	Comments
Description	Tinkham surnames among the 551 profiles from previous comment	Number	42	Comments	FS fuzzy surname matching
Description	Surname matches husband's surname exactly	Number	32	Comments
Description	Of these 32, Census-Tree-identified duplicates with different surnames	Number	4	Comments
Description	Ditto with no parents or FS duplicate flags present on BYU RLL-created profile	Number	4	Comments	demonstrating usefulness of Census Tree in obtaining hints
Description	Apparently genuine duplicates with different surnames (manual investigation)	Number	3	Comments	potentially fixable married name -> maiden name (via merge)

I shall definitely be trying the Census Tree in future when I can't identify a maiden name and the dates make it sensible.

MandyShaw1 · November 1, 2024

Well, I am distinctly underwhelmed to report that FS Support Europe have responded as follows:

'As we said in our latest email we at FamilySearch Support Europe have no specific access to anyone in the BYU Record Linking team beyond the email we have already supplied. If you are not receiving replies, we suggest you contact BYU more generally via [the main BYU website].'

I propose to write back pointing out that (as clearly indicated by my summary document) there are 2 problems, a) BYU's damaging updates, and b) FS' lack of control over them, and that therefore, wherever the conversation with BYU may go, we still need to escalate this within FS.

Meanwhile I will try the main BYU enquiries channel also.

Anyone got any better ideas? (Does someone want to give Report Abuse a try? I am not optimistic, but you never know until you try, and one of you might have a lot more clout than me.)

Paul W · November 1, 2024

@MandyShaw1

Whilst I (and I'm sure many other FT users) are supportive of the efforts you are making on the issue, it was stated from early on (by Professor Joe) that his projects have the full backing of FamilySearch management. As we have found with many other issues that have caused us concern in the past, there is no direct channel of communication with FS engineers, let alone management, so we are completely in the hands of support staff when wanting any issue escalated to that level. I am not optimistic about that happening here, I'm afraid.

Áine Ní Donnghaile · November 1, 2024

@Paul W I have to agree here. As I stated on another thread, the Report Abuse channel tests our patience even on issues that are proven to be abuse. I foresee the chance of snowballs in a hot place in this instance.

JD Cowell · November 1, 2024

Back in 2022 when I first noticed how much of a problem these projects created, I reported several profiles for abuse. I got these responses from Data Administration:

9/8/2022 "Thank you for bringing this to our attention. We have passed the feedback along to those over the project." (received twice when reporting two different duplicates of the same person)

9/14/2022 "Thank you for bringing this to our attention. Appropriate action will be taken."

10/3/2022: (received twice when reporting a husband and wife) "We have reviewed a record that you reported in Family Tree as containing inappropriate content or abuse and have determined that this situation does not qualify as abuse.

Types of inappropriate content to report might include the following:

Offensive or abusive language or content
Information that might harm or embarrass living relatives
Links to external web pages with inappropriate content
Solicitations for businesses or research services
Harassment
Political statement
Copyright infringement

Please do not use the Report Abuse feature to report inaccurate information about individuals or families, such as incorrect names or dates, or to request that the record be deleted or corrected. To correct these errors, work with the other contributors by using the discussions or internal messaging features, or use the Help feature in Family Tree to report your concerns."

Responses like the latter make me extremely cynical about whether Data Administration has any interest at all in addressing these issues. Clearly they were lying to me when they said "appropriate action will be taken", as the problem is ongoing, and I have my doubts about whether they actually passed the feedback along to the project in the first place, although it's also clear that these projects ignore this type of feedback on a regular basis.

MandyShaw1 · November 1, 2024

Thank you all for your input. I do entirely see all your points.

It's sadly entirely clear that FS doesn't rein in any of BYU RLL's activities - what I am asking myself is, are the relevant FS people aware of the full current implications of activities to which they have at some point in the past given the green light? Which is why I am inclined to have one more go at escalating this, followed, if necessary, by submitting a formal complaint.

It feels to me as if BYU RLL could achieve many of their objectives in a much less damaging way if they only communicated and collaborated properly with the rest of the FT community. That is something FS could make happen, I'd have thought.

MandyShaw1 · November 2, 2024

I decided to re-send my previous email (of Sunday 13th October) to BYU RLL 'in case it had got lost in transit'. I will give them a few days to respond.

MandyShaw1 · November 3, 2024

I have submitted my thoughts on how duplicate handling could be improved for automated mechanisms to Suggest an Idea, given that Ideas do now seem to be 'sticking', even if most of them aren't visible to (most?) Community users.

Handling of duplicates by automated Family Tree profile creation mechanisms

MandyShaw1

Nov 3, 2024

We understand from the BYU Record Linking Lab that their various census and NUMIDENT projects use FS’ standard duplicate flagging algorithm in their decision making.
FamilySearch is clearly walking a tightrope in duplicate checking for its web interface and for the interactive users of partner solutions: minimising the number of duplicates that go unflagged (and thus, probably, unmerged), but avoiding encouraging inappropriate merges.
The key point here is that merging, while not easy, is a lot simpler and less disruptive than undoing a merge. I therefore assume that FS are obliged to tune the standard duplicate algorithm to minimise false positives, i.e. the flagging of a duplicate that is not actually a match. I would suspect that they mind false negatives less, i.e. failure to flag a duplicate that is in fact a match.
In my view BYU RLL’s profile creation automation, and other automated bulk insert mechanisms such as gedcom import, have quite different needs. It seems to me that these mechanisms really have to avoid false negatives, because it is fundamental both to the integrity of FT and to the experience of its other users that they avoid inserting duplicates. Meanwhile such automated mechanisms have the option of leaving any profile showing any danger at all of a match to be handled separately/manually, so false positives should not be a big concern.
So, I propose that a differently configured duplicate flagging algorithm is provided by FS for automated mechanisms’ use.
A pre-create 'check for duplication' API (or the option to ‘reject if duplicate’ on the Create Person API) is also needed in my view. At present automated mechanisms appear to have to first create the profile and then later check it for matches, which feels backwards to me.

Áine Ní Donnghaile · November 3, 2024

Since recent Suggestions have all disappeared, I hope you kept a copy. I believe the only ones that have "stuck" were posted during the weekend hours when staff is not on duty.

MandyShaw1 · November 3, 2024

The text is in my 'numident summary' document (and is also quoted in the comment above, in fact).

The Idea I submitted in September never got saved at all as far as I could see; the more recent ones show Permission Problem after a bit, but at least they appear to have been saved somewhere.

MandyShaw1 · November 4, 2024

Hot news folks, we have had an answer from Joe to the email I sent twice to BYU RLL. As follows.

'I'm responding to your email to rll@byu.edu. Normally, those emails forward directly to my BYU email but I'm not sure why they didn't with your emails on Oct 13th and Nov 2nd. I just noticed them by chance while checking on something else.

Maybe what might help best is to just summarize your main concerns about the approach that we are taking and I can work with FamilySearch to find the best solution. I want to be respectful of your concerns and I will do all I can to respond and adjust our approach.

We would also be open to ideas for how you would ensure that everyone in the Numident (or other important datasets) are on the Family Tree. We don't want anyone to be missed. There are still 1.5 million families in the Numident where no one in the family seems to have a direct match to the Family Tree (based on the FamilySearch match files). If you would like to propose a way to add those families to the Family Tree, I would be certainly open to your ideas and your help.'

I think this is really quite positive. I will start drafting a response (and post it here before sending it, obviously), but all thoughts very welcome.

MandyShaw1 · November 9, 2024

Here's my proposed response to Joe's first point (second one to follow). Comments please!

Point 1:

Maybe what might help best is to just summarize your main concerns about the approach that we are taking and I can work with FamilySearch to find the best solution. I want to be respectful of your concerns and I will do all I can to respond and adjust our approach.

Communication

There appear to be no communication channels in place between BYU RLL and the rest of the FT user base for either of the following:

Publicising the aims and workings of BYU RLL projects to the wider FT user base (LDS and non-LDS), and setting expectations.

Reporting and resolution of issues, including non-profile-specific concerns such as those listed here.

Accountability

BYU RLL projects don’t currently demonstrate the expected collaborative approach to editing the Family Tree:

BYU RLL needs to respond meaningfully and in a timely manner to messages from other FT users.

Appropriate reason statements need to be put on all changes made by BYU RLL.

Failure to take current FT information into account

Those using the Power Linker, Source Linker via BYU RLL hint emails, etc., don’t necessarily understand the impact a resulting change would have on the FT before they make or approve it:

Just because a match looks great in the Power Linker (or indeed the Source Linker) does not mean that there is no contra-indication somewhere else in the FT profile's data, in its Research Helps (or Profile Quality Score guidance), or in an alert.

Starting from a BYU RLL-chosen candidate match will always mean that the user has had no chance to judge whether this is the best match available to existing FT data.

Timing problems and best practice

We understand from FS Support that it requires multiple interventions to implement some BYU RLL changes, often with considerable delay between the steps.

These situations may leave a profile in a state that does not reflect genealogy best practice, potentially for a long time (in my analysis database, over 50% of the ‘census wife’ profiles created by BYU RLL using married names have not been touched since - including many profiles created in 2021 and 2022).

Such pending work is in no way communicated to other FT users, resulting in confusion and time-wasting (while FT reason statements and notes/Alerts are available to help with this, they do not appear to be used).

Additionally, changes that may have been made by others in between BYU RLL visits are frequently not taken into account by BYU RLL’s activities.

Handling of duplicates by BYU RLL profile creation

We understand that BYU RLL uses FS’ standard duplicate flagging algorithm in its decision making.

It appears that FS tunes the standard duplicate algorithm to minimise the flagging of duplicates that aren’t actually matches, i.e. to minimise false positives.

But BYU RLL’s profile creation needs to be able to minimise false negatives, thereby protecting both the integrity of FT and the experience of its other users.

The provision and use of a differently configured duplicate flagging algorithm therefore seems appropriate.

A pre-create 'check for duplication' API (or the option to ‘reject if duplicate’ on the Create Person API) would also be beneficial.

melanes · November 11, 2024

@mandyshaw1 I think it is a well written response. I have underlying philosophical problem the the lab and the work they are doing, but it's probably best not to argue that. Mentioning genealogical best practices is a good start.

MandyShaw1 · November 11, 2024

Thanks @melanes, & also to the 'likers'.

Extra para to go on the end of the 'duplicates' section:

Even the standard duplicate algorithm will work better the more information it is given. The timing problems mentioned in the previous section may well, therefore, lead to unnecessary delays in identifying duplicates. One obvious example is that profile creation from Numident records appears initially to ignore the frequent treasure trove of alternate names present on the record as it follows the individual through time.

MandyShaw1 · November 13, 2024

Proposed response to Joe's second point follows. I'm planning on sending both halves off to him tomorrow afternoon unless anyone has any objection.

Point 2:

We would also be open to ideas for how you would ensure that everyone in the Numident (or other important datasets) are on the Family Tree. We don't want anyone to be missed. There are still 1.5 million families in the Numident where no one in the family seems to have a direct match to the Family Tree (based on the FamilySearch match files). If you would like to propose a way to add those families to the Family Tree, I would be certainly open to your ideas and your help.

My initial thoughts on this follow (and may or may not add anything!)

I assume that:

You have already created these 1.5 million FT profiles, and that they are currently in small family groups (‘triplets’ etc.) – as you know, we are seeing this a lot.

You are looking for merge opportunities that would link your family group accurately to the main Tree.

I did an experiment using some of the 40 triplets created by TreeBuilding Project that I found within my analysis database.

I successfully used the following methods to build up the evidence needed to identify potential linkages (which would obviously require detailed review before any action was taken):

1.Flagged duplicates. The standard algorithm minimises false positives, but they remain the best place to start.

2.’Research Helps’ that are already attached to other profiles (Research Helps are also accessible to certified partner solutions as webhints).

3.’Similar Records’ on the Numident that are already attached to other profiles.

4.Additional information from the Numident metadata.

5.’Find Similar People’, with judicious use of Exact/wildcards.

6.Lookups on the Census Tree in FS Genealogies (for possible links that FS’ algorithms can’t see).

7.Find a Grave entries (for possible links between profiles, though obviously to be taken with a pinch of salt).

8.‘Research Helps’ not already attached to other profiles (for potential additional evidence).

My examples are here: Numident evidence collection examples.pdf

MandyShaw1 · November 13, 2024

This thread has regained its correct date on my Bookmarks and on the main Discussions list!

(Edit, having checked re this comment) …. or perhaps not.

Áine Ní Donnghaile · November 13, 2024

It's on Page 1 of Recent Discussions, just barely, on my 34 inch monitor:

MandyShaw1 · November 13, 2024

Yup, back to its old tricks, my Bookmarks say it was last changed by you at 1.05pm (GMT).

MandyShaw1 · November 14, 2024

The more I think about this the more I realise that bulk insert activities, if they are ever to work in a non-disruptive way, have to have more APIs available to them; just as one example, allowing identification of any attached profile(s) on a given record persona. (Plus the 'duplicate precheck' I have banged on about previously.)

I can see these connections in my analysis database because I have already pulled the data, in particular re one specific family who happen to have meaningful but manageable amounts of data, and can therefore match this information at the database level, but that's not in any way a usable mechanism for actual bulk inserts.

MandyShaw1 · November 17, 2024

Here's the response I finally sent to Joe: BYU RLL response.pdf

I'll keep you all posted.

MandyShaw1 · December 8, 2024

Fyi I have had no response bar a quick reply from Joe confirming that what I had sent was what he wanted.

MandyShaw1 · December 13, 2024

This thread is relevant and insightful:

Record Linking Labs 5-A-Day Project Tips

Sue Maxwell

Dec 12, 2024

I have been working these every day and finding it a very worthwhile project!
I’ve found a way that works for me to put the family together in a most complete manner. Some days it might take but 5 minutes to get through the five families and on other days I might spend an hour or two cleaning up the family. I may be doing more that necessary, but I’d prefer to clean up the tree.
As I work each member of the family I add the sources. Then I go back to the beginning of the family and review each member again. Often I find that by adding the sources for each person, new sources appear for one or more members of the family unit. Then I add those sources snd take one more pass at each one. I often find potential duplicates and work those. Putting the duplicates together can also trigger new potential sources, so I make another pass looking for those sources.
Then I go back and check each person’s vitals and work any non-standard places.
If someone is missing a vital record I search FS Records for a record that the program missed. If necessary, I might even check Ancestry records for that vital and add that information.
This project has been extremely worthwhile. 😊

MandyShaw1 · January 5

Has this problem quietened down, or are people just no longer reporting it?

I don't propose to chase Joe until early March, to give him a chance to discuss (and hopefully action) our submission internally.

melanes · January 6

I am still finding examples from work done by users called TreeBuilding and whatnot as recently as October/November 2024. I just haven't taken the time to report it here. I cleaned up a mess just yesterday involving multiple duplicates in the same family because the volunteers don't understand the context in which they are working.

MandyShaw1 · January 6

Thanks @melanes.

Is the user "TreeBuilding Project" taking the tree forward or wasting time?

Answers

Handling of duplicates by automated Family Tree profile creation mechanisms

Record Linking Labs 5-A-Day Project Tips

Welcome!

Welcome!

Quick Links

Categories

Change Language

Recently used languages