Collection indexed to the wrong location
Many or most of the entries in this book of christenings St. Dunstans, Stepney, Middlesex, England is indexed to the wrong location, depending on what was entered as original location.
Records indexed as "Stepney" are linked to St. George's, Stepney, Barbados.
Records indexed as "St. Dunstans" are linked to Canterbury St Dunstan, Canterbury, Kent, England.
I've only found a few that include both "St. Dunstans Stepney" in the original location and that are linked to the correct church.
This is another example of the same bug that hit 1790 and 1840 U.S. census reports, and a few others, which bizarrely seems to be getting fixed on a record-by-record basis rather than just tracking down whatever change initially broke the whole thing two years ago, so, good luck with that.
Answers
-
Another one -- The United States Border Crossings from Canada to United States, 1895-1956 database:
Here's ~9,000 records that show people crossing the Canadian border into Iowa, I assume by trebuchet. It seems like the system is parsing "Immigration Place (Original): Quebec" as Quebec, Hardin, Iowa, United States.
2 -
Trebuchet rather than catapult or ballista? :-)
One (somewhat pedantic/nitpicky) correction: none of these are indexing errors. The indexed location was "Quebec", which is completely correct -- as far as it goes. The impossible relocation into Iowa was made by the computer, as part of the auto-standardization flustercluck that has completely corrupted FS's entire database, but which FS continues to expect to fix one reported error at a time. (Given that this approach will not have a noticeable effect within the current century, I no longer bother to try reporting the errors.)
For future reference, on a record detail page (like https://www.familysearch.org/ark:/61903/1:1:X21Z-QJ3), you can recognize the work of the auto-standardization bot by the presence of two location fields, one of them labeled "(Original)". That's the one that was actually indexed. The other one is what the computer associated it with. Since there was apparently no data validation step involved at all, this association is almost always at least slightly wrong, in my experience, and it is often spectacularly wrong.
2 -
@Julia Szent-Györgyi No, I totally get what's causing the bug, the fact that what was indexed is correct but isn't getting standardized correctly, but I'd still call that an indexing error because the process of transcribing the image didn't result in fully functional results. That's not pointing a finger at the indexers. It could very well be because the indexing guidelines and requirements themselves were insufficient.
But that's one reason why swear it used to work much better most of the time. I feel like just entering "Quebec" for location used to be sufficient, because the location of the collection itself was taken into account by the standardization process. Just appending "Canada" to that location would probably make it standardize correctly. I think that If it always behaved the current way, the guidelines wouldn't have allowed just the granular locality name to be entered, or that standardization would be checked as part of the indexing process.
It's kind of a guess, but I really think that's where this broke, and how it could be easily fixed. Instead of just standardizing based on that original location field, standardize based on [original location] + [collection location]. For U.S. census reports, we have to navigate to state → county → (sometimes) township/district → city/town, so everything needed to populate the standardization search string is already available.
And I'm sure it used to work because the only reason I wrote it up is because I noticed when it broke -- I'd been working with those same places in the same censuses and realized that suddenly searches weren't working the same and that a bunch of location fields that were filled by those sources consistently had a wrong location with a similar name. I checked a few that I'd created just a couple weeks earlier and they were fine.
0 -
No, the change that has caused all of these location problems is more fundamental than that: FS has attempted to change location and date searching from text-based to entity-based. Text-based searching is tried-and-true but slow and resource-intensive compared to a search that's essentially just a single-item database query. The problem is, the query is highly dependent on the database being correctly set up, and that's where it's completely failing on FS: they tried to take shortcuts in converting their text data into entities, so now the location fields may as well not exist, or not be searchable.
1 -
OK, but I still think there's got to be a way to define the entity itself to include the collection location, even if it means writing a function that takes [original location]+[collection location] then regenerates the standardized location from that. The function would only need to be run on collections that are flagged as wrong, or where the parent level (e.g. state or province) of the standardized location of the record doesn't match the parent level of the collection.
Or they could just admit their mistake and revert to the old method until they figure out how to optimize it correctly.
0 -
It appears that you have received some excellent information concerning the problem and its genesis; so, let me just add that FamilySearch is working hard to correct these problems.
I will forward this instance of the problem in order to be fixed. There is a large backlog of such problems, thus it is impossible to suggest when it will be fixed.
We are all looking forward to seeing this problem go away.
1 -
The extent of solving the problem using the current method (and available resources) can probably be compared (without exaggeration!) to trying to empty a swimming pool of its water by using a spoon. Many of us have realised the pure futility of providing individual examples, given the enormity of the task in addressing the problem in a piecemeal manner.
4