St John The Baptist, Hoxton, Middlesex, England is not in Yorkshire!!
Comments
-
Hoxton is already in the Places database (https://www.familysearch.org/research/places/?focusedId=2244976&searchTypeaheadInputText=Hoxton&text=Hoxton). That's not the problem. The problem is that the automated process that associated the indexed text "St John The Baptist, Hoxton, England" with a place on the map made a totally-incorrect choice, ignoring the actual placename in favor of the church's name, and choosing the first thing that came up on the global list for that. (We should be happy that it found one in the correct country, and not somewhere clear on the other side of the globe, as it did for many other places.)
You can recognize the work of the auto-standardization bot by the presence of two "event place" fields on an index entry's details page. The one labeled with a parenthetical "Original" is the text that was actually in the index; the other one is what the computer picked to go with that text.
2 -
In hope the placename algorithm will disappear @N Tychonievich
0 -
Am I wrong in not even bothering to highlight the many examples I encounter? The situation reminds me of a song about "tears in an ocean".
2 -
To be honest, @Paul W, I wonder if my efforts are worthwhile, but I live in hope that a preponderance of evidence will show that algorithm needs to disappear. It is still active since I have found instances in the recently-released 1950 US Census.
0 -
@Ronald P. Tilby I will report the inaccurate auto-standardization for you.
@Áine Ní Donnghaile Yes, the algorithm is still active. Yes, corrections are being made. I have to report each instance separately and I see them corrected daily. Have not heard whether folks in power are considering a discontinuation of the tool. They are certainly aware of the errors it is generating--I make sure of that.
3 -
Could I also point out that (at least in the examples I looked at) the Original Place omitted the county, which is presumably an indexing issue (post processing?). If "Middlesex" had been included in the Original Place, I wonder if the auto standardisation might have swung towards the correct value? (I have no idea how that algorithm works but wonder if a count of matching letters is involved?)
0 -
In this example from the 1950 census, the county is included, but the algorithm chose a placename from colonial times rather than the current placename. https://www.familysearch.org/ark:/61903/1:1:6F91-JH73
And in this one, the algorithm went a little bonkers, since the entire record set is from a single county, in New Jersey, but the majority of the New Jersey, Essex County, Superintendent of Soldiers' Burials, 1776-1979 have been transported to anywhere other than New Jersey. https://community.familysearch.org/en/discussion/130585/error-report-robo-index-collection
0 -
@Áine Ní Donnghaile - now I'm baffled - not about the auto algorithm (that's pretty much a given, and I have a feeling that it's getting more and more complex) but about what the indexing process is supposed to produce.
I had thought that "Event Place (Original)" was supposed to contain something useful out of the indexing process. Instead, in one of your Barrett examples, I see "Event Place (Original) = St John". Just "St John". Well, there's an awful lot of places called "St. John" so the fact that one of your examples decided it was going to be "St. John, Virgin Islands" wasn't surprising.
This, from the UK's 1911 census, is roughly how I thought it worked:
- Event Place (Original) = Monks Coppenhall, Crewe, Cheshire, England
- Event Place = Monks Coppenhall, Cheshire, England, United Kingdom
That's a minimal step and requires only the "Event Place (Original)" for it to work. So the indexing is supposed to produce a full placename. I thought. The Auto process was only there to match the full placename to the standards. I thought.
But "Event Place (Original) = St John" is just way off that path and suggests that I'm totally missing a chunk of knowledge about how the algorithm and/or the output indexing works - as if there's some place data associated with the collection? Film? Whatever it is, I can't see it.
Any clarity welcomed!
0 -
@Adrian Bruce1, the indexing is simply supposed to record what's on the page. If it just says "St. John", then it should be indexed as "St. John". The rest of the location can be figured out from the metadata. Yes, there is always some place data associated both with each collection as a whole and with each specific film or image group (or portion thereof). Unfortunately, the autostandardizing bot appears not to consult that metadata at all.
There are record types with a wide array of places, such as passenger manifests, on which FS's idea of searching for points on a map instead of text strings is just never going to work. Neither a bot nor a human indexer can decide whether the "B. Keresztur" written on the page is for Bácskeresztúr, Balatonkeresztúr, Berekeresztúr, Bethlenkeresztúr, or Bodrogkeresztúr. That has to be the researcher's job, not the finding aid's. I do not see any effort on FS's part to address this problem.
3 -
And since the cards clearly show that the entire record set refers to cemeteries in Essex County, New Jersey, by any logic, St John is a cemetery in Essex County. That's a pretty short list.
2 -
@Julia Szent-Györgyi and @Áine Ní Donnghaile - thanks hugely for the explanation. I had clearly missed the implications of "the indexing is simply supposed to record what's on the page". In my defence, when I've just been sampling my sources, the majority of my parish register entries just have an "Event Place" - no "Event Place (Original)". So they must predate the "Event place" plus automatic "Event Place (Original)".
So for instance, an Event Place of "St Paul, Crewe, Cheshire, England" (no "Event Place (Original)") must have drawn in "England" from the metadata, because "England" is not present in the original register (I checked).
Plenty of my census stuff has the dual Event Place, but the district headers will have a lot more components to the placename, presumably providing the full "Event Place (Original)".
So between the full data in "Event Place (Original)" from censuses and the full data (from metadata) in the original PR Event Places, I'd missed examples of a short "Event Place (Original)" such as "St. John" and was still under the impression that the collection's metadata fed into the value of "Event Place (Original)". Sorry...
0