Do the engineers intend to address the incorrectly standardized placenames issues?
There are probably are least two (maybe more) reasons that users are finding their relatives' placename events have been incorrectly standardized. The most common one, of late, relates what has transpired to be the misconceived idea of an automated placename standardization procedure.
@N Tychonievich has been of great help in passing on individual examples reported on this forum, but the issue needs to be given much wider attention in order to resolve the current, widespread mess that is having negative effects in tracing our relatives' records.
The example below is an interesting one, whereby the "London" birthplace of members of the same household in this census record has somehow been standardized differently. I admit I have not taken time to analyse a cause for this problem, if it is indeed possible to find one. The URL is https://www.familysearch.org/search/record/results?count=20&q.anyDate.from=1911&q.anyDate.to=1911&q.anyPlace=london&q.anyPlace.exact=on&q.surname=plat%2Aman&q.surname.exact=on&f.collectionId=1921547, if anyone would like to check this out. However, it seems quite baffling how both Marks and Morris have had their London birthplace standardized as "London, Moral Township, Shelby, Indiana, United States ", whereas Louis (and David, off screenshot) have had theirs correctly standardized as "London, England, United Kingdom".
Would someone at FamilySearch please give this, and other "standardization issues", some serious attention, rather than merely addressing the "auto-standardized" examples on a one-by-one basis?
Answers
-
I'm told the link above produces some weird behaviour (confirmed), so please try this instead:
1 -
I'd like some clarification on how places that change names and/or jurisdictions are handled.
For example, Versmold is in Ravensberg before about 1800, then it is in Westphalia. I've been reluctant to use the correct place (in Ravensberg) because I don't know if the event will be found in a search specifying Westphalia.
0 -
@Paul W, I think think that we all agree that this is a real problem; but. I am puzzled by your thought that the problem needs "much wider attention" and "some serious attention, rather than merely addressing the "auto-standardized" examples on a one-by-one basis". Since it is the engineers that are most knowledgeable of the algorithm/tool and the only one that can/will fix these problems, the focus can only be applied there. That they are giving the standardization issue serious attention is extrapolated based upon the "squeaky wheel" principle and the large number of related issues that are bombarding the engineers constantly.
That the standardization issues are addressed one-by-one is driven by the engineers who have requested this approach. I accept that their knowledge of the algorithm or tools cause this approach to be the most effective, at this time. But one-by-one actually includes, generally, hundreds or thousands of instances of the error. For example, the standardization error that you provided includes over 700 instances of the error in the England and Wales Census, 1911.
I hope that this may help a little. Perhaps I have not said anything that you don't already know. In that case, let me do, here, what I can.
The error that you describe has an impact on the ability to search for an individual in the England and Wales Census, 1911, when searching with a birth place/date. For example, I have tried to find the record for Marks Platman, using a birthplace of "London, England," "London" or "England," all without success. My results when searching for Morris Platman were equally unsuccessful. Although this is reasonable, considering the standardization error; its impact would be significant for those who would include the birth information in a search.
I will move this issue into the queue to be reviewed and resolved by the engineers. As usual, we cannot indicated when the issue will be fixed.
I appreciate your thoughtfulness in discussing the issue, and for the example that you provided.
0 -
@Mike357, unfortunately I think the one-by-one approach advocated by the engineers is based on a serious underestimation of the scope of the problem. The engineers think it affects only a few records here and there, which we report and they eventually fix.
In my explorations, I have found an error rate between 1.5% and 2%. I can't find any information about the size of FamilySearch's database, but the weekly blog posts on "New Free Historical Records on FamilySearch" talk about half a million records here and two million records there, so we're easily talking billions of records. One percent of one billion is ten million.
This means that the database contains well over 10 million records where the place has been rendered Completely Wrong.
The engineers will never be able to fix this with the current one-by-one approach. They need to revert all of the automated changes and start over, employing proper data validation and human oversight steps this time, so that records from Hungary don't end up in Uganda, and South Africa doesn't end up in Norway.
2 -
I feel it is just a shame that the whole auto-standardization exercise was ever implemented. Even where it hasn't changed the standardized value to a place on the other side of the world and effectively "got it right", the changed event place in most of the records I come across refer to exactly the same location! However, for a pre-1801 "England event" where the name had been correctly indexed as (say) "Yorkshire, England" the computer program changed this to an (incorrect for that period of time) "Yorkshire, England, United Kingdom".
Overall, my experience of the auto-standardization exercise is that it has proved (at "best") a waste of time and (at "worst") a really damaging piece of programming.
As I have stated, this exercise is not the only reason for placenames becoming messed-up, as many of the examples I find do not refer to an "Original event" place, so (as in my screenshot example) must be due to another, completely different piece of faulty programming (or other reason that cannot currently be understood by the engineers) that equally needs to be addressed.
2