Project for correcting computer-indexed records
A lot of my research is focused around Brazilian records and I see an ever increasing number of records being indexed with information that is utter nonsense because it was indexed by a computer. Most of the time when I see something like that I click on the "report an error" section of the record but the changes are not accepted straightaway and I miss having a consistent and organized way to go through batches of these computer generated indexes to check them for mistakes.
Comments
-
@BiancaCM - Thank you for your suggestion! :)
Would you be able to give more context about how your project would work, from start to finish, as best you can?
0 -
Yes, I have some utter nonsense too, which is not editable. And I do mean nonsense. In an 1810 marriage record, ancestors of mine were apparently beamed by Scotty from Knox county Indiana to a Knox Atoll in the Marshall Islands, Pacific Ocean for a quick beach wedding. Where does one report these? I have come across another recently too.
0 -
I have a related idea (I am sure it has been posted before but I do not find it any longer when searching through ideas category).
Problem: AI (artificial intelligence/OCR indexer) has created an unintelligible index.
General Indexing Algorithm (which should/could also correct these problems?):
- Segment the record into discrete words that can be 'grouped' as Index Fields (Names, places, dates, relationships)
- Link the Index Fields to the segments in the image (Currently this is only possible if AI processed the image - meaning it does do some segmentation/highlight of fields - OR a researcher has Edit availability for the record). It would be 'nice' if this step were part of the Indexing process (as the AI indexer currently attempts) - because then the 'highlight' of the segment would already be attached for the researcher - all they would be correcting is the transcription.
- Index the fields. Review of AI indexes should be available OR just continue to 4. Perhaps Bianca's Project idea would pick up here - making AI indexed gibberish available for human indexer review/pass.
- Make the fields editable for correction once the collection/index is published. If 'report an error' is all that is available - then I assume corrections will be internal to FamilySearch teams (not much else users can do but 'report an error' if no other options).
As far as reporting these errors. I believe the best current process involves tagging @N Tychonievich ? There used to be a Report tab but since that has been done away in favor of Community - one needs to report issues here. I believe there is a 'report an error' link/button for AI indexed collections (as Bianca is referring to). Further - placename AI passes - which may have changed original index locations to some other incorrect location - as you indicate with your specific example - is a separate but related issue (much discussed here in Community). There is/was a dedicated thread to reporting these headed by @Gordon Collett (i believe). But ntychonievich indicates that perhaps reports in separate threads are easier to track/pass along to development.
1 -
Good morning, genthusiast and @N Tychonievich . There is no button to report a link on this index. See the link below to the actual index. I attached a screen shot of what I see, in case there are different view by church members. To see the actual document I have to go to a FHL, and it is possible THAT is where the report a problem link is. Link: https://www.familysearch.org/ark:/61903/1:1:41X6-D42M?from=lynx1UIV8&treeref=29HC-9Z8
0 -
@Gail S Watson First off, you do not need to @ mention me to report errors in Historical Records. We have quite a few support folks who watch for issues and can get them reported as needed. Mentioning me might even slow things down since others see that I was mentioned and they leave it until I happen to get around to it. I do take time off once in a while, after all :) So just come into Community, click FamilySearch Help and then Search and post your report.
Looking at your screenshot and the URL (thank you for providing it), I see this as an auto-standardization issue we have been seeing a lot of. We'll get it off to the engineers. They have a lot of issues they are currently dealing with, so don't expect an immediate fix. Meanwhile, you can still use the record as a source. When you use the source linker to attach it, you can record the error in your "reason to attach" notes. Also, in source linker, when you bring over a place, you can edit it so that it shows correctly in the attached source.
1 -
N Tychonievich Thank you. I can imagine how massive this issue is. It has not prevented me from using the source, and I am sure it will be fixed at some point. Thank you.
0 -
I review a lot of 17th, 18th and 19th century records in Guatemala. The biggest error I see is that the family name, GAMEZ is often read as GOMEZ. Sometimes, I agree - it's hard to distinguish the "a" from an "o." But often, the "a" is quite clear. These two names/families are distinct. We see Gamez frequently in San Martin Jilotopeque (Chimlatenango) and the state of Quiche where the there is a high concentration of intermarriage among the Estrada and Dubon family. But certainly, there are Gamez elsewhere in Guatemala and even other Latin American counteries. It would be nice if - the algorithm is making an assumption that something close to GxMxZ is GOMEZ - that could be fixed so that it reads correctly. Also, as I come across these, I report and attempt to correct, but have noted that the records are not updating correctly. @N Tychonievich
Kind regards,
Justin Estrada-Gamez
0 -
"I miss having a consistent and organized way to go through batches of these computer generated indexes to check them for mistakes." - me too!
The automatic algorithm usually indexes brazilian documents with the name of the priest and recognizes random names amongst words such as "annointed with holy oil" on old baptism certificates. The biggest problem is indexing dates wrong such as 1893 became 1903 when indexed by computer.
Well, looking forward to way to check easily all those generated brazilian records, but anyways I found precious information already thanks to this automation nevertheless.
0 -
I have a similar complaint about some records I've been working with in Mexico. I've specifically reported a few of the errors, but I think it would be worthwhile to reindex the records in question in an organized fashion. Widespread problems:
- The blue box either goes around two entries, or only part of one entry, so it either mixes together information form two people, or doesn't get important information from one entry
- When a name is split across two lines, it's indexed as two people
- Phrases like "I baptized" indexed as a person
- The place indexed as a person, or as part of a person's name
0