"This record was indexed by a computer..."

LegacyUser · July 27, 2020

m said: "This record was indexed by a computer..."

Wow. New thing? Is it working out? Or full of errors?

LegacyUser · July 22, 2020

Cindy Hecker said: They have been trying and testing out computer technology for some time. My verdict so far is people still do a better job. I found an obituary yesterday, clearly labeled indexed by a computer and it was from a partner, Genealogy bank. The names were almost all wrong. The mother's name was indexing correctly and in the indexed listed as the mother but the child who was an adult dying at age 22. Her name was Gloria Gaviati and the computer indexed it as Ollie Liowers. Way off! The family names of her husband, parents and married sister were in the obituary but were in weird places in the indexing and not all were spelled right. I was glad there was an image so I could read it myself. I am grateful it was indexed so it was found when I was searching on the mother but it was full of errors.

LegacyUser · July 22, 2020

Tom Huber said: Other than the Genealogy Bank obituaries, I do not know of any other material where a computer was used to index records. For the most part, the indexes are mostly accurate, but a lot depends upon the original image.

LegacyUser · July 23, 2020

m said: I can't remember if it was an obit or not.

LegacyUser · July 23, 2020

Juli said: The computer-indexed data is all GenealogyBank obituaries, which means that non-LDS like me cannot check the accuracy without forking over $$. I therefore do not know whether the incorrect relationships and misspelled names in obituaries of The Famous Relative are original to the newspapers, or created by the computer index.

LegacyUser · July 24, 2020

Tom Huber said: I have encountered some obituaries where there were a lot of errors, even in the newspaper copy. I’m not surprised that there are bad spellings, relationship errors and so on.

LegacyUser · July 24, 2020

m said: It might have been the obit of a person who died in 1800s something attached to a person who was born in the 1600s that I detached without reading it since it was impossible.

LegacyUser · July 25, 2020

Adrian Bruce said: Both Ancestry and FindMyPast, as well as other sites like TROVE, use OCR to index sources like directories, electoral registers and newspapers - some of those indexes may get through into FS.

In essence there are two areas of problems as I see it.
1. Legibility of individual words - the classic OCR problems;
2. A failure to parse the block of OCR'd text correctly into the right items of given names, family names, addresses, etc, etc. I'm sure we've all seen things like an address of 32 John Street being interpreted as a person of that name. This is easy enough where the format is fixed - less so if, like electoral registers, the format constantly changes.

LegacyUser · July 26, 2020

Juli said: All of the online repository sites do some form of OCR-based search for printed materials. The difference between those and the "indexed by computer" obituaries is that the latter attempts to actually interpret the words to find the names and the relationships between them.

LegacyUser · July 26, 2020

Adrian Bruce said: That's interesting, Juli, thanks. I know that FindMyPast have attempted a degree of interpretation / parsing on the UK electoral registers - their first OCR attempt was quite crude in interpretation and suffered from taking a given name off one line and surname from the next and similar issues. They have a second collection on the same images with rather better interpretation / parsing, though I don't know if it's complete yet or how successful it is overall - our registers do change format a lot.

LegacyUser · July 27, 2020

m said: if it is OCR, I don't think it was handwritten text, then. so it was probably that one obit i detached for being 200 years out of place.

David Peterson · August 4, 2021

A year later.... it is no longer confined to OCR and indexing of obituaries. FS is now trying it on other records. I am currently looking at church records from the 1800s in the Philippines "indexed by computer". EVERY ONE of them has at least one transcription error. They also miss names that are spread across two lines (which happens regularly). It regularly fails to correctly identify the relationship of the principal person to other people in the record. It also indexes the names of public officials and priests, which usually have little genealogical value, which then just end up cluttering the index and source links.

Furthermore, if you attach one of these records as a source, the source detail does NOT show: (1) event type, (2) event date, (3) event place, even though that data has been captured. You have to review the attachment in the source linker and open the primary person's detail in order to see this information.

In my opinion, indexes done solely by computer, especially for handwritten text, are not yet ready to released directly for use. Better to have the computer do a first pass in the two-step indexing process, and then have a human review it.

dontiknowyou · August 4, 2021

My guess is that is the index editing tool is meant to not just correct individual records but also:

train the handwriting OCR system
detect the most badly transcribed batches and flag them for re-processing

There are simply too many historical records waiting for processing, to do them all entirely by hand. And the backlog must be growing rapidly.

Dennis J Yancey · August 5, 2021

Machine intelligence and OCR is always a moving target

but one should consider the fact that the OCR technology that was used to transcribe the specific newspaper article could have occurred 20 years ago (originally) and has never changed since then (the transcription) . . . .

and comparing that technology to what is available today can be like night and day.

its not always easy to judge when something that we see today that turned out bad -

may have been the result of something that happened 20 years ago - or maybe more recent but with technology that was 20 year old technology.

so its not always easy to judge todays technology on what you see in a specific item today - when you don't know when and how it took place.

"This record was indexed by a computer..."

Active · Last Updated February 21

Comments

Categories